## Webpage Loaders
- Load the webpage and extract the data using the `WebBaseLoader` and `BeautifulSoup` libraries.
- Use LLM to extract meaningful data from the webpage.

### Project 1: Share Market Data Analysis Based on Global Cues
- We will extract the data from the stock market website and analyze the data to understand the impact of global cues on the Indian share market.

#### Stock Market Data Extraction

In [1]:
from dotenv import load_dotenv

load_dotenv('../../.env')

True

In [15]:
from langchain_community.document_loaders import WebBaseLoader

urls = ['https://finance.yahoo.com/topic/latest-news/',
        'https://finance.yahoo.com/topic/stock-market-news/',
        'https://finance.yahoo.com/topic/crypto/'
        'https://finance.yahoo.com/topic/economic-news/',
        'https://finance.yahoo.com/topic/earnings/']

In [16]:
loader = WebBaseLoader(web_paths=urls)

In [17]:
docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

Fetching pages: 100%|##########| 4/4 [00:02<00:00,  1.39it/s]


In [18]:
def format_docs(docs):
    return "\n\n".join([x.page_content for x in docs])

In [19]:
context = format_docs(docs)

In [20]:
# print(context)
# context

import re

def text_clean(text):
    text = re.sub(r'\n\n+', '\n\n', text)
    text = re.sub(r'\t+', '\t', text)
    text = re.sub(r'\s+', ' ', text)
    return text

In [21]:
context = text_clean(context)

In [22]:
print(context)



#### Stock Market Data Processing with LLM

In [23]:
from scripts import llm

In [24]:
# response = llm.ask_llm(context, "What is todays news?")
response = llm.ask_llm(context, "Extract stock market news from the given text.")


In [25]:
print(response)

Here is the extracted stock market news:

**Top Gainers**

* PPCB Propanc Biopharma, Inc. +12.00 (+3,999,899.80%)
* NXT Nextracker Inc. +9.62 (+24.28%)
* EAT Brinker International, Inc. +25.16 (+16.28%)
* FFIV F5, Inc. +30.74 (+11.40%)

**Top Losers**

* MANH Manhattan Associates, Inc. -72.26 (-24.49%)
* TEVA Teva Pharmaceutical Industries Limited -3.00 (-13.91%)
* ASTS AST SpaceMobile, Inc. -2.42 (-12.02%)
* DHR Danaher Corporation -23.92 (-9.65%)
* ASH Ashland Inc. -7.02 (-9.87%)

**Most Active**

* NVDA NVIDIA Corporation -5.20 (-4.03%)
* RGTI Rigetti Computing, Inc. -0.42 (-3.21%)

**Trending Tickers**

* SBUX Starbucks Corporation +8.17 (+8.14%)
* TSLA Tesla, Inc. -8.99 (-2.26%)

**Economic Events**

* Colombia: no important events at this time

Note that there are also general market news and indices mentioned in the text, but I have only extracted the specific stock market news related to individual stocks, companies, and economic events.


In [26]:
response = llm.ask_llm(context[:10_000], "Extract stock market news from the given text.")

In [27]:
print(response)

Here are some stock market news extracted from the provided context:

1. Levi Strauss beats quarterly revenue estimates, with $1.84 billion in revenue for the three months ended Dec. 1.
2. Google and ServiceNow announce a new partnership focused on enterprise AI growth.
3. MSFT (Microsoft) beats quarterly revenue estimates, but Azure growth slows.
4. TSLA (Tesla) misses its mark in Q4 earnings, with full-year adjusted net income dropping 23%.
5. Nvidia falls 4% after reports of potential additional curbs on China sales.
6. Intel will report its Q4 earnings as the company seeks a permanent CEO.
7. Apple will report its Q1 earnings after seeing its stock hit with multiple downgrades.

Note that these are just some of the extracted news, and there may be other relevant information in the provided context.


In [28]:
def chunk_text(text, chunk_size, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

In [29]:
chunks = chunk_text(context, 10_000)

In [30]:
question = "Extract stock market news from the given text."

chunk_summary = []
for chunk in chunks:
    response = llm.ask_llm(chunk, question)
    chunk_summary.append(response)

In [31]:
for chunk in chunk_summary:
    print(chunk)
    print("\n\n")
    break

Here's the extracted stock market news:

* Levi Strauss beats quarterly revenue estimates with $1.84 billion in revenue, exceeding analysts' average estimate of $1.73 billion.
* Google and ServiceNow announce new partnership focused on enterprise AI growth.
* Tesla Q4 earnings miss the mark, as full-year adjusted net income drops 23%.
* Fed rate decision: Dow, Nasdaq, S&P 500 slip, Nvidia falls as Fed leaves rates unchanged.
* Intel will report its Q4 earnings after the bell Thursday.
* Apple to report Q1 earnings a week after stock hit with multiple downgrades.
* 'I prefer across the board': Trump's top tariff man favors broad duties for a range of issues — including AI
* IBM beats profit estimates as software business surges on AI shift.
* TSLA (Tesla) reported fourth quarter earnings after the bell on Wednesday.





In [38]:
len(chunk_summary)

8

In [32]:
summary = "\n\n".join(chunk_summary)

In [29]:
# print(summary)

In [33]:
# question = "Write a detailed report in Markdown from the given context."
question = """Write a detailed market news report in markdown format. Think carefully then write the report."""
response = llm.ask_llm(summary, question)

In [39]:
import os

os.makedirs("data", exist_ok=True)

with open("data/report.md", "w") as f:
    f.write(response)

In [40]:
with open("data/summary.md", "w") as f:
    f.write(summary)