# Web Page Loaders
- Load the web page and **extract** the data using the `WebBaseLoader` and `BeautifulSoup` libraries.
- Use LLM to **extract meaningful data** from the web page

In [1]:
import os

from dotenv import load_dotenv
load_dotenv("./../.env")

True

## Section 1. Share Market Data Analysis Based on Global Cues
- We will **extract the data** from the stock market website and **analyze the data** to understand **the impact of global cues** on the Indian share market. 

### Stock Market Data Extraction

In [3]:
from langchain_community.document_loaders import WebBaseLoader

urls = [
    'https://economictimes.indiatimes.com/markets/stocks/news',
    'https://www.livemint.com/latest-news',
    'https://www.livemint.com/latest-news/page-2'
    'https://www.livemint.com/latest-news/page-3',
    'https://www.moneycontrol.com/'
]

In [6]:
loader = WebBaseLoader(web_paths=urls)
loader

<langchain_community.document_loaders.web_base.WebBaseLoader at 0x1a195451e10>

In [7]:
docs = []

async for doc in loader.alazy_load():
    docs.append(doc)

Fetching pages: 100%|##########| 4/4 [00:02<00:00,  1.57it/s]


In [10]:
def format_docs(docs):
    return "\n\n".join([t.page_content for t in docs])

In [14]:
context = format_docs(docs)

### Remove Multiple New Lines (\n) and Tabs (\t)

In [18]:
import re

def text_clean(text):
    text = re.sub(r'\n\n+', '\n\n', text) # Remove 2 or more into double \n
    text = re.sub(r'\t+', '\t', text)  # Remove more than 1 into single tab
    text = re.sub(r'\s+', ' ', text) # Remove more than 1 into single space
    return text

In [19]:
context = text_clean(context)
print(context)



### Stock Market Data Processing with LLM

In [20]:
from scripts import llm

In [24]:
response = llm.ask_llm(context, "What is today news?")

In [25]:
print(response)

Today's news includes various topics such as:

1. Kevin Hart to perform in India for the first time with ‘Acting My Age’ tour
2. Pittsburgh Steelers sign Chuck Clark, former Baltimore Ravens' safety
3. Goldman Trading Desk’s Clients Start Shorting Speculative Tech
4. Ukraine anti-corruption chief says his agency faces dirty information campaign
5. Anime Stars’ Pioneer Talent Firm Shuts Down
6. Ryan McMahon trade: Yankees address third base woes with Rockies All-Star
7. OYO's Ritesh Agarwal says India-UK FTA deal game-changer for startups, jobs
8. India vs Australia Live Streaming: How to watch Yuvraj Singh vs Brett Lee in WCL
9. Muizzu calls India ‘closest' ally, PM Modi stresses ‘friendship first’
10. Thailand-Cambodia clashes: 20 killed, over 1.3 lakh displaced

These are just a few of the latest news updates available as of the current time. For more information, please visit the MoneyControl website or check their social media channels for the latest updates and breaking news stori

### Catastrophic Forgetting
- LLM puts **more attention** on the **start** part and the **last** part of the data.
- In the **middle of data**, the attention gets **reduced**


<br />

<p align="center">
<img src="./../ASSETS/document-loaders-1.png" />
</p>


In [30]:
response = llm.ask_llm(context, "Extract stock market news from the given text.")

In [31]:
print(response)

Here are some extracted stock market news:

1. Kevin Hart to perform in India for the first time with ‘Acting My Age’ tour
2. Pittsburgh Steelers sign Chuck Clark, former Baltimore Ravens' safety
3. Goldman Trading Desk’s Clients Start Shorting Speculative Tech
4. Ukraine anti-corruption chief says his agency faces dirty information campaign
5. Anime Stars’ Pioneer Talent Firm Shuts Down
6. Ryan McMahon trade: Yankees address third base woes with Rockies All-Star
7. OYO's Ritesh Agarwal says India-UK FTA deal game-changer for startups, jobs
8. Thailand-Cambodia clashes: 20 killed, over 1.3 lakh displaced

These news articles are related to various sectors such as entertainment, sports, and technology. They may not be directly related to the stock market but can provide insight into the overall economic and business landscape.


### Solution:
- Break our text data into smaller chunks
- And then, we can ask question on these individual chunks, so we'll be getting answer from these individual chunks.
- Finally, we can combine all these together and make here a single document.

In [32]:
response = llm.ask_llm(context[:10_000], "Extract stock market news from the given text.")
print(response)

Here are the extracted stock market news:

1. Bajaj Finserv Q1 Results FY26
2. Bajaj Finance Share is Falling
3. Stock Market Crash Today
4. Why Bajaj Finance share is Falling
5. Reliance Power Share Price
6. Why Stock Market is Falling Today
7. Nestle Q1 Results FY26
8. Why IEX Share Price is Falling
9. Persistent Systems Share Price
10. TVS Motor Company Share Price 2774.40-23.30 (-0.84%)
11. These 9 stocks witness decreasing promoter holdings
12. US stocks rise on US-EU trade deal prospects; Intel falls
13. Airtel Africa Q1 profit jumps 20% on Nigeria-led tariff boost
14. Bharti Airtel Share Price 1937.90-1.81 (0.10%)
15. Laurus Labs among 7 stocks that hit 52-week highs and rallied up to 25%
16. IDFC First Bank Q1 Preview: PAT may fall up to 68% YoY amid NIM pressure
17. HDFC Mutual Fund "Tap2Invest" on WhatsApp
18. Should you opt for the old tax regime, or the new tax regime ?
19. Money deadlines in March 2025
20. Top 5 ELSS funds with up to 23% returns in 3 years
21. New NPS fees

In [33]:
from typing import List


def chunk_text(text, chunk_size, overlap=100) -> List[str]:
    """
    :param overlap: 
        Particular context can be covered here in a next chunk properly
    """
    
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
        
    return chunks

In [34]:
chunks = chunk_text(context, 10_000)

In [37]:
chunks

 " in gains after US trade deal rallyJul 25, 2025, 01:26 PM ISTJapan's Nikkei share average dropped on Friday, halting a two-day advance that brought the index to the brink of a record, as traders locked in gains spurred by a newly inked trade deal with the United States.Load More...Videos'I never went to Epstein island'Trump warns of secondary sanctions on RussiaTrump dismisses Macron's plan to recognise Palestinian stateTrending in MarketsStock Market LIVEInfosys Q1 Results FY26Infosys Q1 Results Live UpdatesPaytm Q1 ResultsSwiggy share priceZomato share priceWipro Q1 Results Live UpdatesVedanta Share NewsTrent Share PriceHDB Financial Share Price Live UpdatesGlobe Civil Projects IPO Grey Market PremiumBelrise Industries SharesMarkets BSENSEBSENSEGainersLosersCommodity Gainers/Commodity Losers/ForexIndian rupee falls 19 paise to 86.59 against US dollar in early trade52W - High52W - LowTop Mutual FundsETFsIPOs: OpenQuarterly ResultsRevenuePATRecommendationsMarkets Calendar25JULITC Ltd

### Create summary for each chunks

In [38]:
question = "Extract stock market news from the given text."
chunk_summary = []

for chunk in chunks:
    response = llm.ask_llm(chunk, question)
    chunk_summary.append(response)

In [39]:
chunk_summary

["Here are some of the extracted stock market news:\n\n1. Bajaj Finserv Q1 Results FY26\n2. Stock Market Crash Today\n3. Why Bajaj Finance share is Falling\n4. Stock Market Today Live\n5. Nestle Q1 Results FY26\n6. Why IEX Share Price is Falling\n7. Persistent Systems share price\n8. Infosys Q1 Results Live Updates\n9. Lodha Developers Share Price\n10. Top 5 ELSS funds with up to 23% returns in 3 years\n11. New NPS fees from January 2025: Full breakdown of charges\n12. Top 5 aggressive hybrid equity funds with up to 38% returns in 1 year\n13. Record profit for Motilal Oswal\n14. Why UK FTA, US trade deal don't matter\n15. 3 sectors to offer better earnings upside\n16. Time correction likely; rebound in Q2?\n17. Multi-asset funds are a smart choice\n18. Swaraj Engines Ltd. breaks out from flag pattern to hit fresh record high in July 2025\n19. Sensex set to hit 115,836 by FY28 on earnings momentum: Ventura\n20. Short-term valuation headwinds? Yes. Long-term growth potential intact? Yes.

In [41]:
summary = "\n\n".join(chunk_summary)

### Generate Final Stock Market Report

In [56]:
question = "Write a detailed report in Markdown from the given context. If there are sentences that include 'I don't know', just ignore or skip the sentences."
response = llm.ask_llm(summary, question)

In [57]:
print(response)

## Stock Market News Report
### Table of Contents
1. [Introduction](#introduction)
2. [Company-wise Stock Performance](#company-wise-stock-performance)
3. [Market Trend Analysis](#market-trend-analysis)

## Introduction
The given context is a subscription confirmation email with links to various features, including a gift option, WhatsApp updates, and access to JM Financial Services.

## Company-wise Stock Performance

### Listed Companies:
Below are some of the listed companies along with their stock prices mentioned in the provided context:

*   **Bajaj Finserv:** Q1 Results FY26: Cons profit jumps 30% YoY to Rs 2,789 crore; revenue up 13%.
*   **Nestle India:** Q3 Results
*   **Infosys:** Q1 Results FY26, Infosys Q1 Results 2025.
*   **HDB Financial Services:** IPO allotment status.
*   **TCS:** Q4 Results.

### Other Companies:
Below are some of the other companies mentioned along with their stock prices:

*   Adani Enterprises share price
*   Adani Ports share price
*   Apollo Hos

In [58]:
import os
os.makedirs("data", exist_ok=True)

with open('data/report.md', 'w') as f:
    f.write(response)

### Generate Final Market News

In [63]:
question = """
Write a detailed market news report in Markdown format. Think carefully then write the report.

If any sentence includes 'I don't know', it should be ignored or skipped.
And don't write "I don't know" in the report!
"""
response = llm.ask_llm(summary, question)

In [64]:
print(response)

# Market News Report
## Sectoral Performance and Top Gainers/Losers

The Indian stock market witnessed a mixed performance on [current date], with some sectors experiencing significant gains, while others slipped into the red.

### Top Gainers:

1. **Bajaj Finvest** reported a 30% YoY increase in profits to Rs 2,789 crore, driven by robust revenue growth of 13%.
2. **Adani Green** announced its Q2 results, with a significant jump in earnings.
3. **Nestle India** posted impressive Q1 results, showcasing the company's resilience in a challenging market environment.

### Top Losers:

1. **Zomato** shares plummeted 3% after a public dispute within the promoter family affected investor sentiment.
2. **Swiggy** share price also took a hit due to similar reasons.
3. **HDB Financial Services** shares fell 5% as investors reacted to concerns over the company's financial health.

### Market Watch:

* The Indian rupee continued its downward trend against the US dollar, falling 19 paise to 86.59 i

In [65]:
import os
os.makedirs("data", exist_ok=True)

with open('data/news_report.md', 'w') as f:
    f.write(response)
    print("OK")

OK
