## Webpage Loaders
- Load the webpage and extract the data using the `WebBaseLoader` and `BeautifulSoup` libraries.
- Use LLM to extract meaningful data from the webpage.

### Project 1: Share Market Data Analysis Based on Global Cues
- We will extract the data from the stock market website and analyze the data to understand the impact of global cues on the Indian share market.

#### Stock Market Data Extraction

In [2]:
from dotenv import load_dotenv

load_dotenv('./../.env')

True

In [None]:
from langchain_community.document_loaders import WebBaseLoader

urls = ['https://economictimes.indiatimes.com/markets/stocks/news',
        'https://www.livemint.com/latest-news',
        'https://www.livemint.com/latest-news/page-2'
        'https://www.livemint.com/latest-news/page-3',
        'https://www.moneycontrol.com/']

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
loader = WebBaseLoader(web_paths=urls)

In [5]:
docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

Fetching pages: 100%|##########| 4/4 [00:00<00:00,  5.75it/s]


In [6]:
def format_docs(docs):
    return "\n\n".join([x.page_content for x in docs])

In [7]:
context = format_docs(docs)

In [8]:
# print(context)
# context

import re

def text_clean(text):
    text = re.sub(r'\n\n+', '\n\n', text)
    text = re.sub(r'\t+', '\t', text)
    text = re.sub(r'\s+', ' ', text)
    return text

In [9]:
context = text_clean(context)

In [10]:
print(context)



#### Stock Market Data Processing with LLM

In [11]:
from scripts import llm

In [12]:
# response = llm.ask_llm(context, "What is todays news?")
response = llm.ask_llm(context, "Extract stock market news from the given text.")


In [13]:
print(response)

Here is the extracted stock market news:

1. "OpenAI’s COO says AI needs new devices, teases future beyond screens"
2. "China’s industrial edge is reshaping the global balance of power — and the US is falling behind"

Additionally, there are some market-related announcements such as:
 
1. "Sensex Today"
2. "India GDP Data"
3. " Suzlon Energy Share Price"
4. "Gold Loan"

There are also some news articles related to stocks such as:

1. "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Others Trending Topics more Sensex Today India GDP Data Suzlon Energy Share Price Gold Loan Covid cases in India Karnataka Muslim Quota Bill Shashi Tharoor Haridwar Fire Donald Trump AP DSC Hall Ticket News"
2. "Topper Learning Quick Links more About Us Contact Us Advisory Alert Advertise with Us Support Disclaimer Privacy Policy Cookie Policy Terms & Conditions Financial Terms (Glossary) Sitemap Investors Download MC Apps:"


In [14]:
response = llm.ask_llm(context[:10_000], "Extract stock market news from the given text.")

In [15]:
print(response)

Here are the extracted stock market news:

1. **Alkem Laboratories shares drop 5% after Q4 results**: Alkem Laboratories shares fell 5% after Q4FY25 results showed a modest 4.2% profit rise to Rs 306 crore and margin contraction to 12.4%.
2. **Bharat Dynamics shares rally 6% as govt plans Rs 40,000 crore defence R&D push**: Bharat Dynamics shares jumped 5.6% after reports said the government plans to raise defence R&D spending to Rs 40,000 crore next year and double it over five years.
3. **Mazagon Dock Shipbuilders Share Price**: Mazagon Dock Shipbuilders share price fell by 6.73%.
4. **Emkay Global upgrades SAIL to buy; YES Securities sees 13% upside in VA Tech Wabag**: Brokerages are optimistic about Samvardhana Motherson International, SAIL, and VA Tech Wabag, citing strong growth prospects and favorable sectoral trends.
5. **Sensex falls 300 pts, Nifty below 24,750 amid caution ahead of GDP data**: Indian equity markets traded with mixed signals on Friday, as positive institutiona

In [16]:
def chunk_text(text, chunk_size, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

In [17]:
chunks = chunk_text(context, 10_000)

In [1]:
question = "Extract stock market news from the given text."

chunk_summary = []
for chunk in chunks:
    response = llm.ask_llm(chunk, question)
    chunk_summary.append(response)

NameError: name 'chunks' is not defined

In [26]:
for chunk in chunk_summary:
    print(chunk)
    print("\n\n")
    break

Here are the extracted stock market news:

1. **Avenue Supermarts Share Price**: The stock saw a sharper decline and is now trading below its 200-day exponential moving average (DEMA). It had attempted to hold above this level but dropped following a disappointing second-quarter performance.
2. **Elcid Share Price Surge**: Elcid's share price skyrocketed 66,92,535%, climbing from Rs 3.53 to Rs 2,36,250. This impressive surge has made Elcid India's most expensive stock, surpassing MRF, which closed at Rs 1,22,576 per share on Tuesday.
3. **EFC secures SM REIT approval**: EFC (I) Limited's subsidiary received SEBI approval for SM REIT, enabling a ₹500 crore initial offer. The company reported a 230.26% profit increase, boosting its growth potential.
4. **Auto stocks cooling down**: Cooling is a part of long journey of structural change; 7 auto stocks with an upside potential of up to 38%.
5. **NSE crosses 20 crore client accounts**: The National Stock Exchange (NSE) announced it had exce

In [27]:
summary = "\n\n".join(chunk_summary)

In [29]:
# print(summary)

In [43]:
# question = "Write a detailed report in Markdown from the given context."
question = """Write a detailed market news report in markdown format. Think carefully then write the report."""
response = llm.ask_llm(summary, question)

In [44]:
import os

os.makedirs("data", exist_ok=True)

with open("data/report.md", "w") as f:
    f.write(response)

In [45]:
with open("data/summary.md", "w") as f:
    f.write(summary)