### Project 1: Share Market Data Analysis Based on Global Cues
- We will extract the data from the stock market website and analyze the data to understand the impact of global cues on the Indian share market.

#### Stock Market Data Extraction

In [1]:
from dotenv import load_dotenv

load_dotenv('./../.env')

False

In [11]:
from langchain_community.document_loaders import WebBaseLoader

urls = ['https://economictimes.indiatimes.com/markets/stocks/news',
        'https://www.livemint.com/latest-news',
        'https://www.livemint.com/latest-news/page-2'
        'https://www.livemint.com/latest-news/page-3',
        'https://www.moneycontrol.com/']

In [12]:
loader = WebBaseLoader(web_paths=urls)

In [13]:
docs = []
async for doc in loader.alazy_load():
    docs.append(doc)

Fetching pages: 100%|##########| 4/4 [00:00<00:00,  4.36it/s]


In [14]:
def format_docs(docs):
    return "\n\n".join([x.page_content for x in docs])

In [16]:
context = format_docs(docs)



In [17]:
import re

def text_clean(text):
    text = re.sub(r'\n\n+', '\n\n', text)
    text = re.sub(r'\t+', '\t', text)
    text = re.sub(r'\s+', ' ', text)
    return text

In [18]:
context = text_clean(context)

In [9]:
print(context)



#### Stock Market Data Processing with LLM

In [21]:
### Question Answering using LLM
from langchain_ollama import ChatOllama

from langchain_core.prompts import (SystemMessagePromptTemplate, 
                                    HumanMessagePromptTemplate,
                                    ChatPromptTemplate)



from langchain_core.output_parsers import StrOutputParser

base_url = "http://localhost:11434"
model = 'llama3.2:3b'

llm = ChatOllama(base_url=base_url, model=model)


system = SystemMessagePromptTemplate.from_template("""You are helpful AI assistant who answer user question based on the provided context.""")

prompt = """Answer user question based on the provided context ONLY! If you do not know the answer, just say "I don't know".
            ### Context:
            {context}

            ### Question:
            {question}

            ### Answer:"""

prompt = HumanMessagePromptTemplate.from_template(prompt)

messages = [system, prompt]
template = ChatPromptTemplate(messages)

qna_chain = template | llm | StrOutputParser()

def ask_llm(context, question):
    return qna_chain.invoke({'context': context, 'question': question})

In [26]:
def chunk_text(text, chunk_size, overlap=100):
    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunks.append(text[i:i + chunk_size])
    return chunks

In [27]:
chunks = chunk_text(context, 10_000)

In [29]:
question = "Extract stock market news from the given text."

chunk_summary = []
for chunk in chunks:
    response = ask_llm(chunk, question)
    chunk_summary.append(response)

In [30]:
for chunk in chunk_summary:
    print(chunk)
    print("\n\n")
    break

Here are the extracted stock market news:

1. "Market looking for tax cuts, capex in Budget" - Expectations of a budget that will boost market sentiment.
2. "Trump rains on FM's victory parade" - US President Trump's comments on DeepSeek AI technology may impact Indian stocks.
3. "Maha Kumbh stampede updates: PM Modi offers condolences to devotees who lost loved ones" - Prime Minister Modi addresses the Mahakumbh stampede incident and expresses grief for the victims.
4. "Don't believe rumors, situation under control, says CM Yogi Adityanath" - CM Yogi Adityanath reassures that the situation at Maha Kumbh is under control.
5. "Power of patience: How long-term investing drives superior returns" - India's growth at 10% annually while maintaining 15% return ratios is becoming challenging, and a decade ago, only 30 companies formed the CCP portfolio; now, it's halved.
6. "Syngene International Share Price 757.7513.75 (1.85%)" - Syngene International shares have increased by 1.85%.
7. "Marke

In [31]:
summary = "\n\n".join(chunk_summary)

In [29]:
# print(summary)

In [32]:
# question = "Write a detailed report in Markdown from the given context."
question = """Write a detailed market news report in markdown format. Think carefully then write the report."""
response = ask_llm(summary, question)

In [33]:
import os

os.makedirs("data", exist_ok=True)

with open("data/report.md", "w") as f:
    f.write(response)

In [34]:
with open("data/summary.md", "w") as f:
    f.write(summary)