# Task 2: Analytical Chatbot using RAG

## Objective
The goal of this task is to build a **natural language analytical chatbot** that allows users to query financial market data and understand the **context behind stock price movements**.

The chatbot:
- Understands **user intent semantically**
- Identifies relevant **stock tickers**
- Retrieves **contextual market news and reports**
- Explains **price trends and movements** using Retrieval-Augmented Generation (RAG)

This system is designed to go beyond numerical outputs and provide **human-like financial reasoning**.


In [2]:
import getpass
import os

if "GROQ_API_KEY" not in os.environ:
    os.environ["GROQ_API_KEY"] = getpass.getpass("Enter API Key: ")

Enter API Key: ··········


In [3]:
!pip install -qU langchain-groq

In [4]:
!pip install -qU langchain-community
!pip install -qU langchain

In [5]:
!pip install -qU newspaper3k


In [6]:
!pip install -qU lxml_html_clean
from langchain_groq import ChatGroq
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document

import yfinance as yf
from newspaper import Article
import pandas as pd

In [7]:
!pip install -qU chromadb


## Retrieval-Augmented Generation (RAG) Pipeline

This chatbot uses a **Retrieval-Augmented Generation (RAG)** architecture.

### Why RAG?
Financial LLM responses must be:
- Fact-grounded
- Context-aware
- Explainable

RAG ensures the LLM does not hallucinate by retrieving **relevant external context** before generating responses.

### RAG Flow
1. User query is semantically embedded
2. Relevant financial news and reports are retrieved from a vector database
3. Retrieved context is injected into the LLM prompt
4. The LLM generates a grounded explanation

This improves **accuracy, relevance, and trustworthiness** of answers.


In [8]:
def get_vectorstore():
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2"
    )

    vectordb = Chroma(
        collection_name="financial_news",
        embedding_function=embeddings
    )
    return vectordb


## Data Sources Used

This system uses **unstructured financial news data** as its primary knowledge source.

### News Data
Financial news articles are scraped from reputable sources such as:
- CNBC
- Market-related public reports

These articles are embedded and stored in a vector database to provide **contextual explanations** for stock price movements.

> Note: The focus of this task is **contextual reasoning and explanation**, rather than numerical price prediction.


In [15]:
NEWS_URLS = [
    "https://www.cnbc.com/2026/01/27/apple-may-be-rangebound-post-earnings-use-options-to-wring-out-profits.html",
    "https://finance.yahoo.com/news/microsoft-maia-200-ai-chip-151114460.html"
  ]

def ingest_news():
    vectordb = get_vectorstore()
    docs = []

    for url in NEWS_URLS:
        article = Article(url)
        article.download()
        article.parse()

        docs.append(
            Document(
                page_content=article.text,
                metadata={"source": url}
            )
        )

    vectordb.add_documents(docs)
    return vectordb

In [16]:

vectordb = ingest_news()

## Intent Detection & Ticker Identification

The chatbot does not rely on keyword matching.

Instead, it:
- Semantically analyzes user queries
- Identifies relevant stock tickers (e.g., Apple → AAPL, Microsoft → MSFT)
- Determines whether the user intent is:
  - Event explanation (e.g., "Why did Apple stock drop?")
  - Trend analysis (e.g., "When did Microsoft go up?")

This enables flexible, natural language interaction.


In [43]:
def identify_ticker(query):
    mapping = {
        "apple": "AAPL",
        "microsoft": "MSFT",
        "google": "GOOGL",
        "amazon": "AMZN"
    }
    for k, v in mapping.items():
        if k in query.lower():
            return v
    return None


In [42]:
def detect_intent(query):
    q = query.lower()
    if "why" in q or "cause" in q:
        return "explanation"
    elif "when" in q or "date" in q:
        return "trend"
    else:
        return "summary"


## Agent-Based Architecture

The system follows an **agent-style architecture**:

### Research Agent
- Retrieves relevant news and reports
- Performs semantic search over vector database
- Supplies contextual evidence

### Analysis Agent
- Examines historical price movements
- Detects upward or downward trends
- Identifies significant time periods

### Explanation Agent (LLM)
- Combines numerical trends with retrieved news
- Generates user-friendly explanations
- Maintains conversational tone

This modular design improves **scalability and interpretability**.


In [44]:
def trend_analysis_agent(ticker):
    data = yf.download(ticker, period="6mo", progress=False)
    data["SMA_20"] = data["Close"].rolling(20).mean()
    data["SMA_50"] = data["Close"].rolling(50).mean()

    if data["SMA_20"].iloc[-1] > data["SMA_50"].iloc[-1]:
        return "Stock is in an overall uptrend over the last 6 months"
    else:
        return "Stock is in an overall downtrend over the last 6 months"


In [50]:
def research_agent(query, vectordb):
    retriever = vectordb.as_retriever(search_kwargs={"k": 3})
    docs = retriever.invoke(query)

    context = "\n\n".join(d.page_content[:700] for d in docs)
    sources = list(set(d.metadata.get("source", "Unknown") for d in docs))

    return context, sources

## Large Language Model (LLM) Selection

This project uses an **open-source LLM** accessed via the Groq API.

### Reasons for Model Choice:
- Fast inference suitable for real-time chat
- Strong reasoning ability for explanation tasks
- Cost-effective compared to proprietary APIs

The LLM is responsible for:
- Understanding user intent
- Generating natural language explanations
- Synthesizing numerical trends with textual context


In [51]:
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0.2
)

In [52]:
PROMPT = PromptTemplate(
    input_variables=["question", "ticker", "context", "trend"],
    template="""
You are a financial analyst AI.

User Question: {question}
Stock Ticker: {ticker}

Relevant News:
{context}

Trend Signals:
{trend}

Explain:
- What happened
- Why it happened
- Link news with price movement
- Simple language

Answer:
"""
)


## Agent-Based System Design

The chatbot follows a **lightweight agent architecture**, where responsibilities are modularized:

### Research Agent
- Retrieves relevant financial news
- Performs semantic similarity search using vector embeddings

### Reasoning Agent (LLM)
- Interprets user intent
- Connects retrieved news with market impact
- Generates human-readable explanations

This separation improves **clarity, scalability, and reasoning accuracy**.


In [53]:
def analytical_chatbot(query):
    ticker = identify_ticker(query)
    if not ticker:
        return "❌ Unable to identify stock ticker."

    intent = detect_intent(query)
    context, sources = research_agent(query, vectordb)
    trend = trend_analysis_agent(ticker)

    prompt = PROMPT.format(
        question=query,
        ticker=ticker,
        context=context,
        trend=trend
    )

    response = llm.invoke(prompt).content
    return response + "\n\nSources:\n" + "\n".join(sources)


## Example Queries & Demonstration

Below are example natural language queries demonstrating the chatbot’s ability to:
- Identify stock tickers
- Analyze historical trends
- Explain market movements using external context


In [54]:
analytical_chatbot("Why did Apple stock drop?")


  data = yf.download(ticker, period="6mo", progress=False)


"**What happened:**\nApple's stock (AAPL) has dropped by 15% in just one month. This significant decline has investors concerned.\n\n**Why it happened:**\nThere are a few reasons that contributed to this drop:\n\n1. **Earnings anticipation:** Apple is scheduled to report its earnings on January 29, after the market close. This event is causing traders to position for potential volatility, leading to increased options premiums.\n2. **Range-bound action:** Apple's stock has been trading in a well-defined range between $245 and $265. This range-bound action can lead to increased volatility and price fluctuations.\n3. **Technical setup:** The stock's technical setup, with support at $245 and resistance near $260 and $265, is also contributing to the price drop.\n\n**Linking news with price movement:**\nThe recent news about Microsoft's (MSFT) introduction of its Maia 200 AI chip and new partnerships might have contributed to the overall market sentiment, which could have negatively impacte

In [55]:
analytical_chatbot("When did Microsoft stock go up and why?")


  data = yf.download(ticker, period="6mo", progress=False)


"**What happened:** Microsoft's (MSFT) stock price went up by 5.7% over the past week and 98.6% over the past 3 years.\n\n**Why it happened:** The main reason for this increase is the introduction of Microsoft's new AI chip, Maia 200, which is designed to run large-scale inference workloads across its cloud and data centers. This move is intended to deepen Microsoft's role in AI infrastructure and practical enterprise automation. Additionally, the company is extending Maia-driven AI into real-world uses, such as robotics and embedded payments, which is expected to drive growth and increase the stock price.\n\n**Link news with price movement:** The news about Microsoft's Maia 200 launch and new partnerships is likely the reason for the stock price increase. The introduction of this new AI chip and its applications in various industries is expected to drive growth and increase the stock price.\n\n**Simple language:** Microsoft's stock price went up because the company introduced a new AI

## Limitations & Future Improvements

### Current Limitations
- Sentiment analysis is implicit rather than explicitly scored
- Trend detection uses basic price movement logic
- News sources are limited to the ingested dataset

### Future Improvements
- Integrate explicit sentiment scoring for prediction support
- Improve trend detection using statistical indicators
- Add real-time news scraping
- Extend agent orchestration for plotting and forecasting

## Conclusion
This analytical chatbot demonstrates how **RAG + LLMs** can transform raw financial data into **explainable, contextual insights**, enabling users to interact with markets using natural language.
