# FinAgentX — Multi-Agent Financial Analysis System

**Author:** Soura Biswal 
**Course:** Natural Language Processing & Generative AI - University of San Diego  
**Deliverable:** Final Project — Code Notebook  
**Date:** September 2, 2025 to October 20, 2025
**Github repo link:** https://github.com/BiswalSoura/FinAgentX-Multi-Agent-AI-System.git
---

**Short summary:**  
FinAgentX is a modular, multi-agent pipeline that ingests financial news and market data, classifies and extracts key facts (EPS, revenue), summarizes articles, routes them to specialist agents (earnings & macro), evaluates outputs, and refines the results. The notebook demonstrates the full system end-to-end using lightweight, free tools (Python, yfinance, and a small Kaggle-derived dataset).

## Table of contents

1. Project summary & objectives  
2. Dataset description  
3. Architecture & agent design  
4. Environment & how to run (reproducibility)  
5. Data fetch & samples (prices + news)  
6. Pipeline demo — process articles end-to-end  
7. Routing & specialist agents demonstration  
8. Evaluator & Optimizer demonstration  
9. Memory & learning across runs  
10. Limitations, ethical notes & AI disclosure  
11. How to run (quick start) & submission artifacts


## 1. Project summary & objectives

**Objective:** Build an agentic system that can research stocks and provide structured, explainable research notes by combining news and market signals.  
**Primary learning goals:** agent orchestration, prompt-chaining workflows, routing to specialists, evaluator→optimizer loops, and simple persistent memory.

**Deliverables demonstrated in this notebook:**
- Data ingestion (prices + news)
- Prompt chaining (preprocess → classify → extract → summarize)
- Routing to EarningsAgent and MarketAgent
- Evaluator + Optimizer loop
- Memory persistence and a final human-readable research note


## 2. Dataset description

**Main dataset used:** `Fin_Cleaned.csv`  
**Source:** Kaggle – Financial News Sentiment Dataset (Lightweight Version)  
**Size:** Approximately ~15MB  
**Variables:**  
- `id`: Unique identifier for news entry  
- `title`: News headline  
- `content`: Full news article text  
- `sentiment`: Label such as Positive, Negative, Neutral  
- `ticker`: Related company (e.g., AAPL, TSLA)

**Sample Data Stored:**  
- `data/sample/` → 5–10 news articles per ticker (AAPL, TSLA)  
- `data/sample/*.csv` → price snapshots from `yfinance`  
- `data/memory/memory.json` → stores model run history and evaluation feedback

The data has been simplified for CPU usage (no large-scale model training).  
This setup demonstrates the **architecture and reasoning pipeline** without requiring heavy GPUs.


## 3. Architecture & agent design

**High-level flow:**

1. **Fetcher**: Collects price data (yfinance) and sample news (local JSON from `data/sample/`).  
2. **Preprocessor**: cleans text, normalizes dates.  
3. **Classifier**: rule-based classifier (earnings / macro / news).  
4. **Extractor**: regex rules to extract EPS, revenue, and presence flags.  
5. **Summarizer**: templates that produce short and long summaries + sentiment.  
6. **Router**: routes article to specialized agents (earnings, market).  
7. **Specialists**: context-specific logic (e.g., earnings compare).  
8. **Evaluator**: heuristic scoring (factuality, completeness, clarity).  
9. **Optimizer**: one-step refinement based on feedback.  
10. **Memory store**: JSON file storing per-run notes for learning across runs.

**Design rationale:** keep agents small and testable, prefer deterministic heuristics for reproducibility for coursework.


## 4. Environment & reproduction

**Python version:** 3.10+ recommended

**Install dependencies (from repo root):**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# macOS / Linux
# source venv/bin/activate
pip install -r requirements.txt


In [1]:
import pandas as pd
data = pd.read_csv("../data/Fin_Cleaned.csv")
data.head(5)


Unnamed: 0,Date_published,Headline,Synopsis,Full_text,Final Status
0,2022-06-21,"Banks holding on to subsidy share, say payment...",The companies have written to the National Pay...,ReutersPayments companies and banks are at log...,Negative
1,2022-04-19,Digitally ready Bank of Baroda aims to click o...,"At present, 50% of the bank's retail loans are...",AgenciesThe bank presently has 20 million acti...,Positive
2,2022-05-27,Karnataka attracted investment commitment of R...,Karnataka is at the forefront in attracting in...,PTIKarnataka Chief Minister Basavaraj Bommai.K...,Positive
3,2022-04-06,Splitting of provident fund accounts may be de...,The EPFO is likely to split accounts only at t...,Getty ImagesThe budget for FY22 had imposed in...,Negative
4,2022-06-14,Irdai weighs proposal to privatise Insurance I...,"Set up in 2009 as an advisory body, IIB collec...",AgenciesThere is a view in the insurance indus...,Positive


In [3]:
import glob

# pandas (pd) is already imported in a previous cell; reuse it here.
files = glob.glob("../data/sample/AAPL_prices_1mo_*.csv")
if not files:
	raise FileNotFoundError("No files matched pattern ../data/sample/AAPL_prices_1mo_*.csv. Check the path and filename pattern.")

# read all matched CSVs and concatenate them
df_list = [pd.read_csv(f, on_bad_lines='skip') for f in files]
apple = pd.concat(df_list, ignore_index=True)

apple.tail(3)


Unnamed: 0,Date,Open,High,Low,Close,Volume,Dividends,Stock Splits
18,2025-10-09 00:00:00-04:00,257.809998,258.0,253.139999,254.039993,38322000,0.0,0.0
19,2025-10-10 00:00:00-04:00,254.940002,256.380005,244.0,245.270004,61999100,0.0,0.0
20,2025-10-13 00:00:00-04:00,249.380005,249.690002,245.559998,247.660004,38104400,0.0,0.0


In [5]:
import sys
from pathlib import Path
import json

# ensure project root (parent of the notebooks folder) is on sys.path
proj_root = Path.cwd().parent
if str(proj_root) not in sys.path:
    sys.path.insert(0, str(proj_root))

from src.orchestrator import run_articles

with open("../data/sample/news_AAPL.json", "r", encoding="utf8") as f:
    articles = json.load(f)

results = run_articles(articles)
# safe display: handle empty results
(len(results), list(results[0].keys()) if results else (0, None))


Saved end-to-end results to D:\University of San Diego\Natural Language processing and GenAI\FinAgentX Multi Agent AI System\FinAgentX-Multi-Agent-AI-System\data\sample\out\analysis_end2end.json


(5,
 ['title',
  'type',
  'confidence',
  'extracted',
  'summary_short',
  'summary_long',
  'sentiment',
  'specialist',
  'evaluation'])

In [6]:
import pprint
pprint.pprint(results[0])


{'confidence': 0.7,
 'evaluation': {'feedback': 'OK', 'score': 1.0},
 'extracted': {'mentions_beat': False, 'mentions_miss': False},
 'sentiment': 'neutral',
 'specialist': {'conclusion': 'info', 'note': 'general news'},
 'summary_long': 'eMudhra: Should you exit stock after decent listing or hold '
                 'for long term? — Key figures not found. Sentiment: neutral. '
                 'Full analysis: Getty ImagesAgainst the issue price of Rs '
                 '256, shares of eMudhra listed at a premium of 6 per cent at '
                 'Rs 271 on BSE and a premium of 5 per cent at Rs 270 on the '
                 'National Stock Exchange (NSE).RelatedStocks in the news: '
                 'eMudhra, Bharat Dynamics, HFCL, GOCL Corp and BataD-Street '
                 'debut: eMudhra lists at 6% premium on BSEeMudhra to have a '
                 "soft debut on Dalal Street? Here's what grey market is "
                 'signalingContrary to market expectations, eMudhra debut

In [7]:
sample_eval = results[0]['evaluation']
print("Before refinement:", sample_eval)

if 'post_refine' in sample_eval:
    print("After refinement:", sample_eval['post_refine'])


Before refinement: {'score': 1.0, 'feedback': 'OK'}


In [8]:
import json
with open("../data/memory/memory.json","r",encoding="utf8") as f:
    mem = json.load(f)
print("Previous run memory:")
print(json.dumps(mem['runs'][:1], indent=2))


Previous run memory:
[
  {
    "title": "eMudhra: Should you exit stock after decent listing or hold for long term?",
    "type": "news",
    "confidence": 0.7,
    "extracted": {
      "mentions_beat": false,
      "mentions_miss": false
    },
    "summary_short": "eMudhra: Should you exit stock after decent listing or hold for long term? \u2014 Key figures not found. Sentiment: neutral.",
    "summary_long": "eMudhra: Should you exit stock after decent listing or hold for long term? \u2014 Key figures not found. Sentiment: neutral. Full analysis: Getty ImagesAgainst the issue price of Rs 256, shares of eMudhra listed at a premium of 6 per cent at Rs 271 on BSE and a premium of 5 per cent at Rs 270 on the National Stock Exchange (NSE).RelatedStocks in the news: eMudhra, Bharat Dynamics, HFCL, GOCL Corp and BataD-Street debut: eMudhra lists at 6% premium on BSEeMudhra to have a soft debut on Dalal Street? Here's what grey market is signalingContrary to market expectations, eMudhra deb

# Discussion & Limitations

This system demonstrates:
- Agent-based modular reasoning flow
- Extraction and sentiment logic for financial articles
- Iterative evaluation and feedback loop (Memory)
- Explainable decision structure using simple heuristics

**Limitations:**
- No real-time trading or advice generation
- Sentiment and extraction are rule-based (no deep LLM)
- Context awareness limited to single-article scope
- Future improvement: integrate OpenAI GPT or local LLM for summarization and reasoning

**Future Work:**
- Integrate LangChain agents for advanced NLP
- Add Streamlit dashboard for visualization
- Connect to live stock data with auto-refresh for real-time insights


# AI and Data Disclosure

- Tools used: `pandas`, `yfinance`, `beautifulsoup4`
- Dataset: Financial News Dataset (Kaggle)
- No live financial predictions made
- No LLMs used in automated decision-making (all heuristic)



## 11. How to run & deliverables

**To reproduce:**
1. Create venv and install requirements (see earlier)
2. Ensure `data/Fin_Cleaned.csv` exists and `data/sample/` contains price CSVs and news JSON files
3. Open `notebooks/final_notebook.ipynb` and run cells top-to-bottom

**Deliverables in repo:**
- `notebooks/final_notebook.ipynb`
- `notebooks/final_notebook.html` (export)
- `data/sample/out/analysis_end2end.json`
- `data/memory/memory.json`
- `documentation/quick_start.md`
- `README.md`
