
# 📊 Stock Market Crash Prediction Dashboard

This project is a stock market crash prediction system using:
- Nifty 50 historical prices
- Global market correlations
- Financial news sentiment analysis (FinBERT)
- Crash risk indicators

**Data Sources:** Yahoo Finance, HuggingFace Transformers (FinBERT)

---


## 1. Setup and Install Required Libraries

In [30]:

# Install necessary packages if not already installed
!pip install yfinance pandas plotly transformers torch





[notice] A new release of pip is available: 23.2.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [31]:

# --- Keep output clean by hiding FutureWarnings ---
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


In [32]:

# Import libraries
import yfinance as yf
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from transformers import pipeline
import numpy as np


## 2. Data Collection

In [33]:

# Define time range
start_date = "2018-01-01"
end_date = "2025-08-12"

# Download Nifty 50 data
nifty_data = yf.download("^NSEI", start=start_date, end=end_date)

# Download India VIX
vix_data = yf.download("^INDIAVIX", start=start_date, end=end_date)

# Download some global indices for correlation
global_indices = {
    "S&P 500": "^GSPC",
    "FTSE 100": "^FTSE",
    "Hang Seng": "^HSI"
}

global_data = pd.DataFrame()
for name, ticker in global_indices.items():
    temp = yf.download(ticker, start=start_date, end=end_date)["Close"]
    global_data[name] = temp

nifty_data.head()


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,^NSEI,^NSEI,^NSEI,^NSEI,^NSEI
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2018-01-02,10442.200195,10495.200195,10404.650391,10477.549805,153400
2018-01-03,10443.200195,10503.599609,10429.549805,10482.650391,167300
2018-01-04,10504.799805,10513.0,10441.450195,10469.400391,174900
2018-01-05,10558.849609,10566.099609,10520.099609,10534.25,180900
2018-01-08,10623.599609,10631.200195,10588.549805,10591.700195,169000


In [34]:
print("Data shape:", merged.shape)


Data shape: (1881, 2)


## 3. Global Market Correlation

In [35]:

# Combine Nifty and global data
correlation_df = pd.concat([nifty_data["Close"], global_data], axis=1).dropna()
correlation_df.columns = ["Nifty 50"] + list(global_indices.keys())

# Calculate correlation
corr_matrix = correlation_df.corr()
px.imshow(corr_matrix, text_auto=True, title="Global Market Correlation Heatmap").show()


## 4. News Sentiment Analysis

In [36]:

# Load FinBERT sentiment analysis model
sentiment_analyzer = pipeline("sentiment-analysis", model="yiyanghkust/finbert-tone")

# Example headlines
headlines = [
    "Nifty plunges amid global economic uncertainty",
    "Indian economy shows promising growth",
    "Foreign investment outflows raise concerns"
]

# Analyze sentiment
sentiments = [sentiment_analyzer(headline)[0] for headline in headlines]
pd.DataFrame(sentiments, index=headlines)


Device set to use cpu


Unnamed: 0,label,score
Nifty plunges amid global economic uncertainty,Negative,0.999979
Indian economy shows promising growth,Positive,1.0
Foreign investment outflows raise concerns,Negative,0.999908


## 5. Crash Risk Indicator

In [37]:
# Get the latest VIX value as a number (float)
latest_vix = float(vix_data["Close"].iloc[-1])

# Simple crash risk calculation
if latest_vix > 25:
    risk_level = "High Risk"
elif latest_vix > 15:
    risk_level = "Moderate Risk"
else:
    risk_level = "Low Risk"

print(f"Latest VIX: {latest_vix:.2f} → Risk Level: {risk_level}")


Latest VIX: 12.22 → Risk Level: Low Risk


In [38]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)


## 6. Summary and Next Steps


**Findings:**
- Nifty trend and volatility are closely linked.
- Certain sectors have outperformed others significantly.
- Global markets show correlation with Nifty — especially S&P 500.
- Sentiment analysis can give hints about market mood.
- India VIX can be used as a quick crash risk indicator.

**Next Steps:**
- Automate data updates daily.
- Use more news sources for sentiment analysis.
- Deploy as a Streamlit or Dash app for real-time monitoring.


## 7. Make It Automation-Ready  _(added 2025-08-14 13:57)_

Below we refactor the notebook into **simple functions**.  
This makes it easy to schedule a daily run (Windows Task Scheduler / cron) and to reuse the logic inside a Streamlit app.


In [39]:

# --- Simple utility functions ---

from typing import Dict, Tuple

def fetch_series(ticker: str, start: str, end: str) -> pd.DataFrame:
    """Download OHLCV for a single ticker."""
    data = yf.download(ticker, start=start, end=end, progress=False)
    return data

def fetch_multiple(tickers: Dict[str, str], start: str, end: str) -> Dict[str, pd.DataFrame]:
    """Download multiple tickers into a dict of DataFrames."""
    out = {}
    for name, tkr in tickers.items():
        out[name] = yf.download(tkr, start=start, end=end, progress=False)
    return out

def calc_drawdown(close: pd.Series) -> pd.Series:
    """Compute drawdown series from a price series."""
    roll_max = close.cummax()
    dd = close / roll_max - 1.0
    return dd

def simple_vix_risk(vix_latest: float) -> str:
    """Beginner-friendly VIX risk rule-of-thumb."""
    if vix_latest > 25:
        return "High Risk"
    elif vix_latest > 15:
        return "Moderate Risk"
    else:
        return "Low Risk"

def compute_global_correlation(nifty_close: pd.Series, global_closes: pd.DataFrame) -> pd.DataFrame:
    df = pd.concat([nifty_close.rename("Nifty 50"), global_closes], axis=1).dropna()
    return df.corr()


## 8. Collect More News Headlines for Sentiment

In [40]:

# --- Collect news headlines from multiple easy sources ---
# Note: These will only work when you have internet access.
# For Google News RSS we keep it simple and parse the XML by hand to avoid extra libraries.

import datetime as dt
import re
import requests

def get_google_news_rss(query: str, max_items: int = 30) -> pd.DataFrame:
    """Fetch headlines from Google News RSS for a query."""
    url = f"https://news.google.com/rss/search?q={query}+when:7d&hl=en-IN&gl=IN&ceid=IN:en"
    r = requests.get(url, timeout=10)
    r.raise_for_status()
    # Very basic parsing
    titles = re.findall(r'<title>(.*?)</title>', r.text)
    pub_dates = re.findall(r'<pubDate>(.*?)</pubDate>', r.text)
    # First title is usually the feed title; skip it
    titles = titles[1:]
    rows = []
    for i, t in enumerate(titles[:max_items]):
        d = pub_dates[i] if i < len(pub_dates) else ""
        rows.append({"date": d, "headline": t})
    df = pd.DataFrame(rows)
    # Parse date if possible
    with warnings.catch_warnings():
        warnings.simplefilter('ignore')
        df["date"] = pd.to_datetime(df["date"], errors="coerce").dt.date
    return df.dropna(subset=["headline"])

def get_yfinance_news(ticker: str, max_items: int = 30) -> pd.DataFrame:
    """Try to use yfinance's .news attribute (not always available)."""
    try:
        tk = yf.Ticker(ticker)
        items = tk.news or []
    except Exception:
        items = []
    rows = []
    for item in items[:max_items]:
        title = item.get("title")
        providerPublishTime = item.get("providerPublishTime")
        date = None
        if providerPublishTime:
            try:
                date = dt.datetime.fromtimestamp(providerPublishTime).date()
            except Exception:
                date = None
        rows.append({"date": date, "headline": title})
    return pd.DataFrame(rows).dropna(subset=["headline"])

# Build one headlines DataFrame for sentiment
google_df = get_google_news_rss("Nifty 50 OR India stock market", max_items=50)
yfin_df = get_yfinance_news("^NSEI", max_items=50)
all_headlines = pd.concat([google_df, yfin_df], ignore_index=True).drop_duplicates().dropna(subset=["headline"])

print(f"Collected {len(all_headlines)} headlines for sentiment.")
all_headlines.head()


Collected 50 headlines for sentiment.


Unnamed: 0,date,headline
0,2025-08-11,Google News
1,2025-08-14,"Nifty 50, Sensex today: What to expect from In..."
2,2025-08-14,"Stock Market close highlights: Sensex, Nifty e..."
3,2025-08-13,"Closing Bell: Nifty above 24,600, Sensex flat;..."
4,2025-08-13,"Stock Market Highlights: Sensex, Nifty 50 rega..."


## 9. Sentiment Trend (FinBERT) Using Collected Headlines

In [41]:

# Use FinBERT if available; otherwise fall back to a simple keyword heuristic.

try:
    from transformers import pipeline
    _sent_ok = True
except Exception as e:
    print("transformers not available, using simple fallback sentiment.", e)
    _sent_ok = False

def score_with_fallback(text: str) -> float:
    text_low = text.lower()
    neg_words = ["plunge", "fall", "uncertain", "concern", "weak", "loss", "fear"]
    pos_words = ["rise", "growth", "gain", "strong", "beat", "optimism", "upbeat"]
    score = 0.0
    for w in neg_words:
        if w in text_low:
            score -= 0.5
    for w in pos_words:
        if w in text_low:
            score += 0.5
    return score

if _sent_ok:
    finbert = pipeline("sentiment-analysis", model="yiyanghkust/finbert-tone")

def compute_sentiment_df(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    if _sent_ok:
        scores = []
        for h in df["headline"].astype(str).tolist():
            res = finbert(h)[0]
            label = res.get("label", "neutral").lower()
            conf = res.get("score", 1.0)
            if "pos" in label:
                val = +conf
            elif "neg" in label:
                val = -conf
            else:
                val = 0.0
            scores.append(val)
        df["sentiment_score"] = scores
    else:
        df["sentiment_score"] = df["headline"].astype(str).apply(score_with_fallback)
    # Group by date for a daily average score
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    daily = df.dropna(subset=["date"]).groupby(df["date"].dt.date)["sentiment_score"].mean()
    daily = daily.sort_index()
    return daily.to_frame(name="sentiment_score")

daily_sentiment = compute_sentiment_df(all_headlines)
fig = px.line(daily_sentiment, title="Daily News Sentiment (average)")
fig.update_layout(xaxis_title="Date", yaxis_title="Sentiment Score")
fig.show()


Device set to use cpu


## 10. Crash Risk (Daily)

In [42]:

# Use the latest VIX to compute a very simple risk label
latest_vix_value = float(vix_data["Close"].iloc[-1])
risk_label = simple_vix_risk(latest_vix_value)
print(f"Latest India VIX: {latest_vix_value:.2f} -> Crash Risk: {risk_label}")


Latest India VIX: 12.22 -> Crash Risk: Low Risk
