# Data Exploration

## 1. Imports

In [1]:
import pandas as pd
import hvplot.pandas # Auto-registers hvplot() on DataFrames
import holoviews as hv
import panel as pn
import os

hv.extension('bokeh')
pn.extension()

## 2. Load Stock Data

In [2]:
DATA_PATH = "../src/data/raw/HDFCBANK_NS_1y_1d.csv"
df = pd.read_csv(DATA_PATH, parse_dates=['Date'])
df = df.iloc[1:] # Drop first row contianing stock name
df.set_index("Date", inplace=True)
df = df.sort_index()
df.head()

Unnamed: 0_level_0,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2024-08-05,1598.16845703125,1628.0892961618056,1583.5789569593262,1622.2534719846317,20666817
2024-08-06,1583.7767333984375,1615.1812505223595,1575.9627926604342,1612.2633988029302,22558025
2024-08-07,1605.8341064453125,1612.263377582913,1584.7658311899052,1599.404835307712,21173132
2024-08-08,1624.8251953125,1635.3099498365148,1601.6304107349945,1606.3287244302185,16988475
2024-08-09,1632.2435302734375,1644.45924206239,1627.8915048541417,1634.0239921209834,13322309


## 3. Data Cleaning

In [3]:
df[df.columns].info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 251 entries, 2024-08-05 to 2025-08-05
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Close   251 non-null    object
 1   High    251 non-null    object
 2   Low     251 non-null    object
 3   Open    251 non-null    object
 4   Volume  251 non-null    object
dtypes: object(5)
memory usage: 11.8+ KB


In [4]:
columns = list(df.columns)

for col in columns:
    df[col] = (
        df[col]
        .astype(str)  # Ensures it's a string
        .str.replace(",", "")  # Remove commas
        .astype(float)
    )

df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 251 entries, 2024-08-05 to 2025-08-05
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Close   251 non-null    float64
 1   High    251 non-null    float64
 2   Low     251 non-null    float64
 3   Open    251 non-null    float64
 4   Volume  251 non-null    float64
dtypes: float64(5)
memory usage: 11.8 KB


## 4. Plot Close Price

### 📈 HDFCBANK - Closing Price Over Time

This plot shows the stock's **closing price over time**, helping you observe the overall trend — whether it's upward, downward, or volatile. It's the most fundamental view in any technical or fundamental analysis.

Use this to:
- Understand general stock performance
- Spot recent dips or rallies

In [5]:
df["Close"].hvplot(title="HDFCBANK Closing Price", height=400, width=700)

## 5. Add Moving Averages

### 🔄 Simple Moving Averages (SMA-10 and SMA-30)

This plot overlays **short-term (10-day)** and **long-term (30-day)** simple moving averages on the stock's price.

Use this to:
- Identify trend direction and momentum
- Detect potential buy/sell signals based on crossover points

💡 When SMA-10 crosses above SMA-30 → Bullish Signal  
💡 When SMA-10 falls below SMA-30 → Bearish Signal

In [6]:
df["SMA_10"] = df["Close"].rolling(window=10).mean()
df["SMA_30"] = df["Close"].rolling(window=30).mean()

df[["Close", "SMA_10", "SMA_30"]].hvplot(
    title="HDFCBANK - Close Price with SMA 10 & 30",
    ylabel="Price",
    width=800,
    height=400
)

## 6. Daily Returns & Volatility

### 📉 Daily Return (%) and Volatility

- **Daily Return:** Percentage change in closing price from one day to the next  
- **Volatility:** Rolling 10-day standard deviation of daily returns

Use this to:
- Measure short-term risk and reward
- Identify periods of high price swings (which may indicate uncertainty or opportunity)


In [7]:
df["Daily Return"] = df["Close"].pct_change() * 100
df["Volatility"] = df["Daily Return"].rolling(window=10).std()

df[["Daily Return", "Volatility"]].hvplot(
    subplots=True, 
    shared_axes=False,
    height=300, 
    width=700, 
    title="Returns & Volatility"
)

## 7. Volume Spikes

### 📦 Daily Volume Traded

This bar chart represents how many shares of HDFCBANK were traded each day.

Use this to:
- Spot sudden volume spikes (which often follow news or institutional buying)
- Confirm price trends (volume supports the direction of price movement)


In [8]:
df["Volume"].hvplot.bar(
    title="HDFCBANK Volume Traded",
    ylabel="Volume",
    height=300,
    width=700
)

## 8. Interactive Panel

### 🎚️ Interactive SMA Visualizer

Use the slider to change the window size of the moving average dynamically.  
It helps you explore how different durations (e.g., 5-day vs 50-day) affect the smoothness and timing of trend detection.

Use this to:
- Tune your investment strategy based on your timeframe
- Compare short-term vs long-term trend sensitivity


In [9]:
def price_plot(window=10):
    df["SMA"] = df["Close"].rolling(window=window).mean()
    return df[["Close", "SMA"]].hvplot(title=f"Close Price + SMA({window})")

pn.interact(price_plot, window=(5, 50, 5))