# TCS Stock Data — Exploratory Data Analysis (EDA)
**Internship Project | TCS Stock Data – Live and Latest**

This notebook performs a thorough EDA on TCS (Tata Consultancy Services) historical stock data.

### What we cover:
1. Load and inspect the dataset
2. Time-series plot of closing price
3. Volume, Dividends, and Stock Splits over time
4. Correlation heatmap
5. Scatter plot: Close vs Volume
6. Moving averages (30-day) and MA crossover (buy/sell signals)
7. Price distribution

---

## 0. Setup — Imports and Path Configuration

In [None]:
# ── Standard library ────────────────────────────────────────────────
import os
import sys
import warnings
warnings.filterwarnings('ignore')

# ── Add project root to sys.path so we can import from src/ ─────────
# The notebook lives in notebooks/, project root is one level up.
PROJECT_ROOT = os.path.dirname(os.getcwd())
if PROJECT_ROOT not in sys.path:
    sys.path.insert(0, PROJECT_ROOT)

# ── Data science libraries ──────────────────────────────────────────
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns

# ── Project modules ─────────────────────────────────────────────────
from src.data_loader import load_tcs_data, display_basic_info
from src.eda import (
    plot_close_price,
    plot_volume,
    plot_dividends_and_splits,
    plot_correlation_heatmap,
    plot_close_vs_volume,
    plot_moving_averages,
    plot_price_distribution,
)

# Consistent plot style
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print('✓ All imports successful!')
print(f'  Project root: {PROJECT_ROOT}')

---
## 1. Load the Dataset

In [None]:
# Load TCS historical stock data
# The CSV file must be placed at:  data/TCS_stock_history.csv
# If you haven't downloaded it yet, run: python src/download_latest_tcs_data.py
df = load_tcs_data()
print(f'\nDataset loaded: {df.shape[0]} rows × {df.shape[1]} columns')

---
## 2. Basic Inspection — Head, Info, Describe

In [None]:
# First 5 rows — check column names and sample values
print('── FIRST 5 ROWS ──')
df.head()

In [None]:
# Last 5 rows — verify most recent dates
print('── LAST 5 ROWS ──')
df.tail()

In [None]:
# Data types and null counts
print('── COLUMN INFO ──')
df.info()

In [None]:
# Summary statistics — mean, std, min, max, quartiles
print('── DESCRIPTIVE STATISTICS ──')
df.describe().round(2)

In [None]:
# Check for missing values
print('── NULL VALUE COUNTS ──')
null_summary = df.isnull().sum()
print(null_summary[null_summary > 0] if null_summary.sum() > 0 else 'No missing values — dataset is clean!')

---
## 3. Closing Price Over Time
> A time-series plot of daily closing prices shows the overall trend of TCS stock.

In [None]:
plot_close_price(df)

---
## 4. Trading Volume Over Time
> Volume spikes often coincide with significant news events or price movements.

In [None]:
plot_volume(df)

---
## 5. Dividends and Stock Splits
> Most days have zero dividends/splits. Non-zero events mark corporate actions.

In [None]:
plot_dividends_and_splits(df)

---
## 6. Correlation Heatmap
> High correlation between Open, High, Low, Close is expected — they are all prices of the same stock.

In [None]:
plot_correlation_heatmap(df)

---
## 7. Scatter Plot — Close vs Volume
> Helps explore whether higher volume days tend to have different price levels.

In [None]:
plot_close_vs_volume(df)

---
## 8. Moving Averages & Buy/Sell Signal Detection
> The moving average crossover strategy:
> - **Buy signal (↑)**: When the short-term MA (5-day) crosses above the long-term MA (30-day).
> - **Sell signal (↓)**: When the short-term MA crosses below the long-term MA.

In [None]:
# Moving average crossover — 5-day (short) vs 30-day (long)
plot_moving_averages(df, short_window=5, long_window=30)

In [None]:
# Plot: 30-day MA vs actual Close (for a smoother trend view)
df_ma = df.copy()
df_ma['MA_30'] = df_ma['Close'].rolling(window=30).mean()

fig, ax = plt.subplots(figsize=(14, 5))
ax.plot(df_ma['Date'], df_ma['Close'],  color='#2196F3', alpha=0.7, linewidth=1.0, label='Close Price')
ax.plot(df_ma['Date'], df_ma['MA_30'],  color='#FF9800', linewidth=2.0,            label='30-Day MA')
ax.set_title('TCS Stock – Close Price with 30-Day Moving Average', fontsize=14, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Price (INR)')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
fig.autofmt_xdate()
ax.legend()
plt.tight_layout()
plt.show()

---
## 9. Price Distribution
> Understanding the distribution shape helps identify if the price series is normally distributed or skewed.

In [None]:
plot_price_distribution(df)

---
## 10. EDA Summary

| Observation | Finding |
|---|---|
| Overall trend | TCS stock has shown strong long-term growth |
| Volume | Spikes correspond to key market events |
| Correlation | Open, High, Low, Close are highly correlated (>0.99) |
| Moving Averages | Crossover signals help identify trend shifts |
| Dividends | Periodic dividend payouts are visible as discrete events |

➡️ **Next step:** Open `02_ml_model_tcs_stock.ipynb` to train machine learning models.