# 2. Technical Indicators and Correlation Analysis Summary

This notebook summarizes the technical indicators integration and correlation analysis for feature reduction.

## Overview
The dataset is enriched with technical indicators from Yahoo Finance, then analyzed for correlations to reduce dimensionality and multicollinearity.

## Data Sources
- **Cleaned Dataset**: `stock_data_cleaned_and_features.csv` - Processed data with engineered features
- **Technical Indicators**: `stock_data_with_technical_indicators.csv` - Data with technical indicators
- **Process**: Load → Merge → Null Analysis → Drop High Null Columns → Drop Null Rows → Eliminate Inf/NaN → Correlation Analysis → Feature Reduction


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")


## 1. Load and Merge Datasets


In [None]:
# Load cleaned dataset with engineered features
df_clean = pd.read_csv('stock_data_cleaned_and_features.csv')
print(f"Cleaned dataset shape: {df_clean.shape}")

# Load technical indicators dataset
df_indicators = pd.read_csv('stock_data_with_technical_indicators.csv')
print(f"Technical indicators dataset shape: {df_indicators.shape}")

# Merge datasets on ticker
df_merged = df_clean.merge(df_indicators, on='ticker', how='inner')
print(f"Merged dataset shape: {df_merged.shape}")

# Define technical indicators
technical_indicators = [
    'rsi', 'stoch', 'willr', 'roc', 'mom', 'rvi', 'cci',  # Momentum
    'ema', 'sma', 'wma', 'hma', 'tema', 'dema', 'vwma', 'kama', 'swma', 'fwma', 'hwma',  # Trend
    'atr', 'natr', 'bb_upper', 'bb_middle', 'bb_lower', 'bb_width',  # Volatility
    'mfi', 'obv', 'ad', 'cmf', 'vwap', 'pvt', 'eom', 'nvi'  # Volume
]

print(f"\nTechnical indicators available: {len(technical_indicators)}")
print(f"All columns in merged dataset: {list(df_merged.columns)}")


## 2. Null Value Analysis and Cleaning

### 2.1 Drop High Null Columns (>10%) and Null Rows

### 2.2 Eliminate Infinite and NaN Values
After dropping high null columns and null rows, we need to handle any remaining infinite values that may cause issues in correlation analysis and clustering.
