# 🌳📈 Stock Price Prediction Across Market Sectors

This project applies machine learning to the problem of stock price prediction, with an emphasis on sector-level diversity and company-level representation. The analysis covers all 11 sectors defined by the Global Industry Classification Standard (GICS). For each sector, a leading stock has been selected from a predefined list of 22 well-established and widely traded companies.

The goal is to develop a generalizable and reproducible prediction pipeline, while gaining insight into the behavior of stocks across different industries. 

### GICS Sectors Covered:
- Information Technology  
- Health Care  
- Financials  
- Consumer Discretionary  
- Communication Services  
- Industrials  
- Consumer Staples  
- Energy  
- Utilities  
- Real Estate  
- Materials

### Here we explore using Tree-based Models & Feature Importance

## 🏗️ 1. Project Overview

In this notebook, we explore the problem of predicting future stock price returns using machine learning.

We will:

- Load & merge historical stock data across 11 GICS sectors.
- Prepare and engineer features (~200+ indicators).
- Train tree-based regression models (Random Forest, LightGBM).
- Analyze feature importances to understand key drivers.
- Evaluate model performance.

## 🔄 2. Load and Merge Data

In [None]:
import pandas as pd
from pathlib import Path

# Define path to data
data_path = Path("/Users/beatawyspianska/Desktop/AIML_Projects/predict_stock_price/stock-price-predictor/data/merged")

# Load all CSV files and combine into a single DataFrame
dfs = []
for file in data_path.glob("*.csv"):
    ticker = file.stem
    df = pd.read_csv(file, parse_dates=['Date'])
    df['TICKER'] = ticker
    dfs.append(df)

# Merge into single DataFrame
df = pd.concat(dfs, ignore_index=True)

# Sanity check
print(f"✅ Loaded data shape: {df.shape}")
df.head()
