# Stock Price Prediction Across Market Sectors

This project applies machine learning to the problem of stock price prediction, with an emphasis on sector-level diversity and company-level representation. The analysis covers all 11 sectors defined by the Global Industry Classification Standard (GICS). For each sector, a leading stock has been selected from a predefined list of 22 well-established and widely traded companies.

The goal is to develop a generalizable and reproducible prediction pipeline, while gaining insight into the behavior of stocks across different industries. 

### GICS Sectors Covered:
- Information Technology  
- Health Care  
- Financials  
- Consumer Discretionary  
- Communication Services  
- Industrials  
- Consumer Staples  
- Energy  
- Utilities  
- Real Estate  
- Materials


### 📈 Dataset

This project uses historical daily stock price data downloaded using the [Yahoo Finance API](https://pypi.org/project/yfinance/). The dataset includes Adjusted Close, Open, High, Low, Volume, and Close prices.

We selected 22 companies across 11 sectors of the US stock market:

| Sector                    | Tickers         |
|--------------------------|-----------------|
| Information Technology   | AAPL, MSFT      |
| Health Care              | JNJ, UNH        |
| Financials               | JPM, BAC        |
| Consumer Discretionary   | AMZN, TSLA      |
| Communication Services   | GOOGL, META     |
| Industrials              | UNP, RTX        |
| Consumer Staples         | PG, KO          |
| Energy                   | XOM, CVX        |
| Utilities                | NEE, DUK        |
| Real Estate              | AMT, PLD        |
| Materials                | LIN, SHW        |

These companies were selected due to their market leadership, high liquidity, and rich historical data. They serve as strong representatives of their sectors and offer a diverse foundation for building and evaluating time series forecasting models.

Raw data is saved in `data/raw/` as individual CSV files.


### Basic Feature Engineering: O

In [28]:
import pandas as pd
import os

# Define the directory where your modified files are located
directory = '/Users/beatawyspianska/Desktop/AIML_Projects/predict_stock_price/stock-price-predictor/data/raw/modified'  # Adjust to your directory path

# List all files in the directory
files = [f for f in os.listdir(directory) if f.endswith('.csv')]  # Adjust extension if needed

# Loop through each file in the directory
for file in files:
    # Construct the full file path
    file_path = os.path.join(directory, file)
    
    # Load the stock data
    df = pd.read_csv(file_path, index_col=0)  # Assuming the first column is the index (Date)
    
    # Rename 'Price' column to 'Date' (if 'Price' exists)
    if 'Price' in df.columns:
        df = df.rename(columns={'Price': 'Date'})
    
    # Save the modified DataFrame back to the same file (overwrite)
    df.to_csv(file_path, index=False)  # Save with the 'Date' column as regular, not as index
