# Stock Price Prediction Across Market Sectors

This project applies machine learning to the problem of stock price prediction, with an emphasis on sector-level diversity and company-level representation. The analysis covers all 11 sectors defined by the Global Industry Classification Standard (GICS). For each sector, a leading stock has been selected from a predefined list of 22 well-established and widely traded companies.

The goal is to develop a generalizable and reproducible prediction pipeline, while gaining insight into the behavior of stocks across different industries. 

### GICS Sectors Covered:
- Information Technology  
- Health Care  
- Financials  
- Consumer Discretionary  
- Communication Services  
- Industrials  
- Consumer Staples  
- Energy  
- Utilities  
- Real Estate  
- Materials


### 📈 Dataset

This project uses historical daily stock price data downloaded using the [Yahoo Finance API](https://pypi.org/project/yfinance/). The dataset includes Adjusted Close, Open, High, Low, Volume, and Close prices.

We selected 22 companies across 11 sectors of the US stock market:

| Sector                    | Tickers         |
|--------------------------|-----------------|
| Information Technology   | AAPL, MSFT      |
| Health Care              | JNJ, UNH        |
| Financials               | JPM, BAC        |
| Consumer Discretionary   | AMZN, TSLA      |
| Communication Services   | GOOGL, META     |
| Industrials              | UNP, RTX        |
| Consumer Staples         | PG, KO          |
| Energy                   | XOM, CVX        |
| Utilities                | NEE, DUK        |
| Real Estate              | AMT, PLD        |
| Materials                | LIN, SHW        |

These companies were selected due to their market leadership, high liquidity, and rich historical data. They serve as strong representatives of their sectors and offer a diverse foundation for building and evaluating time series forecasting models.

Raw data is saved in `data/raw/` as individual CSV files.


In [10]:
# import necessary libraries
import yfinance as yf
import os
import pandas as pd

# Define your stock tickers by sector
us_market_sectors = {
    "Information Technology": ["AAPL", "MSFT"],
    "Health Care": ["JNJ", "UNH"],
    "Financials": ["JPM", "BAC"],
    "Consumer Discretionary": ["AMZN", "TSLA"],
    "Communication Services": ["GOOGL", "META"],
    "Industrials": ["UNP", "RTX"],
    "Consumer Staples": ["PG", "KO"],
    "Energy": ["XOM", "CVX"],
    "Utilities": ["NEE", "DUK"],
    "Real Estate": ["AMT", "PLD"],
    "Materials": ["LIN", "SHW"]
}

def download_data(tickers, start="2000-01-01", end="2025-05-05"):
    os.makedirs("data/raw", exist_ok=True)

    for sector, symbols in tickers.items():
        for symbol in symbols:
            print(f"Downloading {symbol} from {start} to {end}...")
            data = yf.download(symbol, start=start, end=end)
            file_path = f"data/raw/{symbol}.csv"
            data.to_csv(file_path)
            print(f"Saved {symbol} to {file_path}")

if __name__ == "__main__":
    download_data(us_market_sectors)

Downloading AAPL from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved AAPL to data/raw/AAPL.csv
Downloading MSFT from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved MSFT to data/raw/MSFT.csv
Downloading JNJ from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved JNJ to data/raw/JNJ.csv
Downloading UNH from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved UNH to data/raw/UNH.csv
Downloading JPM from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved JPM to data/raw/JPM.csv
Downloading BAC from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved BAC to data/raw/BAC.csv
Downloading AMZN from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved AMZN to data/raw/AMZN.csv
Downloading TSLA from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved TSLA to data/raw/TSLA.csv
Downloading GOOGL from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Saved GOOGL to data/raw/GOOGL.csv
Downloading META from 2000-01-01 to 2025-05-05...
Saved META to data/raw/META.csv
Downloading UNP from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved UNP to data/raw/UNP.csv
Downloading RTX from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved RTX to data/raw/RTX.csv
Downloading PG from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved PG to data/raw/PG.csv
Downloading KO from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved KO to data/raw/KO.csv
Downloading XOM from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved XOM to data/raw/XOM.csv
Downloading CVX from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved CVX to data/raw/CVX.csv
Downloading NEE from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved NEE to data/raw/NEE.csv
Downloading DUK from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved DUK to data/raw/DUK.csv
Downloading AMT from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved AMT to data/raw/AMT.csv
Downloading PLD from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved PLD to data/raw/PLD.csv
Downloading LIN from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed


Saved LIN to data/raw/LIN.csv
Downloading SHW from 2000-01-01 to 2025-05-05...


[*********************100%***********************]  1 of 1 completed

Saved SHW to data/raw/SHW.csv



