# üìä Stock Performance Data Collection with yFinance

This notebook collects historical stock price data for selected S&P 500 companies using the **yfinance** API.
The goal is to build a clean, structured dataset that can later be used for:
- Financial analysis
- Visualization in Excel or Tableau
- Return and volatility calculations
- Event-based analysis

**Time period:** January 1, 2025 to July 1, 2025  (First Half of 2025)
**Sectors covered:** Technology and Banking

I used:
- **yfinance** to fetch stock market data  
- **pandas** for data manipulation  
- **datetime** to define the date range dynamically

In [1]:

import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta


Each ticker represents a publicly traded firm.
These companies were selected to compare **tech vs financial institutions**.

- Apple (AAPL)
- NVIDIA (NVDA)
- JPMorgan Chase (JPM)
- Bank of America (BAC)
- Goldman Sachs (GS)
- Morgan Stanley (MS)
- Citigroup (C)
- Wells Fargo (WFC)

In [None]:
tickers = ['AAPL', 'NVDA', 'JPM', 'BAC', 'GS', 'MS', 'C', 'WFC']

I then analyzed the **180 trading days** leading up to **July 1, 2025**.
This captures recent market behavior while avoiding short-term noise.

In [None]:
end_date = datetime(2025, 7, 1)
start_date = end_date - timedelta(days=180)


Used a list to temporarily store each stock's DataFrame.
Later, all datasets will be combined into a single master table.


In [None]:

all_data_list = []



For each ticker:
- Download historical prices
- Reset the index so Date becomes a column
- Ensure consistent column structure
- Add a Ticker column for identification

This step prevents downstream issues during visualization or analysis.


In [None]:

for ticker in tickers:
    try:
        data = yf.download(
            ticker,
            start=start_date,
            end=end_date,
            auto_adjust=True
        )

        data['Ticker'] = ticker
        data.reset_index(inplace=True)

        if 'Adj Close' not in data.columns:
            data['Adj Close'] = data['Close']

        required_columns = [
            'Date', 'Open', 'High', 'Low',
            'Close', 'Adj Close', 'Volume', 'Ticker'
        ]

        for col in required_columns:
            if col not in data.columns:
                if col == 'Adj Close':
                    data[col] = data['Close']
                else:
                    data[col] = 0

        data = data[required_columns]
        all_data_list.append(data)

        print(f"‚úÖ Downloaded data for {ticker}")
    except Exception as e:
        print(f"‚ùå Error downloading {ticker}: {e}")




All individual stock tables are merged into a single DataFrame.
This format is ideal for:
- Tableau dashboards
- Excel pivot tables
- Python-based analysis


In [None]:

if all_data_list:
    all_data = pd.concat(all_data_list, ignore_index=True)

    print("Columns:", all_data.columns.tolist())
    print("Data shape:", all_data.shape)

    if isinstance(all_data.columns, pd.MultiIndex):
        all_data.columns = [
            col[0] if col[1] == '' else f"{col[0]}_{col[1]}"
            for col in all_data.columns
        ]

    all_data.to_excel('stock_data.xlsx', index=False)

    print("‚úÖ Data saved to stock_data.xlsx")
    print("Total rows:", len(all_data))
else:
    print("‚ùå No data downloaded")
