# FAANG Stock Data Lab – Infrastructure & Setup

This Jupyter notebook explores hourly stock data for the FAANG companies—**Facebook (Meta), Apple, Amazon, Netflix, and Google (Alphabet)**—using Python. It guides you through:

- 📥 Retrieving data with `yfinance`  
- 🧹 Cleaning and preparing datasets  
- 📊 Visualizing trends  
- 📈 Performing basic statistical analysis  

All code follows **PEP 8** style guidelines for clarity and consistency. Each code cell focuses on a single step and includes comments explaining key lines.

Assignment specifics: this notebook maps directly to the module Problems. Problem 1 handles data download, and Problem 2 plots the latest dataset. Stubs/notes are provided for Problems 3–4 in the README.

Target audience: an informed computing professional (e.g., a prospective employer). We assume strong computing background but not prior familiarity with these particular Python packages; comments and short explanations are provided where helpful.

### Key concepts: Setup

- Use PEP 8 for readable, consistent code ([PEP 8](https://peps.python.org/pep-0008/)).
- Set plotting defaults for reproducible visuals ([Jupyter best practices](https://jupyter.org/practices)).
- Keep imports minimal to reduce environment friction ([Real Python: imports](https://realpython.com/python-imports/)).

## 📚 Background: Accessing Market Data with `yfinance`

[`yfinance`](https://pypi.org/project/yfinance/) is a Python library that provides a simple interface to download historical and real-time financial data from Yahoo Finance. It’s widely used in research, education, and personal finance projects due to its ease of use.

> ⚠️ **Note:** `yfinance` is not affiliated with or endorsed by Yahoo Inc. Use it only for educational or research purposes.

## ⚙️ Install Dependencies (if needed)

If `yfinance` is not already installed in your notebook environment, run the following cell:

```python
# Install yfinance if not already available
%pip install yfinance
```

## 📊 Problem 1 — FAANG Stock Data with `yfinance`

In this task, you'll download **hourly stock data** for the FAANG companies:

- META (Facebook), AAPL (Apple), AMZN (Amazon), NFLX (Netflix), GOOG (Google)

You'll retrieve data for the **past 5 days** using the `yfinance` library and save timestamped CSVs to the `data/` folder.

> 🧼 Keep code cells small, well-commented, and reproducible.

# ⚙️ Setup – Imports and Plotting Defaults

This step performs minimal setup:
- Imports required libraries
- Sets plotting defaults
- Defines a toggle (`SHOW_PREVIEW`) to control DataFrame previews

Tip: For reproducible environments, prefer a virtual environment and install from `requirements.txt` rather than installing within the notebook.

In [1]:
# Minimal Setup: imports, plotting defaults, and preview flag
# - Keep imports small for faster environment setup and easier reproducibility.
# - SHOW_PREVIEW toggles DataFrame previews (useful to suppress output in CI).

import os
from datetime import datetime, timezone

# Toggle previews of large tables (set False for CI/grading)
SHOW_PREVIEW: bool = True

# Core libraries used in the exercises
import pandas as pd  # Data analysis library for tabular data
import numpy as np   # Numerical computing (arrays, math helpers)
import yfinance as yf  # Financial data from Yahoo Finance
import matplotlib.pyplot as plt  # Plotting library
import seaborn as sns  # Plot styling built on top of matplotlib
from collections import Counter  # Simple frequency counting for diagnostics

# Plotting defaults for consistency across environments
plt.rcParams['figure.figsize'] = (10, 5)
sns.set_style('whitegrid')

print("✅ Minimal setup complete. If you need to install packages, see README_SETUP.md.")

✅ Minimal setup complete. If you need to install packages, see README_SETUP.md.


### Key concepts: Setup (continued)

- Use `%pip install` in notebooks only if needed; prefer project venv + requirements.txt ([Jupyter best practices](https://jupyter.org/practices)).
- Keep notebook installs minimal to maintain reproducibility ([The Turing Way](https://the-turing-way.netlify.app/reproducible-research/overview/overview.html)).

# Package Verification and Install Guidance

## 🔍 Quick Verification – Package Availability

This cell checks that essential packages are available and performs a lightweight `yfinance` request to verify network/API access.

> ⚠️ If any package is missing, follow the install instructions in `README_SETUP.md` or use `requirements.txt` from the repo root.
```python

### Quick verification of Installed Packages

In [2]:
# Quick verification of essential packages and yfinance functionality
# - Confirms pandas and yfinance are importable
# - Executes a small history() call to validate network access

ok = True

# Check pandas availability
if 'pd' not in globals() or pd is None:
    print("❌ pandas not available. Install with: python -m pip install pandas")
    ok = False

# Check yfinance availability
if 'yf' not in globals() or yf is None:
    print("❌ yfinance not available. Install with: python -m pip install yfinance")
    ok = False

# If both are present, perform a lightweight yfinance request
if ok:
    try:
        t = yf.Ticker("AAPL")  # Create a Ticker object for Apple
        df = t.history(period="1d", interval="1h")  # Fetch 1 day of hourly data
        print("✅ yfinance request succeeded — rows, cols =", df.shape)

        if SHOW_PREVIEW:
            display(df.head())  # Show first few rows for a quick sanity check
    except Exception as e:
        print("❌ yfinance request failed:", e)
else:
    print("⚠️ Environment not ready — install missing packages and re-run this cell.")

✅ yfinance request succeeded — rows, cols = (7, 7)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-10-17 09:30:00-04:00,248.020004,250.320007,247.270004,249.560104,11142993,0.0,0.0
2025-10-17 10:30:00-04:00,249.643997,249.860001,248.229996,248.479996,3459331,0.0,0.0
2025-10-17 11:30:00-04:00,248.479996,250.350006,248.097198,250.296494,4277714,0.0,0.0
2025-10-17 12:30:00-04:00,250.293304,251.360001,250.275208,250.815002,6063960,0.0,0.0
2025-10-17 13:30:00-04:00,250.800003,252.535004,250.244995,252.300003,3952984,0.0,0.0


### Key concepts: Setup (verification)

- Verify required packages up front to avoid mid-notebook failures ([Jupyter best practices](https://jupyter.org/practices)).
- Prefer lightweight network checks for external APIs ([Real Python: package management](https://realpython.com/python-package-management/)).

## Install packages from repository requirements.txt (if needed)

In [3]:
# 📦 Install packages from repository requirements.txt (if needed)
# - This is optional, intended for environments launched outside the project venv.
# - Prefer using the venv in README_SETUP.md. Use this cell only if packages are missing.

import sys
import subprocess
import pathlib
import importlib

# Path to requirements.txt (assumes it's one level up from the notebook)
req = pathlib.Path('..') / 'requirements.txt'

if req.exists():
    try:
        subprocess.check_call([
            sys.executable, '-m', 'pip', 'install', '--quiet', '-r', str(req)
        ])
        print(f"✅ Installed packages from {req}")
    except subprocess.CalledProcessError as e:
        print(f"❌ pip install failed with code {e.returncode}")
else:
    print(f"⚠️ requirements.txt not found at: {req}")

# 🔍 Quick availability check (non-fatal)
print("\n📦 Package availability check:")
for pkg in ('yfinance', 'pandas', 'numpy', 'requests', 'matplotlib', 'seaborn'):
    spec = importlib.util.find_spec(pkg)
    status = "✅ available" if spec else "❌ NOT available"
    print(f"{pkg}: {status}")

✅ Installed packages from ..\requirements.txt

📦 Package availability check:
yfinance: ✅ available
pandas: ✅ available
numpy: ✅ available
requests: ✅ available
matplotlib: ✅ available
seaborn: ✅ available


## FAANG Tickers and Data Download Overview

## Canonical Ticker List

### Key concepts: Fetch (ticker list)

- Centralize ticker list and deduplicate defensively ([Real Python: constants](https://realpython.com/python-constants/)).
- Validate DataFrames before downstream steps ([pandas docs](https://pandas.pydata.org/docs/)).

In [4]:
# ✅ Canonical list of FAANG tickers used throughout the notebook
# - Using a list keeps order consistent.
# - Deduplicate defensively to avoid accidental repeats.

tickers = ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']

# Remove accidental duplicates while preserving order
# dict.fromkeys(...) creates a dict with keys in insertion order; list(...) extracts keys back.
tickers = list(dict.fromkeys(tickers))

# Sanity check
if len(tickers) != 5:
    print('⚠️ Warning: unexpected tickers list (duplicates removed):', tickers)

### Key concepts: Fetch (ticker list)

- Deduplicate ticker lists with `dict.fromkeys()` ([Real Python: idioms](https://realpython.com/lessons/pythonic-idioms/)).
- Centralize constants to avoid drift ([Real Python: constants](https://realpython.com/python-constants/)).

# Fetch Function

## Download and label data

This section fetches hourly history for each ticker, labels each DataFrame, and saves CSVs to `data/`. Code is split into a small reusable function and a runner cell.

### Key concepts: Fetch (function)

- Encapsulate data fetching in a reusable function with clear return values ([pandas docs](https://pandas.pydata.org/docs/)).
- Use `df.copy()` to avoid SettingWithCopyWarning ([pandas DataFrame.copy](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html)).
- Return `None` on failure for defensive programming ([Python for Data Analysis](https://www.oreilly.com/library/view/python-for-data/9781491957653/)).

In [5]:
# 📥 Function to fetch hourly stock data for a single ticker
# - Encapsulates the yfinance call and minimal cleanup.
# - Returns a DataFrame with an added 'Ticker' column or None on failure.

from typing import Optional

def fetch_hourly_history(ticker: str, period: str = '5d', interval: str = '1h') -> Optional[pd.DataFrame]:
    """
    Fetch hourly historical stock data for a single ticker using yfinance.

    Parameters
    ----------
    ticker : str
        Stock symbol (e.g., 'AAPL').
    period : str
        Range of time to download (e.g., '5d' for 5 days).
    interval : str
        Data granularity (e.g., '1h' for hourly).

    Returns
    -------
    Optional[pd.DataFrame]
        Labeled DataFrame with index named 'Date', or None if fetch fails.
    """
    try:
        t = yf.Ticker(ticker)  # Construct a Ticker object for the symbol
        df = t.history(period=period, interval=interval)  # Retrieve historical data
        if df is None or df.empty:
            return None  # Nothing to process
        df = df.copy()  # Avoid SettingWithCopy issues
        df['Ticker'] = ticker  # Label rows with the symbol
        df.index.name = 'Date'  # Name the index for CSV readability
        return df
    except Exception as e:
        print(f"❌ Error fetching {ticker}: {e}")
        return None

# 🔍 Smoke test (non-blocking)
_test = fetch_hourly_history('AAPL')
if _test is not None:
    print('✅ Fetched AAPL rows:', _test.shape[0])

✅ Fetched AAPL rows: 35


### Key concepts: Save

- Use UTC timestamps in filenames for sorting and reproducibility ([Jupyter best practices](https://jupyter.org/practices)).
- Wrap file I/O in try/except for robust workflows ([Real Python: file I/O](https://realpython.com/python-file-io/)).

## Runner – Save CSVs

In [6]:
# Runner: Fetch data for each ticker and save to a timestamped CSV file
# - Ensures the output folder exists
# - Skips duplicates and gracefully handles missing data
# - Uses sortable UTC timestamps (YYYYMMDDTHHMMSSZ)

# Ensure ticker list has no duplicates (defensive)
tickers = list(dict.fromkeys(tickers))

# Create output folder (inside notebooks/ by design for this repo)
os.makedirs('data', exist_ok=True)

# Track saved files
results: dict[str, str] = {}
seen: set[str] = set()

for ticker in tickers:
    # Skip duplicate symbols if any
    if ticker in seen:
        print(f'⏭️ Skipping duplicate ticker {ticker}')
        continue
    seen.add(ticker)

    # Fetch hourly data
    df = fetch_hourly_history(ticker)
    if df is None or df.empty:
        print(f'⚠️ No data for {ticker}')
        continue

    # If a file for this ticker already exists in this session, skip saving
    existing = [p for p in os.listdir('data') if p.startswith(f"{ticker}_") and p.endswith('.csv')]
    if existing:
        print(f'📁 File already exists for {ticker}, skipping save: {existing[0]}')
        results[ticker] = os.path.join('data', existing[0])
        continue

    # Timestamped filename (UTC)
    ts = datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')
    filename = os.path.join('data', f'{ticker}_{ts}.csv')

    try:
        df.to_csv(filename, index=True)
        results[ticker] = filename
        print(f'✅ Saved {ticker} -> {filename}')
    except Exception as exc:
        print(f'❌ Failed to save {ticker}: {exc}')

# Show mapping of ticker -> file path for reference
results

✅ Saved META -> data\META_20251018T174228Z.csv
✅ Saved AAPL -> data\AAPL_20251018T174228Z.csv
✅ Saved AMZN -> data\AMZN_20251018T174228Z.csv
✅ Saved NFLX -> data\NFLX_20251018T174228Z.csv
✅ Saved AMZN -> data\AMZN_20251018T174228Z.csv
✅ Saved NFLX -> data\NFLX_20251018T174228Z.csv
✅ Saved GOOG -> data\GOOG_20251018T174228Z.csv
✅ Saved GOOG -> data\GOOG_20251018T174228Z.csv


{'META': 'data\\META_20251018T174228Z.csv',
 'AAPL': 'data\\AAPL_20251018T174228Z.csv',
 'AMZN': 'data\\AMZN_20251018T174228Z.csv',
 'NFLX': 'data\\NFLX_20251018T174228Z.csv',
 'GOOG': 'data\\GOOG_20251018T174228Z.csv'}

### Key concepts: Save (runner)

- Check for existing files before saving to avoid duplication ([Real Python: file I/O](https://realpython.com/python-file-io/)).
- Use try/except for robust file operations ([Python Cookbook](https://www.oreilly.com/library/view/python-cookbook-3rd/9781449340377/ch17s12.html)).

## 🔍 Fetch and Preview Hourly Data for Each Ticker

This section builds a dictionary of hourly stock data for each FAANG ticker using the `fetch_hourly_history()` function.

- Each ticker is fetched individually.
- Valid DataFrames are stored in a dictionary keyed by ticker symbol.
- Optionally, a preview of each DataFrame is displayed.

> 📦 This modular approach supports downstream analysis and avoids failures due to missing data.


In [7]:
# 📦 Build a dictionary to store hourly data for each ticker
data = {}

for ticker in tickers:
    # Fetch hourly historical data using the custom function
    df = fetch_hourly_history(ticker)

    # Skip if no data is returned or DataFrame is empty
    if df is None or df.empty:
        print(f"⚠️ No data for {ticker}")
        continue

    # Store valid DataFrame in the dictionary
    data[ticker] = df

# 👀 Preview: Display shape and head of each DataFrame
for sym, df in data.items():
    print(f"{sym}: {df.shape}")
    
    if SHOW_PREVIEW:
        display(df.head())

META: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-13 09:30:00-04:00,713.01001,719.940002,707.641296,716.650024,3474259,0.0,0.0,META
2025-10-13 10:30:00-04:00,716.64502,718.414978,708.080017,713.859985,1424938,0.0,0.0,META
2025-10-13 11:30:00-04:00,714.0,714.75,711.710022,714.099976,621689,0.0,0.0,META
2025-10-13 12:30:00-04:00,714.210022,714.890015,712.149292,713.13501,478944,0.0,0.0,META
2025-10-13 13:30:00-04:00,713.030029,715.48999,712.100098,714.96051,499360,0.0,0.0,META


AAPL: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-13 09:30:00-04:00,249.380005,249.490005,245.559998,246.376007,9359382,0.0,0.0,AAPL
2025-10-13 10:30:00-04:00,246.369995,248.659897,246.229996,248.590302,5816790,0.0,0.0,AAPL
2025-10-13 11:30:00-04:00,248.619995,249.619995,248.360001,249.179993,3115587,0.0,0.0,AAPL
2025-10-13 12:30:00-04:00,249.169998,249.689301,248.550003,249.104202,2298264,0.0,0.0,AAPL
2025-10-13 13:30:00-04:00,249.089996,249.320007,248.688004,249.146194,2175748,0.0,0.0,AAPL


AMZN: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-13 09:30:00-04:00,217.699997,220.429993,217.039993,220.110001,11916565,0.0,0.0,AMZN
2025-10-13 10:30:00-04:00,220.100006,220.539993,218.690002,220.160004,5313426,0.0,0.0,AMZN
2025-10-13 11:30:00-04:00,220.1698,220.669998,219.899994,220.169998,3478186,0.0,0.0,AMZN
2025-10-13 12:30:00-04:00,220.169998,220.589996,219.860504,220.050003,2850696,0.0,0.0,AMZN
2025-10-13 13:30:00-04:00,220.050003,220.570007,219.690094,220.524994,2655303,0.0,0.0,AMZN


NFLX: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-13 09:30:00-04:00,1221.930054,1231.119995,1206.810059,1227.360107,591757,0.0,0.0,NFLX
2025-10-13 10:30:00-04:00,1227.359985,1230.989868,1219.27002,1223.165039,243480,0.0,0.0,NFLX
2025-10-13 11:30:00-04:00,1223.810059,1226.25,1220.300049,1224.194946,187927,0.0,0.0,NFLX
2025-10-13 12:30:00-04:00,1223.27002,1224.98999,1218.22998,1220.255005,172973,0.0,0.0,NFLX
2025-10-13 13:30:00-04:00,1220.64502,1222.094971,1218.180054,1219.209961,178239,0.0,0.0,NFLX


GOOG: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-13 09:30:00-04:00,240.970001,243.179993,240.75,241.630005,3137872,0.0,0.0,GOOG
2025-10-13 10:30:00-04:00,241.585007,243.259995,240.910004,242.940002,1431030,0.0,0.0,GOOG
2025-10-13 11:30:00-04:00,242.899994,243.800003,242.729996,243.546005,1096593,0.0,0.0,GOOG
2025-10-13 12:30:00-04:00,243.5,243.820007,242.544998,243.018402,1244831,0.0,0.0,GOOG
2025-10-13 13:30:00-04:00,243.009995,244.039993,242.899994,243.819901,780801,0.0,0.0,GOOG


### Key concepts: Plot

- Plot all tickers on one chart for comparison ([matplotlib docs](https://matplotlib.org/stable/users/index.html)).
- Label axes, add legend, and use a date-range title for clarity.
- Save plots with timestamped filenames for reproducibility.

## Diagnostic – Ticker Validation

In [8]:
# 🧪 Diagnostic: Show effective tickers and detect duplicates
# (Counter was imported at the top of the notebook)

# Display the current list of tickers
print("📋 Effective tickers:", tickers)

# Count occurrences of each ticker
counts = Counter(tickers)

# Identify any tickers that appear more than once
dups = [t for t, c in counts.items() if c > 1]

# Report findings
if dups:
    print(f"⚠️ Duplicate tickers found: {dups}")
else:
    print("✅ No duplicate tickers detected.")



📋 Effective tickers: ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']
✅ No duplicate tickers detected.


### Key concepts: Diagnostics

- Use `collections.Counter` for quick frequency checks ([Python Cookbook](https://www.oreilly.com/library/view/python-cookbook/0596001673/ch01s02.html)).
- Provide clear feedback to help users debug inputs ([Real Python: notebooks](https://realpython.com/python-notebooks/)).

# END