# FAANG Stock Data Lab – Infrastructure & Setup

This Jupyter notebook explores hourly stock data for the FAANG companies—**Facebook (Meta), Apple, Amazon, Netflix, and Google (Alphabet)**—using Python. It guides you through:

- 📥 Retrieving data with `yfinance`  
- 🧹 Cleaning and preparing datasets  
- 📊 Visualizing trends  
- 📈 Performing basic statistical analysis  

All code follows **PEP 8** style guidelines for clarity and consistency. Each code cell focuses on a single step and includes comments explaining key lines.

Assignment specifics: this notebook maps directly to the module Problems. Problem 1 handles data download, and Problem 2 plots the latest dataset. Stubs/notes are provided for Problems 3–4 in the README.

Target audience: an informed computing professional (e.g., a prospective employer). We assume strong computing background but not prior familiarity with these particular Python packages; comments and short explanations are provided where helpful.

### 📘 Explanation of Key Concepts with Sources

- **PEP 8 Compliance**: Following PEP 8 ensures code readability and consistency. As emphasized in *Python Cookbook* (David Beazley, Chapter 1), clean code is easier to maintain and debug. https://www.oreilly.com/library/view/python-cookbook/0596001673/ch01s02.html
- **Plotting Defaults**: Setting consistent styles with `matplotlib` and `seaborn` avoids rendering issues across environments. *Python for Data Analysis* (Wes McKinney, Chapter 5) highlights the importance of reproducible visuals in data workflows. https://www.oreilly.com/library/view/python-for-data/9781491957653/ch05.html
- **Minimal Imports**: Keeping imports lean supports reproducibility and avoids bloated environments. Real Python recommends importing only what’s needed to reduce dependency conflicts and improve notebook performance. https://realpython.com/python-imports/


## 📚 Background: Accessing Market Data with `yfinance`

[`yfinance`](https://pypi.org/project/yfinance/) is a Python library that provides a simple interface to download historical and real-time financial data from Yahoo Finance. It’s widely used in research, education, and personal finance projects due to its ease of use.

> ⚠️ **Note:** `yfinance` is not affiliated with or endorsed by Yahoo Inc. Use it only for educational or research purposes.

## ⚙️ Install Dependencies (if needed)

If `yfinance` is not already installed in your notebook environment, run the following cell:

```python
# Install yfinance if not already available
%pip install yfinance
```

## 📊 Problem 1 — FAANG Stock Data with `yfinance`

In this task, you'll download **hourly stock data** for the FAANG companies:

- META (Facebook), AAPL (Apple), AMZN (Amazon), NFLX (Netflix), GOOG (Google)

You'll retrieve data for the **past 5 days** using the `yfinance` library and save timestamped CSVs to the `data/` folder.

> 🧼 Keep code cells small, well-commented, and reproducible.

# ⚙️ Setup – Imports and Plotting Defaults

This step performs minimal setup:
- Imports required libraries
- Sets plotting defaults
- Defines a toggle (`SHOW_PREVIEW`) to control DataFrame previews

Tip: For reproducible environments, prefer a virtual environment and install from `requirements.txt` rather than installing within the notebook.

In [None]:
# Minimal Setup: imports, plotting defaults, and preview flag
# - Keep imports small for faster environment setup and easier reproducibility.
# - SHOW_PREVIEW toggles DataFrame previews (useful to suppress output in CI).

import os
from datetime import datetime, timezone

# Toggle previews of large tables (set False for CI/grading)
SHOW_PREVIEW: bool = True

# Core libraries used in the exercises
import pandas as pd  # Data analysis library for tabular data
import numpy as np   # Numerical computing (arrays, math helpers)
import yfinance as yf  # Financial data from Yahoo Finance
import matplotlib.pyplot as plt  # Plotting library
import seaborn as sns  # Plot styling built on top of matplotlib
from collections import Counter  # Simple frequency counting for diagnostics

# Plotting defaults for consistency across environments
plt.rcParams['figure.figsize'] = (10, 5)
sns.set_style('whitegrid')

print("✅ Minimal setup complete. If you need to install packages, see README_SETUP.md.")

✅ Minimal setup complete. If you need to install packages, see README_SETUP.md.


### 📘 Explanation of Key Concepts with Sources – yfinance & Install Cell

- **yfinance Overview**: `yfinance` provides a simple interface to access financial data. *Python for Data Analysis* (McKinney) discusses using external APIs for real-world datasets, making this library ideal for educational labs. https://www.oreilly.com/library/view/python-for-data/9781491957653/ch04.html
- **Install Cell Design**: Using `%pip install` is preferred in Jupyter notebooks for compatibility. Real Python advises avoiding unnecessary installs in shared environments to preserve reproducibility and prevent side effects. https://realpython.com/python-pip-install/

# Package Verification and Install Guidance

## 🔍 Quick Verification – Package Availability

This step checks that essential packages are available and performs a lightweight `yfinance` request to verify network/API access.

If a package is missing, follow the install instructions in `README_SETUP.md` or install from `requirements.txt` in the repo root.

### Quick verification of Installed Packages

In [None]:
# Quick verification of essential packages and yfinance functionality
# - Confirms pandas and yfinance are importable
# - Executes a small history() call to validate network access

ok = True

# Check pandas availability
if 'pd' not in globals() or pd is None:
    print("❌ pandas not available. Install with: python -m pip install pandas")
    ok = False

# Check yfinance availability
if 'yf' not in globals() or yf is None:
    print("❌ yfinance not available. Install with: python -m pip install yfinance")
    ok = False

# If both are present, perform a lightweight yfinance request
if ok:
    try:
        t = yf.Ticker("AAPL")  # Create a Ticker object for Apple
        df = t.history(period="1d", interval="1h")  # Fetch 1 day of hourly data
        print("✅ yfinance request succeeded — rows, cols =", df.shape)

        if SHOW_PREVIEW:
            display(df.head())  # Show first few rows for a quick sanity check
    except Exception as e:
        print("❌ yfinance request failed:", e)
else:
    print("⚠️ Environment not ready — install missing packages and re-run this cell.")

✅ yfinance request succeeded — rows, cols = (7, 7)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-10-10 09:30:00-04:00,255.320007,256.292908,254.880005,255.179993,7963458,0.0,0.0
2025-10-10 10:30:00-04:00,255.199997,255.300003,248.949997,249.410004,9992812,0.0,0.0
2025-10-10 11:30:00-04:00,249.039993,249.664993,247.759995,248.610001,6989327,0.0,0.0
2025-10-10 12:30:00-04:00,248.589996,249.059998,247.059998,247.809998,4394995,0.0,0.0
2025-10-10 13:30:00-04:00,247.770004,248.570007,247.080002,247.160004,4882982,0.0,0.0


### 📘 Explanation of Key Concepts with Sources – Package Verification

- **Subprocess & Importlib**: Automating installs with `subprocess` and checking availability with `importlib.util.find_spec()` are robust practices. *Python Cookbook* (Beazley) recommends these tools for shell automation and safe introspection. https://www.oreilly.com/library/view/python-cookbook/0596001673/ch08s15.html
- **Environment Safety**: Verifying packages before running analysis ensures stability. Real Python emphasises proactive checks to avoid runtime errors and improve user experience. https://realpython.com/python-package-management/


## Install packages from repository requirements.txt (if needed)

In [None]:
# 📦 Install packages from repository requirements.txt (if needed)
# - This is optional, intended for environments launched outside the project venv.
# - Prefer using the venv in README_SETUP.md. Use this cell only if packages are missing.

import sys
import subprocess
import pathlib
import importlib

# Path to requirements.txt (assumes it's one level up from the notebook)
req = pathlib.Path('..') / 'requirements.txt'

if req.exists():
    try:
        subprocess.check_call([
            sys.executable, '-m', 'pip', 'install', '--quiet', '-r', str(req)
        ])
        print(f"✅ Installed packages from {req}")
    except subprocess.CalledProcessError as e:
        print(f"❌ pip install failed with code {e.returncode}")
else:
    print(f"⚠️ requirements.txt not found at: {req}")

# 🔍 Quick availability check (non-fatal)
print("\n📦 Package availability check:")
for pkg in ('yfinance', 'pandas', 'numpy', 'requests', 'matplotlib', 'seaborn'):
    spec = importlib.util.find_spec(pkg)
    status = "✅ available" if spec else "❌ NOT available"
    print(f"{pkg}: {status}")

✅ Installed packages from ..\requirements.txt

📦 Package availability check:
yfinance: ✅ available
pandas: ✅ available
numpy: ✅ available
requests: ✅ available
scipy: ✅ available
matplotlib: ✅ available


## FAANG Tickers and Data Download Overview

## 📈 FAANG Tickers and Data Download

This section defines the canonical list of FAANG tickers used throughout the notebook:

- **META**, **AAPL**, **AMZN**, **NFLX**, **GOOG**

The list is deduplicated and used as the authoritative source for all data operations.

> 🧠 Lists are a core Python data structure.  
> See: [W3Schools – Python Lists](https://www.w3schools.com/python/python_lists.asp)  
> See: *Fluent Python* by Luciano Ramalho, Chapter 2 – Data Structures

We then define a reusable function to fetch hourly stock data using `yfinance`, label the data, and return a cleaned DataFrame. A runner cell loops through each ticker, saves the data to timestamped CSVs in the `data/` folder, and handles errors gracefully.

> 📘 References:
- [`yfinance` on PyPI](https://pypi.org/project/yfinance/)
- [pandas.DataFrame.empty](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.empty.html)
- [pandas.DataFrame.copy](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html)
- [Python for Data Analysis](https://www.oreilly.com/library/view/python-for-data/9781491957653/)
- [Python Cookbook](https://www.oreilly.com/library/view/python-cookbook/9781449340377/)

## Canonical Ticker List

In [None]:
# ✅ Canonical list of FAANG tickers used throughout the notebook
# - Using a list keeps order consistent.
# - Deduplicate defensively to avoid accidental repeats.

tickers = ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']

# Remove accidental duplicates while preserving order
# dict.fromkeys(...) creates a dict with keys in insertion order; list(...) extracts keys back.
tickers = list(dict.fromkeys(tickers))

# Sanity check
if len(tickers) != 5:
    print('⚠️ Warning: unexpected tickers list (duplicates removed):', tickers)

### 📘 Explanation of Key Concepts with Sources - FAANG Ticker List

- **List Deduplication**: Using `dict.fromkeys()` to remove duplicates while preserving order is a [Pythonic idiom](https://realpython.com/lessons/pythonic-idioms/). *Fluent Python* (Luciano Ramalho, Chapter 2) explores the power and flexibility of built-in data structures like lists and dictionaries. https://www.oreilly.com/library/view/fluent-python/9781491946237/ch02.html
- **Canonical Source**: Defining a single authoritative list avoids duplication and confusion. Real Python recommends centralising constants to improve maintainability and reduce bugs. https://realpython.com/python-constants/


# Fetch Function

## Download and label data

This section fetches hourly history for each ticker, labels each DataFrame, and saves CSVs to `data/`. Code is split into a small reusable function and a runner cell.

In [None]:
# 📥 Function to fetch hourly stock data for a single ticker
# - Encapsulates the yfinance call and minimal cleanup.
# - Returns a DataFrame with an added 'Ticker' column or None on failure.

from typing import Optional

def fetch_hourly_history(ticker: str, period: str = '5d', interval: str = '1h') -> Optional[pd.DataFrame]:
    """
    Fetch hourly historical stock data for a single ticker using yfinance.

    Parameters
    ----------
    ticker : str
        Stock symbol (e.g., 'AAPL').
    period : str
        Range of time to download (e.g., '5d' for 5 days).
    interval : str
        Data granularity (e.g., '1h' for hourly).

    Returns
    -------
    Optional[pd.DataFrame]
        Labeled DataFrame with index named 'Date', or None if fetch fails.
    """
    try:
        t = yf.Ticker(ticker)  # Construct a Ticker object for the symbol
        df = t.history(period=period, interval=interval)  # Retrieve historical data
        if df is None or df.empty:
            return None  # Nothing to process
        df = df.copy()  # Avoid SettingWithCopy issues
        df['Ticker'] = ticker  # Label rows with the symbol
        df.index.name = 'Date'  # Name the index for CSV readability
        return df
    except Exception as e:
        print(f"❌ Error fetching {ticker}: {e}")
        return None

# 🔍 Smoke test (non-blocking)
_test = fetch_hourly_history('AAPL')
if _test is not None:
    print('✅ Fetched AAPL rows:', _test.shape[0])

✅ Fetched AAPL rows: 35


### 📘 Explanation of Key Concepts with Sources – fetch_hourly_history()

- **DataFrame Validation**: Checking `df.empty` prevents errors during analysis. See [pandas.DataFrame.empty](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.empty.html).
- **Copying DataFrames**: Using `df.copy()` avoids `SettingWithCopyWarning`, a common issue discussed in *Python for Data Analysis* (McKinney, Chapter 5). https://www.oreilly.com/library/view/python-for-data/9781491957653/ch05.html
- **Index Naming**: Setting `df.index.name = 'Date'` improves CSV readability and downstream parsing. This aligns with McKinney’s guidance on preparing data for export. https://www.oreilly.com/library/view/python-for-data/9781491957653/ch05.html


## Runner – Save CSVs

In [None]:
# Runner: Fetch data for each ticker and save to a timestamped CSV file
# - Ensures the output folder exists
# - Skips duplicates and gracefully handles missing data
# - Uses sortable UTC timestamps (YYYYMMDDTHHMMSSZ)

# Ensure ticker list has no duplicates (defensive)
tickers = list(dict.fromkeys(tickers))

# Create output folder (inside notebooks/ by design for this repo)
os.makedirs('data', exist_ok=True)

# Track saved files
results: dict[str, str] = {}
seen: set[str] = set()

for ticker in tickers:
    # Skip duplicate symbols if any
    if ticker in seen:
        print(f'⏭️ Skipping duplicate ticker {ticker}')
        continue
    seen.add(ticker)

    # Fetch hourly data
    df = fetch_hourly_history(ticker)
    if df is None or df.empty:
        print(f'⚠️ No data for {ticker}')
        continue

    # If a file for this ticker already exists in this session, skip saving
    existing = [p for p in os.listdir('data') if p.startswith(f"{ticker}_") and p.endswith('.csv')]
    if existing:
        print(f'📁 File already exists for {ticker}, skipping save: {existing[0]}')
        results[ticker] = os.path.join('data', existing[0])
        continue

    # Timestamped filename (UTC)
    ts = datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')
    filename = os.path.join('data', f'{ticker}_{ts}.csv')

    try:
        df.to_csv(filename, index=True)
        results[ticker] = filename
        print(f'✅ Saved {ticker} -> {filename}')
    except Exception as exc:
        print(f'❌ Failed to save {ticker}: {exc}')

# Show mapping of ticker -> file path for reference
results

✅ Saved META -> data\META_20251013T081225Z.csv
✅ Saved AAPL -> data\AAPL_20251013T081225Z.csv
✅ Saved AMZN -> data\AMZN_20251013T081225Z.csv
✅ Saved AMZN -> data\AMZN_20251013T081225Z.csv
✅ Saved NFLX -> data\NFLX_20251013T081225Z.csv
✅ Saved GOOG -> data\GOOG_20251013T081225Z.csv
✅ Saved NFLX -> data\NFLX_20251013T081225Z.csv
✅ Saved GOOG -> data\GOOG_20251013T081225Z.csv


{'META': 'data\\META_20251013T081225Z.csv',
 'AAPL': 'data\\AAPL_20251013T081225Z.csv',
 'AMZN': 'data\\AMZN_20251013T081225Z.csv',
 'NFLX': 'data\\NFLX_20251013T081225Z.csv',
 'GOOG': 'data\\GOOG_20251013T081225Z.csv'}

### 📘 Explanation of Key Concepts with Sources - Runner: Save CSVs

- **Idempotent Saves**: Checking for existing files before saving prevents duplication. Real Python recommends this pattern for safe file I/O. https://realpython.com/python-file-io/
- **Timestamped Filenames**: Using UTC timestamps ensures sortability and reproducibility. *Python for Data Analysis* (McKinney) encourages timestamping for version control. https://www.oreilly.com/library/view/python-for-data/9781491957653/ch05.html
- **Error Handling**: Wrapping `df.to_csv()` in `try/except` ensures robustness. *Python Cookbook* (Beazley) emphasises fault-tolerant scripting for production-grade workflows. https://www.oreilly.com/library/view/python-cookbook-3rd/9781449340377/ch17s12.html


## 🔍 Fetch and Preview Hourly Data for Each Ticker

This section builds a dictionary of hourly stock data for each FAANG ticker using the `fetch_hourly_history()` function.

- Each ticker is fetched individually.
- Valid DataFrames are stored in a dictionary keyed by ticker symbol.
- Optionally, a preview of each DataFrame is displayed.

> 📦 This modular approach supports downstream analysis and avoids failures due to missing data.


In [7]:
# 📦 Build a dictionary to store hourly data for each ticker
data = {}

for ticker in tickers:
    # Fetch hourly historical data using the custom function
    df = fetch_hourly_history(ticker)

    # Skip if no data is returned or DataFrame is empty
    if df is None or df.empty:
        print(f"⚠️ No data for {ticker}")
        continue

    # Store valid DataFrame in the dictionary
    data[ticker] = df

# 👀 Preview: Display shape and head of each DataFrame
for sym, df in data.items():
    print(f"{sym}: {df.shape}")
    
    if SHOW_PREVIEW:
        display(df.head())

META: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,705.0,706.619995,690.659973,700.933594,7964634,0.0,0.0,META
2025-10-06 10:30:00-04:00,701.26001,706.116699,699.0,703.590027,2664362,0.0,0.0,META
2025-10-06 11:30:00-04:00,703.659973,708.030029,703.159973,705.799988,1677535,0.0,0.0,META
2025-10-06 12:30:00-04:00,705.785583,709.950012,705.52002,708.565125,1489886,0.0,0.0,META
2025-10-06 13:30:00-04:00,708.537415,716.690002,707.320007,715.119995,2070174,0.0,0.0,META


AAPL: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,257.945007,259.070007,255.050003,257.779999,11669158,0.0,0.0,AAPL
2025-10-06 10:30:00-04:00,257.799988,257.980011,257.279999,257.5513,3447400,0.0,0.0,AAPL
2025-10-06 11:30:00-04:00,257.565002,257.670013,256.769989,257.130005,3501833,0.0,0.0,AAPL
2025-10-06 12:30:00-04:00,257.130005,257.480011,256.130493,256.559998,2877222,0.0,0.0,AAPL
2025-10-06 13:30:00-04:00,256.549988,256.681213,255.5,256.184998,5922536,0.0,0.0,AAPL


AMZN: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,220.919998,220.919998,216.029999,218.330002,13078703,0.0,0.0,AMZN
2025-10-06 10:30:00-04:00,218.330002,219.160004,218.290207,219.108795,4809466,0.0,0.0,AMZN
2025-10-06 11:30:00-04:00,219.145996,220.434998,219.050003,219.604996,4367656,0.0,0.0,AMZN
2025-10-06 12:30:00-04:00,219.600006,220.565506,219.550003,220.240005,2776867,0.0,0.0,AMZN
2025-10-06 13:30:00-04:00,220.25,221.190002,220.229996,221.169998,3683984,0.0,0.0,AMZN


NFLX: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,1160.369995,1162.119995,1150.5,1151.869995,551078,0.0,0.0,NFLX
2025-10-06 10:30:00-04:00,1151.099976,1153.14502,1145.47998,1150.400024,407082,0.0,0.0,NFLX
2025-10-06 11:30:00-04:00,1150.415039,1153.439941,1146.98999,1150.0,240097,0.0,0.0,NFLX
2025-10-06 12:30:00-04:00,1150.622559,1154.219971,1148.987549,1153.330444,151067,0.0,0.0,NFLX
2025-10-06 13:30:00-04:00,1153.089966,1161.97998,1152.994995,1160.98999,233125,0.0,0.0,NFLX


GOOG: (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,246.139999,249.660004,245.690002,246.098007,4373843,0.0,0.0,GOOG
2025-10-06 10:30:00-04:00,246.050003,247.281998,246.039993,247.039993,1614988,0.0,0.0,GOOG
2025-10-06 11:30:00-04:00,247.079895,249.419998,247.050003,249.054993,1251392,0.0,0.0,GOOG
2025-10-06 12:30:00-04:00,249.059998,250.5,248.850006,249.889999,1568368,0.0,0.0,GOOG
2025-10-06 13:30:00-04:00,249.919998,251.619293,246.740005,251.399994,2780352,0.0,0.0,GOOG


### 📘 Explanation of Key Concepts with Sources – Fetch & Preview

- **Dictionary Usage**: Storing DataFrames keyed by ticker is efficient and clean. *Fluent Python* (Ramalho) highlights dictionaries as foundational tools for organising structured data. https://www.oreilly.com/library/view/fluent-python/9781491946237/ch02.html
- **Conditional Display**: Using a toggle like `SHOW_PREVIEW` allows flexible output control. Real Python recommends this approach to tailor notebook behavior for students vs. CI environments. https://realpython.com/python-notebooks/


## Diagnostic – Ticker Validation

In [8]:
# 🧪 Diagnostic: Show effective tickers and detect duplicates
# (Counter was imported at the top of the notebook)

# Display the current list of tickers
print("📋 Effective tickers:", tickers)

# Count occurrences of each ticker
counts = Counter(tickers)

# Identify any tickers that appear more than once
dups = [t for t, c in counts.items() if c > 1]

# Report findings
if dups:
    print(f"⚠️ Duplicate tickers found: {dups}")
else:
    print("✅ No duplicate tickers detected.")



📋 Effective tickers: ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']
✅ No duplicate tickers detected.


### 📘 Explanation of Key Concepts with Sources – Diagnostic: Ticker Validation

- **Counter for Frequency Analysis**: `collections.Counter` is ideal for detecting duplicates. *Python Cookbook* (Beazley, Chapter 1) recommends it for quick frequency checks and diagnostics. https://www.oreilly.com/library/view/python-cookbook/0596001673/ch01s02.html
- **Clean Reporting**: Providing clear feedback on duplicates helps students debug their inputs. Real Python emphasises transparent messaging to improve learning outcomes. https://realpython.com/python-notebooks/


# END

## 📊 Problem 2 — Plot Close Prices from Latest CSV

In this task, we load the latest CSV for each ticker from `data/` and plot Close prices on a single figure.

Assignment specifics:
- Include axis labels, a legend, and the date as a title.
- Save the plot to a `plots/` folder with a timestamped filename.

Notes for reviewers:
- We consider the "latest" file per ticker by lexicographic sort of UTC timestamps embedded in filenames (`T%H%M%SZ`).

In [None]:
# Utilities: discover latest CSV per ticker
# - Scans the data/ folder and returns a mapping ticker -> latest CSV path.
# - Assumes filenames are of the form <TICKER>_YYYYMMDDTHHMMSSZ.csv

from pathlib import Path
from typing import Dict

def find_latest_csvs(data_dir: str = 'data') -> Dict[str, str]:
    root = Path(data_dir)
    if not root.exists():
        print(f"⚠️ Data directory not found: {root.resolve()}")
        return {}

    latest: Dict[str, str] = {}
    # Group by ticker prefix by scanning all CSV files
    files = sorted([p for p in root.glob('*.csv') if '_' in p.stem])
    by_ticker: Dict[str, list[Path]] = {}
    for p in files:
        ticker = p.stem.split('_', 1)[0]
        by_ticker.setdefault(ticker, []).append(p)

    # For each ticker, pick lexicographically last (UTC timestamp ensures correct ordering)
    for ticker, paths in by_ticker.items():
        latest[ticker] = str(sorted(paths)[-1])

    return latest

latest_files = find_latest_csvs('data')
latest_files

In [None]:
# Load the latest CSVs into a single DataFrame
# - Reads each latest CSV and concatenates them vertically
# - Ensures 'Date' column is parsed as a datetime index

from typing import Optional

def load_latest_data(latest: dict[str, str]) -> Optional[pd.DataFrame]:
    if not latest:
        print("⚠️ No files discovered — run the download step above first.")
        return None

    frames = []
    for ticker, path in latest.items():
        try:
            df = pd.read_csv(path, parse_dates=['Date'])
            df['Ticker'] = ticker  # Ensure label exists even if missing
            frames.append(df)
        except Exception as e:
            print(f"❌ Failed to read {path}: {e}")

    if not frames:
        return None

    out = pd.concat(frames, ignore_index=True)
    out.sort_values(['Ticker', 'Date'], inplace=True)
    return out

df_latest = load_latest_data(latest_files)
if df_latest is not None and SHOW_PREVIEW:
    display(df_latest.groupby('Ticker').head(2))

In [None]:
# Plot Close prices for each ticker on one chart and save to plots/
# - Uses tight layout and legend
# - Title includes the date range present in the concatenated DataFrame

from pathlib import Path

# Ensure plots directory exists alongside data/
PLOTS_DIR = Path('plots')
PLOTS_DIR.mkdir(exist_ok=True)

if df_latest is None or df_latest.empty:
    print("⚠️ No data loaded — cannot plot. Run the previous cells first.")
else:
    # Determine overall date range for the title
    dmin = df_latest['Date'].min()
    dmax = df_latest['Date'].max()

    # Create a figure
    fig, ax = plt.subplots()

    # Plot a line per ticker
    for ticker, group in df_latest.groupby('Ticker'):
        ax.plot(group['Date'], group['Close'], label=ticker)

    # Labels and formatting
    ax.set_xlabel('Date/Time (UTC)')
    ax.set_ylabel('Close Price (USD)')
    ax.set_title(f'FAANG Close Prices — {dmin:%Y-%m-%d %H:%M} to {dmax:%Y-%m-%d %H:%M} (UTC)')
    ax.legend(loc='best')
    fig.autofmt_xdate()
    plt.tight_layout()

    # Save with UTC timestamp similar to CSV naming
    ts = datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')
    out_path = PLOTS_DIR / f'faang_close_{ts}.png'
    fig.savefig(out_path, dpi=150)
    print(f"✅ Saved plot -> {out_path}")

## ✅ Submission checklist (for graders/reviewers)

- Setup instructions in `README_SETUP.md` and `README.md` allow minimal setup on Windows PowerShell.
- Code is broken into small, commented cells. Each cell focuses on a single task.
- Outputs:
  - CSVs saved under `notebooks/data/` (ignored by git).
  - Plots saved under `notebooks/plots/` (ignored by git).
- Large files are not committed; the notebook provides code to (re)generate them.
- Assumptions/Constraints are documented inline.

For Problems 3–4 (script and automation), see notes in `README.md` and extend this notebook or extract functions into a small module/script as needed.