# Computer Infrastructure Problems / Tasks

This Jupyter Notebook analyses the hourly stock data of FAANG companies (Facebook, Apple, Amazon, Netflix, and Google) using Python. The analysis includes data retrieval, cleaning, visualization, and basic statistical analysis.

The notebook is compliant with [PEP 8 standards](https://www.python.org/dev/peps/pep-0008/) and is structured to ensure clarity and reproducibility.

> Keep the notebook reproducible, clean and concise. Run each code cell to perform a single step and save outputs to the `data/` folder.

## Background yfinance: Accessing Market Data in Python

The [`yfinance`](https://github.com/ranaroussi/yfinance) library offers a clean, Pythonic interface for retrieving historical and real-time financial data from [Yahoo Finance](https://finance.yahoo.com/). It’s widely used in academic, research, and personal finance projects due to its simplicity and flexibility.

> ⚠️ Note: yfinance is not affiliated with or endorsed by Yahoo Inc. Use it for research/educational purposes.

Run the Install cell below to install `yfinance` into the notebook kernel if needed. Do not run installs unnecessarily in shared or production environments.

```bash
%pip install yfinance
```

[Source: pypi.org](https://pypi.org/project/yfinance/)

## Problem 1 — FAANG Stock Data with yfinance

Download hourly stock data for the FAANG companies (META, AAPL, AMZN, NFLX, GOOG) for the past 5 days using `yfinance`. Save timestamped CSVs to `data/`. Keep cells small and well-commented.

## Setup — install & imports

Run the Setup cell below first. It checks for and installs missing packages into the notebook kernel (recommended only for interactive/student use). For reproducible environments prefer creating a virtual environment and installing from `requirements.txt`.

In [1]:
# Minimal Setup cell: imports, plotting defaults, and instructor-visible flags
import os
from datetime import datetime, timezone

# Instructor toggle: set False for CI/grading to avoid DataFrame previews
SHOW_PREVIEW = True

# Minimal imports used in exercises (do not pip-install in this cell)
import pandas as pd
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter

# plotting defaults
plt.rcParams['figure.figsize'] = (10, 5)
sns.set_style('whitegrid')

print('Minimal setup complete. If you need to install packages, follow README_SETUP.md.')

Minimal setup complete. If you need to install packages, follow README_SETUP.md.


### Minimal package list & install behavior

This notebook will attempt to install any missing small packages listed below into the running kernel. Packages listed are the minimal set needed for the exercises. For full reproducibility use `requirements.txt` in the repo root or create a fresh virtual environment.

### Import summary / plotting defaults

This cell configures the notebook's imports and plotting defaults (matplotlib backend + seaborn style). These settings make figure rendering consistent across environments and avoid backend errors in headless CI/kernels.

### Quick verification of Installed Packages

In [2]:
# Quick verification (run after the imports cell)
# This small smoke-test checks that the essential packages are importable
# and performs a tiny yfinance request to ensure the network/API call works.

ok = True
# Verify pandas is available in the kernel
if 'pd' not in globals() or pd is None:
    print('pandas is not available. Install with: python -m pip install pandas')
    ok = False
# Verify yfinance is available in the kernel
if 'yf' not in globals() or yf is None:
    print('yfinance is not available. Install with: python -m pip install yfinance')
    ok = False

# If both packages appear present, perform a lightweight request for a single ticker
if ok:
    try:
        # Create a Ticker object and fetch a small recent history (1 day, hourly)
        t = yf.Ticker('AAPL')
        df = t.history(period='1d', interval='1h')

        # Print only the dataframe shape to confirm the request succeeded; avoid printing the whole DataFrame
        print('yfinance request succeeded — rows,cols =', df.shape)

        # Optionally show a small preview if the instructor/student wants it
        # Toggle this with SHOW_PREVIEW defined in the Setup cell above
        if SHOW_PREVIEW:
            display(df.head())
    except Exception as e:
        # Catch and print any errors during the yfinance request (network, API, parsing)
        print('yfinance request failed:', e)
else:
    # If a required package is missing, instruct the user how to install it
    print('Environment not ready — install missing packages and re-run this cell.')

yfinance request succeeded — rows,cols = (7, 7)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2025-10-10 09:30:00-04:00,255.320007,256.292908,254.880005,255.179993,7963458,0.0,0.0
2025-10-10 10:30:00-04:00,255.199997,255.300003,248.949997,249.410004,9992812,0.0,0.0
2025-10-10 11:30:00-04:00,249.039993,249.664993,247.759995,248.610001,6989327,0.0,0.0
2025-10-10 12:30:00-04:00,248.589996,249.059998,247.059998,247.809998,4394995,0.0,0.0
2025-10-10 13:30:00-04:00,247.770004,248.570007,247.080002,247.160004,4882982,0.0,0.0


In [3]:
# Install packages from repository requirements (run if needed)
import sys, subprocess, pathlib, importlib
req = pathlib.Path('..') / 'requirements.txt'
if req.exists():
    try:
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', '--quiet', '-r', str(req)])
        print('Installed packages from', req)
    except subprocess.CalledProcessError as e:
        print('pip install failed with code', e.returncode)
else:
    print('requirements.txt not found:', req)

# Quick availability check (non-fatal)
for pkg in ('yfinance','pandas','numpy','requests','scipy','matplotlib'):
    spec = importlib.util.find_spec(pkg)
    print(pkg, 'available' if spec else 'NOT available')

Installed packages from ..\requirements.txt
yfinance available
pandas available
numpy available
requests available
scipy available
matplotlib available


## FAANG Tickers

### FAANG tickers

The notebook uses the following tickers as the canonical list in the code cell below. Keep the code cell as the authoritative source to avoid duplication.

Lists are a core Python data structure as outlined in the following:
- W3Schools — Python Lists: https://www.w3schools.com/python/python_lists.asp
- Fluent Python (Luciano Ramalho), Chapter 2 — [Data Structures](https://www.oreilly.com/library/view/fluent-python/9781491946237/ch02.html)

In [4]:
# Canonical list of FAANG tickers used throughout the notebook
tickers = ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']
# Remove accidental duplicates while preserving order
tickers = list(dict.fromkeys(tickers))
if len(tickers) != 5:
    print('Warning: unexpected tickers list (duplicates removed):', tickers)

## Download and label data

This section fetches hourly history for each ticker, labels each DataFrame, and saves CSVs to `data/`. Code is split into a small reusable function and a runner cell.

In [5]:
from typing import Optional

def fetch_hourly_history(ticker: str, period: str = '5d', interval: str = '1h') -> Optional[pd.DataFrame]:
    """
    Fetch hourly historical stock data for a single ticker using yfinance.
    Returns a labelled DataFrame or None if fetch fails.
    """
    try:
        t = yf.Ticker(ticker)
        df = t.history(period=period, interval=interval)
        if df is None or df.empty:
            return None
        df = df.copy()
        df['Ticker'] = ticker
        df.index.name = 'Date'
        return df
    except Exception as e:
        print(f"Error fetching {ticker}: {e}")
        return None

# Test block
if __name__ == '__main__':
    test = fetch_hourly_history('AAPL')
    if test is not None:
        print('Fetched AAPL rows:', test.shape[0])

Fetched AAPL rows: 35


### Explanation of Key Concepts with Sources

This function fetches hourly historical stock data for a single ticker using `yfinance`, labels rows with the ticker, and returns a cleaned `pandas.DataFrame`. The notes below expand on key steps, edge cases, and references that are helpful for instructors and students.

- **Key steps:**  
  1. Create `yf.Ticker(ticker)` and call `.history()` with `period` and `interval`.  
     https://pypi.org/project/yfinance/
  2. Validate the returned DataFrame (`None` or `empty` → return `None`).  
     https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.empty.html
  3. Copy the DataFrame, add `df['Ticker'] = ticker`, and set `df.index.name = 'Date'` for clearer CSVs.

- **Why copy the DataFrame?**
  Copying avoids pandas' `SettingWithCopyWarning` and ensures modifications don't affect views of original data.  
  https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html

- **Edge cases & error handling:**  
  Network errors, invalid symbols, or API changes may raise exceptions; the function catches these and returns `None` so the caller can skip or log the ticker without failing the whole run.

- **Index naming & CSV compatibility:**  
  Naming the index `Date` improves CSV readability and downstream parsing when combining multiple tickers.  
  https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

References:
- *Python for Data Analysis* (Wes McKinney) — Chapter 5: Data Loading and Storage https://www.oreilly.com/library/view/python-for-data/9781491957653/
- [pandas.DataFrame.empty](https://pandas.pydata.org/docs/reference/api/p
- *Fluent Python* (Luciano Ramalho) https://www.oreilly.com/library/view/fluent-python/9781491946237/
- *Python Cookbook* (David Beazley) — Chapter 1: Data Structures https://www.oreilly.com/library/view/python-cookbook/9781449340377/

In [6]:
# Runner: Fetch data for each ticker and save to a timestamped CSV file
# This block is separated from the fetch function for modularity and clarity

# Ensure tickers are unique while preserving their original order
# Converts the list to a dict (which removes duplicates) and back to a list
tickers = list(dict.fromkeys(tickers))

# Create the 'data' folder if it doesn't already exist
os.makedirs('data', exist_ok=True)

# Initialize a dictionary to store saved file paths for each ticker
results = {}

# Track which tickers have already been processed
seen = set()

# Loop through each ticker symbol
for ticker in tickers:
    # Skip duplicate tickers that may have slipped through
    if ticker in seen:
        print(f'Skipping duplicate ticker {ticker}')
        continue
    seen.add(ticker)

    # Fetch hourly data using the custom function
    df = fetch_hourly_history(ticker)

    # Skip if no data is returned or DataFrame is empty
    if df is None or df.empty:
        print(f'No data for {ticker}')
        continue

    # Check if a file already exists for this ticker
    # Prevents overwriting or duplicate saves on reruns
    pattern = os.path.join('data', f'{ticker}_*.csv')
    existing = [p for p in os.listdir('data') if p.startswith(f"{ticker}_") and p.endswith('.csv')]
    if existing:
        print(f'File already exists for {ticker}, skipping save: {existing[0]}')
        results[ticker] = os.path.join('data', existing[0])
        continue

    # Generate a UTC timestamp for the filename
    ts = datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')
    filename = os.path.join('data', f'{ticker}_{ts}.csv')

    # Try saving the DataFrame to CSV
    try:
        df.to_csv(filename, index=True)
        results[ticker] = filename
        print(f'Saved {ticker} -> {filename}')
    except Exception as exc:
        # Catch and log any errors during file save
        print(f'Failed to save {ticker}: {exc}')

# Final output: dictionary of tickers and their saved file paths
results


Saved META -> data\META_20251013T064859Z.csv
Saved AAPL -> data\AAPL_20251013T064859Z.csv
Saved AMZN -> data\AMZN_20251013T064859Z.csv
Saved NFLX -> data\NFLX_20251013T064859Z.csv
Saved AMZN -> data\AMZN_20251013T064859Z.csv
Saved NFLX -> data\NFLX_20251013T064859Z.csv
Saved GOOG -> data\GOOG_20251013T064859Z.csv
Saved GOOG -> data\GOOG_20251013T064859Z.csv


{'META': 'data\\META_20251013T064859Z.csv',
 'AAPL': 'data\\AAPL_20251013T064859Z.csv',
 'AMZN': 'data\\AMZN_20251013T064859Z.csv',
 'NFLX': 'data\\NFLX_20251013T064859Z.csv',
 'GOOG': 'data\\GOOG_20251013T064859Z.csv'}

### Explanation of Key Concepts with Sources

This runner loops through the canonical FAANG tickers, fetches hourly data using `fetch_hourly_history()`, and saves a timestamped CSV for each ticker in the `data/` directory.

- **Deduplicate tickers**  
  Removes accidental duplicates while preserving order using `dict.fromkeys()`.

- **Create `data/` if missing**  
  Uses `os.makedirs(..., exist_ok=True)` to ensure the output directory exists.

- **Validate fetched data**  
  For each ticker, the runner checks `if df is None or df.empty` and skips saving when there is no usable data.  
  📘 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.empty.html

- **Idempotent saves**  
  If a CSV already exists for a ticker, the runner records the existing file and skips writing a new one to avoid duplicates.

- **Timestamped filenames**  
  Filenames use a sortable UTC timestamp (e.g. `AAPL_20251012T161500Z.csv`) generated with `datetime.now(timezone.utc).strftime('%Y%m%dT%H%M%SZ')`.

- **Error handling**  
  Wraps `df.to_csv()` in `try/except` so one failure doesn't halt the entire loop.

References:
- https://docs.python.org/3/library/os.html#os.makedirs
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html


## Fetch and Preview Hourly Data for Each Ticker


In [7]:
# 📦 Build a dictionary to store hourly data for each ticker using fetch_hourly_history()

# Initialize an empty dictionary to hold the DataFrames
data = {}

# Loop through each ticker symbol in the list
for ticker in tickers:
    # Fetch hourly historical data using the custom function
    df = fetch_hourly_history(ticker)
    
    # Skip this ticker if no data is returned or the DataFrame is empty
    if df is None or df.empty:
        print(f'No data for {ticker}')
        continue
    
    # Store the valid DataFrame in the dictionary using the ticker as the key
    data[ticker] = df

# 👀 Optional: Display a preview of the fetched data for each ticker

# Loop through the dictionary to access each ticker and its DataFrame
for sym, df in data.items():
    # Print the ticker symbol and the shape of its DataFrame (rows, columns)
    print(sym, df.shape)
    
    # If previewing is enabled, display the first 5 rows of the DataFrame
    if SHOW_PREVIEW:
        display(df.head())


META (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,705.0,706.619995,690.659973,700.933594,7964634,0.0,0.0,META
2025-10-06 10:30:00-04:00,701.26001,706.116699,699.0,703.590027,2664362,0.0,0.0,META
2025-10-06 11:30:00-04:00,703.659973,708.030029,703.159973,705.799988,1677535,0.0,0.0,META
2025-10-06 12:30:00-04:00,705.785583,709.950012,705.52002,708.565125,1489886,0.0,0.0,META
2025-10-06 13:30:00-04:00,708.537415,716.690002,707.320007,715.119995,2070174,0.0,0.0,META


AAPL (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,257.945007,259.070007,255.050003,257.779999,11669158,0.0,0.0,AAPL
2025-10-06 10:30:00-04:00,257.799988,257.980011,257.279999,257.5513,3447400,0.0,0.0,AAPL
2025-10-06 11:30:00-04:00,257.565002,257.670013,256.769989,257.130005,3501833,0.0,0.0,AAPL
2025-10-06 12:30:00-04:00,257.130005,257.480011,256.130493,256.559998,2877222,0.0,0.0,AAPL
2025-10-06 13:30:00-04:00,256.549988,256.681213,255.5,256.184998,5922536,0.0,0.0,AAPL


AMZN (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,220.919998,220.919998,216.029999,218.330002,13078703,0.0,0.0,AMZN
2025-10-06 10:30:00-04:00,218.330002,219.160004,218.290207,219.108795,4809466,0.0,0.0,AMZN
2025-10-06 11:30:00-04:00,219.145996,220.434998,219.050003,219.604996,4367656,0.0,0.0,AMZN
2025-10-06 12:30:00-04:00,219.600006,220.565506,219.550003,220.240005,2776867,0.0,0.0,AMZN
2025-10-06 13:30:00-04:00,220.25,221.190002,220.229996,221.169998,3683984,0.0,0.0,AMZN


NFLX (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,1160.369995,1162.119995,1150.5,1151.869995,551078,0.0,0.0,NFLX
2025-10-06 10:30:00-04:00,1151.099976,1153.14502,1145.47998,1150.400024,407082,0.0,0.0,NFLX
2025-10-06 11:30:00-04:00,1150.415039,1153.439941,1146.98999,1150.0,240097,0.0,0.0,NFLX
2025-10-06 12:30:00-04:00,1150.622559,1154.219971,1148.987549,1153.330444,151067,0.0,0.0,NFLX
2025-10-06 13:30:00-04:00,1153.089966,1161.97998,1152.994995,1160.98999,233125,0.0,0.0,NFLX


GOOG (35, 8)


Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,Ticker
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-06 09:30:00-04:00,246.139999,249.660004,245.690002,246.098007,4373843,0.0,0.0,GOOG
2025-10-06 10:30:00-04:00,246.050003,247.281998,246.039993,247.039993,1614988,0.0,0.0,GOOG
2025-10-06 11:30:00-04:00,247.079895,249.419998,247.050003,249.054993,1251392,0.0,0.0,GOOG
2025-10-06 12:30:00-04:00,249.059998,250.5,248.850006,249.889999,1568368,0.0,0.0,GOOG
2025-10-06 13:30:00-04:00,249.919998,251.619293,246.740005,251.399994,2780352,0.0,0.0,GOOG


### Explanation of Key Concepts with Sources

This block loops through each ticker, fetches hourly data using the `fetch_hourly_history()` function, stores it in a dictionary, and optionally previews the first few rows for each stock.

- **`data = {}`**  
  Initializes an empty dictionary to store each ticker’s DataFrame.  
  📖 *Fluent Python* — Chapter 1: Dictionaries  
  📘 https://realpython.com/python-dicts/

- **`for ticker in tickers:`**  
  Iterates through each stock symbol in the `tickers` list.  
  📘 https://www.w3schools.com/python/python_for_loops.asp

- **`df = fetch_hourly_history(ticker)`**  
  Calls a custom function to retrieve hourly stock data using `yfinance`.  
  📘 https://pypi.org/project/yfinance/

- **`if df is None or df.empty:`**  
  Validates the returned DataFrame to ensure it contains usable data.  
  📘 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.empty.html

- **`data[ticker] = df`**  
  Stores the valid DataFrame in the dictionary using the ticker as the key.

- **`for sym, df in data.items():`**  
  Loops through each key-value pair in the dictionary to access the ticker (`sym`) and its DataFrame (`df`).  
  📘 https://docs.python.org/3/library/stdtypes.html#dict.items

- **`print(sym, df.shape)`**  
  Prints the ticker symbol and the shape of its DataFrame (rows, columns).  
  📘 https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shape.html

- **`display(df.head())`**  
  Displays the first five rows of the DataFrame in a notebook-friendly format.  
  📘 https://ipython.readthedocs.io/en/stable/api/generated/IPython.display.html  
  📖 *Python for Data Analysis* — Chapter 5: Data Loading and Storage

This structure ensures that each ticker’s data is validated, stored, and previewed in a clean, modular way—ideal for exploratory analysis or debugging.


In [8]:
# Diagnostic: show effective tickers and detect duplicates
# from collections import Counter ( Imported at top of notebook )

# Print the list of tickers currently in use
print('Effective tickers:', tickers)

# Count how many times each ticker appears in the list
counts = Counter(tickers)

# Identify any tickers that appear more than once
dups = [t for t, c in counts.items() if c > 1]

# Print a message based on whether duplicates were found
if dups:
    print('Duplicate tickers found:', dups)
else:
    print('No duplicate tickers')


Effective tickers: ['META', 'AAPL', 'AMZN', 'NFLX', 'GOOG']
No duplicate tickers


# END