# Task 1 — Financial Analysis (Cleaning, Returns, Stationarity, Risk Metrics)

This notebook is primarily a **reporting notebook**: the official deliverables are generated by scripts and saved to disk.

## Rubric-fast deliverables covered here
1. Scaling/normalization evidence (a saved scaled dataset)
2. Three visualizations saved as files
3. Returns + stationarity (ADF) + risk metrics loaded from artifacts

**Artifacts expected (after running scripts):**
- `data/task1/processed/prices.parquet`
- `data/task1/processed/returns.parquet`
- `data/task1/processed/scaled_task1_prices.parquet` ✅ scaling evidence
- `data/task1/processed/task1_adf_results.csv`
- `data/task1/processed/task1_risk_metrics.csv`
- `outputs/task1/viz/task1_prices_timeseries.png` ✅ plot 1
- `outputs/task1/viz/task1_daily_pct_change.png` ✅ plot 2
- `outputs/task1/viz/task1_rolling_mean_std.png` ✅ plot 3


In [2]:
import os
import sys
from pathlib import Path

import pandas as pd
import matplotlib.pyplot as plt


def _find_repo_root(start: Path) -> Path:
    """Walk upward until we find a folder containing both `src/` and `outputs/` or `data/`."""
    start = start.resolve()
    for candidate in [start, *start.parents]:
        if (candidate / "src").is_dir() and ((candidate / "outputs").exists() or (candidate / "data").exists()):
            return candidate
    return start


REPO_ROOT = _find_repo_root(Path.cwd())
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

from src import config

print('Notebook working directory:', os.getcwd())
print('Repo root:', REPO_ROOT)
print('PRICES_PATH:', config.PRICES_PATH)
print('RETURNS_PATH:', config.RETURNS_PATH)
print('TASK1_SCALED_PRICES_PATH:', getattr(config, 'TASK1_SCALED_PRICES_PATH', None))
print('TASK1_VIZ_DIR:', getattr(config, 'TASK1_VIZ_DIR', None))


Notebook working directory: d:\Python\Week 9\portfolio-optimization\notebooks
Repo root: D:\Python\Week 9\portfolio-optimization
PRICES_PATH: data/processed/prices.parquet
RETURNS_PATH: data/processed/returns.parquet
TASK1_SCALED_PRICES_PATH: None
TASK1_VIZ_DIR: None


## 1) Load Task 1 datasets (prices + returns)

These should already exist if you ran your Task 1 pipeline scripts.

In [3]:
prices = pd.read_parquet(config.PRICES_PATH)
returns = pd.read_parquet(config.RETURNS_PATH)

display(prices.head())
display(returns.head())
print('prices shape:', prices.shape)
print('returns shape:', returns.shape)

Unnamed: 0,date,asset,open,high,low,close,adj_close,volume
0,2015-01-02,BND,82.43,82.690002,82.419998,82.650002,60.385941,2218800
1,2015-01-05,BND,82.739998,82.919998,82.699997,82.889999,60.561348,5820100
2,2015-01-06,BND,83.029999,83.379997,83.029999,83.129997,60.736664,3887600
3,2015-01-07,BND,83.139999,83.279999,83.050003,83.18,60.773182,2433400
4,2015-01-08,BND,83.110001,83.110001,82.970001,83.050003,60.678215,1873400


Unnamed: 0,date,asset,open,high,low,close,adj_close,volume,return
0,2015-01-02,BND,82.43,82.690002,82.419998,82.650002,60.385941,2218800,
1,2015-01-05,BND,82.739998,82.919998,82.699997,82.889999,60.561348,5820100,0.002905
2,2015-01-06,BND,83.029999,83.379997,83.029999,83.129997,60.736664,3887600,0.002895
3,2015-01-07,BND,83.139999,83.279999,83.050003,83.18,60.773182,2433400,0.000601
4,2015-01-08,BND,83.110001,83.110001,82.970001,83.050003,60.678215,1873400,-0.001563


prices shape: (8325, 8)
returns shape: (8325, 9)


## 2) Scaling / normalization evidence

Rubric requirement: demonstrate scaling/normalization. Preferred evidence: a saved scaled dataset.

We expect `scaled_task1_prices.parquet` to exist. If it does not, run your scaling script (recommended):

```bash
python scripts/02_task1_scale_and_viz.py
```


In [4]:
scaled_path = getattr(config, 'TASK1_SCALED_PRICES_PATH', None)
if scaled_path is None:
    raise ValueError('TASK1_SCALED_PRICES_PATH not found in config. Please add it and rerun.')

if not os.path.exists(scaled_path):
    raise FileNotFoundError(
        f"Missing scaled dataset: {scaled_path}. "
        "Run: python scripts/02_task1_scale_and_viz.py"
    )

scaled_prices = pd.read_parquet(scaled_path)
display(scaled_prices.head())
print('scaled_prices shape:', scaled_prices.shape)
print('scaled columns:', list(scaled_prices.columns))

ValueError: TASK1_SCALED_PRICES_PATH not found in config. Please add it and rerun.

## 3) Visualizations (must exist as files)

This section verifies the three plot files are present.

If any are missing, rerun:
```bash
python scripts/02_task1_scale_and_viz.py
```

In [None]:
viz_dir = getattr(config, 'TASK1_VIZ_DIR', 'outputs/task1/viz')
expected = [
    os.path.join(viz_dir, 'task1_prices_timeseries.png'),
    os.path.join(viz_dir, 'task1_daily_pct_change.png'),
    os.path.join(viz_dir, 'task1_rolling_mean_std.png'),
]

missing = [p for p in expected if not os.path.exists(p)]
print('Expected plots:')
for p in expected:
    print(' -', p, 'OK' if os.path.exists(p) else 'MISSING')

if missing:
    raise FileNotFoundError(
        'Missing required Task 1 plot files:\n' + '\n'.join(missing) +
        '\n\nRerun: python scripts/02_task1_scale_and_viz.py'
    )

### Display the saved plot images (optional)

If your Jupyter environment supports it, we can display them inline.

In [None]:
from IPython.display import Image, display

for p in expected:
    display(Image(filename=p))

## 4) Stationarity evidence (ADF results)

ADF outputs should be saved to CSV by your Task 1 scripts.

In [None]:
adf_path = config.TASK1_ADF_PATH
if not os.path.exists(adf_path):
    raise FileNotFoundError(f'Missing ADF results: {adf_path}. Run Task 1 scripts.')

adf = pd.read_csv(adf_path)
display(adf)
print('ADF file:', adf_path)

## 5) Risk metrics evidence

Risk metrics (e.g., annualized return/volatility, Sharpe, VaR) should be saved by scripts.

In [None]:
risk_path = config.TASK1_RISK_PATH
if not os.path.exists(risk_path):
    raise FileNotFoundError(f'Missing risk metrics: {risk_path}. Run Task 1 scripts.')

risk = pd.read_csv(risk_path)
display(risk)
print('Risk metrics file:', risk_path)

## 6) Quick interpretation (short)

- Prices are typically non-stationary; returns/log returns tend to be closer to stationary.
- Scaling provides comparable magnitudes across assets for visualization and certain models.
- Risk metrics summarize reward vs risk and tail behavior for each asset.