# ML-Equity-Ranking: End-to-End Workflow

This notebook demonstrates how to run, extend, and automate the ML-Equity-Ranking repository.

**Outline:**
1. Clone and Set Up the Repository
2. Explore Existing Codebase
3. Run Main Script
4. Run and Extend Unit Tests
5. Create and Run a Sample Notebook
6. Automate Workflow with Scripts

## 1. Clone and Set Up the Repository

- Clone the repository (if not already cloned):
  ```bash
  git clone https://github.com/annashahed09-sudo/ML-Equity-Ranking.git
  cd ML-Equity-Ranking
  ```
- Install dependencies:
  ```bash
  pip install -r requirements.txt
  ```
- (Optional) Set up a virtual environment for isolation.

## 2. Explore Existing Codebase

List the main modules and scripts in the repository. This helps you understand the project structure.

In [None]:
# List main folders and files
import os
for root, dirs, files in os.walk("../src"):
    print(f"{root}/")
    for file in files:
        print(f"  - {file}")

## 3. Run Main Script

If you have a main script (e.g., `main.py`), run it here. Otherwise, demonstrate running a core pipeline (e.g., feature engineering and model training) directly in the notebook.

In [None]:
# Example: Run feature engineering and Ridge model training
import pandas as pd
from src.data_loader import load_yfinance_data
from src.features import compute_features, compute_forward_returns, get_feature_columns
from src.models import RidgeModel
from src.evaluation import compute_ic_by_date, long_short_portfolio_returns

# Download a small sample (AAPL, MSFT)
df = load_yfinance_data(["AAPL", "MSFT"], start_date="2022-01-01", end_date="2022-06-01")
df = compute_features(df)
df = compute_forward_returns(df)

feature_cols = get_feature_columns(df)
train = df.dropna(subset=feature_cols + ["forward_return"])

X = train[feature_cols]
y = train["forward_return"]

model = RidgeModel()
model.fit(X, y)
train["model_score"] = model.predict(X)

# Evaluate IC by date
ic_by_date = compute_ic_by_date(train)
print("Mean IC:", ic_by_date.mean())

# Portfolio returns
returns = long_short_portfolio_returns(train)
print(returns.head())

## 4. Run and Extend Unit Tests

Use pytest to discover and run all tests. Add a new test case to extend coverage.

In [None]:
# Run all tests using pytest (output shown in terminal)
!pytest tests/ --maxfail=1 --disable-warnings -q

In [None]:
# Example: Add a new test for a custom feature (extend in tests/test_features.py)
def test_custom_feature():
    import pandas as pd
    import numpy as np
    from src.features import compute_rolling_momentum
    df = pd.DataFrame({'close': np.arange(1, 21)})
    mom = compute_rolling_momentum(df, window=5)
    assert mom.isnull().sum() > 0
print('Custom test ready to add to tests/test_features.py')

## 5. Create and Run a Sample Notebook

Add a new notebook to demonstrate the full pipeline: data loading, feature engineering, model training, evaluation, and visualization.

In [None]:
# Sample: Visualize IC and portfolio returns
import matplotlib.pyplot as plt

plt.figure(figsize=(10,4))
ic_by_date.plot(title="Information Coefficient by Date")
plt.show()

plt.figure(figsize=(10,4))
returns.set_index('date')[['gross_return','net_return']].cumsum().plot(title="Cumulative Portfolio Returns")
plt.show()

## 6. Automate Workflow with Scripts

Write a Python script to automate running the main pipeline and tests. Show how to execute it from the terminal.

In [None]:
# Example: Write a script to run the pipeline and tests
with open('../run_all.py', 'w') as f:
    f.write('''\
import os
os.system("pytest tests/ --maxfail=1 --disable-warnings -q")
# You can add more automation here (e.g., run main pipeline, save results)
print("All tests and main pipeline executed.")
''')
print('Script run_all.py created. Run it with: python run_all.py')