# Advanced Machine Learning - Project

## Imported libraries

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

## Import the data

### 1. Core Features

- ## **Lagged Returns**  
  (*Return* 𝑡−1, *Return* 𝑡−2, …)

    **a. For SVM and Neural Networks:**

    **Nonlinear Relationships:**

    SVMs and NNs can capture complex, nonlinear dependencies between lagged returns and future returns.
    Lagged returns provide the primary input for these models to learn patterns such as momentum or mean-reversion.

    **Feature Engineering:**

    Financial time series data often exhibit dependencies where past values (lagged returns) help predict future values.
    Without lagged returns, SVMs and NNs will lack meaningful input features.

    **b. For ARMA (or ARIMA):**

    **Model Assumption:**

    ARMA processes explicitly model dependencies in time series data using lagged values (AR part) and errors (MA part).
    By default, ARMA relies on past observations and residuals, making lagged returns a natural input for comparison.

    **Direct Comparison:**

    By including lagged returns as input features for SVM and NNs, the dataset aligns conceptually with the ARMA model, which inherently uses lagged terms.


- ## **Volatility** (e.g., rolling 10-day standard deviation)

  Why it's useful: Markets tend to exhibit volatility clustering, where high-volatility periods are followed by more high-volatility periods, and low-volatility periods follow low-volatility periods.

### 2. Key Technical Indicators

- ## **Simple Moving Averages (SMA)**  
  - ${SMA}_5, {SMA}_{20}$

    ### What is SMA (Simple Moving Average)?
    The **Simple Moving Average (SMA)** is a technical indicator that calculates the average of a given data set (e.g., prices or returns) over a specified number of periods. It smooths out price data to identify trends over time.

    #### Formula for SMA
    $$\text{SMA}_t = \frac{P_{t-1} + P_{t-2} + \ldots + P_{t-N}}{N}$$

    Where:
    -  $P_{t-1}, P_{t-2}, \ldots, P_{t-N}$  are the prices for the most recent $N$ periods.
    -  $N$ is the window size (e.g., 10 days, 20 days).

    ---

    ### Why Use SMA as a Feature?
    SMA is a powerful feature in financial modeling due to its ability to capture market trends. Here's why:

    #### 1. Identifies Trends
    - SMA smooths out short-term price fluctuations, helping to identify the overall direction of the market.
    - **Example**: A rising SMA indicates an upward trend, while a declining SMA indicates a downward trend.

    #### 2. Acts as a Support/Resistance Level
    - Many traders and investors view SMA as a key support or resistance level:
      - **Above SMA**: Bullish signal (uptrend).
      - **Below SMA**: Bearish signal (downtrend).

    #### 3. Helps in Detecting Momentum
    - By comparing short-term and long-term SMAs (e.g., 10-day vs. 50-day SMA), you can identify momentum:
      - **Golden Cross**: Short-term SMA crosses above long-term SMA (bullish signal).
      - **Death Cross**: Short-term SMA crosses below long-term SMA (bearish signal).

    #### 4. Useful in Mean-Reversion Strategies
    - Prices often revert to their mean after deviating significantly. SMA can act as a proxy for this mean, helping to predict reversals.

    ---

    ### How Does SMA Enhance Predictive Models?

    #### 1. Trend Representation
    - SMA captures underlying trends that are not evident from raw prices or returns alone.

    #### 2. Momentum and Reversal Patterns
    - Combining SMA with lagged returns or volatility provides insights into both momentum and mean-reversion behaviors.

    #### 3. Risk Reduction
    - Including SMA as a feature reduces the noise in price data, making it easier for models to focus on significant patterns.


- ## **RSI** - 14-day

    ### What is RSI (Relative Strength Index)?

    The **Relative Strength Index (RSI)** is a momentum oscillator used in technical analysis to measure the speed and magnitude of recent price changes. It evaluates overbought or oversold conditions in a market. RSI values range from **0 to 100**.

    #### RSI Formula
    $$\text{RSI} = 100 - \left( \frac{100}{1 + RS} \right)$$
    Where:
    - $$ RS = \frac{\text{Average Gain (over n periods)}}{\text{Average Loss (over n periods)}} $$

    ##### Key Levels
    - **70 and above**: Overbought (price might decrease).
    - **30 and below**: Oversold (price might increase).

    ---

    ### Why RSI is Useful as a Feature?

    #### 1. Detects Momentum and Reversals
    - RSI highlights periods of strong momentum (overbought/oversold) that can predict potential reversals:
      - High RSI (overbought): Indicates a possible downward reversal.
      - Low RSI (oversold): Indicates a possible upward reversal.

    #### 2. Captures Market Sentiment
    - RSI quantifies the balance between buying and selling pressure, serving as a proxy for market sentiment.
      - **Bullish Sentiment**: RSI rising above 50.
      - **Bearish Sentiment**: RSI falling below 50.

    #### 3. Enhances Predictive Power for Returns
    - Since RSI reflects momentum and trend strength, it complements lagged returns and volatility, helping models like SVM and Neural Networks better understand market dynamics.

    #### 4. Widely Used in Trading Strategies
    - RSI is a staple in trading algorithms for designing momentum and mean-reversion strategies.

    ---

    ### How RSI Relates to Predicting Returns
    - **Momentum Signals**: RSI provides momentum signals that help in understanding whether returns are likely to continue in the same direction or reverse.
    - **Reversion to Mean**: RSI-based signals often align with mean-reversion patterns, making it relevant for predicting returns.

    ---

    ### Comparison to Other Features

    | **Feature**           | **Primary Insight**               | **Why Complement RSI?**                        |
    |------------------------|-----------------------------------|------------------------------------------------|
    | **Lagged Returns**     | Historical return dependencies.  | RSI captures trend/momentum beyond raw returns.|
    | **Volatility (StdDev)**| Magnitude of price movement.     | RSI adds directional bias to magnitude data.   |
    | **SMA**                | Long-term/Short-term smoothed trends.       | RSI detects short-term overbought/oversold.    |

    ---


### 3. Macroeconomic and Market Sentiment

- **VIX Index**  
  - (daily value)

- **1 or 2 macroeconomic variables most relevant to your time frame**  
  - (e.g., Treasury yields and CPI)

### 4. Day-of-Week Effects

- **Include dummy variables for the day of the week**


In [11]:
%run process_data.py

[*********************100%***********************]  1 of 1 completed

Fichier téléchargé et sauvegardé dans : data/fred_unrate.csv
DataFrame créé avec succès.
Fichier téléchargé et sauvegardé dans : data/fred_cpi.csv
DataFrame créé avec succès.
Macro data loaded.
S&P 500 data processed.
Merged data saved as 'merged_data.csv'.





In [13]:
data = pd.read_csv('data/merged_data.csv', index_col=0)
print(data.shape)
data.head()

(3502, 11)


Unnamed: 0_level_0,returns,volume,volatility_lag_10,SMA_10,SMA_20,RSI_14,Lag_1,Lag_2,Lag_3,UNRATE,CPI
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2010-02-02,0.012973,4749540000.0,0.012761,-0.004082,-0.001274,56.090437,0.014266,-0.009829,-0.011818,9.8,1.048684
2010-02-03,-0.005474,4285450000.0,0.012572,-0.003569,-0.001703,46.255193,0.012973,0.014266,-0.009829,9.8,1.048684
2010-02-04,-0.031141,5859690000.0,0.014649,-0.004789,-0.003288,41.773785,-0.005474,0.012973,0.014266,9.8,1.048684
2010-02-05,0.002897,6438900000.0,0.013444,-0.002285,-0.003343,53.051474,-0.031141,-0.005474,0.012973,9.8,1.048684
2010-02-08,-0.008863,4089820000.0,0.013352,-0.003631,-0.00393,44.991135,0.002897,-0.031141,-0.005474,9.8,1.048684


## 1. Financial Time series