# Project 1: Trading with Momentum
## Overview and Instructions
In this project, I’ll be working on implementing functions to create momentum-based trading strategies. For each task, there’s a specific function to complete, along with detailed instructions. After writing the function, I can test it using the provided unit tests. These tests, included in the project_tests package, won’t guarantee my solution is perfect but will help catch major errors.

## Required Packages
I’ll be using familiar libraries like Pandas and NumPy, which are already imported for me. It’s best not to add extra import statements, as this could cause issues with the grading system.

In addition to the standard libraries, there are three custom modules to help:

* helper and project_helper: These provide utility functions and tools for plotting.
* project_tests: This module has unit tests to validate my solutions for each task.

### Install Packages and RESTART the Kernel

In [1]:
import sys
!{sys.executable} -m pip install -r requirements.txt

Collecting plotly>=4.0.0 (from -r requirements.txt (line 6))
  Using cached plotly-5.24.1-py3-none-any.whl.metadata (7.3 kB)
Using cached plotly-5.24.1-py3-none-any.whl (19.1 MB)
Installing collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 3.10.0
    Uninstalling plotly-3.10.0:
      Successfully uninstalled plotly-3.10.0
Successfully installed plotly-5.24.1


In [2]:
!python -m pip install plotly==3.10.0 --no-cache

Collecting plotly==3.10.0
  Downloading plotly-3.10.0-py2.py3-none-any.whl.metadata (6.2 kB)
Downloading plotly-3.10.0-py2.py3-none-any.whl (41.5 MB)
   ---------------------------------------- 41.5/41.5 MB 3.6 MB/s eta 0:00:00
Installing collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 5.24.1
    Uninstalling plotly-5.24.1:
      Successfully uninstalled plotly-5.24.1
Successfully installed plotly-3.10.0


In [3]:
# Verify
# Should return plotly==3.10.0
import plotly
print(plotly.__version__)

3.10.0


### Load Packages

In [4]:
import pandas as pd
import numpy as np
import helper
import project_helper
import project_tests
import yfinance as yf

## Market Data
### Load Data
For most of the projects, we utilize end-of-day stock market data. This dataset includes information on numerous stocks, with a particular focus on those listed in the S&P 500 index. To simplify and optimize processing, we have restricted the data to a specific time period rather than using the entire dataset.

In [5]:
# Define the tickers and date range
# Fetch S&P 500 tickers from Wikipedia
def get_sp500_tickers():
    url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
    sp500_table = pd.read_html(url)
    tickers = sp500_table[0]['Symbol'].tolist()
    return tickers

# Get the tickers
tickers = get_sp500_tickers()

# Select the first 100 tickers
selected_tickers = tickers[:100]
start_date = '2020-01-01'
end_date = '2020-12-31'

# Fetch stock data
data = yf.download(selected_tickers, start=start_date, end=end_date)

# Extract Close prices from the MultiIndex
close_prices = data.xs(key='Close', level='Price', axis=1)

# Convert the data to match your script's format
df = close_prices.reset_index().melt(id_vars=['Date'], var_name='ticker', value_name='adj_close')
df.rename(columns={'Date': 'date'}, inplace=True)

# Save the data to a CSV file
df.to_csv('eod-quotemedia.csv', index=False)

print("Data saved in the required format!")

df = pd.read_csv('eod-quotemedia.csv', parse_dates=['date'], index_col=False)

close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')

print('Loaded Data')

[*********************100%***********************]  100 of 100 completed

2 Failed downloads:
['BRK.B']: YFTzMissingError('$%ticker%: possibly delisted; no timezone found')
['BF.B']: YFPricesMissingError('$%ticker%: possibly delisted; no price data found  (1d 2020-01-01 -> 2020-12-31)')


Data saved in the required format!
Loaded Data


### View Data
Run the cell below to see what the data looks like for `close`.

In [6]:
project_helper.print_dataframe(close)

### Stock Example
Let's see what a single stock looks like from the closing prices. For this example and future display examples in this project, we'll use Apple's stock (AAPL). If we tried to graph all the stocks, it would be too much information.

In [7]:
apple_ticker = 'AAPL'
project_helper.plot_stock(close[apple_ticker], '{} Stock'.format(apple_ticker))

## Resample Adjusted Prices

In this project, I don't need to base my trading signal on daily prices. For instance, I can use month-end prices to make trades once a month. To do this, I first need to resample the daily adjusted closing prices into monthly intervals and select the last available price for each month.

I’ll implement the `resample_prices` function to resample `close_prices` at the frequency specified by `freq`.

In [8]:
def resample_prices(close_prices, freq='M'):
    """
    Resample close prices for each ticker at specified frequency.
    
    Parameters
    ----------
    close_prices : DataFrame
        Close prices for each ticker and date
    freq : str
        What frequency to sample at
        For valid freq choices, see http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
    
    Returns
    -------
    prices_resampled : DataFrame
        Resampled prices for each ticker and date
    """

    # Resample and select the last observation
    return close_prices.resample(freq).last()

project_tests.test_resample_prices(resample_prices)

Tests Passed


### View Data
Let's apply this function to `close` and view the results.

In [9]:
monthly_close = resample_prices(close)
project_helper.plot_resampled_prices(
    monthly_close.loc[:, apple_ticker],
    close.loc[:, apple_ticker],
    '{} Stock - Close Vs Monthly Close'.format(apple_ticker))

## Compute Log Returns

To calculate the log returns ($R_t$) from prices ($P_t$), I use the formula:

$$R_t = log_e(P_t) - log_e(P_{t-1})$$

Here, ln represents the natural logarithm. Log returns are super useful for momentum analysis in trading.

I’ll implement the `compute_log_returns` function, which takes a DataFrame (like the one I get from `resample_prices`) and outputs another DataFrame with the log returns for each ticker. For this, I’ll use Numpy’s [log function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html) to make the calculations straightforward.

In [10]:
def compute_log_returns(prices):
    """
    Compute log returns for each ticker.
    
    Parameters
    ----------
    prices : DataFrame
        Prices for each ticker and date
    
    Returns
    -------
    log_returns : DataFrame
        Log returns for each ticker and date
    """
    
    return np.log(prices) - np.log(prices.shift(1))

project_tests.test_compute_log_returns(compute_log_returns)

Tests Passed


### View Data
Using the same data returned from `resample_prices`, we'll generate the log returns.

In [11]:
monthly_close_returns = compute_log_returns(monthly_close)
project_helper.plot_returns(
    monthly_close_returns.loc[:, apple_ticker],
    'Log Returns of {} Stock (Monthly)'.format(apple_ticker))

## Shift Returns

To implement the `shift_returns` function, I’ll write a method that shifts the log returns by a specified number of steps (`shift_n`) either forward or backward in the time series. Here’s how it works:

* If `shift_n` is positive, the returns are shifted forward in time.
* If `shift_n` is negative, the returns are shifted backward in time.
* The result is a DataFrame with the same structure as the input but with NaNs where data isn't available after the shift.

For example, if I have the following returns data:

```
                           Returns
               A         B         C         D
2013-07-08     0.015     0.082     0.096     0.020     ...
2013-07-09     0.037     0.095     0.027     0.063     ...
2013-07-10     0.094     0.001     0.093     0.019     ...
2013-07-11     0.092     0.057     0.069     0.087     ...
...            ...       ...       ...       ...
```
If `shift_n` = 2, the output will look like:
```
                        Shift Returns
               A         B         C         D
2013-07-08     NaN       NaN       NaN       NaN       ...
2013-07-09     NaN       NaN       NaN       NaN       ...
2013-07-10     0.015     0.082     0.096     0.020     ...
2013-07-11     0.037     0.095     0.027     0.063     ...
...            ...       ...       ...       ...
```
If `shift_n` = -2, the output will look like:
```
                        Shift Returns
               A         B         C         D
2013-07-08     0.094     0.001     0.093     0.019     ...
2013-07-09     0.092     0.057     0.069     0.087     ...
...            ...       ...       ...       ...       ...
...            ...       ...       ...       ...       ...
...            NaN       NaN       NaN       NaN       ...
...            NaN       NaN       NaN       NaN       ...
```
I’ll implement this function to take a DataFrame and the `shift_n` parameter and return the shifted DataFrame.

In [12]:
def shift_returns(returns, shift_n):
    """
    Generate shifted returns
    
    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    shift_n : int
        Number of periods to move, can be positive or negative
    
    Returns
    -------
    shifted_returns : DataFrame
        Shifted returns for each ticker and date
    """
    
    return returns.shift(shift_n)

project_tests.test_shift_returns(shift_returns)

Tests Passed


### View Data
Let's get the previous month's and next month's returns.

In [13]:
prev_returns = shift_returns(monthly_close_returns, 1)
lookahead_returns = shift_returns(monthly_close_returns, -1)

project_helper.plot_shifted_returns(
    prev_returns.loc[:, apple_ticker],
    monthly_close_returns.loc[:, apple_ticker],
    'Previous Returns of {} Stock'.format(apple_ticker))
project_helper.plot_shifted_returns(
    lookahead_returns.loc[:, apple_ticker],
    monthly_close_returns.loc[:, apple_ticker],
    'Lookahead Returns of {} Stock'.format(apple_ticker))

## Generate Trading Signal

A trading signal is a sequence of trading actions, or results that can be used to take trading actions. A common form is to produce a "long" and "short" portfolio of stocks on each date (e.g. end of each month, or whatever frequency you desire to trade at). This signal can be interpreted as rebalancing your portfolio on each of those dates, entering long ("buy") and short ("sell") positions as indicated.

Here's a strategy that we will try:
> For each month-end observation period, rank the stocks by _previous_ returns, from the highest to the lowest. Select the top performing stocks for the long portfolio, and the bottom performing stocks for the short portfolio.

I will implement the `get_top_n` function to get the top performing stock for each month. Then, I will get the top performing stocks from `prev_returns` by assigning them a value of 1. For all other stocks, I will give them a value of 0. For exemple, we have the following `prev_returns`:

```
                                     Previous Returns
               A         B         C         D         E         F         G
2013-07-08     0.015     0.082     0.096     0.020     0.075     0.043     0.074
2013-07-09     0.037     0.095     0.027     0.063     0.024     0.086     0.025
...            ...       ...       ...       ...       ...       ...       ...
```

The function `get_top_n` with `top_n` set to 3 should return the following:
```
                                     Previous Returns
               A         B         C         D         E         F         G
2013-07-08     0         1         1         0         1         0         0
2013-07-09     0         1         0         1         0         1         0
...            ...       ...       ...       ...       ...       ...       ...
```
*Note: We may have to use Panda's [`DataFrame.iterrows`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.iterrows.html) with [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) in order to implement the function.*

In [14]:
def get_top_n(prev_returns, top_n):
    """
    Select the top performing stocks
    
    Parameters
    ----------
    prev_returns : DataFrame
        Previous shifted returns for each ticker and date
    top_n : int
        The number of top performing stocks to get
    
    Returns
    -------
    top_stocks : DataFrame
        Top stocks for each ticker and date marked with a 1
    """
    
    new_returns = prev_returns.copy()
    
    for index, row in new_returns.iterrows():
        data = row.nlargest(top_n)
        for col_name, pos in row.items():
            if pos in data.values:
                new_returns.loc[index, col_name] = 1.0
            else:
                new_returns.loc[index, col_name] = 0
                
    
    return new_returns.astype('int64')

project_tests.test_get_top_n(get_top_n)

Tests Passed


### View Data
We want to get the best performing and worst performing stocks. To get the best performing stocks, we'll use the `get_top_n` function. To get the worst performing stocks, we'll also use the `get_top_n` function. However, we pass in `-1*prev_returns` instead of just `prev_returns`. Multiplying by negative one will flip all the positive returns to negative and negative returns to positive. Thus, it will return the worst performing stocks.

In [15]:
top_bottom_n = 50
df_long = get_top_n(prev_returns, top_bottom_n)
df_short = get_top_n(-1*prev_returns, top_bottom_n)
project_helper.print_top(df_long, 'Longed Stocks')
project_helper.print_top(df_short, 'Shorted Stocks')

10 Most Longed Stocks:
A, CE, CAT, ALB, APTV, GOOG, BALL, CDNS, ANSS, ALGN
10 Most Shorted Stocks:
T, BAX, AFL, AVB, ATO, BXP, BAC, ALL, AXP, BDX


## Projected Returns
It's now time to check if own trading signal has the potential to become profitable!

To do this, I’ll calculate the net returns for the portfolio. I’m simplifying things by assuming an equal dollar investment across all stocks. This way, I can compute portfolio returns as the arithmetic average of individual stock returns.

I need to implement the `portfolio_returns` function to compute the expected portfolio returns. I’ll use `df_long` to indicate which stocks to long and `df_short` to indicate which stocks to short, then calculate the returns using `lookahead_returns`. To make the calculation easier, I have `n_stocks`, which tells me how many stocks I’m investing in for a single period.

In [16]:
def portfolio_returns(df_long, df_short, lookahead_returns, n_stocks):
    """
    Compute expected returns for the portfolio, assuming equal investment in each long/short stock.
    
    Parameters
    ----------
    df_long : DataFrame
        Top stocks for each ticker and date marked with a 1
    df_short : DataFrame
        Bottom stocks for each ticker and date marked with a 1
    lookahead_returns : DataFrame
        Lookahead returns for each ticker and date
    n_stocks: int
        The number number of stocks chosen for each month
    
    Returns
    -------
    portfolio_returns : DataFrame
        Expected portfolio returns for each ticker and date
    """
    
    return (df_long * lookahead_returns + df_short * lookahead_returns * -1) / n_stocks

project_tests.test_portfolio_returns(portfolio_returns)

Tests Passed


### View Data
Time to see how the portfolio did.

In [17]:
expected_portfolio_returns = portfolio_returns(df_long, df_short, lookahead_returns, 2*top_bottom_n)
project_helper.plot_returns(expected_portfolio_returns.T.sum(), 'Portfolio Returns')

## Statistical Tests
### Annualized Rate of Return

In [18]:
expected_portfolio_returns_by_date = expected_portfolio_returns.T.sum().dropna()
portfolio_ret_mean = expected_portfolio_returns_by_date.mean()
portfolio_ret_ste = expected_portfolio_returns_by_date.sem()
portfolio_ret_annual_rate = (np.exp(portfolio_ret_mean * 12) - 1) * 100

print("""
Mean:                       {:.6f}
Standard Error:             {:.6f}
Annualized Rate of Return:  {:.2f}%
""".format(portfolio_ret_mean, portfolio_ret_ste, portfolio_ret_annual_rate))


Mean:                       -0.002979
Standard Error:             0.002978
Annualized Rate of Return:  -3.51%



The annualized rate of return allows us to compare the rate of return from this strategy to other quoted rates of return, which are usually quoted on an annual basis. 

### T-Test
Our null hypothesis ($H_0$) is that the actual mean return from the signal is zero. We'll perform a one-sample, one-sided t-test on the observed mean return, to see if we can reject $H_0$.

We'll need to first compute the t-statistic, and then find its corresponding p-value. The p-value will indicate the probability of observing a t-statistic equally or more extreme than the one we observed if the null hypothesis were true. A small p-value means that the chance of observing the t-statistic we observed under the null hypothesis is small, and thus casts doubt on the null hypothesis. It's good practice to set a desired level of significance or alpha ($\alpha$) _before_ computing the p-value, and then reject the null hypothesis if $p < \alpha$.

For this project, we'll use $\alpha = 0.05$, since it's a common value to use.

I will implement the `analyze_alpha` function to perform a t-test on the sample of portfolio returns. We will import the `scipy.stats` module for perform the t-test.

In [19]:
from scipy import stats

def analyze_alpha(expected_portfolio_returns_by_date):
    """
    Perform a t-test with the null hypothesis being that the expected mean return is zero.
    
    Parameters
    ----------
    expected_portfolio_returns_by_date : Pandas Series
        Expected portfolio returns for each date
    
    Returns
    -------
    t_value
        T-statistic from t-test
    p_value
        Corresponding p-value
    """
    
    t_value, p_value = stats.ttest_1samp(expected_portfolio_returns_by_date, popmean=0)

    p_value /= 2
    
    return t_value, p_value

project_tests.test_analyze_alpha(analyze_alpha)

Tests Passed


### View Data
Let's see what values we get with our portfolio.

In [20]:
t_value, p_value = analyze_alpha(expected_portfolio_returns_by_date)
print("""
Alpha analysis:
 t-value:        {:.3f}
 p-value:        {:.6f}
""".format(t_value, p_value))


Alpha analysis:
 t-value:        -1.000
 p-value:        0.169330

