# Project 1: Trading with Momentum
## Instructions
Each problem consists of a function to implement and instructions on how to implement the function.  The parts of the function that need to be implemented are marked with a `# TODO` comment. After implementing the function, run the cell to test it against the unit tests we've provided. For each problem, we provide one or more unit tests from our `project_tests` package. These unit tests won't tell you if your answer is correct, but will warn you of any major errors. Your code will be checked for the correct solution when you submit it to Udacity.

## Packages
When you implement the functions, you'll only need to you use the packages you've used in the classroom, like [Pandas](https://pandas.pydata.org/) and [Numpy](http://www.numpy.org/). These packages will be imported for you. We recommend you don't add any import statements, otherwise the grader might not be able to run your code.

The other packages that we're importing are `helper`, `project_helper`, and `project_tests`. These are custom packages built to help you solve the problems.  The `helper` and `project_helper` module contains utility functions and graph functions. The `project_tests` contains the unit tests for all the problems.

### Install Packages

In [1]:
import sys
!{sys.executable} -m pip install -r requirements.txt

Collecting cvxpy==1.0.3 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/a1/59/2613468ffbbe3a818934d06b81b9f4877fe054afbf4f99d2f43f398a0b34/cvxpy-1.0.3.tar.gz (880kB)
[K    100% |████████████████████████████████| 880kB 693kB/s ta 0:00:01    58% |██████████████████▋             | 512kB 6.9MB/s eta 0:00:01
Collecting numpy==1.13.3 (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/57/a7/e3e6bd9d595125e1abbe162e323fd2d06f6f6683185294b79cd2cdb190d5/numpy-1.13.3-cp36-cp36m-manylinux1_x86_64.whl (17.0MB)
[K    100% |████████████████████████████████| 17.0MB 39kB/s  eta 0:00:01   33% |██████████▊                     | 5.7MB 25.8MB/s eta 0:00:01    40% |█████████████                   | 6.9MB 26.4MB/s eta 0:00:01    47% |███████████████▎                | 8.1MB 26.4MB/s eta 0:00:01    55% |█████████████████▊              | 9.4MB 26.8MB/s eta 0:00:01    62% |████████████████████            | 10.6MB 26.8MB/s eta 0:00

  Downloading https://files.pythonhosted.org/packages/6f/78/8b96476f4ae426db71c6e86a8e6a81407f015b34547e442291cd397b18f3/dill-0.2.8.2.tar.gz (150kB)
[K    100% |████████████████████████████████| 153kB 3.1MB/s eta 0:00:01
Building wheels for collected packages: cvxpy, plotly, ecos, scs, multiprocess, dill
  Running setup.py bdist_wheel for cvxpy ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/2b/60/0b/0c2596528665e21d698d6f84a3406c52044c7b4ca6ac737cf3
  Running setup.py bdist_wheel for plotly ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/98/54/81/dd92d5b0858fac680cd7bdb8800eb26c001dd9f5dc8b1bc0ba
  Running setup.py bdist_wheel for ecos ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/50/91/1b/568de3c087b3399b03d130e71b1fd048ec072c45f72b6b6e9a
  Running setup.py bdist_wheel for scs ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/ff/f0/aa/530ccd478d7d9900b4e9ef5bc5a39e895ce110bed3d3ac653e
  Running setup.py 

### Load Packages

In [2]:
import pandas as pd
import numpy as np
import helper
import project_helper
import project_tests

## Market Data
### Load Data
The data we use for most of the projects is end of day data. This contains data for many stocks, but we'll be looking at stocks in the S&P 500. We also made things a little easier to run by narrowing down our range of time period instead of using all of the data.

In [3]:
df = pd.read_csv('../../data/project_1/eod-quotemedia.csv', parse_dates=['date'], index_col=False)

close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')

print('Loaded Data')

Loaded Data


### View Data
Run the cell below to see what the data looks like for `close`.

In [4]:
project_helper.print_dataframe(close)

In [5]:
print(close)

ticker               A         AAL          AAP         AAPL        ABBV  \
date                                                                       
2013-07-01 29.99418563 16.17609308  81.13821681  53.10917319 34.92447839   
2013-07-02 29.65013670 15.81983388  80.72207258  54.31224742 35.42807578   
2013-07-03 29.70518453 16.12794994  81.23729877  54.61204262 35.44486235   
2013-07-05 30.43456826 16.21460758  81.82188233  54.17338125 35.85613355   
2013-07-08 30.52402098 16.31089385  82.95141667  53.86579916 36.66188936   
2013-07-09 30.68916447 16.71529618  82.43619048  54.81320389 36.35973093   
2013-07-10 31.17771395 16.53235227  81.99032166  54.60295791 36.85493502   
2013-07-11 31.45983407 16.72492481  82.00022986  55.45406479 37.08155384   
2013-07-12 31.48047700 16.90786872  81.91105609  55.35309481 38.15724076   
2013-07-15 31.72819223 17.10044125  82.61453801  55.47379158 37.79303181   
2013-07-16 31.59057266 17.28338516  81.62371841  55.83133953 37.10696377   
2013-07-17 3

### Stock Example
Let's see what a single stock looks like from the closing prices. For this example and future display examples in this project, we'll use Apple's stock (AAPL). If we tried to graph all the stocks, it would be too much information.

In [6]:
apple_ticker = 'AAPL'
project_helper.plot_stock(close[apple_ticker], '{} Stock'.format(apple_ticker))

## Resample Adjusted Prices

The trading signal you'll develop in this project does not need to be based on daily prices, for instance, you can use month-end prices to perform trading once a month. To do this, you must first resample the daily adjusted closing prices into monthly buckets, and select the last observation of each month.

Implement the `resample_prices` to resample `close_prices` at the sampling frequency of `freq`.

In [7]:
def resample_prices(close_prices, freq='M'):
    """
    Resample close prices for each ticker at specified frequency.
    
    Parameters
    ----------
    close_prices : DataFrame
        Close prices for each ticker and date
    freq : str
        What frequency to sample at
        For valid freq choices, see http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
    
    Returns
    -------
    prices_resampled : DataFrame
        Resampled prices for each ticker and date
    """
    # TODO: Implement Function
    check = close_prices.resample(freq).last()
    print(check)
    return check

project_tests.test_resample_prices(resample_prices)

                    WNH         JNG          UWZE         EML        EVPR
2008-08-31  21.05081048 17.01384381   10.98450376 11.24809343 12.96171273
2008-09-30 482.34539247 35.20258059 3516.54167823 66.40531433 13.50396048
2008-10-31  10.91893302 17.90864387   24.80126542 12.48895419 10.52435923
2008-11-30  11.54549538 23.98146843   24.97476306 36.03196210 14.30433232
Tests Passed


### View Data
Let's apply this function to `close` and view the results.

In [8]:
monthly_close = resample_prices(close)
project_helper.plot_resampled_prices(
    monthly_close.loc[:, apple_ticker],
    close.loc[:, apple_ticker],
    '{} Stock - Close Vs Monthly Close'.format(apple_ticker))

ticker               A         AAL          AAP         AAPL        ABBV  \
date                                                                       
2013-07-31 30.77861719 18.63139292  81.73270857  58.73000866 38.52144972   
2013-08-31 32.09288410 15.55986096  79.33492514  63.64994327 36.09056668   
2013-09-30 35.34697923 18.25587647  81.98212977  62.28266407 37.88620154   
2013-10-31 35.00902763 21.15409315  98.34285959  68.28583759 41.39637606   
2013-11-30 36.94707663 22.60801580 100.15741326  73.07037475 41.39637606   
2013-12-31 39.53485221 24.31228275 109.80605374  73.72082947 45.12162270   
2014-01-31 40.19849023 32.30404302 113.90344263  65.78133976 42.40046983   
2014-02-28 39.35511692 35.55851889 126.35434591  69.56208595 43.84740848   
2014-03-31 38.65691442 35.24077420 125.56160936  70.95004943 44.26943225   
2014-04-30 37.44602782 33.76759430 120.39025770  78.00222579 45.26058952   
2014-05-31 39.45552968 38.66856535 123.24889355  84.14255587 47.21597213   
2014-06-30 3

## Compute Log Returns

Compute log returns ($R_t$) from prices ($P_t$) as your primary momentum indicator:

$$R_t = log_e(P_t) - log_e(P_{t-1})$$

Implement the `compute_log_returns` function below, such that it accepts a dataframe (like one returned by `resample_prices`), and produces a similar dataframe of log returns. Use Numpy's [log function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html) to help you calculate the log returns.

test1 - Using variables and functions prefixed with 'test1', I've done the entire calculation on returns instead of log returns to analyse the difference.

For test1_, I chose to work with returns instead of log returns. Since we were not using any kind of compunding, I thought it is better to use returns, especially since we anyways were  using the assumption that ln(1+r) = r for r tends to 0. This gave a significantly different answer as can be matched further.

In [9]:
def compute_log_returns(prices):
    """
    Compute log returns for each ticker.
    
    Parameters
    ----------
    prices : DataFrame
        Prices for each ticker and date
    
    Returns
    -------
    log_returns : DataFrame
        Log returns for each ticker and date
    """
    # TODO: Implement Function
    #print(prices)
    log = np.log(prices) - np.log(prices.shift(1))
    print(log)
    return log

def test1_compute_normal_returns(prices):
    normal = (prices - prices.shift(1))/prices.shift(1)
    return normal
project_tests.test_compute_log_returns(compute_log_returns)

                  VOQS         CNT        VBEO         XAP        PTOC
2008-08-31         nan         nan         nan         nan         nan
2008-09-30  3.13172138  0.72709204  5.76874778  1.77557845  0.04098317
2008-10-31 -3.78816218 -0.67583590 -4.95433863 -1.67093250 -0.24929051
2008-11-30  0.05579709  0.29199789  0.00697116  1.05956179  0.30686995
Tests Passed


### View Data
Using the same data returned from `resample_prices`, we'll generate the log returns.

In [10]:
monthly_close_returns = compute_log_returns(monthly_close)
test1_monthly_close_returns = test1_compute_normal_returns(monthly_close)

project_helper.plot_returns(
    monthly_close_returns.loc[:, apple_ticker],
    'Log Returns of {} Stock (Monthly)'.format(apple_ticker))
project_helper.plot_shifted_returns(
    monthly_close_returns.loc[:, apple_ticker],
    test1_monthly_close_returns.loc[:, apple_ticker],
    'Previous Returns of {} Stock'.format(apple_ticker))

ticker               A         AAL         AAP        AAPL        ABBV  \
date                                                                     
2013-07-31         nan         nan         nan         nan         nan   
2013-08-31  0.04181412 -0.18015337 -0.02977582  0.08044762 -0.06518370   
2013-09-30  0.09657861  0.15979244  0.03282284 -0.02171531  0.04855545   
2013-10-31 -0.00960698  0.14734639  0.18195865  0.09201927  0.08860637   
2013-11-30  0.05388057  0.06647111  0.01828314  0.06772063  0.00000000   
2013-12-31  0.06769609  0.07267716  0.09197258  0.00886237  0.08616823   
2014-01-31  0.01664682  0.28421071  0.03663543 -0.11394918 -0.06220213   
2014-02-28 -0.02120344  0.09598737  0.10373913  0.05588347  0.03355617   
2014-03-31 -0.01790035 -0.00897599 -0.00629368  0.01975642  0.00957880   
2014-04-30 -0.03182502 -0.04270218 -0.04205794  0.09476126  0.02214224   
2014-05-31  0.05227357  0.13552541  0.02346722  0.07577509  0.04229556   
2014-06-30  0.01103586  0.06739797  0.

## Shift Returns
Implement the `shift_returns` function to shift the log returns to the previous or future returns in the time series. For example, the parameter `shift_n` is 2 and `returns` is the following:

```
                           Returns
               A         B         C         D
2013-07-08     0.015     0.082     0.096     0.020     ...
2013-07-09     0.037     0.095     0.027     0.063     ...
2013-07-10     0.094     0.001     0.093     0.019     ...
2013-07-11     0.092     0.057     0.069     0.087     ...
...            ...       ...       ...       ...
```

the output of the `shift_returns` function would be:
```
                        Shift Returns
               A         B         C         D
2013-07-08     NaN       NaN       NaN       NaN       ...
2013-07-09     NaN       NaN       NaN       NaN       ...
2013-07-10     0.015     0.082     0.096     0.020     ...
2013-07-11     0.037     0.095     0.027     0.063     ...
...            ...       ...       ...       ...
```
Using the same `returns` data as above, the `shift_returns` function should generate the following with `shift_n` as -2:
```
                        Shift Returns
               A         B         C         D
2013-07-08     0.094     0.001     0.093     0.019     ...
2013-07-09     0.092     0.057     0.069     0.087     ...
...            ...       ...       ...       ...       ...
...            ...       ...       ...       ...       ...
...            NaN       NaN       NaN       NaN       ...
...            NaN       NaN       NaN       NaN       ...
```
_Note: The "..." represents data points we're not showing._

In [11]:
def shift_returns(returns, shift_n):
    """
    Generate shifted returns
    
    Parameters
    ----------
    returns : DataFrame
        Returns for each ticker and date
    shift_n : int
        Number of periods to move, can be positive or negative
    
    Returns
    -------
    shifted_returns : DataFrame
        Shifted returns for each ticker and date
    """
    # TODO: Implement Function
    print(returns)
    chak = returns.shift(shift_n)
    print(chak)
    return chak

project_tests.test_shift_returns(shift_returns)

                   RTV         OKJ         DOH         JCW         HJZ
2008-08-31         nan         nan         nan         nan         nan
2008-09-30  3.13172138  0.72709204  5.76874778  1.77557845  0.04098317
2008-10-31 -3.78816218 -0.67583590 -4.95433863 -1.67093250 -0.24929051
2008-11-30  0.05579709  0.29199789  0.00697116  1.05956179  0.30686995
                   RTV         OKJ         DOH         JCW         HJZ
2008-08-31         nan         nan         nan         nan         nan
2008-09-30         nan         nan         nan         nan         nan
2008-10-31  3.13172138  0.72709204  5.76874778  1.77557845  0.04098317
2008-11-30 -3.78816218 -0.67583590 -4.95433863 -1.67093250 -0.24929051
Tests Passed


### View Data
Let's get the previous month's and next month's returns.

In [12]:

prev_returns = shift_returns(monthly_close_returns, 1)

lookahead_returns = shift_returns(monthly_close_returns, -1)

test1_prev_returns = shift_returns(test1_monthly_close_returns, 1)
test1_lookahead_returns = shift_returns(test1_monthly_close_returns, -1)

project_helper.plot_shifted_returns(
    prev_returns.loc[:, apple_ticker],
    monthly_close_returns.loc[:, apple_ticker],
    'Previous Returns of {} Stock'.format(apple_ticker))
project_helper.plot_shifted_returns(
    lookahead_returns.loc[:, apple_ticker],
    monthly_close_returns.loc[:, apple_ticker],
    'Lookahead Returns of {} Stock'.format(apple_ticker))

ticker               A         AAL         AAP        AAPL        ABBV  \
date                                                                     
2013-07-31         nan         nan         nan         nan         nan   
2013-08-31  0.04181412 -0.18015337 -0.02977582  0.08044762 -0.06518370   
2013-09-30  0.09657861  0.15979244  0.03282284 -0.02171531  0.04855545   
2013-10-31 -0.00960698  0.14734639  0.18195865  0.09201927  0.08860637   
2013-11-30  0.05388057  0.06647111  0.01828314  0.06772063  0.00000000   
2013-12-31  0.06769609  0.07267716  0.09197258  0.00886237  0.08616823   
2014-01-31  0.01664682  0.28421071  0.03663543 -0.11394918 -0.06220213   
2014-02-28 -0.02120344  0.09598737  0.10373913  0.05588347  0.03355617   
2014-03-31 -0.01790035 -0.00897599 -0.00629368  0.01975642  0.00957880   
2014-04-30 -0.03182502 -0.04270218 -0.04205794  0.09476126  0.02214224   
2014-05-31  0.05227357  0.13552541  0.02346722  0.07577509  0.04229556   
2014-06-30  0.01103586  0.06739797  0.

ticker               A         AAL         AAP        AAPL        ABBV  \
date                                                                     
2013-07-31         nan         nan         nan         nan         nan   
2013-08-31  0.04181412 -0.18015337 -0.02977582  0.08044762 -0.06518370   
2013-09-30  0.09657861  0.15979244  0.03282284 -0.02171531  0.04855545   
2013-10-31 -0.00960698  0.14734639  0.18195865  0.09201927  0.08860637   
2013-11-30  0.05388057  0.06647111  0.01828314  0.06772063  0.00000000   
2013-12-31  0.06769609  0.07267716  0.09197258  0.00886237  0.08616823   
2014-01-31  0.01664682  0.28421071  0.03663543 -0.11394918 -0.06220213   
2014-02-28 -0.02120344  0.09598737  0.10373913  0.05588347  0.03355617   
2014-03-31 -0.01790035 -0.00897599 -0.00629368  0.01975642  0.00957880   
2014-04-30 -0.03182502 -0.04270218 -0.04205794  0.09476126  0.02214224   
2014-05-31  0.05227357  0.13552541  0.02346722  0.07577509  0.04229556   
2014-06-30  0.01103586  0.06739797  0.

ticker               A         AAL         AAP        AAPL        ABBV  \
date                                                                     
2013-07-31         nan         nan         nan         nan         nan   
2013-08-31  0.04270065 -0.16485788 -0.02933689  0.08377207 -0.06310466   
2013-09-30  0.10139616  0.17326733  0.03336746 -0.02148123  0.04975358   
2013-10-31 -0.00956098  0.15875527  0.19956459  0.09638595  0.09265047   
2013-11-30  0.05535855  0.06873009  0.01845130  0.07006632  0.00000000   
2013-12-31  0.07004006  0.07538330  0.09633476  0.00890176  0.08998968   
2014-01-31  0.01678615  0.32871287  0.03731478 -0.10769670 -0.06030707   
2014-02-28 -0.02098022  0.10074516  0.10931104  0.05747445  0.03412553   
2014-03-31 -0.01774109 -0.00893582 -0.00627392  0.01995287  0.00962483   
2014-04-30 -0.03132393 -0.04180328 -0.04118577  0.09939636  0.02238920   
2014-05-31  0.05366395  0.14513829  0.02374474  0.07871993  0.04320276   
2014-06-30  0.01109698  0.06972112  0.

ticker               A         AAL         AAP        AAPL        ABBV  \
date                                                                     
2013-07-31  0.04270065 -0.16485788 -0.02933689  0.08377207 -0.06310466   
2013-08-31  0.10139616  0.17326733  0.03336746 -0.02148123  0.04975358   
2013-09-30 -0.00956098  0.15875527  0.19956459  0.09638595  0.09265047   
2013-10-31  0.05535855  0.06873009  0.01845130  0.07006632  0.00000000   
2013-11-30  0.07004006  0.07538330  0.09633476  0.00890176  0.08998968   
2013-12-31  0.01678615  0.32871287  0.03731478 -0.10769670 -0.06030707   
2014-01-31 -0.02098022  0.10074516  0.10931104  0.05747445  0.03412553   
2014-02-28 -0.01774109 -0.00893582 -0.00627392  0.01995287  0.00962483   
2014-03-31 -0.03132393 -0.04180328 -0.04118577  0.09939636  0.02238920   
2014-04-30  0.05366395  0.14513829  0.02374474  0.07871993  0.04320276   
2014-05-31  0.01109698  0.06972112  0.08709637  0.02766193  0.03883674   
2014-06-30 -0.02350279 -0.09334264 -0.

## Generate Trading Signal

A trading signal is a sequence of trading actions, or results that can be used to take trading actions. A common form is to produce a "long" and "short" portfolio of stocks on each date (e.g. end of each month, or whatever frequency you desire to trade at). This signal can be interpreted as rebalancing your portfolio on each of those dates, entering long ("buy") and short ("sell") positions as indicated.

Here's a strategy that we will try:
> For each month-end observation period, rank the stocks by _previous_ returns, from the highest to the lowest. Select the top performing stocks for the long portfolio, and the bottom performing stocks for the short portfolio.

Implement the `get_top_n` function to get the top performing stock for each month. Get the top performing stocks from `prev_returns` by assigning them a value of 1. For all other stocks, give them a value of 0. For example, using the following `prev_returns`:

```
                                     Previous Returns
               A         B         C         D         E         F         G
2013-07-08     0.015     0.082     0.096     0.020     0.075     0.043     0.074
2013-07-09     0.037     0.095     0.027     0.063     0.024     0.086     0.025
...            ...       ...       ...       ...       ...       ...       ...
```

The function `get_top_n` with `top_n` set to 3 should return the following:
```
                                     Previous Returns
               A         B         C         D         E         F         G
2013-07-08     0         1         1         0         1         0         0
2013-07-09     0         1         0         1         0         1         0
...            ...       ...       ...       ...       ...       ...       ...
```
*Note: You may have to use Panda's [`DataFrame.iterrows`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.iterrows.html) with [`Series.nlargest`](https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.Series.nlargest.html) in order to implement the function. This is one of those cases where creating a vecorization solution is too difficult.*

In [13]:
def get_top_n(prev_returns, top_n):
    """
    Select the top performing stocks
    
    Parameters
    ----------
    prev_returns : DataFrame
        Previous shifted returns for each ticker and date
    top_n : int
        The number of top performing stocks to get
    
    Returns
    -------
    top_stocks : DataFrame
        Top stocks for each ticker and date marked with a 1
    """
    # TODO: Implement Function
    
    yiy = pd.DataFrame(0, index=prev_returns.index, columns=prev_returns.columns)
    for x in range(prev_returns.shape[0]):
            yiy.loc[prev_returns.index[x]][prev_returns.iloc[x].nlargest(top_n).index] = 1
    
    print(yiy)
    
    return yiy

project_tests.test_get_top_n(get_top_n)

            DNN  RSOP  WMC  OSFZ  FCBG
2008-08-31    0     0    0     0     0
2008-09-30    0     0    0     0     0
2008-10-31    1     0    1     1     0
2008-11-30    0     1    0     1     1
Tests Passed


### View Data
We want to get the best performing and worst performing stocks. To get the best performing stocks, we'll use the `get_top_n` function. To get the worst performing stocks, we'll also use the `get_top_n` function. However, we pass in `-1*prev_returns` instead of just `prev_returns`. Multiplying by negative one will flip all the positive returns to negative and negative returns to positive. Thus, it will return the worst performing stocks.

In [14]:
top_bottom_n = 50
df_long = get_top_n(prev_returns, top_bottom_n)
df_short = get_top_n(-1*prev_returns, top_bottom_n)

test1_df_long = get_top_n(test1_prev_returns, top_bottom_n)
test1_df_short = get_top_n(-1*test1_prev_returns, top_bottom_n)

project_helper.print_top(df_long, 'Longed Stocks')
project_helper.print_top(df_short, 'Shorted Stocks')

ticker      A  AAL  AAP  AAPL  ABBV  ABC  ABT  ACN  ADBE  ADI ...   XL  XLNX  \
date                                                          ...              
2013-07-31  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2013-08-31  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2013-09-30  1    0    0     1     0    0    0    0     0    0 ...    0     0   
2013-10-31  0    1    0     0     0    0    0    0     1    0 ...    0     0   
2013-11-30  0    1    1     0     0    0    0    0     0    0 ...    0     0   
2013-12-31  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2014-01-31  0    0    1     0     1    0    0    0     0    0 ...    0     0   
2014-02-28  0    1    0     0     0    0    0    0     0    0 ...    0     0   
2014-03-31  0    0    0     0     0    0    0    0     1    0 ...    0     1   
2014-04-30  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2014-05-31  0    0    0     1     0    0

ticker      A  AAL  AAP  AAPL  ABBV  ABC  ABT  ACN  ADBE  ADI ...   XL  XLNX  \
date                                                          ...              
2013-07-31  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2013-08-31  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2013-09-30  1    0    0     1     0    0    0    0     0    0 ...    0     0   
2013-10-31  0    1    0     0     0    0    0    0     1    0 ...    0     0   
2013-11-30  0    1    1     0     0    0    0    0     0    0 ...    0     0   
2013-12-31  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2014-01-31  0    0    1     0     1    0    0    0     0    0 ...    0     0   
2014-02-28  0    1    0     0     0    0    0    0     0    0 ...    0     0   
2014-03-31  0    0    0     0     0    0    0    0     1    0 ...    0     1   
2014-04-30  0    0    0     0     0    0    0    0     0    0 ...    0     0   
2014-05-31  0    0    0     1     0    0

## Projected Returns
It's now time to check if your trading signal has the potential to become profitable!

We'll start by computing the net returns this portfolio would return. For simplicity, we'll assume every stock gets an equal dollar amount of investment. This makes it easier to compute a portfolio's returns as the simple arithmetic average of the individual stock returns.

Implement the `portfolio_returns` function to compute the expected portfolio returns. Using `df_long` to indicate which stocks to long and `df_short` to indicate which stocks to short, calculate the returns using `lookahead_returns`. To help with calculation, we've provided you with `n_stocks` as the number of stocks we're investing in a single period.

From this point on, Incase of all variables, another variable with prefix test2_ has been created. I wasn't convinced of using the arthimetic operations like dividing,finding sum or finding mean of log returns same as it would have been done for normal returns. I didn't want to work under the asssumption that ln(1+r) = r. I have used equivalent mathematical operations in test2_ that should ideally be used while dealing with log. 

For calculations pertaining to log_returns, I converted them to returns, applied the opertaions, and converted it back to log return. This is because I'dont think sum or mean of log returns makes sense itself because we aren't dealing with a time series here, but aggreagating over individual returns for every month.i.e sum(returns) != sum(log_returns) and mean(returns) != mean(log_returns). Incase of orignal problem supplied, the above agrreagates have been considered equal assuming an approximation by applying limit considering data values is very small. However, I wasn't sure such an approximation is valid.

In [18]:
def portfolio_returns(df_long, df_short, lookahead_returns, n_stocks):
    """
    Compute expected returns for the portfolio, assuming equal investment in each long/short stock.
    
    Parameters
    ----------
    df_long : DataFrame
        Top stocks for each ticker and date marked with a 1
    df_short : DataFrame
        Bottom stocks for each ticker and date marked with a 1
    lookahead_returns : DataFrame
        Lookahead returns for each ticker and date
    n_stocks: int
        The number number of stocks chosen for each month
    
    Returns
    -------
    portfolio_returns : DataFrame
        Expected portfolio returns for each ticker and date
    """
    # TODO: Implement Function
    donow = (df_long*lookahead_returns-df_short*lookahead_returns)/n_stocks
    return donow

def test2_portfolio_returns(df_long, df_short, lookahead_returns, n_stocks):
    test2_donow = np.log((np.exp(df_long*lookahead_returns-df_short*lookahead_returns) -1 + n_stocks)/n_stocks)
    return test2_donow

project_tests.test_portfolio_returns(portfolio_returns)

Tests Passed


### View Data
Time to see how the portfolio did.

In [19]:
expected_portfolio_returns = portfolio_returns(df_long, df_short, lookahead_returns, 2*top_bottom_n)
test1_expected_portfolio_returns = portfolio_returns(test1_df_long,test1_df_short, test1_lookahead_returns, 2*top_bottom_n)
test2_expected_portfolio_returns = test2_portfolio_returns(df_long,df_short, lookahead_returns, 2*top_bottom_n)

project_helper.plot_returns(expected_portfolio_returns.T.sum(), 'Portfolio Returns')
project_helper.plot_returns(test1_expected_portfolio_returns.T.sum(), 'Test1 Returns')
project_helper.plot_returns(test2_expected_portfolio_returns.T.sum(), 'Test2 Returns')



## Statistical Tests
### Annualized Rate of Return

In [22]:
expected_portfolio_returns_by_date = expected_portfolio_returns.T.sum().dropna()
portfolio_ret_mean = expected_portfolio_returns_by_date.mean()
portfolio_ret_ste = expected_portfolio_returns_by_date.sem()
portfolio_ret_annual_rate = (np.exp(portfolio_ret_mean * 12) - 1) * 100

test1_expected_portfolio_returns_by_date = test1_expected_portfolio_returns.T.sum().dropna()
test1_portfolio_ret_mean = test1_expected_portfolio_returns_by_date.mean()
test1_portfolio_ret_ste = test1_expected_portfolio_returns_by_date.sem()
test1_portfolio_ret_annual_rate = test1_portfolio_ret_mean *12*100

test2_expected_portfolio_returns_by_date = np.log((np.exp(test2_expected_portfolio_returns)-1).T.sum().dropna()+1)
test2_portfolio_ret_mean = np.log((np.exp(test2_expected_portfolio_returns_by_date)).mean())
test2_portfolio_ret_ste = test2_expected_portfolio_returns_by_date.sem()
test2_portfolio_ret_annual_rate = (np.exp(test2_portfolio_ret_mean) - 1) * 12*100

print("""
Mean:                       {:.6f}
Standard Error:             {:.6f}
Annualized Rate of Return:  {:.2f}%
""".format(portfolio_ret_mean, portfolio_ret_ste, portfolio_ret_annual_rate))

print("""
Mean:                       {:.6f}
Standard Error:             {:.6f}
Annualized Rate of Return:  {:.2f}%
""".format(test1_portfolio_ret_mean, test1_portfolio_ret_ste, test1_portfolio_ret_annual_rate))

print("""
Mean:                       {:.6f}
Standard Error:             {:.6f}
Annualized Rate of Return:  {:.2f}%
""".format(test2_portfolio_ret_mean, test2_portfolio_ret_ste, test2_portfolio_ret_annual_rate))


Mean:                       0.003253
Standard Error:             0.002203
Annualized Rate of Return:  3.98%


Mean:                       0.002946
Standard Error:             0.002189
Annualized Rate of Return:  3.54%


Mean:                       0.006536
Standard Error:             0.002172
Annualized Rate of Return:  7.87%



The annualized rate of return allows you to compare the rate of return from this strategy to other quoted rates of return, which are usually quoted on an annual basis. 

### T-Test
Our null hypothesis ($H_0$) is that the actual mean return from the signal is zero. We'll perform a one-sample, one-sided t-test on the observed mean return, to see if we can reject $H_0$.

We'll need to first compute the t-statistic, and then find its corresponding p-value. The p-value will indicate the probability of observing a mean return equally or more extreme than the one we observed if the null hypothesis were true. A small p-value means that the chance of observing the mean we observed under the null hypothesis is small, and thus casts doubt on the null hypothesis. It's good practice to set a desired level of significance or alpha ($\alpha$) _before_ computing the p-value, and then reject the null hypothesis if $p < \alpha$.

For this project, we'll use $\alpha = 0.05$, since it's a common value to use.

Implement the `analyze_alpha` function to perform a t-test on the sample of portfolio returns. We've imported the `scipy.stats` module for you to perform the t-test.

Note: [`scipy.stats.ttest_1samp`](https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.stats.ttest_1samp.html) performs a two-sided test, so divide the p-value by 2 to get 1-sided p-value

In [24]:
from scipy import stats

def analyze_alpha(expected_portfolio_returns_by_date):
    """
    Perform a t-test with the null hypothesis being that the expected mean return is zero.
    
    Parameters
    ----------
    expected_portfolio_returns_by_date : Pandas Series
        Expected portfolio returns for each date
    
    Returns
    -------
    t_value
        T-statistic from t-test
    p_value
        Corresponding p-value
    """
    # TODO: Implement Function
    t,p = stats.ttest_1samp(expected_portfolio_returns_by_date,0)
    return t,p/2

project_tests.test_analyze_alpha(analyze_alpha)

Tests Passed


### View Data
Let's see what values we get with our portfolio. After you run this, make sure to answer the question below.

In [25]:
t_value, p_value = analyze_alpha(expected_portfolio_returns_by_date.dropna())
print("""
Alpha analysis:
 t-value:        {:.3f}
 p-value:        {:.6f}
""".format(t_value, p_value))

t_value, p_value = analyze_alpha(test1_expected_portfolio_returns_by_date.dropna())
print("""
Alpha analysis:
 t-value:        {:.3f}
 p-value:        {:.6f}
""".format(t_value, p_value))

t_value, p_value = analyze_alpha(test2_expected_portfolio_returns_by_date.dropna())
print("""
Alpha analysis:
 t-value:        {:.3f}
 p-value:        {:.6f}
""".format(t_value, p_value))


Alpha analysis:
 t-value:        1.476
 p-value:        0.073359


Alpha analysis:
 t-value:        1.346
 p-value:        0.092432


Alpha analysis:
 t-value:        2.959
 p-value:        0.002434



### Question: What p-value did you observe? And what does that indicate about your signal?

Since pvalue > 0.05, hence we fail to reject the null hypothesis. This leads us to conclude that our strategy didn't contain an alpha.

However, as can be followed, when I worked with my test_1 and test_2 function and variables, I got an astonishgly different return rate. Ideally, they should have resulted in the same return but this is clearly not the case. For test2_, pvalue < 0.05 concluding that the hull hypothesis be rejected. This implies that my test2_ function contained alpha. I would really appreciate if the mentor can look if I've done anything conceptually wrong in working with test1_ and test2_.

For test1_, I chose to work with returns instead of log returns. Since we were not using any kind of compunding, I thought it is better to use returns, especially since we anyways were  using the assumption that ln(1+r) = r for r tends to 0. This gave a significantly different answer.

All I did for test2_ was that for calculations pertaining to log_returns, I converted them to returns, applied the opertaions, and converted it back to log return. This is because I'dont think sum or mean of log returns makes sense itself because we aren't dealing with a time series here, but aggreagating over individual returns for every month.i.e sum(returns) != sum(log_returns) and mean(returns) != mean(log_returns). Incase of orignal problem supplied, the above agrreagates have been considered equal assuming an approximation by applying limit considering data values is very small. However, I dont think such an approximation is valid.

## Submission
Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback.