# Common DataFrame methods

This lesson introduces the common `DataFrame` methods that
we will repeatedly use in the course. 

This first cell load data for use in this lesson.

In [1]:
# Setup: Load prices
import pandas as pd
prices = pd.read_hdf("data/dataframes.h5", "prices")
sep_04 = pd.read_hdf("data/dataframes.h5", "sep_04")
goog = pd.read_hdf("data/dataframes.h5", "goog")
returns = prices.pct_change().dropna()
spy_returns = returns.SPY
aapl_returns = returns.AAPL
goog_returns = returns.GOOG

## Problem: Constructing portfolio returns

Compute the return of a portfolio with weight $\frac{1}{3}$ in each security using
multiplication (`*`) and `.sum()`.

**Note**: You need to use the `axis` keyword for the sum.

In [8]:
import numpy as np
import pandas as pd

weight = np.ones(3)/3
weighted_returns = weight * returns
weighted_returns.sum(axis=1)

2018-05-09   -0.006002
2018-06-09   -0.010768
2018-07-09   -0.005218
2018-10-09   -0.003948
2018-11-09    0.013167
2018-12-09   -0.008176
2018-09-13    0.013609
2018-09-14   -0.004520
2018-09-17   -0.015325
2018-09-18    0.003850
2018-09-19   -0.002537
dtype: float64

## Problem: Compute the Mean and Standard Deviation

Using the function mean, compute the mean of the three returns series one at a time. For example  
```python
goog_mean = goog_returns.mean()
```
Next, compute the mean of the matrix of returns using  

```python
retmean = returns.mean()
```

What is the relationship between these two? Repeat this exercise for the standard deviation (`std()`).


In [13]:
goog_mean = goog_returns.mean()
goog_mean

-0.0029096851629990942

In [12]:
retmean = returns.mean()
retmean

SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

In [16]:
ret_std = returns.std()
ret_std

SPY     0.003562
AAPL    0.016113
GOOG    0.008899
dtype: float64

## Problem: Compute Correlation
Compute the correlation of the matrix of returns (`corr()`). 

In [19]:
returns.corr()

Unnamed: 0,SPY,AAPL,GOOG
SPY,1.0,0.773894,0.883557
AAPL,0.773894,1.0,0.892719
GOOG,0.883557,0.892719,1.0


## Problem: Summing all elements

Compute the sum of the columns of returns using `.sum()`. How is this related to the mean computed 
in the previous step? 

In [20]:
returns.sum()

SPY     0.005673
AAPL   -0.051268
GOOG   -0.032007
dtype: float64

In [21]:
nobs = returns.shape[0]
retmean * nobs

SPY     0.005673
AAPL   -0.051268
GOOG   -0.032007
dtype: float64

## Problem: Maximum and Minimum Values
Compute the minimum and maximum values of the columns of returns using the `min()` and `max()` commands. 

In [22]:
returns.min()

SPY    -0.005294
AAPL   -0.026626
GOOG   -0.014055
dtype: float64

In [23]:
returns.max()

SPY     0.005914
AAPL    0.025283
GOOG    0.010922
dtype: float64

## Problem: Rounding Up, Down and to the Closest Integer

Rounding up is handled by ceil, rounding down is handled by floor and rounding to the closest 
integer is handled by round. Try all of these commands on 100 times returns. For example,  
```python
rounded = (100*returns).round()
``` 

Use `ceil` and `floor` to round up and down, respectively.

In [33]:
100*returns.round(5)

Unnamed: 0,SPY,AAPL,GOOG
2018-05-09,-0.269,-0.652,-0.879
2018-06-09,-0.301,-1.662,-1.268
2018-07-09,-0.194,-0.807,-0.564
2018-10-09,0.174,-1.342,-0.016
2018-11-09,0.33,2.528,1.092
2018-12-09,0.024,-1.242,-1.235
2018-09-13,0.591,2.416,1.076
2018-09-14,0.017,-1.135,-0.238
2018-09-17,-0.529,-2.663,-1.406
2018-09-18,0.543,0.165,0.447


In [36]:
np.ceil(10000*returns)

Unnamed: 0,SPY,AAPL,GOOG
2018-05-09,-26.0,-65.0,-87.0
2018-06-09,-30.0,-166.0,-126.0
2018-07-09,-19.0,-80.0,-56.0
2018-10-09,18.0,-134.0,-1.0
2018-11-09,33.0,253.0,110.0
2018-12-09,3.0,-124.0,-123.0
2018-09-13,60.0,242.0,108.0
2018-09-14,2.0,-113.0,-23.0
2018-09-17,-52.0,-266.0,-140.0
2018-09-18,55.0,17.0,45.0


In [37]:
np.floor(10000*returns)

Unnamed: 0,SPY,AAPL,GOOG
2018-05-09,-27.0,-66.0,-88.0
2018-06-09,-31.0,-167.0,-127.0
2018-07-09,-20.0,-81.0,-57.0
2018-10-09,17.0,-135.0,-2.0
2018-11-09,32.0,252.0,109.0
2018-12-09,2.0,-125.0,-124.0
2018-09-13,59.0,241.0,107.0
2018-09-14,1.0,-114.0,-24.0
2018-09-17,-53.0,-267.0,-141.0
2018-09-18,54.0,16.0,44.0


## Exercises

### Exercise: Compute Quantiles

Compute the 5%, 25%, 50%, 75% and 95% quantiles of momentum using the `quantile`
method.


In [41]:
# Setup: Load data
import pandas as pd
momentum = pd.read_csv("data/momentum.csv", index_col="date", parse_dates=True)
mom_10 = momentum.mom_10

In [52]:
momentum.quantile([.05,.25,.5,.75,.95])

Unnamed: 0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
0.05,-2.487,-1.556,-1.326,-1.187,-1.128,-1.015,-1.116,-1.148,-1.198,-1.597
0.25,-0.615,-0.385,-0.305,-0.26,-0.21,-0.25,-0.28,-0.29,-0.33,-0.395
0.5,0.08,0.05,0.07,0.05,0.06,0.07,0.07,0.08,0.07,0.12
0.75,0.89,0.645,0.48,0.465,0.405,0.44,0.42,0.465,0.43,0.615
0.95,2.798,2.094,1.618,1.32,1.339,1.158,1.164,1.23,1.319,1.44


### Exercise: Sorting

Use `sort_values` to sort momentum by the column `mom_10`. Verify that the
sort was successful by looking at the minimum of a diff.

In [54]:
mom_sorted = momentum.sort_values("mom_10")

In [56]:
mom_sorted.diff().min()

mom_01   -8.09
mom_02   -6.92
mom_03   -6.04
mom_04   -5.04
mom_05   -5.41
mom_06   -5.33
mom_07   -3.43
mom_08   -2.54
mom_09   -2.78
mom_10    0.00
dtype: float64

### Exercise: Sort Descending

Use `sort_values` to sort momentum by by the column `mom_10` using a descending
sort (see the help for `sort_values`). Verify the sort worked by looking at the maximum of
a diff.

In [58]:
mom_sort = momentum.sort_values("mom_10", ascending=False)
mom_sort

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-28,1.04,0.60,0.10,-0.35,0.80,-0.53,-0.04,-0.12,-0.41,2.87
2016-11-07,2.52,2.29,2.58,2.22,2.31,2.28,1.90,1.99,2.37,2.64
2017-03-01,0.65,0.75,0.93,1.30,1.61,1.38,1.97,2.12,2.44,2.62
2016-03-01,1.47,2.37,2.38,2.98,2.61,2.68,2.31,2.12,1.42,2.54
2017-01-24,0.20,0.72,0.38,0.02,1.05,1.16,1.65,1.47,1.87,2.27
...,...,...,...,...,...,...,...,...,...,...
2016-09-09,-3.09,-2.88,-1.88,-2.28,-2.05,-2.39,-2.47,-2.54,-2.71,-3.06
2016-01-13,-4.88,-3.06,-2.70,-2.00,-2.34,-2.18,-2.44,-2.77,-2.65,-3.71
2016-02-05,-2.49,-2.40,-1.52,-1.77,-1.77,-0.93,-1.47,-1.62,-2.04,-3.91
2017-03-21,-1.68,-0.69,-0.58,-0.89,-1.32,-1.14,-1.66,-2.21,-2.91,-4.03


In [59]:
mom_sort.diff().max()

mom_01    8.08
mom_02    6.92
mom_03    6.04
mom_04    5.04
mom_05    5.41
mom_06    5.33
mom_07    3.43
mom_08    2.54
mom_09    2.78
mom_10    0.00
dtype: float64

### Exercise: Get Number of Elements

Use the `shape` property to get the number of observations in momentum. Use it
again to get the number of columns.

In [60]:
nobs = momentum.shape[0]
nobs

503

In [61]:
ncols = momentum.shape[1]
ncols

10

### Exercise: Use `shift` to Compute Returns

Compute the percentage change using only `shift`, division (`/`) and
subtraction (`-`) on the `Series` `mom_10`. Verify that your result matches what `pct_change` produces.

In [66]:
momentum.pct_change().dropna()

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-05,-1.537313,-7.666667,-0.602151,-1.252252,-1.108844,-1.108434,-0.842857,-1.120192,-1.169591,-1.048689
2016-01-06,12.805556,-12.650000,6.027027,-5.142857,-11.625000,-9.055556,4.227273,-6.840000,-4.931034,-4.461538
2016-01-07,-0.012072,-0.180258,0.165385,0.612069,0.358824,0.586207,1.347826,0.582192,1.070175,4.911111
2016-01-08,-0.918534,-0.340314,-0.676568,-0.326203,-0.510823,-0.556522,-0.644444,-0.385281,-0.601695,-0.503759
2016-01-11,11.850000,-0.357143,0.357143,-0.428571,-0.619469,-1.137255,-1.239583,-1.197183,-1.308511,-1.250000
...,...,...,...,...,...,...,...,...,...,...
2017-12-22,-0.809524,-0.900000,-1.104478,-1.010204,-1.137931,-1.270270,-2.076923,-0.384615,-0.375000,-0.962963
2017-12-26,2.708333,4.384615,-4.285714,2.000000,-2.250000,-0.400000,0.000000,-0.562500,-3.800000,64.000000
2017-12-27,-1.651685,-1.785714,-2.173913,5.333333,-2.400000,-4.000000,-2.285714,-2.428571,0.571429,-1.553846
2017-12-28,-1.241379,-1.418182,-2.185185,-1.105263,-3.071429,-0.111111,0.666667,0.500000,0.272727,-0.194444


In [72]:
((momentum - momentum.shift(1) ) / momentum.shift(1)).dropna()

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-05,-1.537313,-7.666667,-0.602151,-1.252252,-1.108844,-1.108434,-0.842857,-1.120192,-1.169591,-1.048689
2016-01-06,12.805556,-12.650000,6.027027,-5.142857,-11.625000,-9.055556,4.227273,-6.840000,-4.931034,-4.461538
2016-01-07,-0.012072,-0.180258,0.165385,0.612069,0.358824,0.586207,1.347826,0.582192,1.070175,4.911111
2016-01-08,-0.918534,-0.340314,-0.676568,-0.326203,-0.510823,-0.556522,-0.644444,-0.385281,-0.601695,-0.503759
2016-01-11,11.850000,-0.357143,0.357143,-0.428571,-0.619469,-1.137255,-1.239583,-1.197183,-1.308511,-1.250000
...,...,...,...,...,...,...,...,...,...,...
2017-12-22,-0.809524,-0.900000,-1.104478,-1.010204,-1.137931,-1.270270,-2.076923,-0.384615,-0.375000,-0.962963
2017-12-26,2.708333,4.384615,-4.285714,2.000000,-2.250000,-0.400000,-0.000000,-0.562500,-3.800000,64.000000
2017-12-27,-1.651685,-1.785714,-2.173913,5.333333,-2.400000,-4.000000,-2.285714,-2.428571,0.571429,-1.553846
2017-12-28,-1.241379,-1.418182,-2.185185,-1.105263,-3.071429,-0.111111,0.666667,0.500000,0.272727,-0.194444
