# Using DataFrames

This lesson introduces:

* Computing returns (percentage change)
* Basic mathematical operations on DataFrames
* Common DataFrame methods (functions)

This first cell load data for use in this lesson.

In [1]:
# Setup: Load prices
import pandas as pd
prices = pd.read_hdf("data/dataframes.h5", "prices")
sep_04 = pd.read_hdf("data/dataframes.h5", "sep_04")
goog = pd.read_hdf("data/dataframes.h5", "goog")

## Problem: Compute Returns

Compute returns using 

```python
returns = prices.pct_change()
```

which computes the percentage change.

Additionally, extract returns for each name using 

```python
spy_returns = returns["SPY"]
```

In [2]:
returns = prices.pct_change()
returns

Unnamed: 0,SPY,GOOG,AAPL
2018-04-09,,,
2018-05-09,-0.002691,-0.006525,-0.008789
2018-06-09,-0.00301,-0.016617,-0.012676
2018-07-09,-0.001943,-0.008068,-0.005643
2018-10-09,0.001739,-0.013421,-0.000163
2018-11-09,0.003297,0.025283,0.010922
2018-12-09,0.000242,-0.012419,-0.01235
2018-09-13,0.005914,0.024155,0.010758
2018-09-14,0.000172,-0.011351,-0.002382
2018-09-17,-0.005294,-0.026626,-0.014055


In [3]:
# First row is missing since no data on Sep 3, can use .dropna() to remove rows with missing values
returns = returns.dropna()
returns

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,-0.002691,-0.006525,-0.008789
2018-06-09,-0.00301,-0.016617,-0.012676
2018-07-09,-0.001943,-0.008068,-0.005643
2018-10-09,0.001739,-0.013421,-0.000163
2018-11-09,0.003297,0.025283,0.010922
2018-12-09,0.000242,-0.012419,-0.01235
2018-09-13,0.005914,0.024155,0.010758
2018-09-14,0.000172,-0.011351,-0.002382
2018-09-17,-0.005294,-0.026626,-0.014055
2018-09-18,0.005426,0.001652,0.004472


In [4]:
spy_returns = returns["SPY"]
goog_returns = returns["GOOG"]
aapl_returns = returns["AAPL"]

## Problem: Compute Log Returns

```python
import numpy as np

log_returns = np.log(prices).diff()
```

first difference of the natural log of the prices. Mathematically this is 
$r_{t}=\ln\left(P_{t}\right)-\ln\left(P_{t-1}\right)=\ln\left(\frac{P_{t}}{P_{t-1}}\right)\approx\frac{P_{t}}{P_{t-1}}-1$.

In [5]:
import numpy as np

log_returns = np.log(prices).diff()
log_returns

Unnamed: 0,SPY,GOOG,AAPL
2018-04-09,,,
2018-05-09,-0.002695,-0.006546,-0.008827
2018-06-09,-0.003015,-0.016757,-0.012757
2018-07-09,-0.001945,-0.008101,-0.005659
2018-10-09,0.001737,-0.013512,-0.000163
2018-11-09,0.003292,0.024969,0.010863
2018-12-09,0.000242,-0.012497,-0.012427
2018-09-13,0.005897,0.023868,0.010701
2018-09-14,0.000172,-0.011416,-0.002385
2018-09-17,-0.005308,-0.026987,-0.014155


## Problem: Basic Mathematical Operations

|  Operation            | Symbol | Precedence |
|:----------------------|:------:|:----------:|
| Parentheses           | ()     | 4          |
| Exponentiation        | **     | 3          |
| Multiplication        | *      | 2          | 
| Division              | /      | 2          |
| Floor division        | //     | 2          |
| Modulus               | %      | 2          | 
| Matrix multiplication | @      | 2          |
| Addition              | +      | 1          |
| Subtraction           | -      | 1          |

**Note**: Higher precedence operators are evaluated first, and ties are
evaluated left to right. 

1. Add 1 to all returns
2. Square the returns
3. Multiply the price of Google by 2. 
4. Extract the fractional return using floor division and modulus
5. Add the returns on SPY to those of AAPL 

In [6]:
1 + returns

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,0.997309,0.993475,0.991211
2018-06-09,0.99699,0.983383,0.987324
2018-07-09,0.998057,0.991932,0.994357
2018-10-09,1.001739,0.986579,0.999837
2018-11-09,1.003297,1.025283,1.010922
2018-12-09,1.000242,0.987581,0.98765
2018-09-13,1.005914,1.024155,1.010758
2018-09-14,1.000172,0.988649,0.997618
2018-09-17,0.994706,0.973374,0.985945
2018-09-18,1.005426,1.001652,1.004472


In [7]:
returns ** 2

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,7.243734e-06,4.3e-05,7.724016e-05
2018-06-09,9.06051e-06,0.000276,0.0001606848
2018-07-09,3.776667e-06,6.5e-05,3.183925e-05
2018-10-09,3.022472e-06,0.00018,2.660615e-08
2018-11-09,1.087328e-05,0.000639,0.0001192864
2018-12-09,5.864758e-08,0.000154,0.0001525142
2018-09-13,3.49813e-05,0.000583,0.0001157416
2018-09-14,2.955709e-08,0.000129,5.675399e-06
2018-09-17,2.802939e-05,0.000709,0.0001975452
2018-09-18,2.944302e-05,3e-06,1.99999e-05


In [8]:
2 * goog

2018-04-09    2394.00
2018-05-09    2372.96
2018-06-09    2342.88
2018-07-09    2329.66
2018-10-09    2329.28
2018-11-09    2354.72
2018-12-09    2325.64
2018-09-13    2350.66
2018-09-14    2345.06
2018-09-17    2312.10
2018-09-18    2322.44
2018-09-19    2317.56
Name: GOOG, dtype: float64

In [9]:
returns % 1

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,0.997309,0.993475,0.991211
2018-06-09,0.99699,0.983383,0.987324
2018-07-09,0.998057,0.991932,0.994357
2018-10-09,0.001739,0.986579,0.999837
2018-11-09,0.003297,0.025283,0.010922
2018-12-09,0.000242,0.987581,0.98765
2018-09-13,0.005914,0.024155,0.010758
2018-09-14,0.000172,0.988649,0.997618
2018-09-17,0.994706,0.973374,0.985945
2018-09-18,0.005426,0.001652,0.004472


In [10]:
returns - returns // 1

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,0.997309,0.993475,0.991211
2018-06-09,0.99699,0.983383,0.987324
2018-07-09,0.998057,0.991932,0.994357
2018-10-09,0.001739,0.986579,0.999837
2018-11-09,0.003297,0.025283,0.010922
2018-12-09,0.000242,0.987581,0.98765
2018-09-13,0.005914,0.024155,0.010758
2018-09-14,0.000172,0.988649,0.997618
2018-09-17,0.994706,0.973374,0.985945
2018-09-18,0.005426,0.001652,0.004472


In [11]:
spy_returns + aapl_returns

2018-05-09   -0.011480
2018-06-09   -0.015686
2018-07-09   -0.007586
2018-10-09    0.001575
2018-11-09    0.014219
2018-12-09   -0.012107
2018-09-13    0.016673
2018-09-14   -0.002210
2018-09-17   -0.019349
2018-09-18    0.009898
2018-09-19   -0.000279
dtype: float64

## Problem: Non-conformable math

Add the prices in `sep_04` to the prices of `goog`. What happens? 

In [12]:
sep_04 + goog


2018-04-09 00:00:00   NaN
2018-05-09 00:00:00   NaN
2018-06-09 00:00:00   NaN
2018-07-09 00:00:00   NaN
2018-09-13 00:00:00   NaN
2018-09-14 00:00:00   NaN
2018-09-17 00:00:00   NaN
2018-09-18 00:00:00   NaN
2018-09-19 00:00:00   NaN
2018-10-09 00:00:00   NaN
2018-11-09 00:00:00   NaN
2018-12-09 00:00:00   NaN
AAPL                  NaN
GOOG                  NaN
SPY                   NaN
dtype: float64

## Problem: Constructing portfolio returns
Set up a 3-element array of portfolio weights 

$$w=\left(\frac{1}{3},\,\frac{1}{3}\,,\frac{1}{3}\right)$$

and compute the return of a portfolio with weight $\frac{1}{3}$ in each security.


In [13]:
import numpy as np

w = np.array([1/3, 1/3, 1/3])

port_ret = returns @ w
port_ret

2018-05-09   -0.006002
2018-06-09   -0.010768
2018-07-09   -0.005218
2018-10-09   -0.003948
2018-11-09    0.013167
2018-12-09   -0.008176
2018-09-13    0.013609
2018-09-14   -0.004520
2018-09-17   -0.015325
2018-09-18    0.003850
2018-09-19   -0.002537
dtype: float64

Repeat the previous calcuation using multiplication (`*`) and `.sum()`.
**Note**: You need to use the `axis` keyword for the sum.

In [14]:
weighted_rets = returns * w
port_ret = weighted_rets.sum(axis=1)
port_ret

2018-05-09   -0.006002
2018-06-09   -0.010768
2018-07-09   -0.005218
2018-10-09   -0.003948
2018-11-09    0.013167
2018-12-09   -0.008176
2018-09-13    0.013609
2018-09-14   -0.004520
2018-09-17   -0.015325
2018-09-18    0.003850
2018-09-19   -0.002537
dtype: float64

## Problem: Mean, Standard Deviation and Correlation

Using the function mean, compute the mean of the three returns series one at a time. For example  
```python
goog_mean = goog_returns.mean()
```
Next, compute the mean of the matrix of returns using  

```python
retmean = returns.mean()
```

What is the relationship between these two? Repeat this exercise for the standard deviation (`std()`).
Finally, compute the correlation of the matrix of returns (`corr()`). 

In [15]:
goog_mean = goog_returns.mean()
spy_mean = spy_returns.mean()
aapl_mean = aapl_returns.mean()
print(spy_mean, aapl_mean, goog_mean)

0.0005157696449953846 -0.0029096851629990942 -0.0046607598305810375


In [16]:
returns.mean()

SPY     0.000516
GOOG   -0.004661
AAPL   -0.002910
dtype: float64

In [17]:
returns.std()

SPY     0.003562
GOOG    0.016113
AAPL    0.008899
dtype: float64

In [18]:
returns.corr()

Unnamed: 0,SPY,GOOG,AAPL
SPY,1.0,0.773894,0.883557
GOOG,0.773894,1.0,0.892719
AAPL,0.883557,0.892719,1.0


## Problem: Summing all elements

Compute the sum of the columns of returns using `.sum()`. How is this related to the mean computed 
in the previous step? 

In [19]:
returns.mean()



SPY     0.000516
GOOG   -0.004661
AAPL   -0.002910
dtype: float64

In [20]:
returns.sum() / 11

SPY     0.000516
GOOG   -0.004661
AAPL   -0.002910
dtype: float64

## Problem: Maximum and Minimum Values
Compute the minimum and maximum values of the columns of returns using the `min()` and `max()` commands. 

In [21]:
returns.min()

SPY    -0.005294
GOOG   -0.026626
AAPL   -0.014055
dtype: float64

In [22]:
returns.max()

SPY     0.005914
GOOG    0.025283
AAPL    0.010922
dtype: float64

## Problem: Rounding Up, Down and to the Closest Integer

Rounding up is handled by ceil, rounding down is handled by floor and rounding to the closest 
integer is handled by round. Try all of these commands on 100 times returns. For example,  
```python
rounded = (100*returns).round()
``` 

Use `ceil` and `floor` to round up and down, respectively.

In [23]:
rounded = (100*returns).round()
rounded

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,-0.0,-1.0,-1.0
2018-06-09,-0.0,-2.0,-1.0
2018-07-09,-0.0,-1.0,-1.0
2018-10-09,0.0,-1.0,-0.0
2018-11-09,0.0,3.0,1.0
2018-12-09,0.0,-1.0,-1.0
2018-09-13,1.0,2.0,1.0
2018-09-14,0.0,-1.0,-0.0
2018-09-17,-1.0,-3.0,-1.0
2018-09-18,1.0,0.0,0.0


In [24]:
ceiled = np.ceil(100*returns)
ceiled

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,-0.0,-0.0,-0.0
2018-06-09,-0.0,-1.0,-1.0
2018-07-09,-0.0,-0.0,-0.0
2018-10-09,1.0,-1.0,-0.0
2018-11-09,1.0,3.0,2.0
2018-12-09,1.0,-1.0,-1.0
2018-09-13,1.0,3.0,2.0
2018-09-14,1.0,-1.0,-0.0
2018-09-17,-0.0,-2.0,-1.0
2018-09-18,1.0,1.0,1.0


In [25]:
floored = np.floor(100*returns)
floored

Unnamed: 0,SPY,GOOG,AAPL
2018-05-09,-1.0,-1.0,-1.0
2018-06-09,-1.0,-2.0,-2.0
2018-07-09,-1.0,-1.0,-1.0
2018-10-09,0.0,-2.0,-1.0
2018-11-09,0.0,2.0,1.0
2018-12-09,0.0,-2.0,-2.0
2018-09-13,0.0,2.0,1.0
2018-09-14,0.0,-2.0,-1.0
2018-09-17,-1.0,-3.0,-2.0
2018-09-18,0.0,0.0,0.0
