# Using DataFrames

This lesson introduces:

* Basic mathematical operations on DataFrames
* Common DataFrame methods (functions)
* Computing percentage change (returns)

## Problem: Addition and Subtraction

Add the prices of the three series together using `.sum(axis=1)`. Add the prices in `sep_04` to 
the prices of `goog`. What happens? 

In [1]:
# Setup: Load prices
import pandas as pd
prices = pd.read_hdf('data/data.h5', 'prices')
sep_04 = pd.read_hdf('data/data.h5', 'sep_04')
goog = pd.read_hdf('data/data.h5', 'goog')

In [2]:
prices.sum()

SPY      3474.27
AAPL     2665.89
GOOG    14048.48
dtype: float64

In [3]:
sep_04 + goog

  join_index = self.union(other)


2018-04-09 00:00:00   NaN
2018-05-09 00:00:00   NaN
2018-06-09 00:00:00   NaN
2018-07-09 00:00:00   NaN
2018-10-09 00:00:00   NaN
2018-11-09 00:00:00   NaN
2018-12-09 00:00:00   NaN
2018-09-13 00:00:00   NaN
2018-09-14 00:00:00   NaN
2018-09-17 00:00:00   NaN
2018-09-18 00:00:00   NaN
2018-09-19 00:00:00   NaN
0                     NaN
1                     NaN
2                     NaN
dtype: float64

## Problem: Multiplication

Multiply the price of Google by 2. 

In [4]:
2 * goog

2018-04-09    2394.00
2018-05-09    2372.96
2018-06-09    2342.88
2018-07-09    2329.66
2018-10-09    2329.28
2018-11-09    2354.72
2018-12-09    2325.64
2018-09-13    2350.66
2018-09-14    2345.06
2018-09-17    2312.10
2018-09-18    2322.44
2018-09-19    2317.56
Name: GOOG, dtype: float64

## Problem: Constructing portfolio returns
Set up a vector or portfolio weights $w=\left(\frac{1}{3},\,\frac{1}{3}\,,\frac{1}{3}\right)$ and 
compute the price of a portfolio with $\frac{1}{3}$ share of each.

*Note*: Division uses the slash operator (/). 

In [5]:
import numpy as np

w = np.array([1/3, 1/3, 1/3])

port_price = prices @ w
print(port_price)

2018-04-09    571.723333
2018-05-09    567.460000
2018-06-09    560.900000
2018-07-09    557.910000
2018-10-09    557.023333
2018-11-09    563.420000
2018-12-09    557.670000
2018-09-13    564.190000
2018-09-14    562.416667
2018-09-17    554.423333
2018-09-18    556.790000
2018-09-19    555.620000
dtype: float64


## Problem: Compute Returns

Compute returns using 

```python
returns = prices.pct_change()
```

which computes the percentage change.

Additionally, extract returns for each name using 

```python
spy_returns = returns['SPY']
```


In [6]:
returns = prices.pct_change()
print(returns)

                 SPY      AAPL      GOOG
2018-04-09       NaN       NaN       NaN
2018-05-09 -0.002691 -0.006525 -0.008789
2018-06-09 -0.003010 -0.016617 -0.012676
2018-07-09 -0.001943 -0.008068 -0.005643
2018-10-09  0.001739 -0.013421 -0.000163
2018-11-09  0.003297  0.025283  0.010922
2018-12-09  0.000242 -0.012419 -0.012350
2018-09-13  0.005914  0.024155  0.010758
2018-09-14  0.000172 -0.011351 -0.002382
2018-09-17 -0.005294 -0.026626 -0.014055
2018-09-18  0.005426  0.001652  0.004472
2018-09-19  0.001822 -0.007331 -0.002101


In [7]:
# First row is missing since no data on Sep 3, can use .dropna() to remove rows with missing values
print(returns.dropna())

                 SPY      AAPL      GOOG
2018-05-09 -0.002691 -0.006525 -0.008789
2018-06-09 -0.003010 -0.016617 -0.012676
2018-07-09 -0.001943 -0.008068 -0.005643
2018-10-09  0.001739 -0.013421 -0.000163
2018-11-09  0.003297  0.025283  0.010922
2018-12-09  0.000242 -0.012419 -0.012350
2018-09-13  0.005914  0.024155  0.010758
2018-09-14  0.000172 -0.011351 -0.002382
2018-09-17 -0.005294 -0.026626 -0.014055
2018-09-18  0.005426  0.001652  0.004472
2018-09-19  0.001822 -0.007331 -0.002101


In [8]:
spy_returns = returns['SPY']
goog_returns = returns['GOOG']
aapl_returns = returns['AAPL']

## Problem: Compute Log Returns

```python
import numpy as np

log_returns = np.log(prices).diff()
```

first difference of the natural log of the prices. Mathematically this is 
$r_{t}=\ln\left(P_{t}\right)-\ln\left(P_{t-1}\right)=\ln\left(\frac{P_{t}}{P_{t-1}}\right)\approx\frac{P_{t}}{P_{t-1}}-1$.

In [9]:
log_returns = np.log(prices).diff()
log_returns

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.002695,-0.006546,-0.008827
2018-06-09,-0.003015,-0.016757,-0.012757
2018-07-09,-0.001945,-0.008101,-0.005659
2018-10-09,0.001737,-0.013512,-0.000163
2018-11-09,0.003292,0.024969,0.010863
2018-12-09,0.000242,-0.012497,-0.012427
2018-09-13,0.005897,0.023868,0.010701
2018-09-14,0.000172,-0.011416,-0.002385
2018-09-17,-0.005308,-0.026987,-0.014155


## Problem: Mean, Standard Deviation and Correlation

Using the function mean, compute the mean of the three returns series one at a time. For example  
```python
goog_mean = goog_returns.mean()
```
Next, compute the mean of the matrix of returns using  

```python
retmean = returns.mean()
```

What is the relationship between these two? Repeat this exercise for the standard deviation (`std()`).
Finally, compute the correlation of the matrix of returns (`corr()`). 

In [10]:
goog_mean = goog_returns.mean()
spy_mean = spy_returns.mean()
aapl_mean = aapl_returns.mean()
print(spy_mean, aapl_mean, goog_mean)

0.0005157696449953846 -0.0046607598305810375 -0.0029096851629990942


In [11]:
returns.mean()

SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

In [12]:
returns.std()

SPY     0.003562
AAPL    0.016113
GOOG    0.008899
dtype: float64

In [13]:
returns.corr()

Unnamed: 0,SPY,AAPL,GOOG
SPY,1.0,0.773894,0.883557
AAPL,0.773894,1.0,0.892719
GOOG,0.883557,0.892719,1.0


## Problem: Summing all elements

Compute the sum of the columns of returns using `.sum()`. How is this related to the mean computed 
in the previous step? 

In [14]:
returns.dropna().mean()



SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

In [15]:
returns.sum() / 11

SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

## Problem: Maximum and Minimum Values
Compute the minimum and maximum values of the columns of returns using the `min()` and `max()` commands. 

In [16]:
returns.min()

SPY    -0.005294
AAPL   -0.026626
GOOG   -0.014055
dtype: float64

In [17]:
returns.max()

SPY     0.005914
AAPL    0.025283
GOOG    0.010922
dtype: float64

## Problem: Rounding Up, Down and to the Closest Integer

Rounding up is handled by ceil, rounding down is handled by floor and rounding to the closest 
integer is handled by round. Try all of these commands on 100 times returns. For example,  
```python
rounded = (100*returns).round()
``` 

Use `ceil` and `floor` to round up and down, respectively.

In [18]:
rounded = (100*returns).round()
rounded

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.0,-1.0,-1.0
2018-06-09,-0.0,-2.0,-1.0
2018-07-09,-0.0,-1.0,-1.0
2018-10-09,0.0,-1.0,-0.0
2018-11-09,0.0,3.0,1.0
2018-12-09,0.0,-1.0,-1.0
2018-09-13,1.0,2.0,1.0
2018-09-14,0.0,-1.0,-0.0
2018-09-17,-1.0,-3.0,-1.0


In [19]:
ceiled = np.ceil(100*returns)
ceiled

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.0,-0.0,-0.0
2018-06-09,-0.0,-1.0,-1.0
2018-07-09,-0.0,-0.0,-0.0
2018-10-09,1.0,-1.0,-0.0
2018-11-09,1.0,3.0,2.0
2018-12-09,1.0,-1.0,-1.0
2018-09-13,1.0,3.0,2.0
2018-09-14,1.0,-1.0,-0.0
2018-09-17,-0.0,-2.0,-1.0


In [20]:
floored = np.floor(100*returns)
floored

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-1.0,-1.0,-1.0
2018-06-09,-1.0,-2.0,-2.0
2018-07-09,-1.0,-1.0,-1.0
2018-10-09,0.0,-2.0,-1.0
2018-11-09,0.0,2.0,1.0
2018-12-09,0.0,-2.0,-2.0
2018-09-13,0.0,2.0,1.0
2018-09-14,0.0,-2.0,-1.0
2018-09-17,-1.0,-3.0,-2.0


## Problem: Element-by-Element Multiplication

Mathematical commands in Python are element-by-element, except the `@` operator which is matrix 
multiplication and uses the rules of linear algebra. 

Multiply the returns of Google and SPY together using the dot operator. 

In [21]:
goog_returns @ spy_returns

nan