# Basic Input and Operators

This lesson covers:

* Manually inputting data in scalars, vectors, and matrices 
* Basic mathematical operations 
* Saving and loading data 

<a id="stock-data"></a>
## Data
September 2018 prices (adjusted closing prices) for the S&P 500 EFT (SPY), Apple (AAPL) and 
Google (GOOG) are listed below:



| Date   | SPY Price | AAPL Price | GOOG Price | 
|:-------|----------:|-----------:|-----------:| 
| Sept4  | 289.81    | 228.36     | 1197.00    | 
| Sept5  | 289.03    | 226.87     | 1186.48    | 
| Sept6  | 288.16    | 223.10     | 1171.44    | 
| Sept7  | 287.60    | 221.30     | 1164.83    | 
| Sept10 | 288.10    | 218.33     | 1164.64    | 
| Sept11 | 289.05    | 223.85     | 1177.36    | 
| Sept12 | 289.12    | 221.07     | 1162.82    | 
| Sept13 | 290.83    | 226.41     | 1175.33    | 
| Sept14 | 290.88    | 223.84     | 1172.53    | 
| Sept17 | 289.34    | 217.88     | 1156.05    | 
| Sept18 | 290.91    | 218.24     | 1161.22    | 
| Sept19 | 291.44    | 216.64     | 1158.78    | 

**Prices in September 2018**
 

## Problem: Input scalar data

Create 3 variables, one labeled `spy`, one labeled `aapl` and one labeled `goog` that contain the
September 4 price of the asset. For example, to enter the Google data
```python
goog = 1197.00
```

In [1]:
spy = 289.81
aapl = 228.36
goog = 1197.00

## Problem: Print the values
Print the values of the three variables you created in the previous step using `print`.

In [2]:
print(spy)
print(aapl)
print(goog)
print(spy, aapl, goog)

289.81
228.36
1197.0
289.81 228.36 1197.0


## Problem: Print the values with formatting
Print the values of the three variables you created in the previous step using format strings following
the pattern TICKER: Value. For example, you can print the value of Google using `print(f'GOOG: {goog}')`.

In [3]:
print(f'GOOG: {goog}')
print(f'SPY: {spy}')
print(f'AAPL: {aapl}')

GOOG: 1197.0
SPY: 289.81
AAPL: 228.36


## Problem: Input a Vector

Create vectors for each of the days in the [Table](#stock-data) named `sep_xx` where `xx` is the 
numeric date. For example,  
```python
import pandas as pd

sep_04 = pd.Series([289.81,228.36,1197.00], index=['SPY','AAPL','GOOG']);
```

In [4]:
import pandas as pd

sep_04 = pd.Series([289.81, 228.36, 1197.00])
sep_05 = pd.Series([289.03, 226.87, 1186.48])
sep_06 = pd.Series([288.16, 223.10, 1171.44])
sep_07 = pd.Series([287.60, 221.30, 1164.83])
sep_10 = pd.Series([288.10, 218.33, 1164.64])
sep_11 = pd.Series([289.05, 223.85, 1177.36])
sep_12 = pd.Series([289.12, 221.07, 1162.82])
sep_13 = pd.Series([290.83, 226.41, 1175.33])
sep_14 = pd.Series([290.88, 223.84, 1172.53])
sep_17 = pd.Series([289.34, 217.88, 1156.05])
sep_18 = pd.Series([290.91, 218.24, 1161.22])
sep_19 = pd.Series([291.44, 216.64, 1158.78])


## Problem: Create a Vector of Dates

Use the pandas function `pd.to_datetime` to convert a list of string dates to a pandas 
`DateTimeIndex`, which can be used to set dates in other arrays. For example, the first two dates 
are
```python
dates_2 = pd.to_datetime(['4-9-2018','5-9-2018'])
print(dates_2)
```
which produces
```python
DatetimeIndex(['2018-04-09', '2018-05-09'], dtype='datetime64[ns]', freq=None)
```

Create a vector containing all of the dates in the table.

In [5]:
dates = pd.to_datetime(['4-9-2018','5-9-2018','6-9-2018','7-9-2018',
                        '10-9-2018','11-9-2018','12-9-2018','13-9-2018','14-9-2018',
                        '17-9-2018','18-9-2018','19-9-2018'])
print(dates)

DatetimeIndex(['2018-04-09', '2018-05-09', '2018-06-09', '2018-07-09',
               '2018-10-09', '2018-11-09', '2018-12-09', '2018-09-13',
               '2018-09-14', '2018-09-17', '2018-09-18', '2018-09-19'],
              dtype='datetime64[ns]', freq=None)


## Problem: Input a Vector with Dates

Create vectors for each of the ticker symbols in [Table](#stock-data) named spy, aapl and 
goog, respectively. Use the variable `dates` that you created in the previous step. 

For example

```python
goog = pd.Series([1197.00,1186.48,1171.44,...], index=dates)
```

In [6]:
goog = pd.Series([1197.00, 1186.48, 1171.44, 1164.83, 1164.64, 1177.36,
                  1162.82, 1175.33, 1172.53, 1156.05, 1161.22, 1158.78],
                 index=dates, name='GOOG') 
aapl = pd.Series([228.36, 226.87, 223.10, 221.30, 218.33, 223.85,
                  221.07, 226.41, 223.84, 217.88, 218.24, 216.64],
                 index=dates, name='AAPL')
spy = pd.Series([289.81, 289.03, 288.16, 287.60, 288.10, 289.05,
                 289.12, 290.83, 290.88, 289.34, 290.91, 291.44],
                index=dates, name='SPY')

## Problem: Create a DataFrame

Create a DataFrame named `prices` containing [Table](#stock-data). Set the column names equal to 
the ticker and set the index to the dates you created previously.

```python
prices = pd.DataFrame([[289.81, 228.36, 1197.00], [289.03, 226.87, 1186.48]],
                      columns = ['SPY', 'AAPL', 'GOOG'],index=dates_2)
```

In [7]:
prices = pd.DataFrame([[289.81, 228.36, 1197.00],
                       [289.03, 226.87, 1186.48],
                       [288.16, 223.10, 1171.44],
                       [287.60, 221.30, 1164.83],
                       [288.10, 218.33, 1164.64],
                       [289.05, 223.85, 1177.36],
                       [289.12, 221.07, 1162.82],
                       [290.83, 226.41, 1175.33],
                       [290.88, 223.84, 1172.53],
                       [289.34, 217.88, 1156.05],
                       [290.91, 218.24, 1161.22],
                       [291.44, 216.64, 1158.78]],
                      columns=['SPY', 'AAPL', 'GOOG'], index=dates)


## Problem: Construct a DataFrame from Series

Create a second DataFrame named prices_row from the row vectors previously entered such that 
the results are identical to prices. For example, the first two days worth of data are

```python
pricess_row = pd.DataFrame([Sep04, Sep05])
# Set the index after using concat to join
pricess_row.index = dates_2
```

Create a third DataFrame named prices_col from the 3 column vectors entered such that the results 
are identical to prices
```python
prices_col = pd.DataFrame([SPY,APPL,GOOG]).T
```

*Note*: The `.T` above transposes the 2-d array since `DataFrame` builds the array by rows.

Verify that all three matrices are identical by printing the difference, e.g., 

```python
print(pricescol - prices)
```

and that all elements are 0. 

In [8]:
prices_row = pd.DataFrame([sep_04, sep_05, sep_06, sep_07, sep_10, sep_11,
                           sep_12, sep_13, sep_14, sep_17, sep_18, sep_19])
prices_row.columns = ['SPY', 'AAPL', 'GOOG']
prices_row.index = dates
print(prices_row)
print(prices - prices_row)
# No need to set the index or the column names since index
# and name set in Series
prices_col = pd.DataFrame([spy, aapl, goog]).T
print(prices - prices_col)
prices_col

               SPY    AAPL     GOOG
2018-04-09  289.81  228.36  1197.00
2018-05-09  289.03  226.87  1186.48
2018-06-09  288.16  223.10  1171.44
2018-07-09  287.60  221.30  1164.83
2018-10-09  288.10  218.33  1164.64
2018-11-09  289.05  223.85  1177.36
2018-12-09  289.12  221.07  1162.82
2018-09-13  290.83  226.41  1175.33
2018-09-14  290.88  223.84  1172.53
2018-09-17  289.34  217.88  1156.05
2018-09-18  290.91  218.24  1161.22
2018-09-19  291.44  216.64  1158.78
            SPY  AAPL  GOOG
2018-04-09  0.0   0.0   0.0
2018-05-09  0.0   0.0   0.0
2018-06-09  0.0   0.0   0.0
2018-07-09  0.0   0.0   0.0
2018-10-09  0.0   0.0   0.0
2018-11-09  0.0   0.0   0.0
2018-12-09  0.0   0.0   0.0
2018-09-13  0.0   0.0   0.0
2018-09-14  0.0   0.0   0.0
2018-09-17  0.0   0.0   0.0
2018-09-18  0.0   0.0   0.0
2018-09-19  0.0   0.0   0.0
            SPY  AAPL  GOOG
2018-04-09  0.0   0.0   0.0
2018-05-09  0.0   0.0   0.0
2018-06-09  0.0   0.0   0.0
2018-07-09  0.0   0.0   0.0
2018-10-09  0.0   0.0   0.0


Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,289.81,228.36,1197.0
2018-05-09,289.03,226.87,1186.48
2018-06-09,288.16,223.1,1171.44
2018-07-09,287.6,221.3,1164.83
2018-10-09,288.1,218.33,1164.64
2018-11-09,289.05,223.85,1177.36
2018-12-09,289.12,221.07,1162.82
2018-09-13,290.83,226.41,1175.33
2018-09-14,290.88,223.84,1172.53
2018-09-17,289.34,217.88,1156.05


## Problem: Saving Data 
Save the prices DataFrame to a pickle using `prices.to_pickle('prices.pkl')`.

Delete the `prices` variable using `del prices`, and then load it back using 
`prices = pd.load_pickle('prices.pkl')`. Finally, print the loaded data to verify it is the same.

In [9]:
import pickle
prices.to_pickle('prices.pkl')
del prices
prices = pd.read_pickle('prices.pkl')
# Print first 5 rows
print(prices.head())

               SPY    AAPL     GOOG
2018-04-09  289.81  228.36  1197.00
2018-05-09  289.03  226.87  1186.48
2018-06-09  288.16  223.10  1171.44
2018-07-09  287.60  221.30  1164.83
2018-10-09  288.10  218.33  1164.64


## Problem: Addition and Subtraction

Add the prices of the three series together using `.sum(axis=1)`. Add the prices in `sep_04` to 
the prices of `goog`. What happens? 

In [10]:
prices.sum()

SPY      3474.27
AAPL     2665.89
GOOG    14048.48
dtype: float64

In [11]:
sep_04 + goog

  join_index = self.union(other)


2018-04-09 00:00:00   NaN
2018-05-09 00:00:00   NaN
2018-06-09 00:00:00   NaN
2018-07-09 00:00:00   NaN
2018-10-09 00:00:00   NaN
2018-11-09 00:00:00   NaN
2018-12-09 00:00:00   NaN
2018-09-13 00:00:00   NaN
2018-09-14 00:00:00   NaN
2018-09-17 00:00:00   NaN
2018-09-18 00:00:00   NaN
2018-09-19 00:00:00   NaN
0                     NaN
1                     NaN
2                     NaN
dtype: float64

## Problem: Multiplication

Multiply the price of Google by 2. 

In [12]:
2 * goog

2018-04-09    2394.00
2018-05-09    2372.96
2018-06-09    2342.88
2018-07-09    2329.66
2018-10-09    2329.28
2018-11-09    2354.72
2018-12-09    2325.64
2018-09-13    2350.66
2018-09-14    2345.06
2018-09-17    2312.10
2018-09-18    2322.44
2018-09-19    2317.56
Name: GOOG, dtype: float64

## Problem: Constructing portfolio returns
Set up a vector or portfolio weights $w=\left(\frac{1}{3},\,\frac{1}{3}\,,\frac{1}{3}\right)$ and 
compute the price of a portfolio with $\frac{1}{3}$ share of each.

*Note*: Division uses the slash operator (/). 

In [13]:
import numpy as np

w = np.array([1/3, 1/3, 1/3])

port_price = prices @ w
print(port_price)

2018-04-09    571.723333
2018-05-09    567.460000
2018-06-09    560.900000
2018-07-09    557.910000
2018-10-09    557.023333
2018-11-09    563.420000
2018-12-09    557.670000
2018-09-13    564.190000
2018-09-14    562.416667
2018-09-17    554.423333
2018-09-18    556.790000
2018-09-19    555.620000
dtype: float64


## Problem: Compute Returns

Compute returns using 

```python
returns = prices.pct_change()
```

which computes the percentage change.

Additionally, extract returns for each name using 

```python
spy_returns = returns['SPY']
```


In [14]:
returns = prices.pct_change()
print(returns)

                 SPY      AAPL      GOOG
2018-04-09       NaN       NaN       NaN
2018-05-09 -0.002691 -0.006525 -0.008789
2018-06-09 -0.003010 -0.016617 -0.012676
2018-07-09 -0.001943 -0.008068 -0.005643
2018-10-09  0.001739 -0.013421 -0.000163
2018-11-09  0.003297  0.025283  0.010922
2018-12-09  0.000242 -0.012419 -0.012350
2018-09-13  0.005914  0.024155  0.010758
2018-09-14  0.000172 -0.011351 -0.002382
2018-09-17 -0.005294 -0.026626 -0.014055
2018-09-18  0.005426  0.001652  0.004472
2018-09-19  0.001822 -0.007331 -0.002101


In [15]:
# First row is missing since no data on Sep 3, can use .dropna() to remove rows with missing values
print(returns.dropna())

                 SPY      AAPL      GOOG
2018-05-09 -0.002691 -0.006525 -0.008789
2018-06-09 -0.003010 -0.016617 -0.012676
2018-07-09 -0.001943 -0.008068 -0.005643
2018-10-09  0.001739 -0.013421 -0.000163
2018-11-09  0.003297  0.025283  0.010922
2018-12-09  0.000242 -0.012419 -0.012350
2018-09-13  0.005914  0.024155  0.010758
2018-09-14  0.000172 -0.011351 -0.002382
2018-09-17 -0.005294 -0.026626 -0.014055
2018-09-18  0.005426  0.001652  0.004472
2018-09-19  0.001822 -0.007331 -0.002101


In [16]:
spy_returns = returns['SPY']
goog_returns = returns['GOOG']
aapl_returns = returns['AAPL']

## Problem: Compute Log Returns

```python
import numpy as np

log_returns = np.log(prices).diff()
```

first difference of the natural log of the prices. Mathematically this is 
$r_{t}=\ln\left(P_{t}\right)-\ln\left(P_{t-1}\right)=\ln\left(\frac{P_{t}}{P_{t-1}}\right)\approx\frac{P_{t}}{P_{t-1}}-1$.

In [17]:
log_returns = np.log(prices).diff()
log_returns

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.002695,-0.006546,-0.008827
2018-06-09,-0.003015,-0.016757,-0.012757
2018-07-09,-0.001945,-0.008101,-0.005659
2018-10-09,0.001737,-0.013512,-0.000163
2018-11-09,0.003292,0.024969,0.010863
2018-12-09,0.000242,-0.012497,-0.012427
2018-09-13,0.005897,0.023868,0.010701
2018-09-14,0.000172,-0.011416,-0.002385
2018-09-17,-0.005308,-0.026987,-0.014155


## Problem: Mean, Standard Deviation and Correlation

Using the function mean, compute the mean of the three returns series one at a time. For example  
```python
goog_mean = goog_returns.mean()
```
Next, compute the mean of the matrix of returns using  

```python
retmean = returns.mean()
```

What is the relationship between these two? Repeat this exercise for the standard deviation (`std()`).
Finally, compute the correlation of the matrix of returns (`corr()`). 

In [18]:
goog_mean = goog_returns.mean()
spy_mean = spy_returns.mean()
aapl_mean = aapl_returns.mean()
print(spy_mean, aapl_mean, goog_mean)

0.0005157696449953846 -0.0046607598305810375 -0.0029096851629990942


In [19]:
returns.mean()

SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

In [20]:
returns.std()

SPY     0.003562
AAPL    0.016113
GOOG    0.008899
dtype: float64

In [21]:
returns.corr()

Unnamed: 0,SPY,AAPL,GOOG
SPY,1.0,0.773894,0.883557
AAPL,0.773894,1.0,0.892719
GOOG,0.883557,0.892719,1.0


## Problem: Summing all elements

Compute the sum of the columns of returns using `.sum()`. How is this related to the mean computed 
in the previous step? 

In [22]:
returns.dropna().mean()



SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

In [23]:
returns.sum() / 11

SPY     0.000516
AAPL   -0.004661
GOOG   -0.002910
dtype: float64

## Problem: Maximum and Minimum Values
Compute the minimum and maximum values of the columns of returns using the `min()` and `max()` commands. 

In [24]:
returns.min()

SPY    -0.005294
AAPL   -0.026626
GOOG   -0.014055
dtype: float64

In [25]:
returns.max()

SPY     0.005914
AAPL    0.025283
GOOG    0.010922
dtype: float64

## Problem: Rounding Up, Down and to the Closest Integer

Rounding up is handled by ceil, rounding down is handled by floor and rounding to the closest 
integer is handled by round. Try all of these commands on 100 times returns. For example,  
```python
rounded = (100*returns).round()
``` 

Use `ceil` and `floor` to round up and down, respectively.

In [26]:
rounded = (100*returns).round()
rounded

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.0,-1.0,-1.0
2018-06-09,-0.0,-2.0,-1.0
2018-07-09,-0.0,-1.0,-1.0
2018-10-09,0.0,-1.0,-0.0
2018-11-09,0.0,3.0,1.0
2018-12-09,0.0,-1.0,-1.0
2018-09-13,1.0,2.0,1.0
2018-09-14,0.0,-1.0,-0.0
2018-09-17,-1.0,-3.0,-1.0


In [27]:
ceiled = np.ceil(100*returns)
ceiled

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.0,-0.0,-0.0
2018-06-09,-0.0,-1.0,-1.0
2018-07-09,-0.0,-0.0,-0.0
2018-10-09,1.0,-1.0,-0.0
2018-11-09,1.0,3.0,2.0
2018-12-09,1.0,-1.0,-1.0
2018-09-13,1.0,3.0,2.0
2018-09-14,1.0,-1.0,-0.0
2018-09-17,-0.0,-2.0,-1.0


In [28]:
floored = np.ceil(100*returns)
floored

Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,,,
2018-05-09,-0.0,-0.0,-0.0
2018-06-09,-0.0,-1.0,-1.0
2018-07-09,-0.0,-0.0,-0.0
2018-10-09,1.0,-1.0,-0.0
2018-11-09,1.0,3.0,2.0
2018-12-09,1.0,-1.0,-1.0
2018-09-13,1.0,3.0,2.0
2018-09-14,1.0,-1.0,-0.0
2018-09-17,-0.0,-2.0,-1.0


## Problem: Element-by-Element Multiplication

Mathematical commands in Python are element-by-element, except the `@` operator which is matrix 
multiplication and uses the rules of linear algebra. 

Multiply the returns of Google and SPY together using the dot operator. 

In [29]:
goog_returns * spy_returns

2018-04-09             NaN
2018-05-09    2.365390e-05
2018-06-09    3.815608e-05
2018-07-09    1.096568e-05
2018-10-09   -2.835778e-07
2018-11-09    3.601436e-05
2018-12-09   -2.990751e-06
2018-09-13    6.363013e-05
2018-09-14   -4.095708e-07
2018-09-17    7.441151e-05
2018-09-18    2.426639e-05
2018-09-19   -3.828182e-06
dtype: float64

## Problem: Save Everything
Save everything created using `dill`.

```python
import dill

dill.dump_session('lesson-3.dill')
```

You can load everything using `dill.load_session('lesson-3.dill')` later if you want to get 
the data back.

In [30]:
import dill

dill.dump_session('lesson-4.dill')
# Show file contents using ls. %ls is an IPython magic function.
%ls *.dill

 Volume in drive C has no label.
 Volume Serial Number is 5303-EBBC

 Directory of c:\git\python-introduction\solutions

09/27/2019  04:25 PM            28,307 lesson-4.dill
               1 File(s)         28,307 bytes
               0 Dir(s)  39,200,886,784 bytes free
