# Series and DataFrames

This lesson covers:

* Manually inputting data in scalars, vectors, and matrices 
* Basic mathematical operations 
* Saving and loading data 

<a id="stock-data"></a>
## Data
September 2018 prices (adjusted closing prices) for the S&P 500 EFT (SPY), Apple (AAPL) and 
Google (GOOG) are listed below:



| Date   | SPY Price | AAPL Price | GOOG Price | 
|:-------|----------:|-----------:|-----------:| 
| Sept4  | 289.81    | 228.36     | 1197.00    | 
| Sept5  | 289.03    | 226.87     | 1186.48    | 
| Sept6  | 288.16    | 223.10     | 1171.44    | 
| Sept7  | 287.60    | 221.30     | 1164.83    | 
| Sept10 | 288.10    | 218.33     | 1164.64    | 
| Sept11 | 289.05    | 223.85     | 1177.36    | 
| Sept12 | 289.12    | 221.07     | 1162.82    | 
| Sept13 | 290.83    | 226.41     | 1175.33    | 
| Sept14 | 290.88    | 223.84     | 1172.53    | 
| Sept17 | 289.34    | 217.88     | 1156.05    | 
| Sept18 | 290.91    | 218.24     | 1161.22    | 
| Sept19 | 291.44    | 216.64     | 1158.78    | 

**Prices in September 2018**
 

## Problem: Input scalar data

Create 3 variables, one labeled `spy`, one labeled `aapl` and one labeled `goog` that contain the
September 4 price of the asset. For example, to enter the Google data
```python
goog = 1197.00
```

In [1]:
spy = 289.81
aapl = 228.36
goog = 1197.00

## Problem: Print the values
Print the values of the three variables you created in the previous step using `print`.

In [2]:
print(spy)
print(aapl)
print(goog)
print(spy, aapl, goog)

289.81
228.36
1197.0
289.81 228.36 1197.0


## Problem: Print the values with formatting
Print the values of the three variables you created in the previous step using format strings following
the pattern TICKER: Value. For example, you can print the value of Google using `print(f'GOOG: {goog}')`.

In [3]:
print(f'GOOG: {goog}')
print(f'SPY: {spy}')
print(f'AAPL: {aapl}')

GOOG: 1197.0
SPY: 289.81
AAPL: 228.36


## Problem: Input a Vector

Create vectors for each of the days in the [Table](#stock-data) named `sep_xx` where `xx` is the 
numeric date. For example,  
```python
import pandas as pd

sep_04 = pd.Series([289.81,228.36,1197.00], index=['SPY','AAPL','GOOG']);
```

In [4]:
import pandas as pd

sep_04 = pd.Series([289.81, 228.36, 1197.00])
sep_05 = pd.Series([289.03, 226.87, 1186.48])
sep_06 = pd.Series([288.16, 223.10, 1171.44])
sep_07 = pd.Series([287.60, 221.30, 1164.83])
sep_10 = pd.Series([288.10, 218.33, 1164.64])
sep_11 = pd.Series([289.05, 223.85, 1177.36])
sep_12 = pd.Series([289.12, 221.07, 1162.82])
sep_13 = pd.Series([290.83, 226.41, 1175.33])
sep_14 = pd.Series([290.88, 223.84, 1172.53])
sep_17 = pd.Series([289.34, 217.88, 1156.05])
sep_18 = pd.Series([290.91, 218.24, 1161.22])
sep_19 = pd.Series([291.44, 216.64, 1158.78])


## Problem: Create a Vector of Dates

Use the pandas function `pd.to_datetime` to convert a list of string dates to a pandas 
`DateTimeIndex`, which can be used to set dates in other arrays. For example, the first two dates 
are
```python
dates_2 = pd.to_datetime(['4-9-2018','5-9-2018'])
print(dates_2)
```
which produces
```python
DatetimeIndex(['2018-04-09', '2018-05-09'], dtype='datetime64[ns]', freq=None)
```

Create a vector containing all of the dates in the table.

In [5]:
dates = pd.to_datetime(['4-9-2018','5-9-2018','6-9-2018','7-9-2018',
                        '10-9-2018','11-9-2018','12-9-2018','13-9-2018','14-9-2018',
                        '17-9-2018','18-9-2018','19-9-2018'])
print(dates)

DatetimeIndex(['2018-04-09', '2018-05-09', '2018-06-09', '2018-07-09',
               '2018-10-09', '2018-11-09', '2018-12-09', '2018-09-13',
               '2018-09-14', '2018-09-17', '2018-09-18', '2018-09-19'],
              dtype='datetime64[ns]', freq=None)


## Problem: Input a Vector with Dates

Create vectors for each of the ticker symbols in [Table](#stock-data) named spy, aapl and 
goog, respectively. Use the variable `dates` that you created in the previous step. 

For example

```python
goog = pd.Series([1197.00,1186.48,1171.44,...], index=dates)
```

In [6]:
goog = pd.Series([1197.00, 1186.48, 1171.44, 1164.83, 1164.64, 1177.36,
                  1162.82, 1175.33, 1172.53, 1156.05, 1161.22, 1158.78],
                 index=dates, name='GOOG') 
aapl = pd.Series([228.36, 226.87, 223.10, 221.30, 218.33, 223.85,
                  221.07, 226.41, 223.84, 217.88, 218.24, 216.64],
                 index=dates, name='AAPL')
spy = pd.Series([289.81, 289.03, 288.16, 287.60, 288.10, 289.05,
                 289.12, 290.83, 290.88, 289.34, 290.91, 291.44],
                index=dates, name='SPY')

## Problem: Create a DataFrame

Create a DataFrame named `prices` containing [Table](#stock-data). Set the column names equal to 
the ticker and set the index to the dates you created previously.

```python
prices = pd.DataFrame([[289.81, 228.36, 1197.00], [289.03, 226.87, 1186.48]],
                      columns = ['SPY', 'AAPL', 'GOOG'],index=dates_2)
```

In [7]:
prices = pd.DataFrame([[289.81, 228.36, 1197.00],
                       [289.03, 226.87, 1186.48],
                       [288.16, 223.10, 1171.44],
                       [287.60, 221.30, 1164.83],
                       [288.10, 218.33, 1164.64],
                       [289.05, 223.85, 1177.36],
                       [289.12, 221.07, 1162.82],
                       [290.83, 226.41, 1175.33],
                       [290.88, 223.84, 1172.53],
                       [289.34, 217.88, 1156.05],
                       [290.91, 218.24, 1161.22],
                       [291.44, 216.64, 1158.78]],
                      columns=['SPY', 'AAPL', 'GOOG'], index=dates)


## Problem: Construct a DataFrame from Series

Create a second DataFrame named prices_row from the row vectors previously entered such that 
the results are identical to prices. For example, the first two days worth of data are

```python
pricess_row = pd.DataFrame([Sep04, Sep05])
# Set the index after using concat to join
pricess_row.index = dates_2
```

Create a third DataFrame named prices_col from the 3 column vectors entered such that the results 
are identical to prices
```python
prices_col = pd.DataFrame([SPY,APPL,GOOG]).T
```

*Note*: The `.T` above transposes the 2-d array since `DataFrame` builds the array by rows.

Verify that all three matrices are identical by printing the difference, e.g., 

```python
print(pricescol - prices)
```

and that all elements are 0. 

In [8]:
prices_row = pd.DataFrame([sep_04, sep_05, sep_06, sep_07, sep_10, sep_11,
                           sep_12, sep_13, sep_14, sep_17, sep_18, sep_19])
prices_row.columns = ['SPY', 'AAPL', 'GOOG']
prices_row.index = dates
print(prices_row)
print(prices - prices_row)
# No need to set the index or the column names since index
# and name set in Series
prices_col = pd.DataFrame([spy, aapl, goog]).T
print(prices - prices_col)
prices_col


               SPY    AAPL     GOOG
2018-04-09  289.81  228.36  1197.00
2018-05-09  289.03  226.87  1186.48
2018-06-09  288.16  223.10  1171.44
2018-07-09  287.60  221.30  1164.83
2018-10-09  288.10  218.33  1164.64
2018-11-09  289.05  223.85  1177.36
2018-12-09  289.12  221.07  1162.82
2018-09-13  290.83  226.41  1175.33
2018-09-14  290.88  223.84  1172.53
2018-09-17  289.34  217.88  1156.05
2018-09-18  290.91  218.24  1161.22
2018-09-19  291.44  216.64  1158.78
            SPY  AAPL  GOOG
2018-04-09  0.0   0.0   0.0
2018-05-09  0.0   0.0   0.0
2018-06-09  0.0   0.0   0.0
2018-07-09  0.0   0.0   0.0
2018-10-09  0.0   0.0   0.0
2018-11-09  0.0   0.0   0.0
2018-12-09  0.0   0.0   0.0
2018-09-13  0.0   0.0   0.0
2018-09-14  0.0   0.0   0.0
2018-09-17  0.0   0.0   0.0
2018-09-18  0.0   0.0   0.0
2018-09-19  0.0   0.0   0.0
            SPY  AAPL  GOOG
2018-04-09  0.0   0.0   0.0
2018-05-09  0.0   0.0   0.0
2018-06-09  0.0   0.0   0.0
2018-07-09  0.0   0.0   0.0
2018-10-09  0.0   0.0   0.0


Unnamed: 0,SPY,AAPL,GOOG
2018-04-09,289.81,228.36,1197.0
2018-05-09,289.03,226.87,1186.48
2018-06-09,288.16,223.1,1171.44
2018-07-09,287.6,221.3,1164.83
2018-10-09,288.1,218.33,1164.64
2018-11-09,289.05,223.85,1177.36
2018-12-09,289.12,221.07,1162.82
2018-09-13,290.83,226.41,1175.33
2018-09-14,290.88,223.84,1172.53
2018-09-17,289.34,217.88,1156.05


Save the price data

This block saves prices to a HDF file for use in later lessons.

In [9]:
# Setup: Save prices, goog and sep_04 into a single file for use in other lessons

# Only run if prices has been defined
if 'prices' in globals():
    with pd.HDFStore('lesson-6.h5', mode='w') as h5:
        h5.put('prices', prices)
        h5.put('goog', goog)
        h5.put('sep_04', sep_04)