# Importing Data

This lesson covers:

* Importing data 
* Converting dates 

## Problem: Reading in data with Dates

Read in the files `GS10.csv` and `GS10.xls` which have both been downloaded
from [FRED](https://fred.stlouisfed.org/).

In [1]:
import pandas as pd

gs10_csv = pd.read_csv("data/GS10.csv", index_col="DATE", parse_dates=True)
print(gs10_csv.head())

gs10_excel = pd.read_excel("data/GS10.xls", skiprows=10, index_col="observation_date")
print(gs10_excel.head())

            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95
                  GS10
observation_date      
1953-04-01        2.83
1953-05-01        3.05
1953-06-01        3.11
1953-07-01        2.93
1953-08-01        2.95


## Problem: Converting Dates

1. Load the CSV file without converting the dates in `read_csv`.
2. Convert the date column, remove it from the DataFrame, and set it as the
   index. 

In [2]:
gs10_csv = pd.read_csv("data/GS10.csv")
index = pd.to_datetime(gs10_csv["DATE"])
gs10_csv.index = index
del gs10_csv["DATE"]
print(gs10_csv.head())

gs10_csv = pd.read_csv("data/GS10.csv")
# Pop gets a column AND removes it from the frame
index = pd.to_datetime(gs10_csv.pop("DATE"))
gs10_csv.index = index
print(gs10_csv.head())

            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95
            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95


## Exercises

### Exercise: Selectively Load Columns

1. Load the data in `data/fred-md.csv` in the columns sasdate,
   RPI and INDPRO using the `usecols` keyword.
2. Remove the first row by selecting the second to the end.
3. Convert sasdate to dates
4. Set sasdate as the index and remove it from the `DataFrame`.

In [3]:
import pandas as pd

data = pd.read_csv("data/fred-md.csv", usecols=["sasdate", "RPI", "INDPRO"])
data = data.iloc[1:]
index = pd.to_datetime(data.pop("sasdate"))
data.index = index
data.head()

Unnamed: 0_level_0,RPI,INDPRO
sasdate,Unnamed: 1_level_1,Unnamed: 2_level_1
1959-01-01,2437.296,22.625
1959-02-01,2446.902,23.0681
1959-03-01,2462.689,23.4004
1959-04-01,2478.744,23.8989
1959-05-01,2493.228,24.2589


### Exercise: Load and Merge multiple Sheets

1. Load the data on the sheet "Long Mat" in the Excel file "data/exercise.xlsx". 
   These are 10 and 20 year constant maturity yields.
2. Load the data on the sheet "Short Mat" in the Excel file "data/exercise.xlsx".
   These are 1 and 3 year constant maturity yields.
3. Combine the columns in the two `DataFrame`s by creating a dictionary of the keys in
   each with the values equal to the column names.

In [4]:
long = pd.read_excel("data/exercise.xlsx", sheet_name="Long Mat")
short = pd.read_excel("data/exercise.xlsx", sheet_name="Short Mat")

data = {}
for col in long:
    data[col] = long[col]
for col in short:
    data[col] = short[col]

data = pd.DataFrame(data)
data.head()

Unnamed: 0,observation_date,GS10,GS20,GS1,GS3
0,1953-04-01,2.83,3.08,2.36,2.51
1,1953-05-01,3.05,3.18,2.48,2.72
2,1953-06-01,3.11,3.21,2.45,2.74
3,1953-07-01,2.93,3.12,2.38,2.62
4,1953-08-01,2.95,3.1,2.28,2.58
