# Importing Data

This lesson covers:

* Importing data 
* Converting dates 
* Saving data

## Problem: Reading in data with Dates

Read in the files `GS10.csv` and `GS10.xlsx` which have both been downloaded from
[FRED](https://fred.stlouisfed.org/).

In [1]:
import pandas as pd

gs10_csv = pd.read_csv('data/GS10.csv', index_col='DATE', parse_dates=True)
print(gs10_csv.head())

gs10_excel = pd.read_excel('data/GS10.xls', skiprows=10, index_col='observation_date')
print(gs10_excel.head())

            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95
                  GS10
observation_date      
1953-04-01        2.83
1953-05-01        3.05
1953-06-01        3.11
1953-07-01        2.93
1953-08-01        2.95


## Problem: Converting Dates

1. Load the CSV file without converting the dates in `read_csv`.
2. Convert the date column, remove it from the DataFrame, and set it as the index. 

In [2]:
gs10_csv = pd.read_csv('data/GS10.csv')
index = pd.to_datetime(gs10_csv['DATE'])
gs10_csv.index = index
del gs10_csv['DATE']
print(gs10_csv.head())

gs10_csv = pd.read_csv('data/GS10.csv')
# Pop gets a column AND removes it from the frame
index = pd.to_datetime(gs10_csv.pop('DATE'))
gs10_csv.index = index
print(gs10_csv.head())


            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95
            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95


## Problem: Export to Excel, CSV, HDF, and Pickle.

1. Export both `gs10_excel` and `gs10_csv` to the same Excel file
2. Export `gs10_excel` to CSV. 
3. Export both to an HDF file (the closest thing to a "native" format in pandas)
4. Export `gs10_excel` to a pickle file.
5. Combine `gs10_excel` and `gs10_csv` into a dictionary and pickle the dictionary.

In [3]:
writer = pd.ExcelWriter('gs10-combined.xlsx', mode='w')
gs10_csv.to_excel(writer,sheet_name='csv-data')
gs10_excel.to_excel(writer,sheet_name='excel-data')
writer.close()

print(pd.read_excel('gs10-combined.xlsx', 'excel-data').head())
print(pd.read_excel('gs10-combined.xlsx', 'csv-data').head())

  observation_date  GS10
0       1953-04-01  2.83
1       1953-05-01  3.05
2       1953-06-01  3.11
3       1953-07-01  2.93
4       1953-08-01  2.95
        DATE  GS10
0 1953-04-01  2.83
1 1953-05-01  3.05
2 1953-06-01  3.11
3 1953-07-01  2.93
4 1953-08-01  2.95


In [4]:
gs10_csv.to_csv('gs10-exported.csv')

In [5]:
# mode='w' creates a new file for writing
gs10_csv.to_hdf('gs10.h5', 'csv', mode='w')
# 'a' allows an existing file to be appended to
gs10_excel.to_hdf('gs10.h5', 'excel', mode='a')

%ls gs10.h5

 Volume in drive C has no label.
 Volume Serial Number is C49F-D3A7

 Directory of C:\git\python-introduction\solutions

09/26/2019  12:45 PM            38,888 gs10.h5
               1 File(s)         38,888 bytes
               0 Dir(s)  57,588,236,288 bytes free


In [6]:
gs10_excel.to_pickle('gs10.pkl')
%ls gs10.pkl

 Volume in drive C has no label.
 Volume Serial Number is C49F-D3A7

 Directory of C:\git\python-introduction\solutions

09/26/2019  12:45 PM            13,693 gs10.pkl
               1 File(s)         13,693 bytes
               0 Dir(s)  57,588,219,904 bytes free


In [7]:
out = {'gs10_excel': gs10_excel, 'gs10_csv': gs10_csv}
import pickle
with open('gs10-combined.pkl', 'wb') as pkl_file:
    pickle.dump(out, pkl_file)

## Problem: Import from HDF and Pickle.

Import the data saved in steps 3-5 of the previous problem.

In [8]:
gs10_csv_reloaded = pd.read_hdf('gs10.h5', 'csv')
print(gs10_csv_reloaded.head())

gs10_excel_reloaded = pd.read_pickle('gs10.pkl')
print(gs10_excel_reloaded.head())

# Open for reading only, and inform that the file is binary (not text)
with open('gs10-combined.pkl', 'rb') as pkl_file:
    pickle.load(pkl_file)

            GS10
DATE            
1953-04-01  2.83
1953-05-01  3.05
1953-06-01  3.11
1953-07-01  2.93
1953-08-01  2.95
                  GS10
observation_date      
1953-04-01        2.83
1953-05-01        3.05
1953-06-01        3.11
1953-07-01        2.93
1953-08-01        2.95
