# Saving and Exporting Data

This lesson covers:

* Saving and reloading data

This first block loads the data that was used in the previous lesson.

In [1]:
# Setup: Load the data to use later
import pandas as pd

gs10_csv = pd.read_csv("data/GS10.csv", index_col="DATE", parse_dates=True)
gs10_excel = pd.read_excel("data/GS10.xls", skiprows=10, index_col="observation_date")

## Problem: Export to Excel

Export `gs10_csv` to the Excel file `gs10-exported.xlsx`.


In [2]:
gs10_csv.to_excel("gs10-exported.xlsx")

## Problem: Export to CSV

Export `gs10_excel` to CSV. 

In [3]:
gs10_csv.to_csv("gs10-exported.csv")

## Problem: Export to HDF

Export both to a single HDF file (the closest thing to a "native" format in pandas).

In [4]:
# mode="w" creates a new file for writing
gs10_csv.to_hdf("gs10.h5", "csv", mode="w")
# "a" allows an existing file to be appended to
gs10_excel.to_hdf("gs10.h5", "excel", mode="a")

  gs10_csv.to_hdf("gs10.h5", "csv", mode="w")
  gs10_excel.to_hdf("gs10.h5", "excel", mode="a")


## Problem: Import from HDF 

Import the data saved as HDF and verify it is the same as the original data.

In [5]:
gs10_csv_reloaded = pd.read_hdf("gs10.h5", "csv")
gs10_csv_reloaded.head()

Unnamed: 0_level_0,GS10
DATE,Unnamed: 1_level_1
1953-04-01,2.83
1953-05-01,3.05
1953-06-01,3.11
1953-07-01,2.93
1953-08-01,2.95


In [6]:
gs10_excel_reloaded = pd.read_hdf("gs10.h5", "excel")
gs10_excel_reloaded.head()

Unnamed: 0_level_0,GS10
observation_date,Unnamed: 1_level_1
1953-04-01,2.83
1953-05-01,3.05
1953-06-01,3.11
1953-07-01,2.93
1953-08-01,2.95


## Exercises

### Exercise: Import, export and verify

* Import the data in "data/fred-md.csv"
* Parse the dates and set the index column to "sasdate"
* Remove first row labeled "Transform:" (**Hint**: Transpose, `del` and
  transpose back, or use `drop`)
* Re-parse the dates on the index
* Remove columns that have more than 10% missing values
* Save to "data/fred-md.h5" as HDF.
* Load the data into the variable `reloaded` and verify it is identical.

In [7]:
import pandas as pd

fred = pd.read_csv(
    "data/fred-md.csv", parse_dates=True, index_col="sasdate", date_format="%m-%d-^Y"
)
fred = fred.T
del fred["Transform:"]
fred = fred.T
# Could also use
# fred = fred.drop("Transform:")
fred.index = pd.to_datetime(fred.index)
retain = fred.isna().mean() < 0.10
print(f"Retained {retain.sum()} out of {retain.shape[0]}")
fred = fred.loc[:, retain]

fred.to_hdf("data/fred-md.h5", "fred_md")

reloaded = pd.read_hdf("data/fred-md.h5", "fred_md")
error = (fred - reloaded).abs().max().max()
print(f"The maximum error is {error}")

Retained 124 out of 128
The maximum error is 0.0


  fred.to_hdf("data/fred-md.h5", "fred_md")


### Exercise: Looping Export

Export the columns RPI, INDPRO, and HWI from the FRED-MD data to
`"data/`_variablename_`.csv"` so that, e.g., RPI is exported to `data/RPI.csv`:

**Note** You need to complete the previous exercise first (or at least the first 4 steps).

In [8]:
variables = ["RPI", "INDPRO", "HWI"]

for var in variables:
    csv_name = f"data/{var}.csv"
    # Pass header to silence a warning in pandas 0.25
    fred[var].to_csv(csv_name, header=True)