![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/68501079-0695df00-023c-11ea-841f-455dac84a089.jpg"
    style="width:400px; float: right; margin: 0 40px 40px 40px;"></img>

# Complementary file types and IO tools

The pandas I/O API has a set of top level functions to let us work with a wide variety of file types.

In this lesson we'll show some file types pandas can work with besides the most known CSV, JSON and XLSX types.

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [None]:
import pandas as pd

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Pickling

When it comes to something like machine learning, after training your models, these models can't be saved to a .txt or .csv file, becouse it's an object with complex binary data.

Luckily, in programming, there are various terms for the process of saving binary data to a file that can be accessed later. In Python, this is called **pickling**. You may know it as serialization, or maybe even something else.

For our lucky, pandas handles _pickles_ in its IO module, and all pandas objects are equipped with the `to_pickle` and `read_pickle` methods.

In [None]:
df = pd.DataFrame([[1,2,3], [4,5,6]],
                  columns=['A','B','C'])

df

The `to_pickle` method uses Python's `cPickle` module to save data structures to disk using the pickle format.

In [None]:
df.to_pickle('out.pkl')

In [None]:
!cat out.pkl

The `read_pickle` method can be used to load any pickled pandas object (or any other pickled object) from file:

In [None]:
df = pd.read_pickle('out.pkl')

In [None]:
df

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Clipboard content

A handy way to grab data is to use the `read_clipboard` method, which takes the contents of the clipboard buffer and passes them to the `read_table` method.

For instance, you can copy the following text to the clipboard (CTRL-C on many operating systems):

```
  A B C
x 1 4 p
y 2 5 q
z 3 6 r
```

And then import the data directly to a `DataFrame` by calling `read_clipboard` method.

In [None]:
df = pd.read_clipboard()

In [None]:
df

The `to_clipboard` method can be used to write the contents of a `DataFrame` to the clipboard.

Following which you can paste the clipboard contents into other applications (CTRL-V on many operating systems).

In [None]:
df.to_clipboard()

In [None]:
pd.read_clipboard()

We can see that we got the same content back, which we had earlier written to the clipboard.

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## SAS files

The top-level function `read_sas()` can read (but not write) SAS xport (.XPT) and (since v0.18.0) SAS7BDAT (.sas7bdat) format files.

SAS files only contain two value types: ASCII text and floating point values (usually 8 bytes but sometimes truncated). For xport files, there is no automatic type conversion to integers, dates, or categoricals. For SAS7BDAT files, the format codes may allow date variables to be automatically converted to dates. By default the whole file is read and returned as a `DataFrame`.

We are going to load the `airline.sav7bdat` file into a pandas `DataFrame` using the `read_sas` method.

In [None]:
!cat airline.sas7bdat

In [None]:
df = pd.read_sas('airline.sas7bdat')

In [None]:
df.head()

We can also load the same file from a given URL:

In [None]:
sas_url = 'http://www.principlesofeconometrics.com/sas/airline.sas7bdat'

In [None]:
df = pd.read_sas(sas_url)

In [None]:
df.head()

Plot a variable:

In [None]:
df.loc[:,'Y'].plot()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## STATA files

The top-level function `read_stata` will read a DTA file and return either a DataFrame or a **StataReader** that can be used to read the file incrementally.

We are going to load the `broiler.dta` file into a pandas `DataFrame` using the `read_stata` method.

In [None]:
!cat broiler.dta

In [None]:
df = pd.read_stata('broiler.dta')

In [None]:
df.head()

We can also load the same file from a given URL:

In [None]:
stata_url = 'http://www.principlesofeconometrics.com/stata/broiler.dta'

In [None]:
df = pd.read_stata(stata_url)

In [None]:
df.head()

Plot the Consumer Price Index (CPI):

In [None]:
df.loc[:,'cpi'].plot()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Google BigQuery

Google BigQuery data can be loaded using pandas `read_gbq` method.

This method requires the `pandas-gbq` package and a _BigQuery project_ (you can create a new one from [here](https://console.cloud.google.com/bigquery)).

In [None]:
!pip install pandas_gbq

In [None]:
import pandas_gbq

In [None]:
sql = """
    SELECT name, SUM(number) as count
    FROM `bigquery-public-data.usa_names.usa_1910_current`
    GROUP BY name
    ORDER BY count DESC
    LIMIT 10
"""

pandas_gbq.read_gbq(sql,
                    project_id='MY_PROJECT_ID')

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)