# Importing other file types (Excel etc)
* Excel spreadsheets
* MATLAB file
* SAS files
* Stata files
* HDF5 files

## Importing Pickled Files
* File type native to Python
* Intent: Many datatypes, such as dictionaries and lists, for which it isn't obvious how to store them
    * If you want to merely import a file into Python, you can ***Serialize*** it aka ***Pickling*** them
    * Pickled files are serialized
    * Serialize = convert object to bytestream

```python
import pickle
with open('file_name.pkl', 'rb') as file:
    data = pickle.load(file)
```

## Importing Excel Spreadsheets
There are many ways to import excel spreadsheets. Standard convention is to use ***pandas***

In [1]:
import pandas as pd
file = 'excel_example.xlsx'
data = pd.ExcelFile(file) 
print(type(data))
print(f'Sheet Name: {data.sheet_names}\n')
print(data.parse(data.sheet_names))

<class 'pandas.io.excel.ExcelFile'>
Sheet Name: ['Sheet1']

OrderedDict([('Sheet1',     Name  Age  Profession
0  Billy   16      Waiter
1    Ted   32     Teacher
2   Joey   28  Accountant)])


### Customizing your spreadsheet import

Turning Type ExcelFile to DataFrame

In [2]:
data = pd.read_excel(file)
print(type(data))
data.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Name,Age,Profession
0,Billy,16,Waiter
1,Ted,32,Teacher
2,Joey,28,Accountant


## Importing SAS / Stata files using pandas

### SAS: Statistical Analysis System
* Used in business analytics and biostatistics
* Advanced analytics
* Multivariate analysis
* Business Intelligence
* Data Management
* Predictive Analytics
* Standard for Computational Analysis
    * Extensions: *.sas7bdat*, *.sas7bcat*
        
#### Importing SAS files
```python
import pandas as pd
from sas7bdat import SAS7BDAT
with SAS7BDAT('file_name.sas7bdat') as file:
    df_sas = file.to_data_frame()
```

### Stata: "Statistics" + "data"
* Used in academic social sciences research
    * Extension: *.dta*

#### Importing Stata files
```python
import pandas as pd
data = pd.read_stata('file_name.dta')
```

Note: We do not have to initialize a Context Manager


## Importing HDF5 Files

Hierarchical Data Format version 5
* Stndard for storing large quantities of numerical data
* Datasets can be hundreds of gigabytes or terabytes of data
* HDF5 can scale to exabytes

```python
import h5py
data = h5py.File('file_name.hdf5', 'r')
```

### The structure of HDF5 files

```python
for key in data.keys():
    print(key)
```

## Importing MATLAB files
* Matrix Laboratory
* Industry standard in engineering and science
    * Extension *.mat*


**SciPy**
* scipy.io.loadmat() - read  .mat files
* scipy.io.savemat() - write .mat files

```python
import scipy.io
mat = scipy.io.loadmat('file_name.mat')
```