# Introduction to other file types

1) Pickle & Excel spreadsheets

2) Matlab files

3) SAS files

4) Stata files

5) HDF5 files

![Pickled%20Files.PNG](attachment:Pickled%20Files.PNG)

## Pickled files

In [41]:
# Pickling Data With Python!
# https://www.youtube.com/watch?v=Pl4Hp8qwwes

import pickle
import numpy as np

grades = {'Ian': ['A*', 'A*', 'A', 'B'],
         'Sam': ['A', 'A', 'B', 'A*'],
         'Sarah': ['A*', 'A*', 'A*', 'A']}


with open("grades.pkl", 'wb') as pickle_file:
    pickle.dump(grades, pickle_file)

In [42]:
with open("./Data/grades.pkl", 'rb') as pickle_file:
    new_data = pickle.load(pickle_file)

In [43]:
new_data

{'Ian': ['A*', 'A*', 'A', 'B'],
 'Sam': ['A', 'A', 'B', 'A*'],
 'Sarah': ['A*', 'A*', 'A*', 'A']}

## Importing Excel spreadsheets

In [22]:
import pandas as pd

file = './Data/battledeath.xlsx'
data = pd.ExcelFile(file)
print(data.sheet_names)

['2002', '2004']


In [27]:
df1 = data.parse('2002')   # sheet name, as a string
df2 = data.parse(0)        # sheet index, as a float
print(df1)

    War, age-adjusted mortality due to       2002
0                          Afghanistan  36.083990
1                              Albania   0.128908
2                              Algeria  18.314120
3                              Andorra   0.000000
4                               Angola  18.964560
..                                 ...        ...
187                          Venezuela   0.000000
188                            Vietnam   0.040222
189                        Yemen, Rep.   0.074510
190                             Zambia   0.044548
191                           Zimbabwe  33.796200

[192 rows x 2 columns]


## Importing SAS/Stata files using pandas

![SAS.PNG](attachment:SAS.PNG)

In [33]:
# Import SAS (sas7bdat) files into Python
# https://www.youtube.com/watch?v=coW7RUEw9PU&t=64s

import pandas as pd
df = pd.read_sas('./Data/sales.sas7bdat')
df.head()

Unnamed: 0,YEAR,P,S
0,1950.0,12.9,181.899994
1,1951.0,11.9,245.0
2,1952.0,10.7,250.199997
3,1953.0,11.3,265.899994
4,1954.0,11.2,248.5


## Importing Stata files

In [32]:
df = pd.read_stata('./Data/disarea.dta')
df.head()

Unnamed: 0,wbcode,country,disa1,disa2,disa3,disa4,disa5,disa6,disa7,disa8,...,disa16,disa17,disa18,disa19,disa20,disa21,disa22,disa23,disa24,disa25
0,AFG,Afghanistan,0.0,0.0,0.76,0.73,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
1,AGO,Angola,0.32,0.02,0.56,0.0,0.0,0.0,0.56,0.0,...,0.0,0.4,0.0,0.61,0.0,0.0,0.99,0.98,0.61,0.0
2,ALB,Albania,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16
3,ARE,United Arab Emirates,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,ARG,Argentina,0.0,0.24,0.24,0.0,0.0,0.23,0.0,0.0,...,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.11


## Importing HDF5 files

![HDF5.PNG](attachment:HDF5.PNG)

In [35]:
import h5py
filename = './Data/L1.hdf5'
data = h5py.File(filename, 'r')   # 'r' is to read
print(type(data))

<class 'h5py._hl.files.File'>


In [36]:
for key in data.keys():
    print(key)

meta
quality
strain


In [37]:
print(type(data['meta']))

<class 'h5py._hl.group.Group'>


In [38]:
for key in data['meta'].keys():
    print(key)

Description
DescriptionURL
Detector
Duration
GPSstart
Observatory
Type
UTCstart


## Importing MATLAB files

![Scipy.PNG](attachment:Scipy.PNG)

In [39]:
import scipy.io

filename = './Data/presents.mat'
mat = scipy.io.loadmat(filename)
print(type(mat))

<class 'dict'>
