# Importing other file types

Python supports interacting with the file system natively through the `os` library, which we need to import `import os`.

In [2]:
import os

wd = os.getcwd() # fetch the name of the current working directory
os.listdir(wd) # output the directory contents as a list in the shell

['04-other-file-types.ipynb',
 '.ipynb_checkpoints',
 '02-flat-files.ipynb',
 '03-importing-data-using-pandas.ipynb',
 'data',
 '01-text-files.ipynb']

## Pickled file

Reference: https://www.datacamp.com/community/tutorials/pickle-python-tutorial

There are a number of datatypes that cannot be saved easily to flat files, such as lists and dictionaries. If you want your files to be human readable, you may want to save them as text files (in some clever manner) or JSON files, which are appropriate for Python dictionaries.

However, if you merely want to be able to import them into Python, you can `serialize` them. All this means is converting the object into a sequence of bytes, or a bytestream.

You need to import the `pickle` package to create/read pickled files.

In [14]:
import pickle

# object to be picked
dogs_dict = { 'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16 }

# serialize the object
file = open('data/pickled_dogs', mode='wb') # open file object, write in binary mode
pickle.dump(dogs_dict, file)
file.close()

# load a picked file - unpickle
with open('data/pickled_dogs', mode='rb') as file: # set mode to read binary
    contents = pickle.load(file)
    
print(contents)
print(type(contents))

{'Ozzy': 3, 'Filou': 8, 'Luna': 5, 'Skippy': 10, 'Barco': 12, 'Balou': 9, 'Laika': 16}
<class 'dict'>


## Excel spreadsheets

We can import an excel spreadsheet into a pandas with the `.ExcelFile()` method.

- `sheet_names` retrieve a list of the sheet names.
- `parse()` load excel sheet into a dataframe, you need to import the xls file into pandas first before importing individual sheets into a dataframe.

In [2]:
import pandas as pd

xl = pd.ExcelFile('data/battledeath.xlsx')
print(type(xl))

<class 'pandas.io.excel.ExcelFile'>


In [3]:
# list excel sheet names
xl.sheet_names

['2002', '2004']

You can now import the individual sheets into a pandas dataframe. You'll be able to do so by specifying either the sheet's name or its index(0 indexed).

In [4]:
# using sheet name
df_xl = xl.parse('2002')
print(type(df_xl))
df_xl.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,"War, age-adjusted mortality due to",2002
0,Afghanistan,36.08399
1,Albania,0.128908
2,Algeria,18.31412
3,Andorra,0.0
4,Angola,18.96456


In [5]:
# using sheet index
df_xl = xl.parse(1)
print(type(df_xl))
df_xl.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,War(country),2004
0,Afghanistan,9.451028
1,Albania,0.130354
2,Algeria,3.407277
3,Andorra,0.0
4,Angola,2.597931


When parsing spreadsheets we can use additional arguments to skip rows, `skiprows` - 0 indexed, rename columns, `names`, and select only particular columns, `usecols` - 0 indexed. All these arguments can be assigned lists containing the specific row numbers, strings and column numbers, as appropriate.

In [28]:
# Parse the first sheet,  skip the first row of data and rename the columns: df1
df1 = xl.parse(0, skiprows=[0], names=['Country', 'AAM due to War (2002)'])

# Print the head of the DataFrame df1
print(df1.head())

# Parse the first column of the second sheet and rename the column: df2
df2 = xl.parse(1, skiprows=[0], usecols=[0], names=['Country'])

# Print the head of the DataFrame df2
print(df2.head())

               Country  AAM due to War (2002)
0              Albania               0.128908
1              Algeria              18.314120
2              Andorra               0.000000
3               Angola              18.964560
4  Antigua and Barbuda               0.000000
               Country
0              Albania
1              Algeria
2              Andorra
3               Angola
4  Antigua and Barbuda
