![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/python_logo.png)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-Excel-Files-As-Pandas-DataFrames" data-toc-modified-id="Load-Excel-Files-As-Pandas-DataFrames-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load Excel Files As Pandas DataFrames</a></span><ul class="toc-item"><li><span><a href="#Pandas-read_excel" data-toc-modified-id="Pandas-read_excel-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Pandas read_excel</a></span></li><li><span><a href="#How-To-Write-Pandas-DataFrames-to-Excel-Files" data-toc-modified-id="How-To-Write-Pandas-DataFrames-to-Excel-Files-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>How To Write Pandas DataFrames to Excel Files</a></span></li></ul></li></ul></div>

# Load Excel Files As Pandas DataFrames
One of the ways that you’ll often used to import your files when you’re working with them for data science is with the help of the Pandas package. As we saw previously, the Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.

This powerful and flexible library is very frequently used by (aspiring) data scientists to get their data into data structures that are highly expressive for their analyses.

If you already have Pandas available through Anaconda, you can just load your files in Pandas DataFrames with pd.Excelfile():

In [125]:
# Import pandas
import pandas as pd

# Load spreadsheet
xl = pd.ExcelFile('https://github.com/fjvarasc/DSPXI/blob/master/data/IMDB-Movie-Data.xlsx?raw=true')
# Print the sheet names
print(xl.sheet_names)


['2010', '2009', '2008', '2007', '2006']


In [126]:
# Load a sheet into a DataFrame by name: df1
df1 = xl.parse('2010')

df1.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,81,Inception,"Action,Adventure,Sci-Fi","A thief, who steals corporate secrets through ...",Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...",2010,148,8.8,1583625,292.57,74.0
1,139,Shutter Island,"Mystery,Thriller","In 1954, a U.S. marshal investigates the disap...",Martin Scorsese,"Leonardo DiCaprio, Emily Mortimer, Mark Ruffal...",2010,138,8.1,855604,127.97,63.0
2,142,Diary of a Wimpy Kid,"Comedy,Family",The adventures of a teenager who is fresh out ...,Thor Freudenthal,"Zachary Gordon, Robert Capron, Rachael Harris,...",2010,94,6.2,34184,64.0,56.0
3,159,Scott Pilgrim vs. the World,"Action,Comedy,Fantasy",Scott Pilgrim must defeat his new girlfriend's...,Edgar Wright,"Michael Cera, Mary Elizabeth Winstead, Kieran ...",2010,112,7.5,291457,31.49,69.0
4,220,Kick-Ass,"Action,Comedy",Dave Lizewski is an unnoticed high school stud...,Matthew Vaughn,"Aaron Taylor-Johnson, Nicolas Cage, Chlo√´ Gra...",2010,117,7.7,456749,48.04,66.0


## Pandas read_excel
We import the pandas module, including ExcelFile. The method read_excel() reads the data into a Pandas Data Frame, where the first parameter is the filename and the second parameter is the sheet.

The list of columns will be called df.columns.

In [130]:
from pandas import ExcelWriter
from pandas import ExcelFile
 
#df = pd.read_excel('IMDB-Movie-Data.xlsx', sheet_name='2010')
df = pd.read_excel('https://github.com/fjvarasc/DSPXI/blob/master/data/IMDB-Movie-Data.xlsx?raw=true', sheet_name='2010')
print("Column headings:")
print(df.columns)

Column headings:
Index(['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore'],
      dtype='object')


Using the data frame, we can get all the rows below an entire column as a list. To get such a list, simply use the column header

In [131]:
print(df['Title'])

0                                             Inception
1                                        Shutter Island
2                                  Diary of a Wimpy Kid
3                           Scott Pilgrim vs. the World
4                                              Kick-Ass
5                                             Predators
6     Percy Jackson & the Olympians: The Lightning T...
7                                            Black Swan
8                                She's Out of My League
9                                    The Social Network
10                                           Robin Hood
11                                        Despicable Me
12                                              Tangled
13         Harry Potter and the Deathly Hallows: Part 1
14                                          Srpski film
15                                       Blue Valentine
16                                           Iron Man 2
17                                             T

## How To Write Pandas DataFrames to Excel Files
Let’s say that after your analysis of the data, you want to write the data back to a new file. There’s also a way to write your Pandas DataFrames back to files with the to_excel() function.

But, before you use this function, make sure that you have the XlsxWriter installed if you want to write your data to multiple worksheets in an .xlsx file:
```
# Install `XlsxWriter` 
pip install XlsxWriter
```

In [111]:
# Specify a writer
writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter')

# Write your DataFrame to a file     
df1.to_excel(writer, 'Sheet1')

# Save the result 
writer.save()

Note that in the code chunk above, you use an ExcelWriter object to output the DataFrame.

Stated differently, you pass the writer variable to the to_excel() function and you also specify the sheet name. This way, you add a sheet with the data to an existing workbook: you can use the ExcelWriter to save multiple, (slightly) different DataFrames to one workbook.

This all means that if you just want to save one DataFrame to a file, you can also go without installing the XlsxWriter package. Then, you just don’t specify the engine argument that you would pass to the pd.ExcelWriter() function. The rest of the steps stay the same.

Similarly to the functions that you used to read in .csv files, you also have a function to_csv() to write the results back to a comma separated file. It again works much in the same way as when you used it to read in the file:


In [112]:
# Write the DataFrame to csv
df1.to_csv("example.csv")

If you want to have a tab separated file, you can also pass a \t to the sep argument to make this clear. Note that there are various other functions that you can use to output your files. You can find all of them [here](http://pandas.pydata.org/pandas-docs/stable/api.html#id12).