#  Loading data from `Excel` Spreadsheet using pandas


## Working with `Excel` files in `Python`

We are all familiar with the spreadsheet format of Microsoft `Excel`. 
However, reading the `.xlsx` files into `Python` contains some challenges, primarily because Excel workbooks may contain multiple spreadsheets within a single file.

<img src="../images/sheets.gif">

In this notebook, we are going to explore how to read an `.xlxs` workboook file and how to format it so it is easier to work with. 
We will then do the same with `pandas` and compare the two methods to one another.

----

### About the Data

The Excel workbook is a very small dataset that contains three different spreadsheets. 
The data contains three spreadsheets about musical artists, their albums, and their songs. 
We will be focusing primarily on the "Artist" spreadsheet for today which contains the following attributes:

Attribute | Description
----------|------------
`artist`  | artist's name
`cat`     | category of music (genre)
`year`    | artist's founding date


<a id='excel_pandas'></a>
## Reading the `Excel` file with `pandas`

The steps shown above were rather complicated just to get the Excel workbook file in a format that we could easily work with. Even then, the data will normally require more work before we can start analyzing it. 
One problem is that **all of the cell contents are strings, even though we may want integers**. This is where data frames are more appealing.

A better approach is to use `pandas` to read in the `Excel` workbook, select a sheet, and return a data frame object. This is shown in the comparatively simple code snippet below. It is that simple!

In [4]:
import pandas as pd

pd.read_excel('/dsa/data/all_datasets/Module4Data.xlsx', 
              sheet_name='Artist', 
              names = ['artist', 'cat', 'year'])

Unnamed: 0,artist,cat,year
0,The Rolling Stones,Rock,1962
1,Prince,Rock,1958
2,The Beatles,Rock,1960
3,Nirvana,Grunge,1987
4,Pearl Jam,Grunge,1990
5,Soundgarden,Grunge,1984
6,Red Hot Chili Peppers,Funk Rock,1983
7,Jane’s Addiction,Alternative Rock,1985
8,No Doubt,Ska Punk,1986
9,Bush,Alternative Rock,1992


Since we already know the specific spreadsheet that we want to read in from the Excel workbook, we just pass that as a parameter and voilà! 
Notice that we also use the `names` parameter. This is because the file doesn't contain a header row so we want to create one ourselves. 

The file is now ready for data manipulation!

# Save your notebook, then `File > Close and Halt`