# EXCEL
Excel is an extremely common format nearly everyone is familiar with, however managing Excel files within Excel can be a mess. Autonomously managing them can be achieved easily in Python and yields many benefits, albeit Excel is not particularly the best choice for managing data and you're better off using a relational database or NoSQL solution. 

This notebook will go through importing Excel files locally using `pandas`. Manipulating the data, followed by exporting to a new file, all while maintaining the lineage of the files.

### Import pandas
We will use pandas to manipulate the data once its stored in a `pandas.DataFrame`

In [1]:
import pandas

### Set the path
This is randomly generated data

In [2]:
file_path = '../inputs/customer_dummy_data.xls'

### Store the data by creating an instance of `pandas.ExcelFile`

In [3]:
data = pandas.ExcelFile(file_path)

### Retrieving sheet names
Some Excel files have multiple sheets, using this method ensures you can programmatically store all sheet names.

In [4]:
data.sheet_names

['customers', 'departments']

### Storing the actual data
The data from each sheet can be retrieved and stored using `.parse` method. Below I store the data for both sheets. You can also do this in a looping fashion and store the `DataFrame`s in something like a dictionary or array.

In [5]:
customer_data = data.parse('customers')

In [6]:
department_data = data.parse('departments')

### Preview data
So you know what youre working with.

In [7]:
customer_data.head()

Unnamed: 0,customer_name,customer_email,join_date,customer_country,customer_guid
0,Lucian,dictum.eu.eleifend@justofaucibus.co.uk,"Dec 25, 2019",Luxembourg,694782D6-D43E-6408-41AE-66051506E79F
1,Len,mi.Aliquam.gravida@idliberoDonec.com,"Jul 20, 2018",Saint Pierre and Miquelon,16F5F202-9BF5-FDAC-FF45-6686BCEC7F3F
2,Ria,tempus.non.lacinia@ornarelectusjusto.ca,"Jul 5, 2019",Senegal,F4D3A063-5803-2547-AD06-C3EEEA39F2BB
3,Grady,felis.Nulla.tempor@ridiculusmusDonec.ca,"Feb 23, 2019",El Salvador,7FA29C6D-F937-6DCF-983B-34C867FC2991
4,Joseph,nunc.ac.mattis@eros.net,"Feb 19, 2019",French Southern Territories,DF7C0532-67D7-06BF-13CB-0189DE2B2B0C


In [8]:
department_data.head()

Unnamed: 0,department_id,department_name,department_start_date
0,0,accounting,2011-07-26
1,1,human resources,2012-07-26
2,2,research and development,2012-02-10
3,3,legal,2012-03-08


### Making a few small transformations 
Im just going to do an uppercase transform on both `DataFrame`s for now.

In [9]:
customer_data['customer_name'] = customer_data['customer_name'].astype(str).str.upper()
customer_data['customer_country'] = customer_data['customer_country'].astype(str).str.upper()
department_data['department_name'] = department_data['department_name'].astype(str).str.upper()

### Preview
Now we can see the customer_country, customer_name and department_name have been transformed to uppercase

In [10]:
customer_data.head()

Unnamed: 0,customer_name,customer_email,join_date,customer_country,customer_guid
0,LUCIAN,dictum.eu.eleifend@justofaucibus.co.uk,"Dec 25, 2019",LUXEMBOURG,694782D6-D43E-6408-41AE-66051506E79F
1,LEN,mi.Aliquam.gravida@idliberoDonec.com,"Jul 20, 2018",SAINT PIERRE AND MIQUELON,16F5F202-9BF5-FDAC-FF45-6686BCEC7F3F
2,RIA,tempus.non.lacinia@ornarelectusjusto.ca,"Jul 5, 2019",SENEGAL,F4D3A063-5803-2547-AD06-C3EEEA39F2BB
3,GRADY,felis.Nulla.tempor@ridiculusmusDonec.ca,"Feb 23, 2019",EL SALVADOR,7FA29C6D-F937-6DCF-983B-34C867FC2991
4,JOSEPH,nunc.ac.mattis@eros.net,"Feb 19, 2019",FRENCH SOUTHERN TERRITORIES,DF7C0532-67D7-06BF-13CB-0189DE2B2B0C


In [11]:
department_data.head()

Unnamed: 0,department_id,department_name,department_start_date
0,0,ACCOUNTING,2011-07-26
1,1,HUMAN RESOURCES,2012-07-26
2,2,RESEARCH AND DEVELOPMENT,2012-02-10
3,3,LEGAL,2012-03-08


### Create the export path name
Note: this file does not exist yet

In [12]:
export_path = '../../customer_data_transformed.xlsx'

### Exporting
In order to export you will have to use the `pandas.ExcelWriter` object. The reason why I use this and not `pandas.DataFrame.to_excel` is because this has support for multiple sheets. Writing two `DataFrames` to the writer object then saving to disk can be done like so. If you don't want the index column, specify `index=false`

In [14]:
with pandas.ExcelWriter(export_path) as writer:
    customer_data.to_excel(writer, sheet_name='customer_data_transformed', index_label='row_index')
    department_data.to_excel(writer, sheet_name='department_data_transformed', index_label='row_index')