# Summary

The hardest part about doing financial workloads in Python is that their Excel spreadsheets are not formatted for data science. Here are some of the specific issues that make it hard for Fin groups to use Python:

* *Data is not Tidy:* Excel spreadsheets are formatted for presentation not data science. People include blank lines and cells to make the sheets look pretty. But data science (Python and all the BI tools) expects data to be in Tidy Format. Tidy data conforms to three rules: 
    * Each variable must have its own column.
    * Each observation must have its own row.
    * Each value must have its own cell.
* *No transparency:* Excel spreadsheets don’t have comments making it hard to trace back why certain transformations occurred. In Python all transformations can be commented to allow for greater transparency.
* *No single source of truth:* Excel spreadsheets rarely connect to a source of record. Instead, they pull values from other spreadsheets using Power Query or other tools. Tracing back all the occasions for a cell is very difficult.
* *Different date standards:* Some Python work loads use work weeks instead of dates. This is an issue because there are multiple standards one could use to align work weeks to dates. Data science algortithms usually expect dates to be in a standard DateTime format. 


Openpyxl is a library that allows users to create, open, edit, and save Excel files. After completing this tutorial, you will know:
* How to use Openpyxl to open and manipuate an Excel workbook in Python.

Let’s get started.

# Tutorial Overview
This tutorial is divided into 2 parts:
1. How to use Openpyxl
2. When to use Openpyxl

# How to use Openpyxl

In [7]:
import openpyxl

file = "data/XL-02-PC-05-Cleaning-Up-Data-Before.xlsx"
wb = openpyxl.load_workbook(file)
wb.sheetnames

['Orders', 'Sales-Reps', 'Companies', 'Regions', 'Summary', 'Pivots']

In [8]:
# create a new worksheet

wb.create_sheet() 

<Worksheet "Sheet">

In [9]:
# create a new worksheet, make it the first sheet in the workbook and name it

wb.create_sheet(index=0, title='First Sheet')

<Worksheet "First Sheet">

In [10]:
# save the worbook with the same name as the origional

wb.save(file) 

# When to use

Openpyxl make it easy to work with Excel workbooks in Python. If the goal is just do do data analysis, then using Pandas is likely the better option (`df = pd.read_excel(file)`). 

An advantage of Openpyxl is the ability to move between worksheets within a workbook. 

# Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.
* Describe three examples when Openpyxl would be better than using Excel directly.
* Complete the next example that uses Openpyxl to clean a dataset. 

# Further Reading
This section provides more resources on the topic if you are looking to go deeper.

## Books
* Automate the boring stuff, by Al Sweigart. Chapter 13. https://automatetheboringstuff.com/2e/chapter13/

## APIs
* Openpyxl. https://openpyxl.readthedocs.io/en/stable/

# Summary

In this tutorial, you were introduced to the Openpyxl library. Specifically, you learned:
* How to use Openpyxl to open and manipuate an Excel workbook in Python.

# Next

In the next section, you will use Openpyxl to clean up a workbook. 