# Introduction
In this module, we will use Python to read from and write to Excel files. Excel is perhaps the most popular busies technology. As such, it is immensely useful that we be able to interact with Excel by either extracting data from or importing data to Excel files. To do this, we will be using the openpyxl library. This library will give us direct access to excel files. 

# Excel
To be effective in our use of the openpyxl library, we need to establish some terminology that is common to Excel: An Excel document is called a workbook. A single workbook is saved in a file with the .xlsx extension. Data in the workbook are stored on worksheets. Each workbook will have one or more sheet (or worksheet). The sheet the user is currently viewing (or last viewed before closing Excel) is called the active sheet. Each sheet is a collection of cells which are arranged in columns and rows. Columns are listed alphabetically such that column ‘A’ is the first column. Rows are arranged numerically such that the first row is number 1. Cells may contain number or text values.

In [None]:
!pip install openpyxl

We start by installing the openpyxl library. The screenshot below illustrates this installation in PyCharm. In the Project Settings, click on 'Project Interpreter' and click the '+' button. Depending on whether you have a Windows or Apple device, the '+' button may be on the right or at the bottom. <br>
<img src="https://thislondonhouse.com/Jupyter/Images/11-Excel-01.png" width="50%" /><br>
Then search for ‘openpyxl’ and click install package.<br>
<img src="https://thislondonhouse.com/Jupyter/Images/11-Excel-02.png" width="50%" />

## Accessing Workbooks
We open excel files using the openpyxl.load_workbook() method. This method accepts a string representing the location of an excel file and returns a workbook object. For more information see here: https://openpyxl.readthedocs.io/en/stable/tutorial.html

In [None]:
import openpyxl

In [None]:
xlsxFile = 'support/censuspopdata.xlsx'
wb = openpyxl.load_workbook(xlsxFile)

Alternatively, we can create an empty workbook using the openpyxl.Workbook() method.

In [None]:
wbEmpty = openpyxl.Workbook()

For the remainder of this session, we will be using the empty workbook and we will populate it with data from the novels.txt file.

## Accessing Worksheets
Workbooks are made up of one or more worksheets. Within openpyxl, you can query the number and name of available worksheets. You can also create and delete worksheets. We can access a list of worksheet names by referencing the property .sheetnames. This property provides a list of strings that represent the sheets in the workbook. Because it is a list, it has a length and you can access individual values with index values.

In [None]:
type(wbEmpty)

In [None]:
wbEmpty.sheetnames

In [None]:
sheetList = wbEmpty.sheetnames
print(len(sheetList))
print(sheetList)

In [None]:
wbEmpty[sheetList[0]].title

In [None]:
wbEmpty['Sheet'].title

In [None]:
wbEmpty.active.title

We create sheets using the .create_sheet() method. This method expects a string value which represents the title of the sheet. The method returns a reference to the created worksheet.

In [None]:
wbEmpty.create_sheet("Top English Novels")

In [None]:
ws = wbEmpty.create_sheet("Another Sheet")

In [None]:
ws.title

We can delete worksheets using the .remove_sheet() method. This method expects as worksheet variable as an input parameter. The worksheet that is passed to this method is then deleted.

In [None]:
wbEmpty.remove(ws)

Worksheets are referenced using string keys which represent the title of the worksheet. The resulting variable is a worksheet object which has various methods and parameters. 

In [None]:
ws = wbEmpty["Top English Novels"]

In [None]:
ws.title

In [None]:
ws.max_row

In [None]:
ws.max_column

In [None]:
ws.min_row

In [None]:
ws.min_column

We can add elements to worksheets cell-by-cell or row-by-row. The append method is used to add rows of data. The append method expects to receive a list or tuple of values with the first element being placed in column A. Each row is appended below the bottommost row. We will look at add data to individual cells in the following section.

In [None]:
ws.append(["Author", "Title", "Year", "Rank"])

In [None]:
data = []
data.append("Author")
data.append("Title")
data.append("Year")
data.append("Rank")

In [None]:
data

In [None]:
ws.append(data)

In [None]:
ws.delete_rows(2)

In [None]:
ws.delete_rows(2,5)

Before we move on to accessing cells, we need to populate the worksheet. The following lines write the contents of our novelList to the worksheet.

In [None]:
novelList = []
with open("support/novels.txt") as fileHandler:
    for line in fileHandler:
        rank = line[:line.find(".")].strip()
        title = line[line.find(".") + 1:line.find("(")].strip()
        author = line[line.find("(") + 1:line.find(",", line.find("("))].strip()
        year = line.strip()[-5:-1]
        novelList.append([author, title, year, rank])

In [None]:
len(novelList)

In [None]:
type(novelList[0])

In [None]:
len(novelList[0])

In [None]:
novelList[0]

In [None]:
novelList[0][0]

In [None]:
for novel in novelList:
    print("Appending ", novel)
    ws.append(novel)

## Accessing Cells
Openpyxl offers a variety of methods for accessing cells. In the following code, we will look at methods for accessing single cells, rows of cell and columns of cells. First, you can access cells by referencing column/row position of the cell. In Excel, columns are given character values while rows are given numeric values. You can use these to access a specific cell.

In [None]:
print(ws['A1'])

In [None]:
cellRef = "C" + str(ws.max_row)
print(ws[cellRef])

Accessing a cell in this way retrieves a cell object. This object has properties and methods that give your application to various cell attributes.

In [None]:
cell = ws[cellRef]

In [None]:
cell.value

In [None]:
cell.coordinate

In [None]:
cell.row

In [None]:
cell.column

It is important to keep this in mind as you will want access the .value of a cell and not the cell itself. This makes sense if you think about all of the things you can do to cells beyond simply entering a value (e.g., labels, colors, formatting, etc.), but it is not how we are accustomed to thinking about cells in Excel. 

Second, we can access cells by row. When accessing cells by row, we retrieve one or more rows that contains one or more cells. Below are three different approaches to accessing rows. The first retrieves all rows and iterates through each row. The second retrieves a single row based on it’s string representation. The final method retrieves a subset of rows between some set of boundaries.

In [None]:
for row in ws.rows:
    print(row)

In [None]:
for cell in ws['22']:
    print(cell.value)

In [None]:
for row in ws.iter_rows(min_row=2, max_row=20):
    print(row)

Finally, we can access cells by column. Just as when accessing cells by row, we retrieve one or more row that contains one or more cells. For columns, we have the same alternatives available to us: accessing all columns, accessing a specific column, accessing some subset of columns based on boundary columns.

In [None]:
for col in ws.columns:
    print(col)

In [None]:
for cell in ws['D']:
    print(cell.value)

In [None]:
for col in ws.iter_cols(min_col=2, max_col=4):
    print(col)

The row/column approach can be combined to iterate through all cells by iterating down rows and then across columns.

In [None]:
for col in ws.columns:
    for cell in col:
        print(cell.value)

We can add data to our worksheets in a similar manner. In the example below, we add a header to cell E1 and then add a formula to all cells in that column.

In [None]:
ws['E1'].value = "Age"
for cell in ws['E']:
    if cell.row > 1:
        cell.value = "=2019 - C" + str(cell.row)


## Creating Charts
Openpyxl gives full access to Excel, including the ability to create charts. For more information on creating charts, please refer to the documentation: https://openpyxl.readthedocs.io/en/stable/charts/introduction.html.
In the following example, we create a chart from data in our workbook. 

First, we transform the novel list and group novels by decade. 

In [None]:
decadesDict = {}
for cell in ws['C']:
    if cell.row > 1:
        decadesDict.setdefault(str(10 * int(int(cell.value)/10)),0)
        decadesDict[str(10 * int(int(cell.value)/10))] += 1

In [None]:
decadesDict

We then add this data to a new worksheet and build a bar chart from the results.

In [None]:
wbEmpty.create_sheet('Chart')
wsChart = wbEmpty['Chart']
wsChart.append(['Decade', 'Count'])
for decade in sorted(decadesDict.keys()):
    wsChart.append([decade, decadesDict[decade]])

The code above initializes the chart as a Bar Chart (“bar” and 11 are values that represent the type of bar chart we are creating). The chart title and the axes titles are similarly set.

In [None]:
from openpyxl.chart import BarChart, Reference

In [None]:
chart = BarChart()
chart.type = "bar"
chart.style = 11
chart.title = "Novels by Decade"
chart.y_axis.title = 'Novels Published'
chart.x_axis.title = 'Decade'

These lines identify the data that will be graphed (the counts, including the column header). And the categories that will be used to group the data (the decades, not including the column header). This is done by constructing a string that represents the range for the data. The string should resemble the type of string used to identify a range of cells in Excel: ‘Chart’!B1:B25. 

In [None]:
data = Reference(wsChart, range_string=wsChart.title + "!B1:B" + str(wsChart.max_row))
categories = Reference(wsChart, range_string=wsChart.title + "!A2:A" + str(wsChart.max_row))

The following lines define additional chart parameters including height, width and position of the chart.

In [None]:
chart.add_data(data, titles_from_data=True)
chart.set_categories(categories)
chart.height = 20
chart.width = 10
chart.shape = 4
wsChart.add_chart(chart, "D1")

<img src="https://thislondonhouse.com/Jupyter/Images/11-Excel-03.png" width="50%" />

## Saving Workbooks
To save your workbook, simply call the .save() method and pass in a file name.

In [None]:
wbEmpty.save('support/novels.xlsx')