# Good Reads Summary

#### The objective of this assignment is for you to explain what is happening in each cell in clear, understandable language. 

#### _There is no need to code._ The code is there for you, and it already runs. Your task is only to explain what each line in each cell does.

#### The placeholder cells should describe what happens in the cell below it.

**Example**: The cell below imports `pandas` as a dependency because `pandas` functions will be used throughout the program, such as the Pandas `DataFrame` as well as the `read_csv` function.

In [1]:
import pandas as pd

The cell below establishes the path to the csv file the program will be reading. It creates a new variable, `goodreads_df`, and creates a dataframe from the csv file. UTF-8 encoding is done to ensure all unicode text is in a form understood by Python. We then preview the first five rows with the `.head()` function.

In [2]:
goodreads_path = "Resources/books_clean.csv"

goodreads_df = pd.read_csv(goodreads_path, encoding="utf-8")
goodreads_df.head()

Unnamed: 0,ISBN,Publication Year,Original Title,Authors,One Star Reviews,Two Star Reviews,Three Star Reviews,Four Star Reviews,Five Star Reviews
0,439023483,2008.0,The Hunger Games,Suzanne Collins,66715,127936,560092,1481305,2706317
1,439554934,1997.0,Harry Potter and the Philosopher's Stone,"J.K. Rowling, Mary GrandPré",75504,101676,455024,1156318,3011543
2,316015849,2005.0,Twilight,Stephenie Meyer,456191,436802,793319,875073,1355439
3,61120081,1960.0,To Kill a Mockingbird,Harper Lee,60427,117415,446835,1001952,1714267
4,743273567,1925.0,The Great Gatsby,F. Scott Fitzgerald,86236,197621,606158,936012,947718


The cell below invokes the `.unique()` function with respect to the Authors column in our csv file to find all of the unique authors, then by using `len()` takes a count of the number of unique authors.

We use `.min()` and `.max()` on the Publication Year column to find the earliest and latest years in the data set, respectively.

To calculate the total number of reviews, we use the `iloc` function to start at the end of the columns and count back four in order to look at the columns counting reviews by number of stars. By setting the axis to 1, we are taking the sum along the row. We then add up the sum of all of the rows and set this value to `total_reviews`.

In [3]:
author_count = len(goodreads_df["Authors"].unique())

earliest_year = goodreads_df["Publication Year"].min()
latest_year = goodreads_df["Publication Year"].max()

goodreads_df['Total Reviews'] = goodreads_df.iloc[:, 4:].sum(axis=1)
total_reviews = sum(goodreads_df['Total Reviews'])

The cell below creates a new summary dataframe using a dictionary of values, including the `author_count` list, then prints the created table.

In [4]:
summary_table = pd.DataFrame({"Total Unique Authors": [author_count],
                              "Earliest Year": earliest_year,
                              "Latest Year": latest_year,
                              "Total Reviews": total_reviews})
summary_table

Unnamed: 0,Earliest Year,Latest Year,Total Reviews,Total Unique Authors
0,-1750.0,2017.0,596873216,4664
