# Good Reads Summary

#### The objective of this assignment is for you to explain what is happening in each cell in clear, understandable language. 

#### _There is no need to code._ The code is there for you, and it already runs. Your task is only to explain what each line in each cell does.

#### The placeholder cells should describe what happens in the cell below it.

**Example**: The cell below imports `pandas` as a dependency because `pandas` functions will be used throughout the program, such as the Pandas `DataFrame` as well as the `read_csv` function.

In [None]:
import pandas as pd

In [None]:
# names a variable (goodreads_path) identifying where (the path) the csv file can be found (in the Resources folder) 
goodreads_path = "Resources/books_clean.csv"

# reads the csv file, using the path name defined earlier, into a dataframe using commonly used encoding
goodreads_df = pd.read_csv(goodreads_path, encoding="utf-8")

# displays the first 5 rows of the dataframe
goodreads_df.head()

In [None]:
# inside the len(): identifes the unique authors in the dataframe (in the column) - 
# adding len() around the result provides the length of the result - i.e., the count of the unique authors (no authors doublecounted)
author_count = len(goodreads_df["Authors"].unique())

# finds the minimum and maximum values in the "Publication Year" column to identify the earliest/latest year of publication
earliest_year = goodreads_df["Publication Year"].min()
latest_year = goodreads_df["Publication Year"].max()

# Finds the "Reviews" columns in the dataframe by identifying the index (all rows, 5th column onwards) 
## of the first column (One Star Reviews) and continuing along the row (axis=1)
goodreads_df['Total Reviews'] = goodreads_df.iloc[:, 4:].sum(axis=1)

# adds up all 1-5 star review counts and provides the number as an output
total_reviews = sum(goodreads_df['Total Reviews'])
total_reviews

In [None]:
# creates a new summary dataframe using variables defined above and ascribing new column titles (e.g., Total Unique Authors)
summary_table = pd.DataFrame({"Total Unique Authors": [author_count],
                              "Earliest Year": earliest_year,
                              "Latest Year": latest_year,
                              "Total Reviews": total_reviews})

# prints the summary dataframe as an output
summary_table