# Goodreads Data Analysis
### Michael Barmada, mdb120@pitt.edu
---

In [1]:
import pandas as pd
import nltk

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

First thing's first, we've got to load in our data. We'll call the new data frame reviews_df.

In [2]:
reviews_df = pd.read_csv('compiled_df.csv')

In [3]:
reviews_df.head()
reviews_df.tail()

Unnamed: 0,book_title,rating,text
0,The Way of Kings,5,I have a Booktube channel now! Subscribe here:...
1,The Way of Kings,3,"A three and a half star read.""What?"" Sanderson..."
2,The Way of Kings,5,"WOW. Ok, so I actually cried during this book ..."
3,The Way of Kings,5,Watch this space for an updated 2020 review! Y...
4,The Way of Kings,5,Me? Giving a Brandon Sanderson's book 5 stars?...


Unnamed: 0,book_title,rating,text
92,Dune,2,I picked this book up at last thanks to my bud...
93,Dune,2,I think teenage me would have liked this a lot...
94,Dune,1,I know this book is HUGE in the science-fictio...
95,Dune,1,Meh! Only finished this out of sheer stubbornn...
96,Dune,2,The Best Buddy Un-Read with Silvana and Elena ...


As we can see, the data from book_scraper.py is a little unorganized. We'll have to fix that, but first let's add on to our data...

In [4]:
reviews_df['toks'] = reviews_df['text'].map(lambda t: len(nltk.word_tokenize(t))) # Adds token column for each review
reviews_df['types'] = reviews_df['text'].map(lambda t: len(set(nltk.word_tokenize(t)))) # Adds type column
reviews_df['ttr'] = reviews_df.types/reviews_df.toks # Calculates ttr for each review

In [5]:
reviews_df.head()

Unnamed: 0,book_title,rating,text,toks,types,ttr
0,The Way of Kings,5,I have a Booktube channel now! Subscribe here:...,946,393,0.415433
1,The Way of Kings,3,"A three and a half star read.""What?"" Sanderson...",597,317,0.530988
2,The Way of Kings,5,"WOW. Ok, so I actually cried during this book ...",182,108,0.593407
3,The Way of Kings,5,Watch this space for an updated 2020 review! Y...,35,32,0.914286
4,The Way of Kings,5,Me? Giving a Brandon Sanderson's book 5 stars?...,12,11,0.916667


Now that our dataframe is a little more fleshed out, let's clean it up. First, we'll give it a MultiIndex with book_title and rating as our two indices. Then, we'll sort the reviews themselves in descending order based on score.

In [6]:
reviews_df.set_index(["book_title", "rating"], inplace = True)
reviews_df.sort_values(['book_title','rating'], ascending = False, inplace = True)

In [7]:
reviews_df.sort_values(['book_title','rating'], ascending = False, inplace = True)

In [8]:
reviews_df.head(15)
reviews_df.tail(15)

Unnamed: 0_level_0,Unnamed: 1_level_0,text,toks,types,ttr
book_title,rating,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
The Way of Kings,5,I have a Booktube channel now! Subscribe here:...,946,393,0.415433
The Way of Kings,5,"WOW. Ok, so I actually cried during this book ...",182,108,0.593407
The Way of Kings,5,Watch this space for an updated 2020 review! Y...,35,32,0.914286
The Way of Kings,5,Me? Giving a Brandon Sanderson's book 5 stars?...,12,11,0.916667
The Way of Kings,5,"\n\n“In the end, all men die. How you lived wi...",2577,820,0.318199
The Way of Kings,5,My video review https://youtu.be/mDxFNJ1P_ek,6,6,1.0
The Way of Kings,5,"So, a buddy of mine has been trying to get me ...",429,219,0.51049
The Way of Kings,5,Reread 2020 Bridge 4 is everything! I even own...,289,131,0.453287
The Way of Kings,5,4/29/18: even better the second time. Reviewed...,1853,573,0.309228
The Way of Kings,5,"The Way Of Kings (The Stormlight Archive,#1)Ho...",73,50,0.684932


Unnamed: 0_level_0,Unnamed: 1_level_0,text,toks,types,ttr
book_title,rating,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Dune,2,"I've loved science fiction my whole life, but ...",259,148,0.571429
Dune,2,1st reread (also a buddy read with the World B...,366,211,0.576503
Dune,2,What has mood to do with it? You fight when th...,279,177,0.634409
Dune,2,"Woo, I conquered the Dunes, crossed the sands ...",977,390,0.399181
Dune,2,I picked this book up at last thanks to my bud...,650,288,0.443077
Dune,2,I think teenage me would have liked this a lot...,416,219,0.526442
Dune,2,The Best Buddy Un-Read with Silvana and Elena ...,248,159,0.641129
Dune,1,If this is the gold standard against which all...,1127,488,0.433008
Dune,1,I am so glad I finally fulfilled my half of th...,193,128,0.663212
Dune,1,I read a few years ago that if you've not foun...,724,306,0.422652


Much better!