![Books](Books.jpg)


# [ ![nbviewer](https://raw.githubusercontent.com/jupyter/design/master/logos/Badges/nbviewer_badge.svg) ](https://nbviewer.org/github/alisonnanjez/A-Decade-of-Literary-Tastes/blob/main/notebook.ipynb)

## Introduction

This project delves into the literary landscape of the early 21st century by analyzing the **Top Goodreads Books Collection dataset** from [Kaggle](https://www.kaggle.com/datasets/cristaliss/ultimate-book-collection-top-100-books-up-to-2023) 

Focusing on the decade between 2000 and 2013, this analysis aims to uncover the prevailing genres favored by Goodreads users and explore the patterns of reader engagement across these genres. By examining the trends in genre representation among top-rated books and the corresponding levels of ratings and votes, this project seeks to provide insights into the evolving "tastes" of the Goodreads community during this dynamic period in online book discovery and discussion. 

The core analysis is conducted using Python in this Jupyter Notebook, with key findings and overarching trends visually represented through interactive dashboards and charts created in Power BI. Screenshots of these visualizations are integrated within this notebook to provide immediate insights, and a link to the full, interactive Power BI report is also provided for further exploration.

## The data

In [5]:
import pandas as pd

goodreads = pd.read_csv("goodreads_top100.csv")

display(goodreads)

Unnamed: 0.1,Unnamed: 0,isbn,title,series_title,series_release_number,authors,publisher,language,description,num_pages,format,genres,publication_date,rating_score,num_ratings,num_reviews,current_readers,want_to_read,price,url
0,0,9780689830594,Summer Story,Brambly Hedge,2,Jill Barklem,Atheneum,English,It was such a hot summer. The sky was deep blu...,32,Hardcover,"['Picture Books', 'Childrens', 'Fiction', 'Ani...","January 1, 1980",4.45,1017.0,74.0,7.0,512.0,3.49,https://www.goodreads.com/book/show/421572.Sum...
1,1,9780375704970,The Lake of Darkness,,,Ruth Rendell,Vintage Crime/Black Lizard,English,Martin Urban is a quiet bachelor with a comfor...,210,Paperback,"['Mystery', 'Fiction', 'Crime', 'Thriller', 'B...","January 1, 1980",3.76,1388.0,114.0,77.0,623.0,4.99,https://www.goodreads.com/book/show/83394.The_...
2,2,9780345446671,Beyond the Blue Event Horizon,Heechee Saga,2,Frederik Pohl,Ballantine Books,English,"In Book Two of the Heechee Saga, Robinette Bro...",336,Paperback,"['Science Fiction', 'Fiction', 'Space Opera', ...","January 1, 1980",3.95,13307.0,339.0,181.0,3961.0,11.99,https://www.goodreads.com/book/show/373399.Bey...
3,3,9780446403016,St. Peter's Fair,Chronicles of Brother Cadfael,4,Ellis Peters,Mysterious Press,English,A pause in the civil war offers Shrewsbury's t...,217,Mass Market Paperback,"['Mystery', 'Historical Fiction', 'Fiction', '...","May 1, 1981",4.12,10493.0,593.0,1298.0,2502.0,0.00,https://www.goodreads.com/book/show/751755.St_...
4,4,9780425198773,Twice Shy,,,Dick Francis,G.P. Putnam's Sons,English,A computerized horse-betting system falls into...,304,Mass Market Paperback,"['Mystery', 'Fiction', 'Thriller', 'Crime', 'S...","January 1, 1981",3.92,4188.0,174.0,162.0,642.0,8.99,https://www.goodreads.com/book/show/103250.Twi...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4395,4395,9781451648218,Murder Your Employer: The McMasters Guide to H...,,,Rupert Holmes,Avid Reader Press / Simon & Schuster,English,A devilish thriller with a killer concept: The...,389,Hardcover,"['Mystery', 'Fiction', 'Thriller', 'Mystery Th...","February 21, 2023",3.89,20992.0,3479.0,4033.0,84900.0,12.99,https://www.goodreads.com/book/show/61272658-m...
4396,4396,9781250826978,System Collapse,The Murderbot Diaries,7,Martha Wells,Tor Publishing Group/Tordotcom,English,Everyone's favorite lethal SecUnit is back.\nF...,245,Hardcover,"['Science Fiction', 'Fiction', 'Audiobook', 'A...","November 13, 2023",4.24,26566.0,3479.0,3609.0,36600.0,11.99,https://www.goodreads.com/book/show/65211701-s...
4397,4397,9781496737298,"Georgie, All Along",,,Kate Clayborn,Kensington,English,A wise and witty new novel that echoes with ti...,340,Paperback,"['Romance', 'Fiction', 'Contemporary', 'Contem...","January 24, 2023",3.82,48031.0,6590.0,3180.0,76300.0,0.00,https://www.goodreads.com/book/show/60604190-g...
4398,4398,9780525619994,Sword Catcher,Sword Catcher,1,Cassandra Clare,Del Rey Books,English,"In the vibrant city-state of Castellane, the r...",624,Hardcover,"['Fantasy', 'Romance', 'Adult', 'Young Adult',...","October 10, 2023",3.84,14991.0,3523.0,4485.0,149000.0,14.99,https://www.goodreads.com/book/show/36679274-s...


The **Top Goodreads Books Collection** dataset from Kaggle offers a rich array of information for each book, providing a multifaceted view of literary works and reader engagement. 

Key features include 'ISBN' codes for identification, the 'Title' of each book, and details on 'Series' and 'Release Number' for books belonging to a collection. 

The dataset also specifies the 'Publisher', the 'Language' of the book, and the 'Author(s)'. Crucially for this project, the 'Genres' column offers insights into thematic categorization, while 'Publication Date' provides historical context. 

Reader reception is captured through the 'Rating' and 'Number of Voters', indicating average sentiment and engagement volume, respectively. 

Additional columns such as 'Current Readers', 'Want to Read', and 'Price' offer further context on the book's popularity and market information. 

Considering the objectives of this project, which aim to explore genre trends, key authors, language diversity, and the role of price within the top Goodreads books from 2000 to 2013, certain columns are more relevant than others. 

The 'URL' and 'Description' columns do not directly contribute to these analytical goals and can be excluded to streamline the dataset. 

Similarly, 'ISBN', 'Series', 'Release Number', 'Publisher', 'Num Pages', and 'Format' are less pertinent to understanding the overarching trends in genre, author prominence, language popularity, and the influence of price on reader reception. 

Therefore, these columns will be considered for removal to focus the analysis on 'Title', 'Publication Date', 'Genres', 'Rating', 'Number of Voters', 'Author', 'Language', and 'Price', which are crucial for addressing the project's objectives.

## Data Cleaning

To prepare the book dataset for analysis in Power BI, a two-phased cleaning process was employed. Initially, in Microsoft Excel, unnecessary columns were removed to streamline the data. 

The language column was excluded due to the overwhelming prevalence of English, and the price column was removed to address a significant number of missing values. 

Additionally, the publication_date column's data type was corrected to ensure proper date formatting, and the data was filtered to include only books published between 2000 and 2013, inclusive. 

Subsequently, the dataset was loaded into Power Query within Excel for further transformation. Specifically, the genres column, which contained genre lists in a string format, was split into multiple columns based on the comma delimiter. 

These newly created genre columns were then unpivoted, converting the data into a long format where each row represents a book and a single genre, thus facilitating accurate genre-based analysis in Power BI.

## Analysis

### Overview Page of the Analysis

![Overview Page](Overview%20Page.JPG)


### The primary book genres represented in Goodreads' top-rated books between 2000 and 2013 and how their prevalence changed year-over-year.

![Genre Prevalance and Trends](Genre%20Prevalance%20and%20Trends.JPG)


The analysis  reveals that Fiction (973 books), Fantasy (661), Romance (576), Young Adult (488), and Contemporary (339) are the primary genres represented.  While Fiction, Fantasy, and Romance maintain a consistent presence throughout the period, with each occupying a substantial portion of the top-rated selections each year, Young Adult shows a noticeable increase in prevalence, particularly in the later years. In contrast, Contemporary has a smaller representation compared to the other top genres. Although there are some year-to-year fluctuations in the proportion of each genre, these general trends highlight the evolving genre landscape within highly rated books during this timeframe

### The most frequently appearing authors in the top-rated books and the genres they are most associated with.

![Key Authors and Genre Associations](Key%20Authors%20and%20Genre%20Associations.JPG)


The analysis of frequently appearing authors in top-rated books reveals that Richelle Mead (16 books), Charlaine Harris (15), Meg Cabot (12), Patricia Briggs (12), and Stephen King (12) are the most prominent. The visualization, which ranks these authors by their number of top-rated books and illustrates their genre breakdown, shows that some authors tend to focus on specific genres, while others write across multiple ones. For instance, Richelle Mead is associated with Fantasy, Paranormal, and Romance, while Charlaine Harris is associated with Fiction. Meg Cabot writes in both Fiction and Romance, Patricia Briggs in Fantasy and Paranormal, and Stephen King primarily in Fiction.

### The average ratings and number of votes received by top books across different genres and by the most frequent authors.

![Reader Engagement by Genre and Author](Reader%20Engagement%20by%20Genre%20and%20Author.JPG)


Analysis of reader engagement reveals variations in average ratings and the number of ratings received by top books, both across different genres and among the most frequent authors.  Genres with the highest average rating scores include Comic Strips (4.81), Halloween (4.45), MMF (4.44), Church (4.43), and Criticism (4.39), while authors with the highest average rating scores are Bill Watterson (4.81), Roger K. Driscoll (4.59), Patrick Rothfuss (4.53), and Kathryn Stockett (4.47).  In contrast, the highest average number of ratings is associated with authors Suzanne Collins (3,837,144.50), Kathryn Stockett (2,734,236.00), Khaled Hosseini (2,580,942.67), and Alice Sebold (2,338,321), and with genres Finance (846,383), Greek Mythology (728,922.23), Spain (627,501), Dystopia (545,715.63), and Coming of Age (542,227.06).  These bar charts illustrate that high average ratings do not always align with a high volume of ratings, suggesting that some genres and authors may be highly rated by a smaller, more niche audience, while others achieve broader popularity with a larger number of ratings, though perhaps with slightly lower average ratings.

### The relationship between publication Year, author, and reader engagement (ratings, number of votes).

![Publication Year,Author and Reader Engagement](Publication%20Year,Author%20and%20Reader%20Engagement.JPG)


Analysis of the relationship between publication date, author, and reader engagement reveals that average rating scores fluctuated between 3.93 in 2009 and 4.15 in 2013, while the average number of ratings varied from a low of 85,717.16 in 2004 to a high of 321,020.51 in 2005. The line charts illustrate these year-over-year changes in average ratings and number of ratings. Additionally, the bubble chart visualizes the relationship between average rating score, average number of ratings, and the number of books for the top authors. As shown in the bubble chart, Bill Watterson has the highest average rating, while Suzanne Collins has the highest average number of ratings, indicating a large readership.

### Titles with Ratings Greater Than 4.5

The table provides a list of specific book titles that meet this criterion, along with their rating scores and authors.

![Titles with ratings greater than 4.5](Titles%20with%20ratings%20greater%20than%204.5.JPG)


## Summary and Conclusion

**Summary**

This analysis of Goodreads' top-rated books from 2000 to 2013 reveals that Fiction, Fantasy, and Romance are consistently prevalent genres, while Young Adult shows a notable increase in representation over the years.  Authors like Richelle Mead and Charlaine Harris appear frequently, with distinct genre associations.  Reader engagement, as measured by average ratings and number of ratings, varies across genres and authors, highlighting that high ratings don't always equate to high popularity, and also uncovering the presence of niche genres like MMF, Church, and Spain.

**Conclusion**

In conclusion, this report provides insights into the trends and patterns within Goodreads' top-rated books during the specified period. For a more interactive exploration of these findings, including detailed visualizations, the Power BI report can be accessed in the Files menu saved as A Decade of Literary Tastes( 2000-2013).pbix