# ** Data Analysis for Amazon Top 50 Bestselling Books 2009 - 2019 **

![The Amazon logo][1]
[1]: https://upload.wikimedia.org/wikipedia/commons/d/db/Amazon_Books_logo.png


## Introduction 
The data has been collated into a csv file and is split into the following categories/columns: - 

* Name of Book
* Author of Book 
* User rating 
* Number of Book Reviews
* Price of Book
* Year book was included in bestsellers list
* Genre (fiction/non-fiction)

### The following areas of analysis were explored in this dataset:- 
1. Top 10 fiction books based on user ratings
2. Top 10 non-fiction books based on user ratings
3. Top 10 producing Authors based on number of books
4. Price range of bestsellers
5. Top 5 books across the board based on user ratings 
6. Top books based on the number of reviews given 

In [None]:
#import libraries 
import pandas as pd
import plotly as py
#import plotly.offline as offline 

#offline.init_notebook_mode(connected=True)


In [None]:
#read the data from csv file and display first five lines of data
data=pd.read_csv('../input/amazon-top-50-bestselling-books-2009-2019/bestsellers with categories.csv')
data.head()

In [None]:
#display fiction books only in order of user ratings(high to low)
fiction_data=data[(data.Genre=='Fiction')]
fiction_data.sort_values(by='User Rating', ascending=False).drop_duplicates(subset=['Name']).head(10).reset_index()


In [None]:
#display non-fiction books in order of user ratings(high to low)
nonfiction_data=data[(data.Genre=='Non Fiction')]
nonfiction_data.sort_values(by='User Rating', ascending=False).drop_duplicates(subset=['Name']).head(10).reset_index()


In [None]:
#count the number of fiction and non-fiction books 
genredf=data.groupby('Genre').count().drop(columns=['Author','User Rating','Reviews','Price','Year']).rename(columns={'Name':'Quantity'}).reset_index()
genredf

In [None]:
#import library for producing interactive graphs 
import plotly.graph_objs as go 

In [None]:
#create a trace for the graph object 
trace1=go.Pie(labels=genredf['Genre'],
             values=genredf['Quantity'],
              
            title='Genre of Books Between 2009 and 2019',
             
             marker=dict(colors=['skyblue','pink'],
                        line=dict(color='dimgrey',width=2)
                        )
             )


In [None]:
fig=go.Figure(trace1)
fig.show()

In [None]:
#list of authors 
authordf=data.groupby('Author').count().drop_duplicates(subset=['Name']).rename(columns={'Name':'Quantity'}).drop(columns=['User Rating','Reviews','Price','Year','Genre'])
authordf

In [None]:
#top 10 authors with the most number of books in bestellers list
top10auth=authordf.sort_values(by='Quantity', ascending=False).head(10).reset_index()
top10auth

In [None]:
trace2=go.Bar(x=top10auth['Author'],
             y=top10auth['Quantity'])

In [None]:
df2=[trace2]

In [None]:
layout=go.Layout(title='Top 10 Authors with most number of books in Bestsellers list',
       xaxis=dict(title='Author'),
       yaxis=dict(title='Quantity'))

In [None]:
fig=go.Figure(data=df2, layout=layout)

In [None]:
fig.show()

In [None]:
#price range of books 
pricedf=data['Price'].describe().to_frame().drop(index='count')
pricedf

In [None]:
trace3=go.Box(y=pricedf['Price'],
             name='Bestsellers')

In [None]:
df3=[trace3]

In [None]:
layout=go.Layout(title='Price Range of Bestsellers',
                yaxis=dict(title='Price, £'))

In [None]:
fig=go.Figure(data=df3, layout=layout)

In [None]:
fig.show()

In [None]:
#books by user rating
userratdf=data.groupby(['Name', 'User Rating']).count()
ratings_data=userratdf.sort_values(by='User Rating', ascending=False).head(5).reset_index()
ratings_data

In [None]:
trace4=go.Bar(x=ratings_data['Name'],
             y=ratings_data['User Rating'],
             marker_color='Violet')

In [None]:
df4=[trace4]

In [None]:
layout=go.Layout(title='Top 5 books with highest User Ratings',
                xaxis=dict(title='Name of Book'),
                yaxis=dict(title='User Rating'))

In [None]:
fig=go.Figure(data=df4,layout=layout)
fig.show()

In [None]:
#books by number of reviews
reviews_df=data.groupby(['Name', 'Reviews']).count()
reviews_data=reviews_df.sort_values(by='Reviews', ascending=False).head(10).reset_index()
reviews_data

In [None]:
trace5=go.Bar(x=reviews_data['Name'],
             y=reviews_data['Reviews'],
             marker_color='Salmon')

In [None]:
df5=[trace5]

In [None]:
layout=go.Layout(title='Top 10 Books with highest Number of Customer Reviews',
       xaxis=dict(title='Name of Book'),
       yaxis=dict(title='Number of Book Reviews'))

In [None]:
fig=go.Figure(data=df5, layout=layout)
fig.show()