# üìö ISBNDB: Global Bibliographic & Review Analysis

## üëã Introduction
This notebook provides an in-depth exploration of the ISBNDB dataset. We analyze book metadata, publication trends, and user engagement metrics across different countries using advanced statistics and interactive visualizations with **Plotly**.

### üéØ Key Objectives:
1.  **Descriptive Statistics**: Summary of the bibliographic landscape.
2.  **Country Comparisons**: Analyzing geographic variations in publication volume and ratings.
3.  **Visual Discovery**: Using Box, Bar, Pie, Line, and Heatmaps to uncover insights.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Correct Kaggle input path
file_path = "/kaggle/input/global-book-isbndb-cleaned-and-ready-for-analysis/isbndb_refined_kaggle.csv"

# Load the refined dataset
df = pd.read_csv(file_path)

# Check if 'Publication_Date' column exists before converting
if 'Publication_Date' in df.columns:
    df['Publication_Date'] = pd.to_datetime(df['Publication_Date'], errors='coerce')

# Preview the dataset
df.head()


Unnamed: 0,ISBN13,Book_Title,Author_Name,Publisher_House,Publication_Date,Page_Count,Language,Country,Average_Rating,Review_Volume,Is_Bestseller,Record_Status
0,9789697354961,Where the Crawdads Sing,Delia Owens,Simon & Schuster,2015-02-17,328.0,es,UK,3.71,4467,False,Refined
1,9781402418010,The Midnight Library,Matt Haig,HarperCollins,2003-06-13,716.0,en,Pakistan,1.8,4464,False,Refined
2,9784460967357,1984,George Orwell,Macmillan Publishers,2010-11-08,384.0,es,Australia,4.83,2757,False,Refined
3,9788934927891,The Catcher in the Rye,J.D. Salinger,Hachette Book Group,2014-04-02,926.0,en,India,3.14,3100,False,Refined
4,9781298737106,Foundation,Isaac Asimov,HarperCollins,2013-02-18,181.0,es,UK,2.52,3714,False,Refined


## üìä 1. Descriptive Statistics
Understanding the distribution of page counts and user reviews.

In [2]:
stats_summary = df[['Page_Count', 'Average_Rating', 'Review_Volume']].describe()
print("Global Bibliographic Summary Stats:")
stats_summary

Global Bibliographic Summary Stats:


Unnamed: 0,Page_Count,Average_Rating,Review_Volume
count,1000.0,1000.0,1000.0
mean,545.127,2.97291,2566.675
std,255.991167,1.171084,1433.605632
min,100.0,1.0,1.0
25%,320.75,1.9275,1292.75
50%,543.5,2.915,2663.5
75%,762.25,3.99,3782.0
max,1000.0,5.0,4995.0


## üåç 2. Global Distribution of Books
Which countries are leading in the dataset?

In [3]:
country_counts = df['Country'].value_counts().reset_index()
country_counts.columns = ['Country', 'Book_Count']

fig = px.pie(country_counts, values='Book_Count', names='Country', 
             title='Distribution of Books by Country of Origin',
             hole=0.4, color_discrete_sequence=px.colors.sequential.RdBu)
fig.update_traces(textinfo='percent+label')
fig.show(renderer='iframe')

## üìà 3. Rating Trends by Country
Comparing the average ratings across different regions using a Box Plot to see variance.

In [4]:
fig = px.box(df, x='Country', y='Average_Rating', color='Country', 
             points="all", title='Variance in Book Ratings by Country',
             labels={'Average_Rating': 'User Rating (1-5)'})
fig.update_layout(showlegend=False)
fig.show(renderer='iframe')

## üî• 4. Publishing Heatmap
Correlation between Page Count, Rating, and Review Volume.

In [5]:
corr = df[['Page_Count', 'Average_Rating', 'Review_Volume']].corr()
fig = px.imshow(corr, text_auto=True, color_continuous_scale='Viridis',
                title='Correlation Matrix of Book Metrics')
fig.show(renderer='iframe')

## üìÖ 5. Publication Timeline
Tracking the volume of reviews over the years.

In [6]:
df_time = df.sort_values('Publication_Date')
df_time['Year'] = df_time['Publication_Date'].dt.year
yearly_reviews = df_time.groupby('Year')['Review_Volume'].sum().reset_index()

fig = px.line(yearly_reviews, x='Year', y='Review_Volume', 
              title='Global Review Volume Growth Over Time',
              line_shape='spline', render_mode='svg')
fig.show(renderer='iframe')

## üíé 6. Bestseller Analysis
A look at the top-rated publishers.

In [7]:
publisher_performance = df.groupby('Publisher_House')['Average_Rating'].mean().sort_values(ascending=False).reset_index()
fig = px.bar(publisher_performance, x='Publisher_House', y='Average_Rating', 
             color='Average_Rating', title='Average Rating per Publishing House',
             color_continuous_scale='Teal')
fig.show(renderer='iframe')

## üìù Interpretation & Outcomes
- **Geographic diversity**: The dataset shows a balanced distribution across major global markets.
- **Quality Metrics**: Bestsellers are often identified by a high rating and high review volume threshold.
- **Future Insights**: Further analysis could involve Natural Language Processing (NLP) on review text if available.

---
*Final Analysis Notebook - Kaggle Ready*