# Notebook 1: Exploratory Data Analysis and Dataset Preparation

Welcome to Notebook 1 of this recommendation system project!

In this notebook, we will explore the anime datasets and perform data analysis.

Our goal in this notebook is to gain insights into the data, understand the distribution of anime titles, genres, user ratings, and other important features. We will visualize the data using various plots and charts to identify patterns and trends.

Once we have completed the data exploration, we will move forward to Notebook 2: Model Training.

[Click here to access Notebook 2: Model Training](https://www.kaggle.com/code/dbdmobile/anime-recommendation-2)

Note: It is recommended to run this notebook first to understand the data and perform necessary data preprocessing before proceeding to the model training phase.

Let's get started with the data exploration!

# Importing libraries

In [None]:
!pip install plotly

In [None]:
!pip install wordcloud

In [None]:
!pip install langdetect

In [None]:
# Reading Dataset
import numpy as np
import pandas as pd
# Visualization
import plotly.express as px
import plotly.graph_objects as go  # for 3D plot visualization
import plotly.figure_factory as ff
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from wordcloud import WordCloud
from langdetect import detect
from datetime import datetime

# Reading our Dataset

In [5]:
# Setting column display to 50
pd.set_option('display.max_columns', 50)

In [None]:
# Importing anime details dataframe
df_anime=pd.read_csv('/kaggle/input/myanimelist-dataset/anime-dataset-2023.csv')
print("Shape of the Dataset:",df_anime.shape)
df_anime.head(3)

In [None]:
# Importing user details dataframe
df_user=pd.read_csv('/kaggle/input/myanimelist-dataset/users-details-2023.csv')
print("Shape of the Dataset:",df_user.shape)
df_user.head()

In [None]:
# Importing user score dataframe
df_score=pd.read_csv('/kaggle/input/myanimelist-dataset/users-score-2023.csv')
print("Shape of the dataset:",df_score.shape)
df_score.head()

# Explorartory Data Analysis

## Data Exploration

#### Checking each dataframes
In order to gain a better understanding of the data, it is important to examine each DataFrame individually. This includes assessing its structure and identifying any missing values. We will begin this process by using the info() method, which provides a comprehensive overview of the DataFrame's columns and structure.

In [None]:
df_anime.info()

In [None]:
# Preprocessing Score column
df_anime['Score'].value_counts()

In [11]:
scores = df_anime['Score'][df_anime['Score'] != 'UNKNOWN']
scores = scores.astype('float')
score_mean= round(scores.mean() , 2)

In [12]:
df_anime['Score'] = df_anime['Score'].replace('UNKNOWN', score_mean)
df_anime['Score'] = df_anime['Score'].astype('float64')

In [None]:
# Processing Ranked column
df_anime['Rank'].value_counts()

In [14]:
df_anime['Rank'] = df_anime['Rank'].replace('UNKNOWN', np.nan)
df_anime['Rank'] = df_anime['Rank'].astype('float64')

In [None]:
df_user.info()

In [None]:
df_score.isnull().sum()

## Data Visualization

### For Anime Dataset

In [None]:
# Count the number of anime titles by type
type_counts = df_anime['Type'].value_counts()

# Create a bar chart
fig = px.bar(type_counts, x=type_counts.index, y=type_counts.values, color=type_counts.index, labels={'x':'Anime Type', 'y':'Count'}, 
             title='Count of Anime Titles by Type')

fig.show()

In [None]:
# Filter out anime titles with popularity value 0
df_valid_popularity = df_anime[df_anime['Popularity'] > 0]

# Sort the dataframe by popularity and select the top 15
top_10_popular = df_valid_popularity.sort_values(by='Popularity', ascending=True).head(15)

# Create a bar chart with different colors for each bar
fig = px.bar(top_10_popular, x='Name', y='Popularity',
             labels={'Name': 'Anime Title', 'Popularity': 'Popularity'},
             title='Top 15 Most Popular Animes',
             color='Name')
# Note:- Less the popularity no. is more popular is the anime.
fig.show()

In [None]:
# Create a scatter plot
fig = px.scatter(df_anime, x='Score', y='Members', 
                 labels={'Score':'Overall Score', 'Members':'Number of Scores'}, 
                 title='Anime Score vs. Number of Scores')

fig.show()

In [None]:
# Sort the dataframe by the number of users who have scored the anime
top_15_scored = df_anime.sort_values(by='Members', ascending=False).head(15)

# Create a bar chart
fig = px.bar(top_15_scored, x='Name', y='Members', labels={'Members':'Number of Users', 'Name':'Anime Title'},color='Name',
             title='Top 15 Animes by Number of Users')

fig.show()

In [None]:
# Split the genres and count their occurrences
genre_counts = df_anime[df_anime['Genres'] != "UNKNOWN"]['Genres'].apply(lambda x: x.split(', ')).explode().value_counts()

# Create a bar chart
fig = px.bar(genre_counts, x=genre_counts.index, y=genre_counts.values,
             labels={'x':'Genre', 'y':'Count'},
             title='Count of Anime Titles by Genre',
             color=genre_counts.index)

fig.show()

In [None]:
# Select the top 20 genres
top_20_genres = genre_counts.head(20)

# Create a bar chart with custom style
fig = px.bar(top_20_genres, x=top_20_genres.index, y=top_20_genres.values,
             labels={'x':'Genre', 'y':'Count'},
             title='Top 20 Most Popular Genres In The Anime Industry')

# Customize the bar chart appearance
fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',
                  marker_line_width=1.5, opacity=0.8)

fig.update_layout(xaxis_tickangle=-45, xaxis=dict(tickfont=dict(size=12)),
                  yaxis=dict(titlefont=dict(size=14)))

fig.show()

In [None]:
# Create the plotly figure
fig = go.Figure(data=[go.Pie(labels=top_20_genres.index, values=top_20_genres.values,
                             hole=0.6, hoverinfo='label+percent', textinfo='value')])

fig.update_layout(title='Distribution of Anime Genres',
                  legend=dict(font=dict(size=12), title='Genre'),
                  annotations=[dict(text='Genre', x=0.5, y=0.5, font_size=20, showarrow=False)])

fig.show()

In [None]:
# Concatenate all genre values into a single string
genre_text = ' '.join(df_anime[df_anime['Genres'] != "UNKNOWN"]['Genres'].dropna())

# Create a WordCloud object
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(genre_text)

# Convert the WordCloud object to an image
wordcloud_image = wordcloud.to_image()

# Create a Plotly figure to display the WordCloud image
fig = go.Figure(go.Image(z=wordcloud_image))
fig.update_layout(title='Word Embedding Plot - Genre')
fig.show()

In [None]:
# Create a violin plot for anime popularity by type
fig = px.violin(df_anime, x='Type', y='Popularity', 
                labels={'Type':'Anime Type', 'Popularity':'Popularity'},
                title='Distribution of Anime Popularity by Type',
                color='Type')

fig.show()

In [None]:
# Create a box plot for anime scores by type
fig = px.box(df_anime, x='Type', y='Score', 
             labels={'Type':'Anime Type', 'Score':'Score'},
             title='Distribution of Anime Scores by Type',
             color='Type')

fig.show()

In [None]:
# Create a bubble chart to visualize the relationship between popularity and scored_by
fig = px.scatter(df_anime, x='Popularity', y='Members', size='Score', color='Type',
                 labels={'Popularity':'Popularity', 'Members':'Number of Scores'},
                 title='Relationship between Popularity, Number of Scores, and Score')

fig.show()

In [None]:
# Create a 3D scatter plot to visualize the relationship between popularity, scored_by, and score
fig = go.Figure(data=go.Scatter3d(
    x=df_anime['Popularity'],
    y=df_anime['Members'],
    z=df_anime['Score'],
    mode='markers',
    marker=dict(
        size=5,
        color=df_anime['Rank'],
        colorscale='Viridis',
        opacity=0.8
    ),
    text=df_anime['Name'],
    hovertemplate='<b>Title</b>: %{text}<br><b>Popularity</b>: %{x}<br><b>Scored By</b>: %{y}<br><b>Score</b>: %{z}',
))

fig.update_layout(scene=dict(
    xaxis_title='Popularity',
    yaxis_title='Scored By',
    zaxis_title='Score'
), title='Relationship between Popularity, Scored By, and Score')

fig.show()

In [None]:
# Create a correlation matrix
correlation_matrix = df_anime[['Score', 'Popularity', 'Rank']].corr()

# Create a heatmap of the correlation matrix
fig = ff.create_annotated_heatmap(z=correlation_matrix.values,
                                  x=list(correlation_matrix.columns),
                                  y=list(correlation_matrix.index),
                                  colorscale='Viridis')
fig.update_layout(title='Correlation Matrix')
fig.show()

In [None]:
df_anime['Licensors'].value_counts()

In [None]:
# Create a list of all the individual licensors
licensors_list = [licensor.strip() for licensors in df_anime[df_anime['Licensors']!="UNKNOWN"]['Licensors'].str.split(',') for licensor in licensors]

# Count the occurrences of each licensor
licensor_counts = pd.Series(licensors_list).value_counts()

# Filter the licensor_counts series to exclude 'Unknown'
filtered_licensor_counts = licensor_counts[licensor_counts.index != 'Unknown']

# Select the top 10 licensors
top_15_licensors = filtered_licensor_counts.head(10)

# Create the bar plot using Plotly
fig = px.bar(top_15_licensors, x=top_15_licensors.index, y=top_15_licensors.values, color=top_15_licensors.index)

# Customize the plot
fig.update_layout(
    title='Top 10 Anime Licensors',
    xaxis_title='Licensors',
    yaxis_title='Count',
    xaxis_tickangle=-45
)

# Show the plot
fig.show()

In [None]:
df_anime['Premiered'].value_counts()

In [33]:
# Function to extract the season and year from the premiered string
def extract_season_year(premiered):
    if premiered == 'UNKNOWN':
        return None, None
    else:
        season, year = premiered.split()
        return season, int(year)

# Apply the function to extract the season and year from the "Premiered" column
season_year = df_anime['Premiered'].map(extract_season_year)
premiered_season = season_year.apply(lambda x: x[0])
premiered_Year = season_year.apply(lambda x: x[1])

In [None]:
# Filter out None values from premiered_season
filtered_premiered_season = premiered_season.dropna()

# Count the occurrences of each season
season_counts = filtered_premiered_season.value_counts()

# Create the pie plot
fig = go.Figure(data=go.Pie(
    labels=season_counts.index,
    values=season_counts.values,
    hole=0.4,  # Add a donut hole in the center
    hoverinfo='label+percent',  # Display label and percentage on hover
    textinfo='value',  # Display count value as text inside each slice
    textfont=dict(size=14),  # Set the text font size
    marker=dict(
        colors=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd'],  # Custom color palette
        line=dict(color='#ffffff', width=2)  # Set the color and width of the slice borders
    )
))

# Set the title and font style for the plot
fig.update_layout(
    title='Distribution of Premiered Seasons',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555')
)

fig.show()

In [None]:
# Filter out None values from premiered_Year
filtered_premiered_year = premiered_Year.dropna()

# Count the occurrences of each year
year_counts = filtered_premiered_year.value_counts()

# Sort the years in ascending order
sorted_years = sorted(year_counts.index)

# Create the bar plot
fig = go.Figure(data=go.Bar(
    x=sorted_years,
    y=year_counts[sorted_years],
    marker=dict(color='#1f77b4'),  # Set the color of the bars
))

# Set the title and axis labels
fig.update_layout(
    title='Number of Animes Premiered by Year',
    xaxis_title='Year',
    yaxis_title='Number of Animes',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555')
)

fig.show()

In [None]:
# Count the occurrences of each studio
studio_counts = df_anime['Studios'].value_counts()

# Filter the studio_counts series to exclude 'Unknown'
studio_counts = studio_counts[studio_counts.index != 'UNKNOWN']

# Select the top 10 studios with the highest number of animes
top_studios = studio_counts.head(10)

# Create the bar plot
fig = go.Figure(data=go.Bar(
    x=top_studios.index,
    y=top_studios.values,
    marker=dict(color=top_studios.values, colorscale='Blues'),  # Set the color of the bars using a colorscale
    text=top_studios.values,  # Set the text to be displayed on hover
    hovertemplate='Studio: %{x}<br>Number of Animes: %{y}<extra></extra>',  # Customize the hover template
))

# Set the title and axis labels
fig.update_layout(
    title='Number of Animes by Studio (Top 10)',
    xaxis_title='Studios',
    yaxis_title='Number of Animes',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555'),
    plot_bgcolor='rgba(0, 0, 0, 0)'  # Set the background color to transparent
)

fig.show()

In [None]:
# Count the occurrences of each source
source_counts = df_anime['Source'].value_counts()

# Filter the source_counts series to exclude 'Unknown'
source_counts = source_counts[source_counts.index != 'UNKNOWN']

# Create the horizontal bar chart
fig = go.Figure(data=go.Bar(
    x=source_counts.values,
    y=source_counts.index,
    orientation='h',  # Set the orientation to horizontal
    marker=dict(color=source_counts.values, colorscale='Viridis'),  # Set the color of the bars using a colorscale
    text=source_counts.values,  # Set the text to be displayed on hover
    hovertemplate='Source: %{y}<br>Number of Animes: %{x}<extra></extra>',  # Customize the hover template
))

# Set the title and axis labels
fig.update_layout(
    title='Number of Animes by Source',
    xaxis_title='Number of Animes',
    yaxis_title='Source',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555')
)

fig.show()

In [None]:
# Sort the DataFrame by the 'Favorites' column in descending order
sorted_df = df_anime.sort_values('Favorites', ascending=False)

# Select the top 10 most favorited anime
top_favorites = sorted_df.head(10)

# Create the horizontal bar chart
fig = go.Figure(data=go.Bar(
    x=top_favorites['Favorites'],
    y=top_favorites['Name'],
    orientation='h',  # Set the orientation to horizontal
    marker=dict(color='#1f77b4'),  # Set the color of the bars
    text=top_favorites['Favorites'],  # Set the text to be displayed on hover
    hovertemplate='Anime: %{y}<br>Favorites: %{x}<extra></extra>',  # Customize the hover template
))

# Set the title and axis labels
fig.update_layout(
    title='Top 10 Most Favorited Anime',
    xaxis_title='Number of Favorites',
    yaxis_title='Anime',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555')
)

fig.show()

In [None]:
# Creating the treemap plot too fr the above code snippet
fig = go.Figure(go.Treemap(
    labels=top_favorites['Name'],
    parents=[""] * len(top_favorites),
    values=top_favorites['Favorites'],
    hovertemplate='Name: %{label}<br>Favorites: %{value}',
))

# Set the color scale
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd',
          '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
fig.update_traces(marker=dict(colors=colors))

# Set the title
fig.update_layout(
    title='Top 10 Most Favorited Anime (Treemap)',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555'),
)

fig.show()

In [None]:
# Count the occurrences of each rating
rating_counts = df_anime[df_anime['Rating']!="UNKNOWN"]['Rating'].value_counts()

# Filter the rating_counts series to exclude 'Unknown'
rating_counts = rating_counts[rating_counts.index != 'Unknown']

# Create the pie plot
fig = go.Figure(data=go.Pie(
    labels=rating_counts.index,
    values=rating_counts.values,
    hoverinfo='label+percent',
    textinfo='value',
    textfont=dict(size=12),
    marker=dict(colors=['#1f77b4']),  # Set the same color for all segments
    hole=0.6,  # Set the size of the inner hole to create a donut shape
))

# Set the title
fig.update_layout(
    title='Distribution of Anime Ratings',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555'),
)

fig.show()

In [41]:
# Function to map abbreviated language codes to full names
def map_language_code(code):
    language_mapping = {
        'ja': 'Japanese',
        'ko': 'Korean',
        'zh-cn': 'Simplified Chinese',
        'de': 'German',
        'vi': 'Vietnamese',
        'en': 'English',
        'zh-tw': 'Traditional Chinese'
    }
    return language_mapping.get(code, 'Other')

# Function to detect language
def detect_language(name):
    try:
        return detect(name)
    except:
        return None  # Return None for rows where detection fails

In [None]:
# Apply language detection to the 'Other name' column
Detected_Language = df_anime[df_anime['Other name']!="UNKNOWN"]['Other name'].apply(detect_language)

# Drop rows where language detection failed (i.e., where Detected_Language is None)
Detected_Language = Detected_Language.dropna()

# Count the occurrences of each language
language_counts = Detected_Language.value_counts()

# Map abbreviated language codes to full names for plotting
language_counts.index = language_counts.index.map(map_language_code)

fig = go.Figure(data=go.Bar(
    x=language_counts.values,
    y=language_counts.index,
    orientation='h',
    marker=dict(color=language_counts.values, colorscale='Viridis'),
    text=language_counts.values,  # Set the text to be displayed on hover
    hovertemplate='Native Language: %{y}<br>Number of Animes: %{x}<extra></extra>',
))

# Set the title and axis labels
fig.update_layout(
    title='Count of Animes based on its Native Name',
    xaxis_title='Number of Animes',
    yaxis_title='Native Language',
    title_font=dict(size=20),
    font=dict(size=12, color='#555555')
)

fig.show()

### For User Dataset

In [None]:
# Distribution of gender
# Count the occurrences of each gender
gender_counts = df_user['Gender'].value_counts(dropna=True)

# Define custom colors for the pie slices
colors = ['rgb(0, 123, 255)', 'rgb(255, 65, 54)', 'rgb(255, 187, 0)', 'rgb(125, 125, 125)']

# Create the pie plot
fig = go.Figure()

fig.add_trace(go.Pie(
    labels=gender_counts.index,
    values=gender_counts.values,
    hole=0.3,
    marker=dict(colors=colors, line=dict(color='#FFFFFF', width=2)),
    hoverinfo='label+percent',
    hovertemplate='<b>%{label}</b><br>%{percent}',
    textinfo='value',
    textposition='inside',
    sort=False
))

# Customize the layout
fig.update_layout(
    title='Gender Distribution',
    title_x=0.5,
    uniformtext_minsize=12,
    uniformtext_mode='hide',
    showlegend=False,
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    margin=dict(l=20, r=20, t=100, b=20),
)

# Show the plot
fig.show()

In [None]:
df_user['Birthday'].value_counts(dropna=True)

In [None]:
# Age Distribution
# Convert birthday to age
def calculate_age(birth_date):
    if birth_date != 'NaN':
        try:
            birth_year = int(birth_date.split('-')[0])
            today_year = datetime.utcnow().year
            age = today_year - birth_year
            if age >= 10 and age < 60:  # Valid age range (modify as needed)
                return age
            else:
                return None
        except:
            return None
    return None

# Apply age calculation to the 'Birthday' column
Age = df_user['Birthday'].dropna().apply(calculate_age)

# Create the histogram
fig = px.histogram(Age, nbins=20, title='Age Distribution', labels={'value': 'Age', 'count': 'Count'})

# Customize the layout
fig.update_layout(
    xaxis=dict(title='Age'),
    yaxis=dict(title='Count'),
    bargap=0.1,
    showlegend=False,
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    margin=dict(l=50, r=20, t=100, b=50),
)

# Show the plot
fig.show()

In [None]:
# Location analysis
# Count the occurrences of each location
location_counts = df_user['Location'].value_counts()

# Create a bar chart
fig = px.bar(location_counts.head(20),
             x=location_counts.head(20).index,
             y=location_counts.head(20).values,
             labels={'x': 'Location', 'y': 'Count'},
             title='Top 20 User Locations',
             color=location_counts.head(20).index)

# Customize the layout
fig.update_layout(
    xaxis=dict(title='Location'),
    yaxis=dict(title='Count'),
    bargap=0.1,
    showlegend=False,
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    margin=dict(l=50, r=20, t=100, b=50),
)

# Show the plot
fig.show()

In [None]:
# Define the metrics you want to consider for top users
metrics = ['Days Watched']
# You can change the metrics to 'Mean Score', 'Watching', 'Completed', 'On Hold', 'Dropped', 'Plan to Watch' or include
# a combination of them to see the comparison b/w them such as Watching vs. Completed

# Initialize an empty DataFrame to store the top users
top_users = pd.DataFrame()

for metric in metrics:
    top_users = pd.concat([top_users, df_user.nlargest(15, metric)], ignore_index=True)

# Create a bar chart for the top users based on the chosen metrics
fig = px.bar(top_users, x='Username', y=metrics, barmode='group',
             title='Top 15 Users Based on some Metrics',
             labels={'value': 'Count', 'variable': 'Metric'},
             color_discrete_sequence=px.colors.qualitative.Plotly)

# Customize the layout
fig.update_layout(
    xaxis=dict(title='Users'),
    yaxis=dict(title='Count'),
    legend_title_text='Metric',
    paper_bgcolor='rgba(0,0,0,0)',
    plot_bgcolor='rgba(0,0,0,0)',
    margin=dict(l=50, r=20, t=100, b=50),
)

# Show the plot
fig.show()

In [None]:
# Function to get watching behavior of a specific user
def get_watching_behavior(username):
    user_data = df_user[df_user['Username'] == username]
    if len(user_data) == 0:
        return None
    watching = user_data['Watching'].values[0]
    on_hold = user_data['On Hold'].values[0]
    completed = user_data['Completed'].values[0]
    dropped = user_data['Dropped'].values[0]
    plan_to_watch = user_data['Plan to Watch'].values[0]
    return watching, on_hold, completed, dropped, plan_to_watch

# Get the input username from the user
username_input = "midnightq2"                           #  input("Enter the username: ")

# Get the watching behavior of the user
watching, on_hold, completed, dropped, plan_to_watch = get_watching_behavior(username_input)

# Create the pie chart
fig = go.Figure(data=[go.Pie(labels=['Watching', 'On Hold', 'Completed', 'Dropped', 'Plan to Watch'],
                             values=[watching, on_hold, completed, dropped, plan_to_watch],
                             hole=0.3,
                             hoverinfo='label+percent',
                             textinfo='value',
                             textfont_size=15)])

# Customize the layout
fig.update_layout(title=f"Watching Behavior of {username_input}",
                  showlegend=True,
                  paper_bgcolor='rgba(0,0,0,0)',
                  plot_bgcolor='rgba(0,0,0,0)')

# Show the plot
fig.show()

In [None]:
# Create a correlation matrix
correlation_matrix = df_user[['Days Watched', 'Mean Score', 'Total Entries', 'Rewatched', 'Episodes Watched']].corr()

# Create a heatmap of the correlation matrix
fig = ff.create_annotated_heatmap(z=correlation_matrix.values,
                                  x=list(correlation_matrix.columns),
                                  y=list(correlation_matrix.index),
                                  colorscale='Viridis')
fig.update_layout(title='Correlation Matrix')
fig.show()

### For User Score Dataset

In [None]:
# Animes that was watched by most users in the df_score dataset

# Get the count of users who watched each anime title
anime_watch_count = df_score.groupby('Anime Title')['user_id'].nunique().reset_index()
anime_watch_count = anime_watch_count.rename(columns={'user_id': 'User Count'})

# Sort the dataframe in descending order by the number of users
anime_watch_count = anime_watch_count.sort_values(by='User Count', ascending=False)

# Select the top 10 anime titles with the highest number of users
top_n = 10
top_anime_watch_count = anime_watch_count.head(top_n)

# Define a colorful color palette
color_palette = px.colors.qualitative.Plotly

# Create the bar chart with colorful bars
fig = px.bar(top_anime_watch_count, x='User Count', y='Anime Title', orientation='h',
             title=f'Top {top_n} Anime Titles Watched by Most Users',
             labels={'User Count': 'Number of Users', 'Anime Title': 'Anime Title'},
             color='User Count',
             color_discrete_sequence=color_palette)

# Customize the layout
fig.update_layout(showlegend=False, paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)',
                  margin=dict(l=50, r=20, t=100, b=50))

# Show the plot
fig.show()

## That concludes our journey through the world of anime data analysis. I hope you found this notebook insightful and enjoyable. Happy analyzing and keep exploring the fascinating world of anime!
## Let's move on with our model training in notebook2!