In [1]:
%%capture output
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Recap of Project

1. I found a dataset on kaggle with information on the 100 most watched anime in the world. The data is public domain and consists of columns detailing the ratings, number of episodes, budget, release year, genre, and other categories. I want to visualize the most important information in this dataset. I am most interested in the ratings of the anime. Since there are so many anime, I decided to look at the average ratings for each designated genre. I am also curious which anime is the most watched in each country.

2. Below I have created a dashboard summarizing the most important details gleaned from this dataset. The visual is a combination of bar chart, scatterplot, density plot, pie chart, and map of the world. I focused primarily on understanding if budget influenced the rating of a specific anime and giving a proportion of the genre of anime. Finally I provided a map showcasing the most watched anime in each part of the world. I decided on easy-to-understand graphics and gave users the option to filter out certain genres when viewing ratings. I made sure to use a variety of colors to distinguish the anime genres. I also included a legend so the user can filter out certain genres when viewing a genre's rating compared to it's budget. The dashboard is also interactive. The user can mouse over different points in the plots to get a clear picture of what is represented. This is very important if the user has trouble reading smaller details. I ended up creating the entire visualization in kaggle, which caused some issues down-the-line. Unfortunately my visual ended up being rather rushed due to time constraints and my final product has significant room for improvement.

3. For my evaluation, I wasn't able to get any professional insight. Instead, I showed my final product to a fellow peer and two family members. I was able to talk directly to the person evaluating, which meant my evaluation approach was rather free-flowing. I made sure to ask certain questions to each person pertaining to how pleasing the visual was to look at, how confusing the visual was, and what insights did you get from looking at the visual. One of the biggest criticisms I recieved was how all over the place the visual was. It was hard for my participants to understand the purpose of the visualization. I think I was a bit too broad with my focus. I should have narrowed my goal down to only looking at one variable. I also recieved a lot of criticism on the map of the world and the legend. In regards to the map, I didn't have data on all countries so some countries were colorful and some were grey and empty. This contrast did not help the plot at all. The legend also had a lot of issues. It was hard for the participants to determine what the legend did and which plot the legend effected. Most of all, the visual was described as "too barebones."

4. Overall my visualization has many, many issues. The only positive I can say is that the color scheme isn't unpleasant. I recieved no complaints about the pie chart and the plots about ratings weren't terrible. I think creating this visualization entirely in kaggle was a huge problem. For future iterations, I would scrap the world map entirely and focus only on the anime ratings and genres. The bar chart of highest budget anime was very out of place and did not fit with the theme of the rest of the plots. I would also definitely need to spend more time and create a dashboard in tableau or another program that is specifically designed for dashboards. Creating a dashboard in kaggle just ended up feeling unfinished, messy, and unsatisfying. 


In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
df = pd.read_csv("/kaggle/input/100-most-watched-anime-in-the-world/most_watched_anime_dataset_100_entries.csv")

In [4]:
df.isnull().sum()

Anime Name                        10
Most Watched in Country           10
Ratings                           10
Number of Episodes                10
Animation Studio Name             10
Budget (in Million USD)           10
Release Year                      10
Genre                             10
Duration per Episode (minutes)    10
dtype: int64

In [5]:
df.dropna(inplace=True)

In [6]:
!pip install altair -- upgrade
!pip install vega_datasets

[31mERROR: Could not find a version that satisfies the requirement upgrade (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for upgrade[0m[31m


In [7]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import altair as alt
from plotly.subplots import make_subplots
import plotly.graph_objects as go

In [8]:
fig1 = px.density_contour(df, x="Ratings", y="Budget (in Million USD)", 
                         title="KDE Density Plot: Ratings vs Budget")

fig1.update_layout(
    xaxis_title="Ratings",
    yaxis_title="Budget (Million USD)",
    template="plotly_dark"
)

fig1.show()
#fig1.write_image("plotly_chart.png")

In [9]:
df1 = df.loc[df.groupby("Anime Name")["Budget (in Million USD)"].idxmax()]
df1 = df1[df1["Budget (in Million USD)"] > 50].sort_values(by="Budget (in Million USD)", ascending=False).head(5)

fig3 = px.bar(df1, x="Anime Name", y="Budget (in Million USD)", 
             title="Most High-Budget Anime",
             text="Budget (in Million USD)",  
             color="Anime Name",  
             color_discrete_sequence=px.colors.qualitative.Set1,
             hover_data={"Ratings": True, "Anime Name": False, "Budget (in Million USD)": False}  # Show only Ratings
            )  

fig3.update_traces(texttemplate='%{text:.1f}M', textposition="outside")

fig3.update_layout(
    xaxis_title="Anime Name",
    yaxis_title="Budget (Million USD)",
    font=dict(size=13),
    template="plotly_dark",
    showlegend=False,
    xaxis_tickangle=-15  
)

fig3.show()
#fig.write_image("plotly_chart.png")

In [10]:
fig4 = px.pie(df, names="Genre", 
             title="Genre Distribution of Anime",
             color_discrete_sequence=px.colors.qualitative.Set2, 
             hole=0.3)  

fig4.update_traces(textinfo="percent+label", 
                  pull=[0.05 if i == df['Genre'].value_counts().idxmax() else 0 for i in df['Genre'].unique()],  # Pops out the most common genre
                  marker=dict(line=dict(color='black', width=1)))  

fig4.update_layout(
    showlegend=False,
    font=dict(size=13),
    template="plotly_dark"  
)

fig4.show()
#fig.write_image("plotly_chart.png")

In [11]:
fig5 = px.box(df, x="Genre", y="Ratings", 
             title="Rating Distribution by Genre",
             color="Genre",
             color_discrete_sequence=px.colors.qualitative.Set2)

fig5.update_layout(
    xaxis_title="Genre",
    yaxis_title="Ratings",
    font=dict(size=13),
    template="plotly_dark",
    showlegend=False,
    xaxis_tickangle=-15  
)

fig5.show()
#fig.write_image("box_plot.png")

In [12]:
fig6 = px.scatter(df, x="Budget (in Million USD)", y="Ratings", 
                 color="Genre",  
                 hover_data=["Anime Name", "Animation Studio Name"],  
                 template="plotly_dark",  
                 opacity=0.8)  

fig6.update_layout(
    title="Budget vs Ratings of Anime",
    xaxis_title="Budget (Million USD)",
    yaxis_title="Ratings",
    font=dict(size=13),
    legend_title="Genre"
)

fig6.show()
#fig.write_image("plotly_chart.png")

In [13]:
import plotly.express as px

fig7 = px.choropleth(df, 
                    locations="Most Watched in Country",  
                    locationmode="country names",  
                    color="Anime Name",  # Color based on the most popular anime
                    hover_name="Most Watched in Country",  
                    title="Most Popular Anime by Country",  
                    color_discrete_sequence=px.colors.qualitative.Bold)  # Use distinct colors

fig7.update_layout(
    geo=dict(showcoastlines=True, showland=True, landcolor="lightgray"),  
    template="plotly_dark",  
    font=dict(size=12, color="white"),  
    title_font_size=20
)

fig7.show()
#fig.write_image("most_popular_anime_by_country.png")

In [14]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

fig = make_subplots(
    rows=3, cols=2,
    subplot_titles=[
        "High-Budget Anime", "Budget vs Ratings",
        "KDE Density: Ratings vs Budget", "Rating Boxplot by Genre",
        "Genre Distribution", "Most Popular Anime by Country"
    ],
    specs=[
        [{"type": "bar"}, {"type": "scatter"}],      
        [{"type": "contour"}, {"type": "box"}],          
        [{"type": "pie"}, {"type": "choropleth"}]  
    ],
    row_heights=[1, 1, 1.5],  # Increase height of the last row (Choropleth)
    column_widths=[1, 1.3]  # Make Column 2 (Choropleth) slightly wider
)

# Add all traces correctly
for trace in fig3.data:
    trace.showlegend = False  # Hides the legend for High-Budget Anime
    fig.add_trace(trace, row=1, col=1)  # High-Budget Anime
for trace in fig6.data:
    fig.add_trace(trace, row=1, col=2)  # Budget vs Ratings Scatter
fig.add_trace(fig1.data[0], row=2, col=1)  # KDE Density
for trace in fig5.data:
    trace.showlegend = False
    fig.add_trace(trace, row=2, col=2)  # Boxplot (Rating by Genre)=
for trace in fig4.data:
    trace.showlegend = False
    fig.add_trace(fig4.data[0], row=3, col=1)# Genre Pie Chart
for trace in fig7.data:
    fig.add_trace(trace, row=3, col=2)  # Choropleth (Increased Size)

# Add axis labels
fig.update_xaxes(title_text="Budget (Million USD)", row=1, col=2)
fig.update_yaxes(title_text="Ratings", row=1, col=2)

fig.update_xaxes(title_text="Budget (Million USD)", row=2, col=1)
fig.update_yaxes(title_text="Ratings", row=2, col=1)

#fig.update_xaxes(title_text="Genre", row=2, col=2)  # X-axis label for Rating Boxplot by Genre

# Update Layout
fig.update_layout(
    title=dict(
        text="Anime Data Dashboard",  # Title text
        x=0.5,  # Centers the title (0 is left, 1 is right)
        y=0.98,  # Adjust vertical position if needed
        xanchor='center',  
        yanchor='top',  
        font=dict(size=24)  # Increases title font size
    ),
    height=1100, width=1300,  # Increased height & width for better layout
    template="plotly_dark"
)

fig.update_layout(
    legend=dict(
        x=1.05,  # Moves legend further right
        y=0.5,  # Moves legend down
    )
)

fig.show()