# Anime Recommendation 

In [1]:
# Reading Dataset
import numpy as np
import pandas as pd

# Visualization
import plotly.express as px
import plotly.graph_objects as go  # Library for 3D plot visualization
import plotly.figure_factory as ff
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
from wordcloud import WordCloud

### Basic libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings(action='ignore')

# Data Preprocessing
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder

# Model Training
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
import tensorflow as tf

## Import necessary modules
from tensorflow.keras.layers import Input, Embedding, Dot, Flatten, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

# Additional Libraries for Visualization
import matplotlib.pyplot as plt
import seaborn as sns


## Exploratory Data Analysis (EDA)

In [2]:
# Load the datasets
anime_data = pd.read_csv('anime-filtered.csv')
user_data = pd.read_csv('user-filtered.csv')

### Explore the anime dataset

In [3]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)

# Explore the anime dataset
# Display the first few rows of the dataset
print("Anime Dataset:")
anime_data.head()

Anime Dataset:


Unnamed: 0,anime_id,Name,Score,Genres,English name,Japanese name,sypnopsis,Type,Episodes,Aired,Premiered,Producers,Licensors,Studios,Source,Duration,Rating,Ranked,Popularity,Members,Favorites,Watching,Completed,On-Hold,Dropped
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space",Cowboy Bebop,カウボーイビバップ,"In the year 2071, humanity has colonized sever...",TV,26,"Apr 3, 1998 to Apr 24, 1999",Spring 1998,Bandai Visual,"Funimation, Bandai Entertainment",Sunrise,Original,24 min. per ep.,R - 17+ (violence & profanity),28.0,39,1251960,61971,105808,718161,71513,26678
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space",Cowboy Bebop:The Movie,カウボーイビバップ 天国の扉,"other day, another bounty—such is the life of ...",Movie,1,"Sep 1, 2001",Unknown,"Sunrise, Bandai Visual",Sony Pictures Entertainment,Bones,Original,1 hr. 55 min.,R - 17+ (violence & profanity),159.0,518,273145,1174,4143,208333,1935,770
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen",Trigun,トライガン,"Vash the Stampede is the man with a $$60,000,0...",TV,26,"Apr 1, 1998 to Sep 30, 1998",Spring 1998,Victor Entertainment,"Funimation, Geneon Entertainment USA",Madhouse,Manga,24 min. per ep.,PG-13 - Teens 13 or older,266.0,201,558913,12944,29113,343492,25465,13925
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",Witch Hunter Robin,Witch Hunter ROBIN (ウイッチハンターロビン),ches are individuals with special powers like ...,TV,26,"Jul 2, 2002 to Dec 24, 2002",Summer 2002,"TV Tokyo, Bandai Visual, Dentsu, Victor Entert...","Funimation, Bandai Entertainment",Sunrise,Original,25 min. per ep.,PG-13 - Teens 13 or older,2481.0,1467,94683,587,4300,46165,5121,5378
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",Beet the Vandel Buster,冒険王ビィト,It is the dark century and the people are suff...,TV,52,"Sep 30, 2004 to Sep 29, 2005",Fall 2004,"TV Tokyo, Dentsu",Unknown,Toei Animation,Manga,23 min. per ep.,PG - Children,3710.0,4369,13224,18,642,7314,766,1108


In [4]:
# Check the shape of the dataset
print("Shape of Anime Dataset:")
print(anime_data.shape)

Shape of Anime Dataset:
(14952, 25)


In [5]:
# Check the data types of each column
print("Data Types of Anime Dataset:")
print(anime_data.dtypes)

Data Types of Anime Dataset:
anime_id           int64
Name              object
Score            float64
Genres            object
English name      object
Japanese name     object
sypnopsis         object
Type              object
Episodes          object
Aired             object
Premiered         object
Producers         object
Licensors         object
Studios           object
Source            object
Duration          object
Rating            object
Ranked           float64
Popularity         int64
Members            int64
Favorites          int64
Watching           int64
Completed          int64
On-Hold            int64
Dropped            int64
dtype: object


From information above we can see there is some columns that should be float64/int64 but it shows object (epsiodes). Maybe this because the anime is not finish yet (atleast in this dataset).

In [6]:
# Check for missing values
print("Missing Values in Anime Dataset:")
print(anime_data.isnull().sum())


Missing Values in Anime Dataset:
anime_id            0
Name                0
Score               0
Genres              0
English name        0
Japanese name       0
sypnopsis        1350
Type                0
Episodes            0
Aired               0
Premiered           0
Producers           0
Licensors           0
Studios             0
Source              0
Duration            0
Rating              0
Ranked           1721
Popularity          0
Members             0
Favorites           0
Watching            0
Completed           0
On-Hold             0
Dropped             0
dtype: int64


### Explore the user dataset

In [7]:

# Display the first few rows of the dataset
print("User Dataset:")
user_data.head()


User Dataset:


Unnamed: 0,user_id,anime_id,rating
0,0,67,9
1,0,6702,7
2,0,242,10
3,0,4898,0
4,0,21,10


In [8]:
# Check the shape of the dataset
print("Shape of User Dataset:")
print(user_data.shape)


Shape of User Dataset:
(109224747, 3)


In [9]:
# Check the data types of each column
print("Data Types of User Dataset:")
print(user_data.dtypes)

Data Types of User Dataset:
user_id     int64
anime_id    int64
rating      int64
dtype: object


In [10]:
# Check for missing values
print("Missing Values in User Dataset:")
print(user_data.isnull().sum())

Missing Values in User Dataset:
user_id     0
anime_id    0
rating      0
dtype: int64


### Data Visualization

#### Anime Scores Distribution

In [37]:
# Extract the average scores
average_scores = anime_data['Score']

# Create a histogram
fig = go.Figure()
fig.add_trace(go.Histogram(x=average_scores, nbinsx=20))

# Update the layout
fig.update_layout(
    title='Distribution of Average Anime Scores',
    xaxis=dict(title='Score'),
    yaxis=dict(title='Frequency'),
)

# Show the plot
fig.show()

# Create a box plot
fig = go.Figure()
fig.add_trace(go.Box(x=average_scores))

# Update the layout
fig.update_layout(
    title='Box Plot of Average Anime Scores',
    yaxis=dict(title='Score'),
)

# Show the plot
fig.show()

In [51]:
# Extract the user ratings
user_ratings = anime_data['Rating']

# Create a bar plot with a colorful palette
fig = px.bar(user_ratings.value_counts(), color=user_ratings.value_counts().index, color_discrete_sequence=px.colors.qualitative.Pastel)

# Update the layout for bar plot
fig.update_layout(
    title='User Ratings',
    xaxis=dict(title='Ratings'),
    yaxis=dict(title='Count')
)

# Show the bar plot
fig.show()

In [55]:
# Extract the genres
genres = anime_data['Genres'].str.split(',', expand=True).stack().str.strip().value_counts()

# Calculate the percentage of each genre
genre_percentages = genres / genres.sum() * 100

# Create a bar chart for genre frequency
fig_bar = px.bar(x=genres.index, y=genres.values, color=genres.index, color_discrete_sequence=px.colors.qualitative.Pastel)
fig_bar.update_layout(title='Frequency of Anime across Different Genres', xaxis=dict(title='Genre'), yaxis=dict(title='Frequency'))


# Select the top 10 genres
top_10_genres = genres[:10]

# Calculate the percentage of each genre
genre_percentages = top_10_genres / top_10_genres.sum() * 100

# Combine the rest of the genres into a single category
other_genres_count = genres[10:].sum()
genre_percentages['Other'] = other_genres_count / genres.sum() * 100

# Create a pie chart for top 10 genres and other
fig_pie = go.Figure(data=[go.Pie(labels=genre_percentages.index, values=genre_percentages.values, 
                                textinfo='label+percent', hole=0.3, 
                                marker=dict(colors=px.colors.qualitative.Pastel))])
fig_pie.update_layout(title='Percentage of Anime across Top 10 Genres')

# Show the bar chart and pie chart
fig_bar.show()
fig_pie.show()

In [57]:
# Extract the anime types
anime_types = anime_data['Type'].value_counts()

# Create a count plot for anime types
fig_count = px.bar(x=anime_types.index, y=anime_types.values, color=anime_types.index, color_discrete_sequence=px.colors.qualitative.Pastel)
fig_count.update_layout(title='Distribution of Anime Types', xaxis=dict(title='Anime Type'), yaxis=dict(title='Count'))

# Create a pie chart for anime types
fig_pie = go.Figure(data=[go.Pie(labels=anime_types.index, values=anime_types.values,
                                textinfo='label+percent', hole=0.3,
                                marker=dict(colors=px.colors.qualitative.Pastel))])
fig_pie.update_layout(title='Distribution of Anime Types')

# Show the count plot and pie chart
fig_count.show()
fig_pie.show()


In [66]:
# Extract the release season
anime_data['Season'] = anime_data['Premiered'].str.split(' ').str[0]

# Count the number of anime in each season
anime_counts = anime_data['Season'].value_counts().sort_index()

# Drop the 'Unknown' season for the bar chart
anime_counts_bar = anime_counts.drop('Unknown')

# Create a bar chart for anime releases by season (excluding 'Unknown')
fig_bar = go.Figure(data=[go.Bar(x=anime_counts_bar.index, y=anime_counts_bar.values,
                                marker=dict(color=px.colors.qualitative.Pastel))])
fig_bar.update_layout(title='Number of Anime Released by Season (excluding "Unknown")',
                      xaxis=dict(title='Season'), yaxis=dict(title='Number of Anime'))

# Create a pie chart for anime releases by season (including 'Unknown')
fig_pie = go.Figure(data=[go.Pie(labels=anime_counts.index, values=anime_counts.values,
                                textinfo='label+percent', hole=0.3,
                                marker=dict(colors=px.colors.qualitative.Pastel))])
fig_pie.update_layout(title='Number of Anime Released by Season (including "Unknown")')

# Show the line chart and bar chart
fig_bar.show()
fig_pie.show()

In [68]:
import pandas as pd
import re

# Function to convert duration to minutes
def convert_duration(duration):
    if 'min. per ep.' in duration:
        # Extract the duration value
        duration_value = float(re.findall(r'\d+', duration)[0])
        # Extract the number of episodes
        num_episodes = float(re.findall(r'\d+', duration.split(' ')[-4])[0])
        # Calculate the total duration
        total_duration = duration_value * num_episodes
        return total_duration
    elif 'hr.' in duration:
        # Extract the hours and minutes values
        duration_parts = re.findall(r'\d+', duration)
        hours = float(duration_parts[0])
        minutes = float(duration_parts[1]) if len(duration_parts) > 1 else 0
        # Convert hours to minutes
        total_duration = hours * 60 + minutes
        return total_duration
    else:
        return float(duration)  # For durations without episodes or hours

# Apply the conversion function to the 'Duration' column
anime_data['Total Duration'] = anime_data['Duration'].apply(convert_duration)

anime_data


ValueError: could not convert string to float: '24 min.'

# Data Visualization Ideas for Anime Recommendation Project

In your anime recommendation project, you can consider including the following data visualizations:

1. **Anime Scores Distribution**: Plotting a histogram or a box plot to visualize the distribution of anime scores can provide insights into the overall quality and popularity of anime in your dataset.

2. **User Ratings Distribution**: Creating a histogram or a bar plot to display the distribution of user ratings can help you understand how users rate anime and identify any patterns or preferences.

3. **Genres Analysis**: Visualizing the frequency or percentage of anime across different genres using a bar chart or a pie chart can give you an overview of the most popular genres and help users explore anime based on their genre preferences.

4. **Anime Types Breakdown**: Creating a count plot or a pie chart to show the distribution of anime types (e.g., TV series, movies, specials) can provide insights into the types of content available in your dataset.

5. **Seasonal Releases**: Plotting a line chart or a bar chart to visualize the number of anime released in each season (e.g., winter, spring, summer, fall) can help identify any seasonal trends in anime production and releases.

6. **Anime Duration Analysis**: Creating a box plot or a histogram to analyze the distribution of anime durations (e.g., episode length) can provide an understanding of the typical duration of anime episodes and identify any outliers.

7. **Anime Studio Analysis**: Visualizing the top anime studios based on the number of anime produced using a bar chart or a treemap can help users discover anime from popular studios and explore their catalog.

8. **Anime Popularity Analysis**: Plotting a scatter plot or a bubble chart with anime popularity on one axis (e.g., based on rankings or user favorites) and another variable (e.g., score or number of members) on the other axis can reveal insights into the relationship between popularity and other attributes.

Remember that these are just some suggestions, and the choice of data visualizations depends on the specific insights you want to showcase and the questions you want to answer in your anime recommendation project.


## Data Preprocessing


## Model Building


## Model Training


## Model Evaluation


## Hyperparameter Tuning


## Generating Recommendations


## Model Deployment