# SC1015 FEL1 Group 2 - *Project Title*
## Done by Jordan Choi, Kye Yong and Yu Kai

### Variables Used & Description
<hr>

### Dataset Description
#### Primary Dataset *"anime-dataset-2023"*
Contains anime information

<table style="width: 100%; text-align: left">
    <tr>
        <th>Field Name</th>
        <th>Field Description</th>
    </tr>
    <tr>
        <td>anime_id</td>
        <td>Unique ID for each anime.</td>
    </tr>
    <tr>
        <td>Name</td>
        <td>The name of the anime in its original language.</td>
    </tr>
    <tr>
        <td>English Name</td>
        <td>The English name of the anime.</td>
    </tr>
    <tr>
        <td>Other Name</td>
        <td>Other names or titles of the anime in different languages.</td>
    </tr>
    <tr>
        <td>Score</td>
        <td>The score or rating given to the anime.</td>
    </tr>
    <tr>
        <td>Genres</td>
        <td>The genres of the anime, separated by commas.</td>
    </tr>
    <tr>
        <td>Synopsis</td>
        <td>A brief description or summary of the anime's plot.</td>
    </tr>
    <tr>
        <td>Type</td>
        <td>The type of the anime (e.g., TV series, movie, OVA, etc.).</td>
    </tr>
    <tr>
        <td>Episodes</td>
        <td>The number of episodes in the anime.</td>
    </tr>
    <tr>
        <td>Aired</td>
        <td>The dates when the anime was aired.</td>
    </tr>
    <tr>
        <td>Premiered</td>
        <td>The season and year when the anime premiered.</td>
    </tr>
    <tr>
        <td>Status</td>
        <td>The status of the anime (e.g., Finished Airing, Currently Airing, etc.)</td>
    </tr>
    <tr>
        <td>Producers</td>
        <td>The production companies or producers of the anime.</td>
    </tr>
    <tr>
        <td>Licensors</td>
        <td>The licensors of the anime (e.g., streaming platforms).</td>
    </tr>
    <tr>
        <td>Studios</td>
        <td>The animation studios that worked on the anime.</td>
    </tr>
    <tr>
        <td>Source</td>
        <td>The source material of the anime (e.g., manga, light novel, original).</td>
    </tr>
    <tr>
        <td>Duration</td>
        <td>The duration of each episode.</td>
    </tr>
    <tr>
        <td>Rating</td>
        <td>The age rating of the anime.</td>
    </tr>
    <tr>
        <td>Rank</td>
        <td>The rank of the anime based on popularity or other criteria.</td>
    </tr>
    <tr>
        <td>Popularity</td>
        <td>The popularity rank of the anime.</td>
    </tr>
    <tr>
        <td>Favorites</td>
        <td>The number of times the anime was marked as a favorite by users.</td>
    </tr>
    <tr>
        <td>Scored By</td>
        <td>The number of users who scored the anime.</td>
    </tr>
    <tr>
        <td>Members</td>
        <td>The number of members who have added the anime to their list on the platform.</td>
    </tr>
    <tr>
        <td>Image URL</td>
        <td>The URL of the anime's image or poster.</td>
    </tr>
</table>
<hr>

In [2]:
# Import the Basic Libraries
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

# Import Decision Tree Classifier model from Scikit-Learn
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import plot_tree
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.preprocessing import LabelEncoder


# Import & Clean "Anime Dataset 2023" Dataset
#anime_data = pd.read_csv("dataset/anime-dataset-2023.csv")
anime_data = pd.read_csv("anime-dataset-2023.csv")
pd.set_option('display.max_rows', 500)
anime_data.head(len(anime_data))

Unnamed: 0,anime_id,Name,English name,Other name,Score,Genres,Synopsis,Type,Episodes,Aired,...,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL
0,1,Cowboy Bebop,Cowboy Bebop,カウボーイビバップ,8.75,"Action, Award Winning, Sci-Fi","Crime is timeless. By the year 2071, humanity ...",TV,26.0,"Apr 3, 1998 to Apr 24, 1999",...,Sunrise,Original,24 min per ep,R - 17+ (violence & profanity),41.0,43,78525,914193.0,1771505,https://cdn.myanimelist.net/images/anime/4/196...
1,5,Cowboy Bebop: Tengoku no Tobira,Cowboy Bebop: The Movie,カウボーイビバップ 天国の扉,8.38,"Action, Sci-Fi","Another day, another bounty—such is the life o...",Movie,1.0,"Sep 1, 2001",...,Bones,Original,1 hr 55 min,R - 17+ (violence & profanity),189.0,602,1448,206248.0,360978,https://cdn.myanimelist.net/images/anime/1439/...
2,6,Trigun,Trigun,トライガン,8.22,"Action, Adventure, Sci-Fi","Vash the Stampede is the man with a $$60,000,0...",TV,26.0,"Apr 1, 1998 to Sep 30, 1998",...,Madhouse,Manga,24 min per ep,PG-13 - Teens 13 or older,328.0,246,15035,356739.0,727252,https://cdn.myanimelist.net/images/anime/7/203...
3,7,Witch Hunter Robin,Witch Hunter Robin,Witch Hunter ROBIN (ウイッチハンターロビン),7.25,"Action, Drama, Mystery, Supernatural",Robin Sena is a powerful craft user drafted in...,TV,26.0,"Jul 3, 2002 to Dec 25, 2002",...,Sunrise,Original,25 min per ep,PG-13 - Teens 13 or older,2764.0,1795,613,42829.0,111931,https://cdn.myanimelist.net/images/anime/10/19...
4,8,Bouken Ou Beet,Beet the Vandel Buster,冒険王ビィト,6.94,"Adventure, Fantasy, Supernatural",It is the dark century and the people are suff...,TV,52.0,"Sep 30, 2004 to Sep 29, 2005",...,Toei Animation,Manga,23 min per ep,PG - Children,4240.0,5126,14,6413.0,15001,https://cdn.myanimelist.net/images/anime/7/215...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24900,55731,Wu Nao Monu,UNKNOWN,无脑魔女,UNKNOWN,"Comedy, Fantasy, Slice of Life",No description available for this anime.,ONA,15.0,"Jul 4, 2023 to ?",...,UNKNOWN,Web manga,Unknown,PG-13 - Teens 13 or older,UNKNOWN,24723,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1386/...
24901,55732,Bu Xing Si: Yuan Qi,Blader Soul,捕星司·源起,UNKNOWN,"Action, Adventure, Fantasy",No description available for this anime.,ONA,18.0,"Jul 27, 2023 to ?",...,UNKNOWN,Web novel,Unknown,PG-13 - Teens 13 or older,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1383/...
24902,55733,Di Yi Xulie,The First Order,第一序列,UNKNOWN,"Action, Adventure, Fantasy, Sci-Fi",No description available for this anime.,ONA,16.0,"Jul 19, 2023 to ?",...,UNKNOWN,Web novel,Unknown,PG-13 - Teens 13 or older,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1130/...
24903,55734,Bokura no Saishuu Sensou,UNKNOWN,僕らの最終戦争,UNKNOWN,UNKNOWN,A music video for the song Bokura no Saishuu S...,Music,1.0,"Apr 23, 2022",...,UNKNOWN,Original,3 min,PG-13 - Teens 13 or older,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1931/...


In [3]:
# Check the basic information of the dataset.
print(anime_data.dtypes)

anime_id         int64
Name            object
English name    object
Other name      object
Score           object
Genres          object
Synopsis        object
Type            object
Episodes        object
Aired           object
Premiered       object
Status          object
Producers       object
Licensors       object
Studios         object
Source          object
Duration        object
Rating          object
Rank            object
Popularity       int64
Favorites        int64
Scored By       object
Members          int64
Image URL       object
dtype: object


## Data Cleanup

#### anime_id - done
in order, retaining in case we're analysing other datasets.

#### Name - done
in order, crucial to analyse. Will serve as the primary name since there are 3 different identifiers and this contains unique values for all rows. 

#### English name - done
to retain, so that we can find out whether the existence of an English name affects the score. 
There seem to be 3 cases: 
"English Name" is same as "Name", 
"Name" contains words of other languages and "English Name" differs from "Name", 
or "English Name" is "UNKNOWN".

#### Other name - done
to remove from dataframe, since majority of the values are the Japanese Hiragana name, which none of us understands and is probably irrelevant.

#### Score - Partially Done
Crucial Field. The type of this field in the raw dataset is "object". To convert to decimal values. Contains 9213 fields with "UNKNOWN"
We will use the following approach to replace the values of Unknown.

1) Find out the correlation of score with other likely fields, such as Rank and Popularity.
2) Separate into different bins, and use the mean/median score of the corresponding bin to replace the "UNKNOWN" value.

#### Genres - Done
Some titles may contain more than 1 genre, hence, multi-label binarization will be used to clean up this column. A unique column label will be created for each genre. Binary values (0 or 1) will be used to indicate if the title belongs to the respective genre.

4929 titles with "Unknown" genres, will remain as 0 for all genres.

#### Synopsis - Done
We will leave the sypnosis as it is. We may or may not use it, depending if we can implement a language model to analyse this texts. There are 4535 without description - "No description available for this anime".

#### Type
Categorical Data, will be converted to a numeral value to represent each category.
- Movie | NA
- Music | NA
- ONA   | 10
- OVA   | 20
- Special | NA
- TV    | 30
- Unknown | NA

There are 74 titles with unknown type in the raw dataset. Will be expelled from dataframe. Done

Almost all of 2686 titles with "Music" category are Animated Music Videos with less than 10 mins "Duration" time. We're not interested in this hence will be expelled from the dataframe. Done

"Movie" and "Special" titles will also be expelled from the dataframe, since we're only interested in the analysis of anime series - aired either on TV, Online Streaming Platforms (ONA), or Home Video Release (OVA). Done


#### Episodes
611 titles with "unknown" number of episodes. Some of these contain popular titles - like "Detective Conan".
We will use the following approach to replace the values of Unknown.

Priority 1 - Since most anime are typically released weekly, we will calculate the number of weeks each anime is aired for if available. Then, we will get the average numbers of episodes released in a week. When the figure is available, we will estimate the number of episodes for titles with unknown episodes, through the number of running weeks * average number of episodes/week.

Priority 2 - If the type is available, we will categorize into bins, and replace the unknown values of the bins with the average number of episodes for the type.

The accurate value of episodes is likely irrelevant, hence a range can be considered to be used. (Eg. < 50, <100, <200, <300, <400, <500)

#### Aired
Will be separated into start and end date. Is the day field necessary? 
Air dates of 915 titles are "Not Available". We will replace these with "NaT".

#### Premiered
The season where the title was first released.
19399 titles are "UNKNOWN".
This column will be expelled from the data since majority of the values are UNKNOWN, and the start date is available.

#### Licensors
20170 titles are "UNKNOWN", it is unlikely this column can provide any valuable insights on our problem. This column will be expelled from the data since majority of the values are UNKNOWN.

#### Studios
10526 titles are "UNKNOWN'. Since the studios who produced the anime titles could be an important factor to determine the success of an anime title, this column will be retained. Titles can be a collaboration of more than 1 studios, hence, we will follow a similar approach to Genres. Multi-label binarization will be employed. A unique column label will be created for each studio. Binary values (0 or 1) will be used to indicate if the title belongs to the respective studio.

UNKNOWN titles will have 0s in all known studios.

#### Source
Will be converted to Numerical Values to represent the categories.

- 4-koma manga
- Book
- Card Game
- Game
- Light Novel
- Manga
- Mixed Media
- Music
- Novel
- Original
- Other
- Picture Book
- Radio
- Unknown
- Visual Novel
- Web Manga
- Web Novel

Since the source is unlikely possible to be determined through other columns, unknown will remain as unknown - 3689 titles.

#### Duration
In the raw data, duration is difficult to process as it is represented as an object. All values will be converted to minutes, represented in numerical value.

Any titles with less than 10 minutes running time will be expelled, as it is unlikely to be an anime series we're trying to study.


#### Rating
All values will be converted to numerical values to represent various categorical value.

G - All Ages
PG - Children
PG-13 - Teens 13 or older
R - 17+ (violence)
R+ - Mild Nudity
Rx - Hentai
UNKNOWN

669 titles have Unknown ratings. Will remain as unknown since it is unlikely to estimate from other columns.

#### Rank
4612 titles has unknown ranks.
187 titles has Rank "0" - it is invalid as each rank should be unique exclusively to each title. Hence this will be treated as unknown data.

#### Popularity
187 titles with Popularity "0"
1 title with no popularity value.

#### Favorites
167 titles with Favorites "0"

#### Scored By
9213 values with "Unknown". Will be replaced with the median values of "Scored By" column.

#### Members
No cleaning required. Data is processable.

#### Image URL
No cleaning required. Can consider to remove, but perhaps we can use machine learning to determine whether poster image affects the success of an anime.


### Data Cleansing

##### Cleaning - Unnecessary Columns

In [8]:
# Copy the raw data into a new dataframe for cleaning.
anime_cleaned_df = anime_data.copy()

# Drop unnecessary columns that are irrelevant to our analysis or unmeaningful due to Unknown values
# - "Other Name", "Premiered", "Licensors" 
anime_cleaned_df = anime_cleaned_df.drop(['Other name', 'Premiered', 'Licensors'], axis=1)

# Check if the columns are dropped successfully.
anime_cleaned_df.dtypes

anime_id         int64
Name            object
English name    object
Score           object
Genres          object
Synopsis        object
Type            object
Episodes        object
Aired           object
Status          object
Producers       object
Studios         object
Source          object
Duration        object
Rating          object
Rank            object
Popularity       int64
Favorites        int64
Scored By       object
Members          int64
Image URL       object
dtype: object

##### Cleaning - "Type" Column

In [10]:
# Drop rows with "Music", "Movie", "Special" value of Type. Retaining only "TV", "ONA" and "OVA" - which are our focus of this project.
# Remove these rows by filtering those values of Type using "isin()"

# Filter the DataFrame to exclude rows where the "Type" column is "Movie", "Music", or "Special"
anime_cleaned_df = anime_cleaned_df[~anime_cleaned_df['Type'].isin(['Movie', 'Music', 'Special', 'UNKNOWN'])]

# Check if rows are dropped successfully.
filtered_df = anime_cleaned_df[anime_cleaned_df['Type'].isin(['Movie', 'Music', 'Special', 'UNKNOWN'])]
filtered_df

anime_cleaned_df.describe()

Unnamed: 0,anime_id,Popularity,Favorites,Members
count,15206.0,15206.0,15206.0,15206.0
mean,28358.823491,11516.737472,649.390833,51496.0
std,18396.556175,7524.147345,5436.525592,191249.1
min,1.0,0.0,0.0,0.0
25%,8194.25,4963.5,0.0,239.0
50%,33774.0,10624.0,3.0,1943.5
75%,45464.75,17985.25,40.0,15192.5
max,55733.0,24723.0,217606.0,3744541.0


##### Cleaning - "English name" Column

In [12]:
# Since most values in 'English names' are the same as the "Name" column, replace duplicate values with "NaN" value.
anime_cleaned_df['English name'] = anime_cleaned_df.apply(lambda row: None if row['English name'] == row['Name'] else row['English name'], axis=1)

In [13]:
# Replace "Unknown" values with "NaN" value.
anime_cleaned_df['English name'].replace("UNKNOWN", None, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  anime_cleaned_df['English name'].replace("UNKNOWN", None, inplace=True)


In [14]:
# Verify if "UNKNOWN" values are replaced by "NaN" value successfully.
filtered_df = anime_cleaned_df[anime_cleaned_df['English name'] == 'UNKNOWN']
filtered_df

Unnamed: 0,anime_id,Name,English name,Score,Genres,Synopsis,Type,Episodes,Aired,Status,...,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL


In [15]:
# Check if the 9213 Unknown values for Score and Scored By are equal - that means 
filtered_df = anime_cleaned_df[anime_cleaned_df['Score'] == 'UNKNOWN']
filtered_df2 = anime_cleaned_df[anime_cleaned_df['Scored By'] == 'UNKNOWN']
are_equal = filtered_df.equals(filtered_df2)
print(are_equal)

True


##### Cleaning - "Score" Column

In [17]:
type(anime_cleaned_df)

pandas.core.frame.DataFrame

In [18]:
# Set Unknown Values of Score to "NaN" first.

In [19]:
anime_cleaned_df['Score'] = anime_cleaned_df['Score'].replace("Unknown", np.nan)
anime_cleaned_df['Score'] = pd.to_numeric(anime_cleaned_df['Score'], errors='coerce')

In [20]:
# Verify if "UNKNOWN" values are replaced by "NaN" value successfully.
filtered_df = anime_cleaned_df[anime_cleaned_df['Score'] == 'UNKNOWN']
filtered_df

### NEED TO REPLACE UNKNOWN VALUES!

Unnamed: 0,anime_id,Name,English name,Score,Genres,Synopsis,Type,Episodes,Aired,Status,...,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL


##### Cleaning - "Genre" Column - Use Multi Label Binarization

In [22]:
# Remove "Unknown"

# Split the genre into a list
anime_cleaned_df['Genres List'] = anime_cleaned_df['Genres'].apply(lambda x: x.split(", "))

# Initialize the MultiLabelBinarizer and fit into the list
mlb = MultiLabelBinarizer()
encoded_genres = mlb.fit_transform(anime_cleaned_df['Genres List'])

# DataFrame for Encoded Genres 
genres_df = pd.DataFrame(encoded_genres, columns=mlb.classes_, index=anime_cleaned_df.index)

# Drop the UNKNOWN category from genres_df
genres_df = genres_df.drop(['UNKNOWN'], axis=1)

# Merge into the original anime dataframe using join.
anime_cleaned_df = anime_cleaned_df.join(genres_df)


In [23]:
# Loop through the genre columns with binary values - check if they are classes generated by mlb.
genre_columns = [col for col in anime_cleaned_df.columns if col in mlb.classes_]  # mlb.classes_ contains the genre names

# Replace NaN values in genre columns with 0
anime_cleaned_df[genre_columns] = anime_cleaned_df[genre_columns].fillna(0)

# Convert genre columns to int64
anime_cleaned_df[genre_columns] = anime_cleaned_df[genre_columns].astype('int64')

print(anime_cleaned_df.dtypes)

anime_id           int64
Name              object
English name      object
Score            float64
Genres            object
Synopsis          object
Type              object
Episodes          object
Aired             object
Status            object
Producers         object
Studios           object
Source            object
Duration          object
Rating            object
Rank              object
Popularity         int64
Favorites          int64
Scored By         object
Members            int64
Image URL         object
Genres List       object
Action             int64
Adventure          int64
Avant Garde        int64
Award Winning      int64
Boys Love          int64
Comedy             int64
Drama              int64
Ecchi              int64
Erotica            int64
Fantasy            int64
Girls Love         int64
Gourmet            int64
Hentai             int64
Horror             int64
Mystery            int64
Romance            int64
Sci-Fi             int64
Slice of Life      int64


In [24]:
# Set pandas to display all columns
pd.set_option('display.max_columns', None)
anime_cleaned_df

Unnamed: 0,anime_id,Name,English name,Score,Genres,Synopsis,Type,Episodes,Aired,Status,Producers,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL,Genres List,Action,Adventure,Avant Garde,Award Winning,Boys Love,Comedy,Drama,Ecchi,Erotica,Fantasy,Girls Love,Gourmet,Hentai,Horror,Mystery,Romance,Sci-Fi,Slice of Life,Sports,Supernatural,Suspense
0,1,Cowboy Bebop,,8.75,"Action, Award Winning, Sci-Fi","Crime is timeless. By the year 2071, humanity ...",TV,26.0,"Apr 3, 1998 to Apr 24, 1999",Finished Airing,Bandai Visual,Sunrise,Original,24 min per ep,R - 17+ (violence & profanity),41.0,43,78525,914193.0,1771505,https://cdn.myanimelist.net/images/anime/4/196...,"[Action, Award Winning, Sci-Fi]",1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,6,Trigun,,8.22,"Action, Adventure, Sci-Fi","Vash the Stampede is the man with a $$60,000,0...",TV,26.0,"Apr 1, 1998 to Sep 30, 1998",Finished Airing,Victor Entertainment,Madhouse,Manga,24 min per ep,PG-13 - Teens 13 or older,328.0,246,15035,356739.0,727252,https://cdn.myanimelist.net/images/anime/7/203...,"[Action, Adventure, Sci-Fi]",1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
3,7,Witch Hunter Robin,,7.25,"Action, Drama, Mystery, Supernatural",Robin Sena is a powerful craft user drafted in...,TV,26.0,"Jul 3, 2002 to Dec 25, 2002",Finished Airing,"Bandai Visual, Dentsu, Victor Entertainment, T...",Sunrise,Original,25 min per ep,PG-13 - Teens 13 or older,2764.0,1795,613,42829.0,111931,https://cdn.myanimelist.net/images/anime/10/19...,"[Action, Drama, Mystery, Supernatural]",1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0
4,8,Bouken Ou Beet,Beet the Vandel Buster,6.94,"Adventure, Fantasy, Supernatural",It is the dark century and the people are suff...,TV,52.0,"Sep 30, 2004 to Sep 29, 2005",Finished Airing,"TV Tokyo, Dentsu",Toei Animation,Manga,23 min per ep,PG - Children,4240.0,5126,14,6413.0,15001,https://cdn.myanimelist.net/images/anime/7/215...,"[Adventure, Fantasy, Supernatural]",0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0
5,15,Eyeshield 21,,7.92,Sports,"Shy, reserved, and small-statured, Deimon High...",TV,145.0,"Apr 6, 2005 to Mar 19, 2008",Finished Airing,"TV Tokyo, Nihon Ad Systems, TV Tokyo Music, Sh...",Gallop,Manga,23 min per ep,PG-13 - Teens 13 or older,688.0,1252,1997,86524.0,177688,https://cdn.myanimelist.net/images/anime/1079/...,[Sports],0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24894,55725,4 Week Lovers,,,Boys Love,Wanna fake-date for 4 weeks?\nDojun's life at ...,ONA,10.0,"Apr 4, 2023 to ?",Finished Airing,UNKNOWN,UNKNOWN,Web manga,5 min per ep,UNKNOWN,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1443/...,[Boys Love],0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
24895,55726,"Die, Please!",,,"Fantasy, Romance",I just want to tell him how I feel!\nMina has ...,ONA,UNKNOWN,"May 31, 2023 to ?",Finished Airing,UNKNOWN,UNKNOWN,Web manga,5 min,G - All Ages,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1621/...,"[Fantasy, Romance]",0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0
24900,55731,Wu Nao Monu,,,"Comedy, Fantasy, Slice of Life",No description available for this anime.,ONA,15.0,"Jul 4, 2023 to ?",Not yet aired,UNKNOWN,UNKNOWN,Web manga,Unknown,PG-13 - Teens 13 or older,UNKNOWN,24723,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1386/...,"[Comedy, Fantasy, Slice of Life]",0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0
24901,55732,Bu Xing Si: Yuan Qi,Blader Soul,,"Action, Adventure, Fantasy",No description available for this anime.,ONA,18.0,"Jul 27, 2023 to ?",Not yet aired,UNKNOWN,UNKNOWN,Web novel,Unknown,PG-13 - Teens 13 or older,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1383/...,"[Action, Adventure, Fantasy]",1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


##### Cleaning - "Synopsis" Column - Replace "No description available for this anime." with NA value

In [26]:
anime_cleaned_df['Synopsis'] = anime_cleaned_df['Synopsis'].replace("No description available for this anime.", pd.NA)

In [27]:
# Verify if "UNKNOWN" values are replaced by "NA" value successfully.
filtered_df = anime_cleaned_df[anime_cleaned_df['Synopsis'] == "No description available for this anime."]
filtered_df

Unnamed: 0,anime_id,Name,English name,Score,Genres,Synopsis,Type,Episodes,Aired,Status,Producers,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL,Genres List,Action,Adventure,Avant Garde,Award Winning,Boys Love,Comedy,Drama,Ecchi,Erotica,Fantasy,Girls Love,Gourmet,Hentai,Horror,Mystery,Romance,Sci-Fi,Slice of Life,Sports,Supernatural,Suspense


In [47]:
anime_cleaned_df

Unnamed: 0,anime_id,Name,English name,Score,Genres,Synopsis,Type,Episodes,Aired,Status,Producers,Studios,Source,Duration,Rating,Rank,Popularity,Favorites,Scored By,Members,Image URL,Genres List,Action,Adventure,Avant Garde,Award Winning,Boys Love,Comedy,Drama,Ecchi,Erotica,Fantasy,Girls Love,Gourmet,Hentai,Horror,Mystery,Romance,Sci-Fi,Slice of Life,Sports,Supernatural,Suspense
0,1,Cowboy Bebop,,8.75,"Action, Award Winning, Sci-Fi","Crime is timeless. By the year 2071, humanity ...",TV,26.0,"Apr 3, 1998 to Apr 24, 1999",Finished Airing,Bandai Visual,Sunrise,Original,24 min per ep,R - 17+ (violence & profanity),41.0,43,78525,914193.0,1771505,https://cdn.myanimelist.net/images/anime/4/196...,"[Action, Award Winning, Sci-Fi]",1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
2,6,Trigun,,8.22,"Action, Adventure, Sci-Fi","Vash the Stampede is the man with a $$60,000,0...",TV,26.0,"Apr 1, 1998 to Sep 30, 1998",Finished Airing,Victor Entertainment,Madhouse,Manga,24 min per ep,PG-13 - Teens 13 or older,328.0,246,15035,356739.0,727252,https://cdn.myanimelist.net/images/anime/7/203...,"[Action, Adventure, Sci-Fi]",1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
3,7,Witch Hunter Robin,,7.25,"Action, Drama, Mystery, Supernatural",Robin Sena is a powerful craft user drafted in...,TV,26.0,"Jul 3, 2002 to Dec 25, 2002",Finished Airing,"Bandai Visual, Dentsu, Victor Entertainment, T...",Sunrise,Original,25 min per ep,PG-13 - Teens 13 or older,2764.0,1795,613,42829.0,111931,https://cdn.myanimelist.net/images/anime/10/19...,"[Action, Drama, Mystery, Supernatural]",1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0
4,8,Bouken Ou Beet,Beet the Vandel Buster,6.94,"Adventure, Fantasy, Supernatural",It is the dark century and the people are suff...,TV,52.0,"Sep 30, 2004 to Sep 29, 2005",Finished Airing,"TV Tokyo, Dentsu",Toei Animation,Manga,23 min per ep,PG - Children,4240.0,5126,14,6413.0,15001,https://cdn.myanimelist.net/images/anime/7/215...,"[Adventure, Fantasy, Supernatural]",0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0
5,15,Eyeshield 21,,7.92,Sports,"Shy, reserved, and small-statured, Deimon High...",TV,145.0,"Apr 6, 2005 to Mar 19, 2008",Finished Airing,"TV Tokyo, Nihon Ad Systems, TV Tokyo Music, Sh...",Gallop,Manga,23 min per ep,PG-13 - Teens 13 or older,688.0,1252,1997,86524.0,177688,https://cdn.myanimelist.net/images/anime/1079/...,[Sports],0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24894,55725,4 Week Lovers,,,Boys Love,Wanna fake-date for 4 weeks?\nDojun's life at ...,ONA,10.0,"Apr 4, 2023 to ?",Finished Airing,UNKNOWN,UNKNOWN,Web manga,5 min per ep,UNKNOWN,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1443/...,[Boys Love],0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
24895,55726,"Die, Please!",,,"Fantasy, Romance",I just want to tell him how I feel!\nMina has ...,ONA,UNKNOWN,"May 31, 2023 to ?",Finished Airing,UNKNOWN,UNKNOWN,Web manga,5 min,G - All Ages,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1621/...,"[Fantasy, Romance]",0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0
24900,55731,Wu Nao Monu,,,"Comedy, Fantasy, Slice of Life",,ONA,15.0,"Jul 4, 2023 to ?",Not yet aired,UNKNOWN,UNKNOWN,Web manga,Unknown,PG-13 - Teens 13 or older,UNKNOWN,24723,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1386/...,"[Comedy, Fantasy, Slice of Life]",0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0
24901,55732,Bu Xing Si: Yuan Qi,Blader Soul,,"Action, Adventure, Fantasy",,ONA,18.0,"Jul 27, 2023 to ?",Not yet aired,UNKNOWN,UNKNOWN,Web novel,Unknown,PG-13 - Teens 13 or older,0.0,0,0,UNKNOWN,0,https://cdn.myanimelist.net/images/anime/1383/...,"[Action, Adventure, Fantasy]",1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
