## How Much Do You Know About Anime?: Filtering, Selection and Sorting

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('anime_dataset.csv')

### Activities

`Note`: Please do not use string handling for solving activities 🙏. Solve them only with the specified methods.

#### Selection with `.iloc[]`

##### 1) Select the first 7 rows of the dataset

first_seven_rows = df.iloc[0:7]
first_seven_rows

first_seven_rows = df.iloc[:6]
first_seven_rows

##### 2) Select the last nine records

last_nine_rows = df.iloc[-9:, :]
last_nine_rows

##### 3) Select the `Release_Year` column

year_df = df.iloc[:, [8]]
year_df

df.info()

Validate your `year_df` variable is a DataFrame

type(year_df)

##### 4) Select rows 3 to 6 and extract only the `Title` and `Genres` columns

selected_rows_cols = df.iloc[2:6,[0,1]]

#### Selection with `.loc[]`

##### 5) Select the `Episodes` column

episodes_df = df.loc[:, ["Episodes"]]
episodes_df

Validate your `episodes_df` variable is a DataFrame

type(episodes_df)

##### 6) Select the popularity scores of anime from rows 200 to 300, Inclusive

pop_200_300_df =df.loc[199:299, ["Popularity"]] 
pop_200_300_df

#### What Is Special About Year 2018 ?

By looking at the bar chart below, we can clearly see that in the year 2018, the highest number of anime were released.

# Run the below cell to see the bar chart

year_counts = df[df['Release_Year'] != 0]['Release_Year'].value_counts().sort_index()

plt.figure(figsize=(10, 6), facecolor='white', edgecolor='black')
year_counts.plot(kind='bar', color='skyblue')  

plt.title('Anime Releases by Year')
plt.xlabel('Release Year')
plt.ylabel('Count')
plt.xticks(rotation=45)

plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

##### 7) Select the anime that were released in the year 2018 and store them in `released_2018`

released_2018 = df.loc[df['Release_Year']==2018]
released_2018

#### Just For Exploration

Hey, did you know? In spring 2022, a bunch of new anime came out. Guess what genre most of them were? Comedy and romance! Maybe it's because spring gets people feeling romantic, who knows? But here's the cool part: I found out by making a bar chart. Shh, it's our little secret!

# Run the below cell to see the plot about genres distribution for spring 2022

genre_counts = df[df['Release Date'] == 'Spring 2022']['Genres'].value_counts()

plt.figure(figsize=(10, 6))

sns.barplot(x=genre_counts.index, y=genre_counts.values, color='#FFF225')

plt.title('Genres Distribution for Spring 2022 Anime')
plt.xlabel('Genres')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')  
plt.tight_layout()

plt.show()

##### 8) How many anime were released in `Spring 2022`

df.loc[df["Release Date"]=="Spring 2022"].shape[0]

##### 9) Which of the following anime have a single episode ?

df.loc[df["Episodes"]==1].shape[0]

option_list = ['Gintama: The Final', 'Fullmetal Alchemist: Brotherhood', 'Violet Evergarden Movie', 'Chiisana Kyojin Microman']

# We are converting the list into series because list has no attribute as .isin() to check whether the elements are present or not
option_series = pd.Series(option_list)

option_series.isin(df.loc[df['Episodes']==1, 'Title'])

df

##### 10) Select anime for which the `Release Date` is missing

missing_release_date = df.loc[df["Release Date"].isnull()]
missing_release_date

##### 11) Select the anime that belong to `Action,Comedy,Sci-Fi` in `Genres` column, Only select `Title`, `Score`

specific_genre_df = df.loc[df["Genres"].isin(["Action","Comedy","Sci-Fi"]),["Title","Score"]]

specific_genre_df ????

specific_genre_df = df.loc[df['Genres'] == 'Action,Comedy,Sci-Fi', ['Title', 'Score']]
specific_genre_df

##### 12) What is the popularity value that marks the top 1% of anime?

round(df["Popularity"].quantile (0.99), -3)

#### Top 1% Anime By Popularity

The popularity of anime are heavily distributed towards the median of 5000. But there are some very popularity anime that greatly surpass this value, the top 1% of the anime, have a popularity score of ~14000:

Here is the distribution of the popularity of anime, with a reference line at the value 14,000.

# Run the below cell for distribution

sns.boxplot(x="Popularity", data=df, showmeans=True, color='#33FF6B')  

plt.axvline(x=14000, color='red', linestyle='--', label='Reference Line')

plt.xlabel("Popularity")
plt.ylabel("Data")  
plt.title("Boxplot of Popularity")  
plt.legend() 
plt.show()

##### 13) Select the anime that have a popularity score greater than `14000`

high_popularity_df= df.loc[df["Popularity"] > 14000]
high_popularity_df

##### 14) Select the anime that have an episode length less than or equal to 5

df.info()

less_episode_length_df = df.loc[df["Episode_Length(In min)"]<= 5]
less_episode_length_df

##### 15) What percentage of anime episodes last half an hour or more ?

round(df.loc[df["Episode_Length(In min)"]>=30].shape[0] / (df.shape[0])*100,2)

##### 16) Select the anime that released after 2020, Select only `Title` and `Release_Year` columns

df.columns

filtered_year = df.loc[(df['Release_Year']>2020),["Title","Release_Year"]]

#### Beginner Query Activities

##### 17) What are the anime that were released in 2015 ?

anime_2015 = df.query("Release_Year ==2015")
anime_2015

#### Do you know ?

The genre that the majority of the anime in this dataset falls under is `Comedy,Slice of Life`, making up 5.4 percent.

# Run the below cell to see the pie chart

genre_counts = df['Genres'].value_counts()

total = genre_counts.sum()
genre_percentages = genre_counts / total

others_threshold = 0.02
main_genres = genre_counts[genre_counts / genre_counts.sum() >= others_threshold]
main_genres['Others'] = genre_counts[genre_counts / genre_counts.sum() < others_threshold].sum()

colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99', '#c2c2f0', '#ffb3e6']

plt.figure(figsize=(8, 8))
plt.pie(main_genres, labels=main_genres.index, autopct='%1.1f%%', startangle=140, colors=colors)
plt.title('Genre Distribution')
plt.axis('equal')
plt.show()

##### 18) Filter the dataset to select the anime that belong to genre `Comedy,Slice of Life` 

comedy_slice_of_life_df = df.loc[(df["Genres"]=="Comedy,Slice of Life")]

##### 19) How many anime consist of precisely ten episodes ?

df.loc[df["Episodes"]==10].shape[0]

##### 20) Filter the dataset on the given condition

episodes_less_than_five = df.loc[df["Episodes"] < 5]
episodes_less_than_five

##### 21) Filter the dataset on the given condition

df.head()

high_score_anime = df.loc[df["Score"]>=9]
high_score_anime

##### 22) Which of the following Anime are in Top 10 Anime's based on Rank ?

high_score_anime.sort_values(by=["Score","Popularity"],ascending=False).head(10)

##### 23) Filter the dataset on the specified condition

filtered_popular_score = df.loc[(df["Popularity"] > 1000) & (df["Score"] > 8.5)]

filtered_popular_score 

##### 24) Filtering Dataset for Recent Releases and High Popularity

df.columns

recent_popular_anime = df.loc[(df['Release_Year']>2020) & (df['Popularity'] >9000)]
recent_popular_anime

recent_popular_anime = df.query('Release_Year >= 2020 and Popularity > 9000')
recent_popular_anime


##### 25) Filtering Based on three conditions

25-Filtering Based on three conditions

Filter the dataframe to only include entries where the Episodes is greater than 20 and less than 30, and the Rank is less than 20. Then store the resulting dataframe in filtered_data variable.

filtered_data = df.loc[(df["Episodes"] >20) & (df["Episodes"] <30)& (df["Rank"] <20)]
filtered_data

#### Beginner Sorting Activities

##### 26) Sort the dataframe by single a column

26- Sort the dataframe by single column

Sort the dataframe based on Rank column in ascending order and store it in sorted_rank_df variable.

This is how your result will look like:

sorted_rank_df = df["Rank"].sort_values(ascending=True)
sorted_rank_df

sorted_rank_df = df.sort_values(by='Rank')
sorted_rank_df


#### In the above activity, we know that `Fullmetal Alchemist: Brotherhood` holds the top rank among anime. Did you know? This acclaimed series secured the Tokyo Anime Award for Best Television Series.

![Fullmetal Alchemist Brotherhood](./Fullmetal-Alchemist-Brotherhood-anime.jpg)


Source: Fumination

##### 27) Sort in descending order

27
Sort in descending order

Sort the dataframe in descending order based on Popularity column and store the resulting dataframe in popularity_desc_df variable.

popularity_desc_df = df.sort_values(by="Popularity",ascending=False)
popularity_desc_df

##### 28) Sort by Multiple Columns

sorted_pop_score_df = df.sort_values(by=["Popularity","Score"])

##### 29) What is the title of the first anime when the dataframe is sorted by the `Genres` column ?

29
What is the title of the first anime when the dataframe is sorted by the Genres column ?

Enter the title of the first anime in the sorted dataframe, where the dataframe is sorted in ascending order based on the Genres column

df.sort_values(by="Genres")

#### Boolean Logic and Sorting Activities

Boolean Logic and Sorting Activities
In this section, you will perform activities using AND (&), OR (|), and NOT (~). Additionally, we will combine sorting techniques with boolean logic.

30
Select anime released in 2017 that have more than 20 episodes

Store the resulting dataframe in a variable named df_2017_more_than_20_episodes

##### 30) Select anime released in 2017 that have more than 20 episodes

df.columns

df_2017_more_than_20_episodes = df.loc[(df["Release_Year"]==2017) & (df["Episodes"] >20)]

df_2017_more_than_20_episodes

#### Anime Release Distribution in 2009

The following pie chart shows the distribution of release dates in 2009. From it, we can infer that in 2009, most anime were released in the spring season, while the fewest were released in the summer season.

# Run the below cell to see the pie chart

release_date_counts = df.loc[df['Release_Year'] == 2009]['Release Date'].value_counts()

custom_colors = sns.color_palette('Paired')

plt.figure(figsize=(8, 8))
sns.set_style("whitegrid")
plt.pie(release_date_counts, labels=release_date_counts.index, autopct='%1.1f%%', startangle=140, colors=custom_colors)
plt.axis('equal')  
plt.title('Distribution of Anime Release Dates')

plt.show()

##### 31) Choose anime that were released during either `Spring 2009` or `Summer 2009`

df.columns

spring_summer_2009_df =df.loc[(df['Release Date']=="Spring 2009") | (df['Release Date']=="Summer 2009")]

spring_summer_2009_df

 ##### 32) Find anime that do not belong to the genre `Action,Adventure`

not_act_adv_df = df.loc[~ (df["Genres"]=="Action,Adventure")]

##### 33) Popularity Range Sorting

pop_range_sorted_df = df.loc[df["Popularity"].between(7500,8000)].sort_values(by="Popularity")

pop_range_sorted_df = df.loc[(df['Popularity'] >= 7500) & (df['Popularity'] <= 8000)].sort_values(by='Popularity', ascending=True)
pop_range_sorted_df

##### 34) What is the score of the anime with the lowest rank where the genre of the movie is `Drama,Sci-Fi` ?

34
What is the score of the anime ranked lowest in the genres Drama,Sci-Fi?

Enter the score of the anime categorized under Drama,Sci-Fi with the highest value in the 'Rank' column. Remember, in the context of ranks, lower numerical values indicate higher ranks, while higher numerical values indicate lower ranks.

df.columns

df["Rank"].nlargest(1)

df["Score"].nlargest(1)

df.loc[(df["Genres"]=="Drama,Sci-Fi")].sort_values(by="Rank",ascending=False)

##### 35) Anime Selection & Ranking (2017-2019)

35
Anime Selection & Ranking (2017-2019)

Select the anime that were released between 2017 and 2019 (inclusive), then sort it based on Score in descending order and store the resulting dataframe in anime_2017_2019_df.

df.columns

anime_2017_2019_df = df.loc[(df['Release_Year'].between(2017,2019))].sort_values(by="Score",ascending=False)

anime_2017_2019_df

##### 36) Select and sort the dataframe based on the given conditions, then store it in `last_activity_df`

36
Select and sort the dataframe based on the given conditions, then store it in last_activity_df

Select anime that belong to genre Action,Comedy, having a popularity score above 50, consisting of fewer than 100 episodes, and released between 2015 and 2019 (inclusive). Sort them based on episode length in ascending order.

df.columns

last_activity_df = df.loc[(df['Genres']=="Action,Comedy") &
                          (df['Popularity'] > 50) & (df['Episodes']<100) & 
                          (df['Release_Year'].between(2015,2019))]

last_activity_df 

last_activity_df = df.loc[(df['Genres']=="Action,Comedy") & (df['Popularity'] > 50) & (df['Episodes'] < 100) & ((df['Release_Year']>=2015) & (df['Release_Year']<=2019))].sort_values(by='Episode_Length(In min)')      
last_activity_df
