# 🎯 Capstone Project: Feature Extraction with Pandas

## 📌 Aim

In this project, we will perform feature extraction on an Anime dataset using **Pandas**.  
The main objectives are:

1. ➕ Create a new column for **Episode Count**  
2. ⏱️ Create a new column for **Timestamp**  
3. ⭐ Identify **the anime with the highest score**  
4. 🏆 List the **Top 5 highest scoring anime**  
5. 🎬 Find **the anime with the highest episode count**  
6. 📺 List the **Top 5 anime by episode count**  
7. 🕰️ Determine **the longest running anime**


#**Loding_Data_Set**

In [12]:
import pandas as pd
import numpy as np

In [13]:
df = pd.read_csv(r'/content/drive/MyDrive/anime.csv')

In [14]:
df.head()

Unnamed: 0,Rank,Title,Score
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05


## 1️⃣ ➕ Create a new column for **Episode Count**


In [15]:
# Lets See Only Tittle
df.loc[2]['Title']

'Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 - Dec 2022474,138 members'

In [16]:
#Extract no. of episodes From Title
def extract_episodes(txt):
    check = False
    data = ""
    for i in txt:
        if i == ")":
            check = False
            return data
        if check == True:
            data = data + i
        if i == '(':
            check = True


In [17]:
df["Episodes"] = df["Title"].apply(extract_episodes)

In [18]:
df

Unnamed: 0,Rank,Title,Score,Episodes
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64 eps
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24 eps
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13 eps
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51 eps
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10 eps
5,6,"Gintama'TV (51 eps)Apr 2011 - Mar 2012534,105 ...",9.04,51 eps
6,7,Gintama: The FinalMovie (1 eps)Jan 2021 - Jan ...,9.04,1 eps
7,8,Hunter x Hunter TV (148 eps)Oct 2011 - Sep 201...,9.04,148 eps
8,9,Kaguya-sama wa Kokurasetai: Ultra RomanticTV (...,9.04,13 eps
9,10,Gintama': EnchousenTV (13 eps)Oct 2012 - Mar 2...,9.03,13 eps


In [19]:
#remove "eps"
df['Episodes']= df['Episodes'].str.replace(" eps","")

In [20]:
#convert the datatype to do aggrations
df['Episodes'] = df['Episodes'].astype(int)


In [21]:
df.head()

Unnamed: 0,Rank,Title,Score,Episodes
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10


## 2️⃣ ⏱️ Create a new column for **Timestamp**


In [22]:
#Again Extraction from Tittle
def extraction_time(txt):
    check = False
    data = ""
    for i in range(len(txt)):
        if txt[i] == ')':
            for j in range(i+1,i+20):
                data += txt[j]

            return data

In [23]:
df['Total Time'] = df['Title'].apply(extraction_time)

In [24]:
df.head()

Unnamed: 0,Rank,Title,Score,Episodes,Total Time
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64,Apr 2009 - Jul 2010
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24,Apr 2011 - Sep 2011
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13,Oct 2022 - Dec 2022
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51,Apr 2015 - Mar 2016
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10,Apr 2019 - Jul 2019


In [25]:
#Thora Gpt Ki Bhi Help Lelo Provided You Know all the Logic
from dateutil.relativedelta import relativedelta
from datetime import datetime

def calculate_total_months(period):
    try:
        start_str, end_str = period.split(' - ')
        start_date = datetime.strptime(start_str, '%b %Y')
        end_date = datetime.strptime(end_str, '%b %Y')
        r = relativedelta(end_date, start_date)
        return r.years * 12 + r.months + 1  # +1 to include the starting month
    except:
        return None

df['Months'] = df['Total Time'].apply(calculate_total_months)

In [26]:
df.head()

Unnamed: 0,Rank,Title,Score,Episodes,Total Time,Months
0,1,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...,9.1,64,Apr 2009 - Jul 2010,16
1,2,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473...",9.07,24,Apr 2011 - Sep 2011,6
2,3,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...,9.06,13,Oct 2022 - Dec 2022,3
3,4,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ...",9.06,51,Apr 2015 - Mar 2016,12
4,5,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...,9.05,10,Apr 2019 - Jul 2019,4


## 3️⃣ ⭐ Which anime has the **highest score**?


In [27]:
df[df['Score'] == df['Score'].max()]['Title']

Unnamed: 0,Title
0,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...


## 4️⃣ 🏆 Top 5 **highest scoring anime**


In [28]:
df['Title'].head()

Unnamed: 0,Title
0,Fullmetal Alchemist: BrotherhoodTV (64 eps)Apr...
1,"Steins;GateTV (24 eps)Apr 2011 - Sep 20112,473..."
2,Bleach: Sennen Kessen-henTV (13 eps)Oct 2022 -...
3,"Gintama°TV (51 eps)Apr 2015 - Mar 2016605,113 ..."
4,Shingeki no Kyojin Season 3 Part 2TV (10 eps)A...


## 5️⃣ 🎬 Which anime has the **highest episode count**?


In [29]:
df[df['Episodes'] == df['Episodes'].max()]

Unnamed: 0,Rank,Title,Score,Episodes,Total Time,Months
15,16,"GintamaTV (201 eps)Apr 2006 - Mar 20101,034,41...",8.94,201,Apr 2006 - Mar 2010,48


## 7️⃣ 🕰️ Which is the **longest running anime**?


In [31]:
# Find the longest running anime based on the months column
df= df.loc[df['Months'].idxmax()]
print("The longest running anime is:")
display(longest_anime)

The longest running anime is:


Unnamed: 0,11
Rank,12
Title,Ginga Eiyuu DensetsuOVA (110 eps)Jan 1988 - Ma...
Score,9.02
Episodes,110
Total Time,Jan 1988 - Mar 1997
Months,111
