## Data Analysys Project: Analysis of Isekai Sub-Genre Growth (2000 - 2024)

### 1. Project Overview:
**Objective:**
    
- Analyse the growth of the Isekai anime sub-genre between 2000 and 2024 to determine whether its expansion is considerable and statistically abnormal compared to the wider anime industry.

**Key questions:**

- Has the Isekai sub-genre had a *considerable* growth during the period analysed?
- Does the Isekai sub-genre represent an abnormal amount of animes?
- Reasons for the Growth and Abnormality (assuming the first two questions yield a positive outcome)

**Note:** See the `README.md` for more information related to these questions.

### 2. Dataset Description:
**Data source:** Animes [1962-2024] by Youxise, licensed under *Creative Commons Attribution 4.0 International (CC BY 4.0)*. 

**Source:** [Kaggle](https://www.kaggle.com/datasets/youcmoulai/animes)

**License:** [creative commons](https://creativecommons.org/licenses/by/4.0/)

### 3. Imports and Enviroment Setup

In [1]:
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option("display.max_rows", None)

### 4. Data Loading
*Only the relevant columns were loaded to reduce memory usage and focus the analysis on variables directly related to the research questions.*

In [4]:
path = "./csv_files/Animes.csv"

df = pd.read_csv(path, usecols=["Title", "Release", "Theme"])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13551 entries, 0 to 13550
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Title    13551 non-null  object
 1   Release  13551 non-null  object
 2   Theme    8946 non-null   object
dtypes: object(3)
memory usage: 317.7+ KB


### 5. Data Cleaning & Filtering 
*Since this analysis _**focuses on sub-genres**_, we begin by removing all rows without an assigned theme to avoid discrepancies between the total number of anime entries and the total number of sub-genres*.

In [6]:
df = df.dropna(subset=["Theme"])
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8946 entries, 1 to 13550
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Title    8946 non-null   object
 1   Release  8946 non-null   object
 2   Theme    8946 non-null   object
dtypes: object(3)
memory usage: 279.6+ KB


*Since this analysis covers the period from 2000 to 2024, entries before 2000 are not relevant and will be removed.*

In [20]:
df["Release"] = pd.to_datetime(df["Release"], format="mixed")
df = df[df["Release"].dt.year > 1999]
df.shape

(7308, 3)

*To compare the growth of anime overall with that of the isekai sub-genre in particular, we will use two separate data frames from this point onwards. The data frame `df` contains all **anime** released during the analysed period, while `isekai_df` includes only **isekai** titles from the same timeframe.*

In [24]:
isekai_df = df[df["Theme"].str.contains("isekai", case=False, na=False)]
isekai_df.shape

(296, 3)

### 6. Analysis