# 📌 Anime Dataset EDA Notebook
# ________________________________________________________________

### About me 
Name : Wasiq Ali

Github link click [here](https://www.github.com/wasiqali275)

facbook link click [here](https://www.facebook.com/profile.php?id=100092751110055)

### 👋 Assalam-o-Alaikum! In this notebook we can use **python==3.12** version. 

##### In this notebook we can discuss about Kaggle  anime dataset 
##### And Dataset link is [here](https://www.kaggle.com/datasets/wasiqaliyasir/anime-dataset)

###### In this notebook we will learn **Data visualization, EDA and analysis** using python

#### ✅ Step 1: Install & Import Libraries
#### Load the dataset directly online using kagglehub

In [None]:
from IPython.display import display, HTML

def set_background(color: str):
    script = f"""
    var cell = this.closest('.cell, .jp-CodeCell');
    var editor = cell.querySelector('.input_area, .jp-Editor');
    editor.style.background = '{color}';
    this.parentNode.removeChild(this)
    """
    display(HTML('<img src onerror="%s" style="display:none">' % script))

# Example usage:
set_background('lightgreen')


# pip install kagglehub[pandas-datasets]
import kagglehub
from kagglehub import KaggleDatasetAdapter
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


### ✅ Step 2: Load Dataset Online

In [None]:
file_path = "anime.csv"  # Replace with exact CSV filename inside dataset if needed

# Dataset load karte hain directly KaggleHub se
df = kagglehub.load_dataset(
    KaggleDatasetAdapter.PANDAS,
    "wasiqaliyasir/anime-dataset",
    file_path,
)

In [None]:
# check first 5 records.
print("\033[95mFirst 5 records:\033[0m")  # Purple color
print(df.head())

#### ✅ Step 3: Dataset Overview

In [None]:

print("\033[94mDataset Shape:\033[0m", df.shape)  # Blue color
print("\033[94mMissing Values:\033[0m")
print(df.isnull().sum())

#### ✅ Step 4: Basic EDA & Plots


In [None]:
# ✅ Step 4: Basic EDA & Plots
# Dataset ka summary stats
df.describe()

In [None]:
# Anime type distribution plot
plt.figure(figsize=(8,5))
sns.countplot(y='type', data=df, palette='cool')
plt.title('Anime Type Distribution')
plt.show()

In [None]:
# 📊 Type of Anime Count
plt.figure(figsize=(8,5))
sns.countplot(y='type', data=df, order=df['type'].value_counts().index, palette='pastel')
plt.title('Number of Anime by Type')
plt.xlabel('Count')
plt.ylabel('Type')
plt.show()


#### ✅ Step 5: 🎨Data visualization

In [None]:
# ⭐ Rating Distribution
plt.figure(figsize=(10,6))
sns.histplot(df['rating'], bins=20, kde=True, color='skyblue')
plt.title('Rating Distribution of Anime')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

In [None]:

# Episodes vs rating scatter plot
plt.figure(figsize=(7,5))
sns.scatterplot(x='episodes', y='rating', data=df, color='blue', alpha=0.6)
plt.title('Episodes rating')
plt.show()


In [None]:
# 📈 Scatter Plot: Members vs Rating
plt.figure(figsize=(10,6))
sns.scatterplot(data=df, x='members', y='rating', hue='type', alpha=0.7)
plt.title('Members vs Rating by Type')
plt.xlabel('Number of Members')
plt.ylabel('Rating')
plt.show()

In [None]:
# ⭐ Rating Distribution
plt.figure(figsize=(10,6))
sns.histplot(df['rating'], bins=20, kde=True, color='orange')
plt.title('Rating Distribution of Anime')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()

#### ✅ Step 6: Top 10 Anime by Score

In [None]:
top10 = df.sort_values(by='rating', ascending=False).head(10)
print("\033[95mTop 10 Anime by Score:\033[0m")
print(top10[['name', 'rating']])


In [None]:
# 🌟 Top Rated Anime (with more than 1000 members)
top_rated = df[df['members']>1000].sort_values('rating', ascending=False).head(10)
plt.figure(figsize=(12,6))
sns.barplot(x='rating', y='name', data=top_rated, palette='viridis')
plt.title('Top 10 Highest Rated Anime (with >1000 members)')
plt.xlabel('Rating')
plt.ylabel('Anime Name')
plt.show()


#### ✅ Step 7: Conclusion

In [None]:
print("\033[94mEDA Complete - Dataset insights generated!\033[0m")


#### ✅ Step 8: Simple Machine Learning Model


In [None]:
df = df.dropna(subset=['Episodes', 'Score'])
X = df[['Episodes']]
y = df['Score']


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
rmse = mean_squared_error(y_test, predictions, squared=False)
print("\033[94mRMSE of Linear Regression:\033[0m", rmse)



#### ✅ Step 9: Data Visualization - Predicted vs Actual


In [None]:
plt.figure(figsize=(7,5))
plt.scatter(y_test, predictions, color='blue')
plt.xlabel('Actual Score')
plt.ylabel('Predicted Score')
plt.title('Actual vs Predicted Score')
plt.show()


### _____________________________________________________________

#### 📊 Anime Dataset - Exploratory Data Analysis
#### **Dataset:** Kaggle Anime Dataset by wasiqaliyasir

#### ✅ Notebook Goals:
#### - Load dataset online (KaggleHub)
#### - Perform EDA (summary, missing values)
#### - Visualize distributions & relationships
#### - Share top insights & summary

#### 📌 Key Features:
#### - English mix explanation
#### - Purple & blue color print statements for attractive output
#### - Detailed code with comments
#### - Size under 1MB

#### ✅ End of Notebook - Thank you for exploring!


<p style="text-align:center;"><span style="font-size:80px;"><span style="color:navy"><span style="font-family:cursive;"><span style="font-weight:1000">Bye 👋</span></span></span></span></p>