# Dashboard prototype charts

This notebook is to get some insights about _Anime_ dataset scrapped from internet, and identify the most relevant information to display it in a future dashboard.

First of all, we import required libraries.

In [1]:
import os
import re
import pandas as pd
import matplotlib.pyplot as plt

from ast import literal_eval

# Relative paths
__file__ = 'dash_proto.ipynb'
CURRENT = os.path.dirname(os.path.abspath(__file__))
ROOT = os.path.dirname(CURRENT)

An initial inspection of the dataset

In [2]:
DATA_PATH = os.path.join(ROOT, 'data/clean/anime_data_clean.csv')

anime = pd.read_csv(DATA_PATH)
anime.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 65 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   ranking           100 non-null    int64  
 1   score             100 non-null    float64
 2   title             100 non-null    object 
 3   emission_date     100 non-null    object 
 4   url               100 non-null    object 
 5   studio            100 non-null    object 
 6   themes            100 non-null    object 
 7   genres            100 non-null    object 
 8   demographics      60 non-null     object 
 9   emission_type     100 non-null    object 
 10  number_episode    99 non-null     float64
 11  members           100 non-null    int64  
 12  first_emission    98 non-null     object 
 13  last_emission     97 non-null     object 
 14  Adult Cast        100 non-null    int64  
 15  Anthropomorphic   100 non-null    int64  
 16  CGDCT             100 non-null    int64  
 17

Global string formating functions

In [3]:
def global_format(x):
    x = re.sub(pattern=r"[°.-]+", repl='_', string=x)
    x = re.sub(pattern="'s", repl='', string=x)
    return x

Encoded genres start after column 13. We can filter previous columns and apply some convenient transformations for catalog building and further analysis.

In [4]:
# Dataset column names format
anime.columns = ['_'.join(col_.split(sep=' ')).lower() for col_ in anime.columns]

# Convenient dataset content tranformations
test = (
    anime
    # Ignore encoded columns
    .filter(items=anime.columns[0:13])
    # Transformations in existing columns
    .assign(
        # studio column in snake case
        studio = lambda df_: (
            df_
            .studio
            # Replace whitespace with '_' in column
            .apply( lambda val_: '_'.join(val_.split(sep=' ')).lower() )
            # Remove abnormal characters in column
            .apply(global_format)
        )
        # themes column in snake case
        ,themes = lambda df_: (
            df_.
            themes
            .apply(
                # Additional lambda required to handle list like data in cell
                lambda val_: [
                    # Remove abnormal characters in column
                    global_format(
                        # Replace whitespace with '_' in element
                        '_'.join(k.split(' ')).lower()
                    ) for k in literal_eval(val_) ]
            )
        )
        # genres column in snake case
        ,genres = lambda df_: (
            df_.genres.apply(
                # Additional lambda required to handle list like data in cell
                lambda val_: [
                    # Remove abnormal characters in column
                    global_format(
                        # Replace whitespace with '_' in element
                        '_'.join(k.split(' ')).lower()
                    ) for k in literal_eval(val_) ]
            )
        )
    )
)

test.head()

Unnamed: 0,ranking,score,title,emission_date,url,studio,themes,genres,demographics,emission_type,number_episode,members,first_emission
0,1,9.13,Shingeki no Kyojin: The Final Season - Kankets...,Mar 2023 - 2023,https://myanimelist.net/anime/51535/Shingeki_n...,mappa,"[gore, military, survival]","[action, drama, suspense]",Shounen,Special,2.0,372149,2023-03-01
1,2,9.11,Fullmetal Alchemist: Brotherhood,Apr 2009 - Jul 2010,https://myanimelist.net/anime/5114/Fullmetal_A...,bones,[military],"[action, adventure, drama, fantasy]",Shounen,TV,64.0,3120633,2009-04-01
2,3,9.08,Bleach: Sennen Kessen-hen,Oct 2022 - Dec 2022,https://myanimelist.net/anime/41467/Bleach__Se...,pierrot,[not_available],"[action, adventure, fantasy]",Shounen,TV,13.0,411507,2022-10-01
3,4,9.08,Steins;Gate,Apr 2011 - Sep 2011,https://myanimelist.net/anime/9253/Steins_Gate,white_fox,"[psychological, time_travel]","[drama, sci_fi, suspense]",,TV,24.0,2401772,2011-04-01
4,5,9.07,Gintama°,Apr 2015 - Mar 2016,https://myanimelist.net/anime/28977/Gintama°,bandai_namco_pictures,"[gag_humor, historical, parody, samurai]","[action, comedy, sci_fi]",Shounen,TV,51.0,584278,2015-04-01
