# Readme

Before running, import into cluster FileStore/tables. All files should be `gzip`ed already.

* All of [IMDB Non-Commerical Dataset](https://datasets.imdbws.com/) in .gz format
    * `title.akas.tsv.gz`
    * `title.ratings.tsv.gz`
    * `title.principals.tsv.gz`
    * `title.episode.tsv.gz`
    * `title.crew.tsv.gz`
    * `title.basics.tsv.gz`
    * `name.basics.tsv.gz`
* All files from the [Kaggle Anime Dataset](https://www.kaggle.com/datasets/dbdmobile/myanimelist-dataset)
    * `anime_dataset.csv.gz`
        > This dataset contains comprehensive details of 24,905 anime entries.
    * `anime_filtered.csv.gz`
        > This dataset provide information about the different attributes and characteristics of each anime (Based on 2020 data).
    * `final_animedataset.csv.gz`
        * Note, this file needs to be compressed with gzip and uploaded by itself
        > This dataset contains user ratings and information about various anime titles. It is curated for building an anime recommendation system(Based on 2018 data).
    * `user_filtered.csv.gz`
        > This dataset contains the user's ratings for every anime they watched and rated(Based on 2020 data).
    * `users_details_2023.csv.gz`
        > This dataset comprises information on 731,290 users registered on the MyAnimeList platform. It is worth noting that while a significant portion of these users are genuine anime enthusiasts, there may be instances of bots, inactive accounts, and alternate profiles present within the dataset.
    * `users_score_2023.csv.gz`
        > This dataset comprises anime scores provided by 270,033 users, resulting in a total of 24,325,191 rows or samples.
* Justin Huang's Anime IMDB scrape
    * `anime_omdb_data.csv.gz`
        > This dataset is an extraction of the top 800ish animes using IMDB filters, which is superior due to additional annotations that don't exist in the non-commerical IMDB dataset.

# Import data

In [0]:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from pyspark.sql.functions import col
from pyspark.sql import DataFrame

In [0]:
files = dbutils.fs.ls("dbfs:/FileStore/tables/")
for file in files:
    print(file.path)

dbfs:/FileStore/tables/anime_dataset_2023_csv.gz
dbfs:/FileStore/tables/anime_filtered_csv.gz
dbfs:/FileStore/tables/anime_imdb_data_csv.gz
dbfs:/FileStore/tables/final_animedataset_csv.gz
dbfs:/FileStore/tables/imdb_scraped_datav2_csv.gz
dbfs:/FileStore/tables/name_basics_tsv.gz
dbfs:/FileStore/tables/title_akas.tsv
dbfs:/FileStore/tables/title_akas_tsv.gz
dbfs:/FileStore/tables/title_basics_tsv.gz
dbfs:/FileStore/tables/title_crew_tsv.gz
dbfs:/FileStore/tables/title_episode_tsv.gz
dbfs:/FileStore/tables/title_principals_tsv.gz
dbfs:/FileStore/tables/title_ratings_tsv.gz
dbfs:/FileStore/tables/user_filtered_csv.gz
dbfs:/FileStore/tables/users_details_2023_csv.gz
dbfs:/FileStore/tables/users_score_2023_csv.gz


In [0]:
def add_prefix_to_columns(df: DataFrame, prefix: str) -> DataFrame:
    return df.select([col(c).alias(prefix + "_" + c) for c in df.columns])

## Import Kaggle

In [0]:
anime_dataset = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .option("multiLine", "true") \
    .csv("dbfs:/FileStore/tables/anime_dataset_2023_csv.gz")

prefix = "anime"
anime_dataset = add_prefix_to_columns(anime_dataset, prefix)

anime_filtered = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .option("multiLine", "true") \
    .csv("dbfs:/FileStore/tables/anime_filtered_csv.gz")

prefix = "anime_filtered"
anime_filtered = add_prefix_to_columns(anime_filtered, prefix)

final_animedataset = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .option("multiLine", "true") \
    .csv("dbfs:/FileStore/tables/final_animedataset_csv.gz")

prefix = "final_animedataset"
final_animedataset = add_prefix_to_columns(final_animedataset, prefix)

user_filtered = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .option("multiLine", "true") \
    .csv("dbfs:/FileStore/tables/user_filtered_csv.gz")

prefix = "user_filtered"
user_filtered = add_prefix_to_columns(user_filtered, prefix)

users_details_2023 = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .option("multiLine", "true") \
    .csv("dbfs:/FileStore/tables/users_details_2023_csv.gz")

prefix = "users_details_2023"
users_details_2023 = add_prefix_to_columns(users_details_2023, prefix)

users_score_2023 = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .option("multiLine", "true") \
    .csv("dbfs:/FileStore/tables/users_score_2023_csv.gz")

prefix = "users_score_2023"
users_score_2023 = add_prefix_to_columns(users_score_2023, prefix)

# anime_dataset.csv.gz
# anime_filtered.csv.gz
# final_animedataset.csv.gz
# user_filtered.csv.gz
# users_details_2023.csv.gz
# users_score_2023.csv.gz

## Import IMDB Non-Commerical

In [0]:
imdb_title = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/title_akas_tsv.gz")

prefix = "imdb_title"
imdb_title = add_prefix_to_columns(imdb_title, prefix)

imdb_ratings = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/title_ratings_tsv.gz")

prefix = "imdb_ratings"
imdb_ratings = add_prefix_to_columns(imdb_ratings, prefix)

imdb_principals = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/title_principals_tsv.gz")

prefix = "imdb_principals"
imdb_principals = add_prefix_to_columns(imdb_principals, prefix)

imdb_episode = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/title_episode_tsv.gz")

prefix = "imdb_episode"
imdb_episode = add_prefix_to_columns(imdb_episode, prefix)

imdb_crew = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/title_crew_tsv.gz")

prefix = "imdb_crew"
imdb_crew = add_prefix_to_columns(imdb_crew, prefix)

imdb_title_basics = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/title_basics_tsv.gz")

prefix = "imdb_title_basics"
imdb_title_basics = add_prefix_to_columns(imdb_title_basics, prefix)

imdb_name_basics = spark.read.option("header", "true").option("delimiter", "\t").option("inferSchema", "true").csv("dbfs:/FileStore/tables/name_basics_tsv.gz")

prefix = "imdb_name_basics"
imdb_name_basics = add_prefix_to_columns(imdb_name_basics, prefix)



## Import IMDB Scrape from Justin Huang

In [0]:
imdb_scrape = spark.read.option("header", "true") \
    .option("inferSchema", "true") \
    .csv("dbfs:/FileStore/tables/imdb_scraped_datav2_csv.gz")

prefix = "imdb_scrape"
imdb_scrape = add_prefix_to_columns(imdb_scrape, prefix)

# Inspect data

## IMDB Data

In [0]:
imdb_title.printSchema()
x = imdb_title.count()
print(x)
imdb_title.show(n=5)

root
 |-- imdb_title_titleId: string (nullable = true)
 |-- imdb_title_ordering: integer (nullable = true)
 |-- imdb_title_title: string (nullable = true)
 |-- imdb_title_region: string (nullable = true)
 |-- imdb_title_language: string (nullable = true)
 |-- imdb_title_types: string (nullable = true)
 |-- imdb_title_attributes: string (nullable = true)
 |-- imdb_title_isOriginalTitle: string (nullable = true)

38853125
+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|imdb_title_titleId|imdb_title_ordering|    imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|
+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|         tt0000001|                  1|          Карменсіта|               UA

In [0]:
imdb_title.filter(F.lower(F.col("title")) == 'cowboy bebop').show()

+------------------+-------------------+----------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|imdb_title_titleId|imdb_title_ordering|imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|
+------------------+-------------------+----------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|         tt0213338|                 10|    Cowboy Bebop|               AU|                 \N|     imdbDisplay|                   \N|                         0|
|         tt0213338|                 12|    Cowboy Bebop|               IL|                 en|     imdbDisplay|                   \N|                         0|
|         tt0213338|                 14|    Cowboy Bebop|               FI|                 \N|     imdbDisplay|                   \N|                         0|
|         tt0213338|        

In [0]:
imdb_title.filter(F.col("titleId") == "tt0213338").show(n=100)

+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|imdb_title_titleId|imdb_title_ordering|    imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|
+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|         tt0213338|                 10|        Cowboy Bebop|               AU|                 \N|     imdbDisplay|                   \N|                         0|
|         tt0213338|                 11|      Kaubôi bibappu|               AE|                 \N|     imdbDisplay|                   \N|                         0|
|         tt0213338|                 12|        Cowboy Bebop|               IL|                 en|     imdbDisplay|                   \N|                         0|
|   

In [0]:
imdb_title.filter((F.col("titleId") == "tt0213338") & (F.col("region") == "US")).show()

+------------------+-------------------+----------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|imdb_title_titleId|imdb_title_ordering|imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|
+------------------+-------------------+----------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|         tt0213338|                 21|    Cowboy Bebop|               US|                 \N|     imdbDisplay|                   \N|                         0|
+------------------+-------------------+----------------+-----------------+-------------------+----------------+---------------------+--------------------------+



In [0]:
imdb_title.filter((F.col("titleId") == "tt0213338") & (F.col("region") == "JP")).show()

+------------------+-------------------+------------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|imdb_title_titleId|imdb_title_ordering|  imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|
+------------------+-------------------+------------------+-----------------+-------------------+----------------+---------------------+--------------------------+
|         tt0213338|                 18|カウボーイビバップ|               JP|                 ja|     imdbDisplay|                   \N|                         0|
|         tt0213338|                 51|      Cowboy Bebop|               JP|                 en|              \N| literal English t...|                         0|
+------------------+-------------------+------------------+-----------------+-------------------+----------------+---------------------+--------------------------+



In [0]:
imdb_title_basics.filter(F.col("tconst") == "tt0213338").show(truncate=False)

+------------------------+---------------------------+------------------------------+-------------------------------+-------------------------+---------------------------+-------------------------+--------------------------------+--------------------------+
|imdb_title_basics_tconst|imdb_title_basics_titleType|imdb_title_basics_primaryTitle|imdb_title_basics_originalTitle|imdb_title_basics_isAdult|imdb_title_basics_startYear|imdb_title_basics_endYear|imdb_title_basics_runtimeMinutes|imdb_title_basics_genres  |
+------------------------+---------------------------+------------------------------+-------------------------------+-------------------------+---------------------------+-------------------------+--------------------------------+--------------------------+
|tt0213338               |tvSeries                   |Cowboy Bebop                  |Kaubôi bibappu: Cowboy Bebop   |0                        |1998                       |1999                     |650                          

In [0]:
imdb_ratings.filter(F.col("tconst") == "tt0213338").show(truncate=False)

+-------------------+--------------------------+---------------------+
|imdb_ratings_tconst|imdb_ratings_averageRating|imdb_ratings_numVotes|
+-------------------+--------------------------+---------------------+
|tt0213338          |8.9                       |136589               |
+-------------------+--------------------------+---------------------+



In [0]:
imdb_episode.filter(F.col("parentTconst") == "tt0213338").show(truncate=False)

+-------------------+-------------------------+-------------------------+--------------------------+
|imdb_episode_tconst|imdb_episode_parentTconst|imdb_episode_seasonNumber|imdb_episode_episodeNumber|
+-------------------+-------------------------+-------------------------+--------------------------+
|tt0618963          |tt0213338                |1                        |1                         |
|tt0618964          |tt0213338                |1                        |5                         |
|tt0618965          |tt0213338                |1                        |23                        |
|tt0618966          |tt0213338                |1                        |22                        |
|tt0618967          |tt0213338                |1                        |10                        |
|tt0618968          |tt0213338                |1                        |4                         |
|tt0618969          |tt0213338                |1                        |24                

In [0]:
imdb_name_basics.printSchema()
x = imdb_name_basics.count()
print(x)
imdb_name_basics.show(n=5)

root
 |-- imdb_name_basics_nconst: string (nullable = true)
 |-- imdb_name_basics_primaryName: string (nullable = true)
 |-- imdb_name_basics_birthYear: string (nullable = true)
 |-- imdb_name_basics_deathYear: string (nullable = true)
 |-- imdb_name_basics_primaryProfession: string (nullable = true)
 |-- imdb_name_basics_knownForTitles: string (nullable = true)

13290477
+-----------------------+----------------------------+--------------------------+--------------------------+----------------------------------+-------------------------------+
|imdb_name_basics_nconst|imdb_name_basics_primaryName|imdb_name_basics_birthYear|imdb_name_basics_deathYear|imdb_name_basics_primaryProfession|imdb_name_basics_knownForTitles|
+-----------------------+----------------------------+--------------------------+--------------------------+----------------------------------+-------------------------------+
|              nm0000001|                Fred Astaire|                      1899|                

In [0]:
imdb_crew.printSchema()
x = imdb_crew.count()
print(x)
imdb_crew.show(n=5)

root
 |-- imdb_crew_tconst: string (nullable = true)
 |-- imdb_crew_directors: string (nullable = true)
 |-- imdb_crew_writers: string (nullable = true)

10581319
+----------------+-------------------+-----------------+
|imdb_crew_tconst|imdb_crew_directors|imdb_crew_writers|
+----------------+-------------------+-----------------+
|       tt0000001|          nm0005690|               \N|
|       tt0000002|          nm0721526|               \N|
|       tt0000003|          nm0721526|               \N|
|       tt0000004|          nm0721526|               \N|
|       tt0000005|          nm0005690|               \N|
+----------------+-------------------+-----------------+
only showing top 5 rows



In [0]:
imdb_episode.printSchema()
x = imdb_episode.count()
print(x)
imdb_episode.show(n=5)

root
 |-- imdb_episode_tconst: string (nullable = true)
 |-- imdb_episode_parentTconst: string (nullable = true)
 |-- imdb_episode_seasonNumber: string (nullable = true)
 |-- imdb_episode_episodeNumber: string (nullable = true)

8091190
+-------------------+-------------------------+-------------------------+--------------------------+
|imdb_episode_tconst|imdb_episode_parentTconst|imdb_episode_seasonNumber|imdb_episode_episodeNumber|
+-------------------+-------------------------+-------------------------+--------------------------+
|          tt0041951|                tt0041038|                        1|                         9|
|          tt0042816|                tt0989125|                        1|                        17|
|          tt0042889|                tt0989125|                       \N|                        \N|
|          tt0043426|                tt0040051|                        3|                        42|
|          tt0043631|                tt0989125|         

In [0]:
imdb_principals.printSchema()
x = imdb_principals.count()
print(x)
imdb_principals.show(n=5)

root
 |-- imdb_principals_tconst: string (nullable = true)
 |-- imdb_principals_ordering: integer (nullable = true)
 |-- imdb_principals_nconst: string (nullable = true)
 |-- imdb_principals_category: string (nullable = true)
 |-- imdb_principals_job: string (nullable = true)
 |-- imdb_principals_characters: string (nullable = true)

60643693
+----------------------+------------------------+----------------------+------------------------+--------------------+--------------------------+
|imdb_principals_tconst|imdb_principals_ordering|imdb_principals_nconst|imdb_principals_category| imdb_principals_job|imdb_principals_characters|
+----------------------+------------------------+----------------------+------------------------+--------------------+--------------------------+
|             tt0000001|                       1|             nm1588970|                    self|                  \N|                  ["Self"]|
|             tt0000001|                       2|             nm0005690

## Anime dataset

In [0]:
x = anime_dataset.count()
print(x)
anime_dataset.printSchema()
anime_dataset.show(n=5, truncate=False)
anime_dataset.filter(F.col('anime_id') == '1').collect()

28829
root
 |-- anime_anime_id: string (nullable = true)
 |-- anime_Name: string (nullable = true)
 |-- anime_English name: string (nullable = true)
 |-- anime_Other name: string (nullable = true)
 |-- anime_Score: string (nullable = true)
 |-- anime_Genres: string (nullable = true)
 |-- anime_Synopsis: string (nullable = true)
 |-- anime_Type: string (nullable = true)
 |-- anime_Episodes: string (nullable = true)
 |-- anime_Aired: string (nullable = true)
 |-- anime_Premiered: string (nullable = true)
 |-- anime_Status: string (nullable = true)
 |-- anime_Producers: string (nullable = true)
 |-- anime_Licensors: string (nullable = true)
 |-- anime_Studios: string (nullable = true)
 |-- anime_Source: string (nullable = true)
 |-- anime_Duration: string (nullable = true)
 |-- anime_Rating: string (nullable = true)
 |-- anime_Rank: string (nullable = true)
 |-- anime_Popularity: string (nullable = true)
 |-- anime_Favorites: string (nullable = true)
 |-- anime_Scored By: string (nullable

In [0]:
anime_filtered.printSchema()
x = anime_filtered.count()
print(x)
anime_filtered.show(n=5)
anime_filtered.filter(F.col('anime_id') == 1).collect()

root
 |-- anime_filtered_anime_id: integer (nullable = true)
 |-- anime_filtered_Name: string (nullable = true)
 |-- anime_filtered_Score: string (nullable = true)
 |-- anime_filtered_Genres: string (nullable = true)
 |-- anime_filtered_English name: string (nullable = true)
 |-- anime_filtered_Japanese name: string (nullable = true)
 |-- anime_filtered_sypnopsis: string (nullable = true)
 |-- anime_filtered_Type: string (nullable = true)
 |-- anime_filtered_Episodes: string (nullable = true)
 |-- anime_filtered_Aired: string (nullable = true)
 |-- anime_filtered_Premiered: string (nullable = true)
 |-- anime_filtered_Producers: string (nullable = true)
 |-- anime_filtered_Licensors: string (nullable = true)
 |-- anime_filtered_Studios: string (nullable = true)
 |-- anime_filtered_Source: string (nullable = true)
 |-- anime_filtered_Duration: string (nullable = true)
 |-- anime_filtered_Rating: string (nullable = true)
 |-- anime_filtered_Ranked: string (nullable = true)
 |-- anime_fil

In [0]:
final_animedataset.printSchema()
x = final_animedataset.count()
print(x)
final_animedataset.show(n=5)
final_animedataset.filter(F.col('anime_id') == 1)

root
 |-- final_animedataset_username: string (nullable = true)
 |-- final_animedataset_anime_id: integer (nullable = true)
 |-- final_animedataset_my_score: integer (nullable = true)
 |-- final_animedataset_user_id: integer (nullable = true)
 |-- final_animedataset_gender: string (nullable = true)
 |-- final_animedataset_title: string (nullable = true)
 |-- final_animedataset_type: string (nullable = true)
 |-- final_animedataset_source: string (nullable = true)
 |-- final_animedataset_score: string (nullable = true)
 |-- final_animedataset_scored_by: double (nullable = true)
 |-- final_animedataset_rank: double (nullable = true)
 |-- final_animedataset_popularity: double (nullable = true)
 |-- final_animedataset_genre: string (nullable = true)

35305695
+---------------------------+---------------------------+---------------------------+--------------------------+-------------------------+------------------------+-----------------------+-------------------------+---------------------

In [0]:
user_filtered.printSchema()
x = user_filtered.count()
print(x)
user_filtered.show(n=5)

root
 |-- user_filtered_user_id: integer (nullable = true)
 |-- user_filtered_anime_id: integer (nullable = true)
 |-- user_filtered_rating: integer (nullable = true)

109224747
+---------------------+----------------------+--------------------+
|user_filtered_user_id|user_filtered_anime_id|user_filtered_rating|
+---------------------+----------------------+--------------------+
|                    0|                    67|                   9|
|                    0|                  6702|                   7|
|                    0|                   242|                  10|
|                    0|                  4898|                   0|
|                    0|                    21|                  10|
+---------------------+----------------------+--------------------+
only showing top 5 rows



In [0]:
users_details_2023.printSchema()
x = users_details_2023.count()
print(x)
users_details_2023.show(n=5)

root
 |-- users_details_2023_Mal ID: integer (nullable = true)
 |-- users_details_2023_Username: string (nullable = true)
 |-- users_details_2023_Gender: string (nullable = true)
 |-- users_details_2023_Birthday: timestamp (nullable = true)
 |-- users_details_2023_Location: string (nullable = true)
 |-- users_details_2023_Joined: string (nullable = true)
 |-- users_details_2023_Days Watched: string (nullable = true)
 |-- users_details_2023_Mean Score: string (nullable = true)
 |-- users_details_2023_Watching: double (nullable = true)
 |-- users_details_2023_Completed: double (nullable = true)
 |-- users_details_2023_On Hold: double (nullable = true)
 |-- users_details_2023_Dropped: double (nullable = true)
 |-- users_details_2023_Plan to Watch: double (nullable = true)
 |-- users_details_2023_Total Entries: double (nullable = true)
 |-- users_details_2023_Rewatched: double (nullable = true)
 |-- users_details_2023_Episodes Watched: double (nullable = true)

731290
+--------------------

In [0]:
users_score_2023.printSchema()
x = users_score_2023.count()
print(x)
users_score_2023.show(n=5)

root
 |-- users_score_2023_user_id: integer (nullable = true)
 |-- users_score_2023_Username: string (nullable = true)
 |-- users_score_2023_anime_id: integer (nullable = true)
 |-- users_score_2023_Anime Title: string (nullable = true)
 |-- users_score_2023_rating: string (nullable = true)

24325191
+------------------------+-------------------------+-------------------------+----------------------------+-----------------------+
|users_score_2023_user_id|users_score_2023_Username|users_score_2023_anime_id|users_score_2023_Anime Title|users_score_2023_rating|
+------------------------+-------------------------+-------------------------+----------------------------+-----------------------+
|                       1|                    Xinil|                       21|                   One Piece|                      9|
|                       1|                    Xinil|                       48|                 .hack//Sign|                      7|
|                       1|            

## IMDB Scrape

In [0]:
imdb_scrape.printSchema()
x = imdb_scrape.count()
print(x)
imdb_scrape.show(n=5)

root
 |-- _c0: string (nullable = true)
 |-- title: string (nullable = true)
 |-- description: string (nullable = true)
 |-- url: string (nullable = true)
 |-- image: string (nullable = true)
 |-- rating_value: string (nullable = true)
 |-- genre: string (nullable = true)
 |-- content_rating: string (nullable = true)
 |-- creator: string (nullable = true)
 |-- main_cast: string (nullable = true)
 |-- keywords: string (nullable = true)
 |-- duration: string (nullable = true)
 |-- soundtrack_trackname: string (nullable = true)
 |-- lyrics: string (nullable = true)
 |-- music: string (nullable = true)
 |-- arrangement: string (nullable = true)
 |-- performed_by: string (nullable = true)
 |-- imdb_id: string (nullable = true)

53786
+---+--------------------+--------------------+--------------------+--------------------+------------+--------------------+--------------+--------------------+--------------------+--------------------+--------+--------------------+------+---------------+-------

# Clean data

In [0]:
# imdb_title_us = imdb_title.filter(F.col("region") == "US")

# # Join the filtered imdb_title DataFrame with the anime_dataset DataFrame
# # Assuming both DataFrames have a column named "title" for the join condition
# joined_df = anime_dataset.join(imdb_title_us, anime_dataset["English name"] == imdb_title_us["title"], how="inner")

# filtered_df = joined_df.filter(F.col('title') == 'Sleeping Beauty')
# row_count = filtered_df.count()
# print(row_count)

## IMDB Scrape data

In [0]:
imdb_scrape.printSchema()
imdb_scrape_row_count = imdb_scrape.count()
print(imdb_scrape_row_count)

root
 |-- imdb_scrape__c0: string (nullable = true)
 |-- imdb_scrape_title: string (nullable = true)
 |-- imdb_scrape_description: string (nullable = true)
 |-- imdb_scrape_url: string (nullable = true)
 |-- imdb_scrape_image: string (nullable = true)
 |-- imdb_scrape_rating_value: string (nullable = true)
 |-- imdb_scrape_genre: string (nullable = true)
 |-- imdb_scrape_content_rating: string (nullable = true)
 |-- imdb_scrape_creator: string (nullable = true)
 |-- imdb_scrape_main_cast: string (nullable = true)
 |-- imdb_scrape_keywords: string (nullable = true)
 |-- imdb_scrape_duration: string (nullable = true)
 |-- imdb_scrape_soundtrack_trackname: string (nullable = true)
 |-- imdb_scrape_lyrics: string (nullable = true)
 |-- imdb_scrape_music: string (nullable = true)
 |-- imdb_scrape_arrangement: string (nullable = true)
 |-- imdb_scrape_performed_by: string (nullable = true)
 |-- imdb_scrape_imdb_id: string (nullable = true)

53786


In [0]:
imdb_scrape_unique = imdb_scrape.select("imdb_scrape_imdb_id", "imdb_scrape_title").distinct()
imdb_scrape_unique_row_count = imdb_scrape_unique.count()
print(imdb_scrape_unique_row_count)


18957


In [0]:
imdb_scrape_music_unique_notnull = imdb_scrape_music_unique.filter(
    col("imdb_scrape_soundtrack_trackname").isNotNull() |
    col("imdb_scrape_lyrics").isNotNull() |
    col("imdb_scrape_music").isNotNull() |
    col("imdb_scrape_arrangement").isNotNull() |
    col("imdb_scrape_performed_by").isNotNull())

imdb_scrape_music_unique_notnull.show(n = 10)

imdb_scrape_music_unique_notnull_row_count = imdb_scrape_music_unique_notnull.count()
print(imdb_scrape_music_unique_notnull_row_count)

+-------------------+--------------------------------+--------------------+--------------------+-----------------------+------------------------+
|imdb_scrape_imdb_id|imdb_scrape_soundtrack_trackname|  imdb_scrape_lyrics|   imdb_scrape_music|imdb_scrape_arrangement|imdb_scrape_performed_by|
+-------------------+--------------------------------+--------------------+--------------------+-----------------------+------------------------+
|         tt29510641|                           Rouge|  Performed by Yu-Ka|                null|                   null|                    null|
|         tt26713948|               Shura ni Otoshite|Performed by Sajo...|Lyrics by Tatsuya...|   Composed by Tatsu...|    Arrangement by Ta...|
|         tt21209876|                           LEveL|     [Opening Theme]|     Hiroyuki Sawano|        Hiroyuki Sawano|    Lyrics by Hiroyuk...|
|         tt22248376|                          Yuusha|              (Hero)|     [Opening Theme]|         Music by Ayase|    

## V1 using IMDB US Names

imdb_title_us = imdb_title.filter(F.col("region") == "US")

v1_joined_df1 = imdb_title_us.join(imdb_scrape, imdb_title_us["titleId"] == imdb_scrape["imdb_id"], how="inner")\
    .drop(imdb_scrape["title"])

# val joinedDF = df1.alias("a").join(df2.alias("b"), $"a.id" === $"b.id")
# .select($"a.title".as("title_a"), $"b.title".as("title_b"), $"a.id")

v1_joined_df1.show(n=5)

v1_joined_df1_row_count = v1_joined_df1.count()
print(v1_joined_df1_row_count)
v1_joined_df1.printSchema()

v1_joined_df2 = anime_dataset.join(v1_joined_df1, anime_dataset["English name"] == v1_joined_df1["title"], how="inner")
v1_joined_df2.show(n=5)

v1_joined_df2_row_count = v1_joined_df2.count()
print(v1_joined_df2_row_count)
v1_joined_df2.printSchema()

## V2 using IMDB JP Names

imdb_title_jp = imdb_title.filter((F.col("region") == "JP") & (F.col("language") == "ja"))

imdb_title_jp.filter(F.col("titleId") == "tt0094625").show()
imdb_title_jp.filter(F.col("titleId") == "tt0088595").show()

In [0]:
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number

windowSpec = Window.partitionBy("titleId").orderBy("ordering")

# Apply the row_number function within each partition
imdb_title_jp_rownumber = imdb_title_jp.withColumn("row_number", row_number().over(windowSpec))

# Filter to keep only the first row of each partition (lowest ordering)
imdb_title_jp_unique = imdb_title_jp_rownumber.filter(imdb_title_jp_rownumber.row_number == 1).drop("row_number")

# Show the result
imdb_title_jp_unique.filter(F.col("titleId") == "tt0094625").show()
imdb_title_jp_unique.filter(F.col("titleId") == "tt0088595").show()

+---------+--------+------+------+--------+-----------+----------+---------------+
|  titleId|ordering| title|region|language|      types|attributes|isOriginalTitle|
+---------+--------+------+------+--------+-----------+----------+---------------+
|tt0094625|      21|アキラ|    JP|      ja|alternative|        \N|              0|
+---------+--------+------+------+--------+-----------+----------+---------------+

+---------+--------+--------+------+--------+-----+--------------------+---------------+
|  titleId|ordering|   title|region|language|types|          attributes|isOriginalTitle|
+---------+--------+--------+------+--------+-----+--------------------+---------------+
|tt0088595|      16|Robotech|    JP|      ja|   \N|literal English t...|              0|
+---------+--------+--------+------+--------+-----+--------------------+---------------+



v2_joined_df1 = imdb_title_jp_unique.join(imdb_scrape, imdb_title_jp_unique["titleId"] == imdb_scrape["imdb_id"], how="inner")\
    .drop(imdb_scrape["title"])

# val joinedDF = df1.alias("a").join(df2.alias("b"), $"a.id" === $"b.id")
# .select($"a.title".as("title_a"), $"b.title".as("title_b"), $"a.id")

v2_joined_df1.show(n=5)

v2_joined_df1_row_count = v2_joined_df1.count()
print(v2_joined_df1_row_count)
v2_joined_df1.printSchema()

v2_joined_df2 = anime_dataset.join(v2_joined_df1, anime_dataset["Other name"] == v2_joined_df1["title"], how="inner")
v2_joined_df2.show(n=5)

v2_joined_df2_row_count = v2_joined_df2.count()
print(v2_joined_df2_row_count)
v2_joined_df2.printSchema()

## V3 - Iterative Data Matching

First start with higher confidence joins. With items that are not joinable, join on Japanese Language

In [0]:
# imdb_scrape gives is the in-scope animes. We need to join it to all possible names in imdb_title. This is imdb_title_inscope.

# Then we take all the possibilies in anime_dataset and and left join it

v3_imdb_titles = imdb_title.join(imdb_scrape_unique, imdb_title["imdb_title_titleId"] == imdb_scrape["imdb_scrape_imdb_id"], "inner")\
    .drop(imdb_scrape["imdb_scrape_title"])

priorityExpr = F.when(F.col("imdb_title_region") == "US", 1)\
    .when((F.col("imdb_title_region") == "JP") & (F.col("imdb_title_language") == "ja"), 2)\
    .when(F.col("imdb_title_types") == "original", 3)\
    .otherwise(4)

v3_imdb_titles_priority = v3_imdb_titles\
    .withColumn("priority", priorityExpr)\
    .orderBy("priority", "imdb_title_ordering")

v3_imdb_titles_priority.show()

+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+-------------------+--------+
|imdb_title_titleId|imdb_title_ordering|    imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|imdb_scrape_imdb_id|priority|
+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+-------------------+--------+
|         tt0101137|                  1|          Luna Varga|               US|                 \N|     imdbDisplay|                   \N|                         0|          tt0101137|       1|
|         tt0170180|                  1|Lupin III: Dead o...|               US|                 \N|     imdbDisplay|                   \N|                         0|          tt0170180|       1|
|         tt0103179|     

In [0]:
v3_imdb_titles_priority.filter(F.col("imdb_title_titleId") == "tt0213338").show(n=100)

+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+-------------------+--------+
|imdb_title_titleId|imdb_title_ordering|    imdb_title_title|imdb_title_region|imdb_title_language|imdb_title_types|imdb_title_attributes|imdb_title_isOriginalTitle|imdb_scrape_imdb_id|priority|
+------------------+-------------------+--------------------+-----------------+-------------------+----------------+---------------------+--------------------------+-------------------+--------+
|         tt0213338|                 21|        Cowboy Bebop|               US|                 \N|     imdbDisplay|                   \N|                         0|          tt0213338|       1|
|         tt0213338|                 18|  カウボーイビバップ|               JP|                 ja|     imdbDisplay|                   \N|                         0|          tt0213338|       2|
|         tt0213338|              

In [0]:
v3_join1 = anime_dataset.join(v3_imdb_titles_priority, anime_dataset["anime_Name"] == v3_imdb_titles_priority["imdb_title_title"], "inner")
v3_antijoin1 = anime_dataset.join(v3_imdb_titles_priority, anime_dataset["anime_Name"] == v3_imdb_titles_priority["imdb_title_title"], "left_anti")
v3_join2 = v3_antijoin1.join(v3_imdb_titles_priority, v3_antijoin1["anime_Other name"] == v3_imdb_titles_priority["imdb_title_title"], "inner")
v3_antijoin2 = v3_antijoin1.join(v3_imdb_titles_priority, v3_antijoin1["anime_Other name"] == v3_imdb_titles_priority["imdb_title_title"], "left_anti")
v3_join3 = v3_antijoin2.join(v3_imdb_titles_priority, v3_antijoin2["anime_English name"] == v3_imdb_titles_priority["imdb_title_title"], "inner")

v3_imdb_found_in_animedataset = v3_join1.union(v3_join2).union(v3_join3)


In [0]:
x = v3_imdb_found_in_animedataset.count()
print(x)

16940


In [0]:
windowSpec = Window.partitionBy("anime_Name").orderBy("priority")  
v3_imdb_found_in_animedataset_ranked = v3_imdb_found_in_animedataset.withColumn("rank", F.row_number().over(windowSpec))

v3_imdb_found_in_animedataset_ranked_final = v3_imdb_found_in_animedataset_ranked.filter(F.col("rank") == 1).drop("rank")

v3_imdb_found_in_animedataset_ranked_final.show()


+--------------+--------------------+--------------------+--------------------------------------+-------------------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+----------------+---------------+--------------------+--------------------+------------------+-------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+-------------------+--------+
|anime_anime_id|          anime_Name|  anime_English name|                      anime_Other name|                          anime_Score|        anime_Genres|      anime_Synopsis|          anime_Type|      anime_Episodes|         anime_Aired|     anime_Premiered|        ani

In [0]:
x = v3_imdb_found_in_animedataset_ranked_final.count()
print(x)

5256


In [0]:
v3_imdb_found_in_animedataset_ranked_final.filter(F.col("imdb_title_titleId") == "tt0213338").show()

+--------------+------------+------------------+------------------+-----------+--------------------+--------------------+----------+--------------+--------------------+---------------+---------------+---------------+--------------------+-------------+------------+--------------+--------------------+----------+----------------+---------------+---------------+-------------+--------------------+------------------+-------------------+----------------+-----------------+-------------------+----------------+---------------------+--------------------------+-------------------+--------+
|anime_anime_id|  anime_Name|anime_English name|  anime_Other name|anime_Score|        anime_Genres|      anime_Synopsis|anime_Type|anime_Episodes|         anime_Aired|anime_Premiered|   anime_Status|anime_Producers|     anime_Licensors|anime_Studios|anime_Source|anime_Duration|        anime_Rating|anime_Rank|anime_Popularity|anime_Favorites|anime_Scored By|anime_Members|     anime_Image URL|imdb_title_titleId|i

## Pivot wide

Join the matched data to the other datasets (namely, )

# Export

In [0]:
final_df = v3_imdb_found_in_animedataset_ranked_final
final_df.printSchema


Out[137]: <bound method DataFrame.printSchema of DataFrame[anime_anime_id: string, anime_Name: string, anime_English name: string, anime_Other name: string, anime_Score: string, anime_Genres: string, anime_Synopsis: string, anime_Type: string, anime_Episodes: string, anime_Aired: string, anime_Premiered: string, anime_Status: string, anime_Producers: string, anime_Licensors: string, anime_Studios: string, anime_Source: string, anime_Duration: string, anime_Rating: string, anime_Rank: string, anime_Popularity: string, anime_Favorites: string, anime_Scored By: string, anime_Members: string, anime_Image URL: string, imdb_title_titleId: string, imdb_title_ordering: int, imdb_title_title: string, imdb_title_region: string, imdb_title_language: string, imdb_title_types: string, imdb_title_attributes: string, imdb_title_isOriginalTitle: string, imdb_scrape_imdb_id: string, priority: int]>

In [0]:
(final_df\
  .coalesce(1)
  .write\
  .option("header", "true")\
  .csv("dbfs:/FileStore/tables/cse6242_dataset_clean.csv"))

In [0]:
%sh
ls /dbfs/FileStore/tables/cse6242_dataset_clean.csv/

In [0]:
%sh mv /dbfs/FileStore/tables/cse6242_dataset_clean.csv/part-00000-...csv /dbfs/FileStore/tables/cse6242_dataset_clean.csv


In [0]:
files = dbutils.fs.ls("dbfs:/FileStore/tables/")
for file in files:
    print(file.path)

dbfs:/FileStore/tables/anime_dataset_2023_csv.gz
dbfs:/FileStore/tables/anime_filtered_csv.gz
dbfs:/FileStore/tables/anime_imdb_data_csv.gz
dbfs:/FileStore/tables/cse6242_group3_dataset_clean.csv/
dbfs:/FileStore/tables/final_animedataset_csv.gz
dbfs:/FileStore/tables/imdb_scraped_datav2_csv.gz
dbfs:/FileStore/tables/name_basics_tsv.gz
dbfs:/FileStore/tables/title_akas.tsv
dbfs:/FileStore/tables/title_akas_tsv.gz
dbfs:/FileStore/tables/title_basics_tsv.gz
dbfs:/FileStore/tables/title_crew_tsv.gz
dbfs:/FileStore/tables/title_episode_tsv.gz
dbfs:/FileStore/tables/title_principals_tsv.gz
dbfs:/FileStore/tables/title_ratings_tsv.gz
dbfs:/FileStore/tables/user_filtered_csv.gz
dbfs:/FileStore/tables/users_details_2023_csv.gz
dbfs:/FileStore/tables/users_score_2023_csv.gz


In [0]:
display(final_df)
# Download from
# https://community.cloud.databricks.com/files/tables/cse6242_group3_dataset_clean.csv/?o=8337381675954717

anime_anime_id,anime_Name,anime_English name,anime_Other name,anime_Score,anime_Genres,anime_Synopsis,anime_Type,anime_Episodes,anime_Aired,anime_Premiered,anime_Status,anime_Producers,anime_Licensors,anime_Studios,anime_Source,anime_Duration,anime_Rating,anime_Rank,anime_Popularity,anime_Favorites,anime_Scored By,anime_Members,anime_Image URL,imdb_title_titleId,imdb_title_ordering,imdb_title_title,imdb_title_region,imdb_title_language,imdb_title_types,imdb_title_attributes,imdb_title_isOriginalTitle,imdb_scrape_imdb_id,priority
52034,"""""""Oshi no Ko""""""",[Oshi No Ko],【推しの子】,8.98,"Drama, Supernatural","In the entertainment world, celebrities often show exaggerated versions of themselves to the public, concealing their true thoughts and struggles beneath elaborate lies. Fans buy into these fabrications, showering their idols with undying love and support, until something breaks the illusion. Sixteen-year-old rising star Ai Hoshino of pop idol group B Komachi has the world captivated; however, when she announces a hiatus due to health concerns, the news causes many to become worried. As a huge fan of Ai, gynecologist Gorou Amemiya cheers her on from his countryside medical practice, wishing he could meet her in person one day. His wish comes true when Ai shows up at his hospital—not sick, but pregnant with twins! While the doctor promises Ai to safely deliver her children, he wonders if this encounter with the idol will forever change the nature of his relationship with her.",TV,11.0,"Apr 12, 2023 to Jun 28, 2023",spring 2023,Currently Airing,"Shueisha, CyberAgent, Kadokawa",Sentai Filmworks,Doga Kobo,Manga,30 min per ep,PG-13 - Teens 13 or older,14.0,401,18336,181665.0,512617,https://cdn.myanimelist.net/images/anime/1812/134736.jpg,tt21030032,16,【推しの子】,JP,ja,imdbDisplay,\N,0,tt21030032,2
16405,"""Boku no Imouto wa """"Osaka Okan""""""",UNKNOWN,僕の妹は「大阪おかん」,5.74,"Comedy, Gourmet","Kyousuke Ishihara is an average high school student residing in Tokyo. Recently, his younger sister, Namika, has moved back home after living in Osaka for 10 years. As Kyousuke tries to understand his sister's peculiar behavior and dialect, the two bond over their differences and the difficulties that come with change.",TV,12.0,"Dec 22, 2012 to Mar 16, 2013",winter 2013,Finished Airing,"Toho, Bouncy",UNKNOWN,Charaction,Original,3 min per ep,PG-13 - Teens 13 or older,9930.0,4662,6,8943.0,19156,https://cdn.myanimelist.net/images/anime/12/45352.jpg,tt2595472,4,僕の妹は「大阪おかん」,JP,ja,imdbDisplay,\N,0,tt2595472,2
31630,"""Gyakuten Saiban: Sono """"Shinjitsu""""","Igi Ari!""",Ace Attorney,逆転裁判 ～その「真実」、異議あり！～,6.5,"Comedy, Drama, Mystery","""Since he was a child, Ryuuichi Naruhodou's dream was to become a defense attorney, protecting the innocent when no one else would. However, when the rookie lawyer finally takes on his first case under the guidance of his mentor Chihiro Ayasato, he realizes that the courtroom is a battlefield. In these fast paced trials, Ryuuichi is forced to think outside the box to uncover the truth of the crimes that have taken place in order to prove the innocence of his clients. Gyakuten Saiban: Sono """"Shinjitsu""""",Igi Ari! follows Ryuuichi as he tackles cases to absolve the falsely accused of the charges they face. It will not be easy—standing in his path is the ruthless Reiji Mitsurugi,a prosecutor who will stop at nothing to hand out guilty verdicts. With his back against the wall,the defense attorney must carefully examine both evidence and witness testimony,"sifting through lies to solve the mystery behind each case. With a shout of """"objection!",""""" the battle in the courtroom begins!""",TV,24.0,"Apr 2, 2016 to Sep 24, 2016",spring 2016,Finished Airing,"Aniplex, Yomiuri Telecasting, Capcom, Trinity Sound","Funimation, Crunchyroll",A-1 Pictures,Game,24 min per ep,PG-13 - Teens 13 or older,tt5603356,12,Ace Attorney,US,\N,imdbDisplay,\N,0,tt5603356,1
33377,"""Trickster: Edogawa Ranpo """"Shounen Tanteidan"""" yori""",Trickster,TRICKSTER -江戸川乱歩「少年探偵団」より-,6.25,"Drama, Mystery, Sci-Fi","""Kogorou Akechi is the founder of a private investigation firm known as the Boy Detectives' Club. Together, this group takes on cases both great and small. One of their junior members, Kensuke Hanasaki, is out solving a case one day when he happens upon Yoshio Kobayashi, a mysterious amnesiac boy with an inability to die. After seeing his abilities in action, Kensuke offers Yoshio a deal: join the Boy Detectives' Club and help them solve cases, and in exchange he will find a way to help Yoshio die. The apathetic Yoshio accepts this deal begrudgingly, unaware of how different his life will become. Although he does not have much use for people, he gradually begins to acknowledge the group as he spends more time with them while solving cases. Trickster: Edogawa Ranpo """"Shounen Tanteidan"""" yori follows Akechi and the rest of the Boy Detectives' Club as they solve the various cases they are given","all while combating a hidden threat from the shadows—""""The Fiend with Twenty Faces.""""""",TV,24.0,"Oct 4, 2016 to Mar 28, 2017",fall 2016,Finished Airing,"Bandai Visual, Yomiuri Telecasting, Lantis, Asmik Ace, JR East Marketing & Communications, Sumitomo, Akatsuki",Funimation,"TMS Entertainment, Shin-Ei Animation",Original,23 min per ep,PG-13 - Teens 13 or older,7635.0,1775,239,36336.0,113746,tt8851064,2,Trickster,\N,\N,original,\N,1,tt8851064,3
4469,.hack//G.U. Trilogy: Parody Mode,.hack//G.U. Trilogy: Parody Mode,.hack//G.U. Trilogy,6.36,"Comedy, Fantasy, Sci-Fi",A special bonus Parody Mode added to the extras of the .hack//G.U. Trilogy (Source: AniDB),Special,1.0,"Mar 25, 2008",UNKNOWN,Finished Airing,Bandai Visual,UNKNOWN,UNKNOWN,Game,6 min,PG-13 - Teens 13 or older,7060.0,5891,10,4136.0,10641,https://cdn.myanimelist.net/images/anime/10/8661.jpg,tt1164545,2,.hack//G.U. Trilogy,\N,\N,original,\N,1,tt1164545,3
454,.hack//Gift,.hack//Gift,.hack//GIFT,6.1,"Comedy, Fantasy","""As an expression of gratitude for the heroes of both the .hack//Sign and the .hack game series, Helba has prepared a special event for all the characters to find the newly established """"Twilight Hot Springs."""" The characters can get their well-deserved rest and relaxation by having a soak in the wonderful hot springs",but there is only one problem—the hot springs are hidden,"and there have been mysterious player murders. With the only clue being the word """"GIFT",""""" the race has begun to find the culprit and the location of the hot springs.""",OVA,1.0,"Nov 16, 2003",UNKNOWN,Finished Airing,CyberConnect2,Bandai Entertainment,Bee Train,Original,26 min,R+ - Mild Nudity,8363.0,4517,19,tt0823406,1,.hack//GIFT,US,\N,imdbDisplay,\N,0,tt0823406,1
48,.hack//Sign,.hack//Sign,.hack//SIGN,6.95,"Adventure, Fantasy, Mystery","""A young wavemaster, only known by the alias of Tsukasa, wakes up in an MMORPG called The World, with slight amnesia. He does not know what he has previously done before he woke up. In The World, the Crimson Knights suspects him of being a hacker, as he was seen accompanying a tweaked character in the form of a cat. Unable to log out from the game, he wanders around looking for answers, avoiding the knights and other players he meets along the way. As Tsukasa explores The World, he stumbles upon a magical item that takes the form of a """"guardian",""""" which promises him protection from all harm. Subaru",the leader of the Crimson Knights,along with several other players who became acquainted with Tsukasa,set out to investigate why Tsukasa is unable to log out,"and attempt to get to the bottom of the problem before it gets out of hand.""",TV,26.0,"Apr 4, 2002 to Sep 26, 2002",spring 2002,Finished Airing,"Bandai Visual, Yomiko Advertising, Bandai, CyberConnect2","Funimation, Bandai Entertainment",Bee Train,Original,24 min per ep,PG-13 - Teens 13 or older,4172.0,tt0361140,4,.hack//SIGN,US,\N,imdbDisplay,\N,0,tt0361140,1
48976,100-man no Inochi no Ue ni Ore wa Tatteiru Recap,UNKNOWN,100万の命の上に俺は立っている,6.2,"Drama, Fantasy",Recap of the first season of 100-man no Inochi no Ue ni Ore wa Tatteiru airing before second season.,Special,1.0,"Jul 2, 2021",UNKNOWN,Finished Airing,UNKNOWN,UNKNOWN,Maho Film,Manga,20 min,PG-13 - Teens 13 or older,7864.0,10750,2,631.0,1955,https://cdn.myanimelist.net/images/anime/1453/122534.jpg,tt12706854,10,100万の命の上に俺は立っている,JP,ja,imdbDisplay,\N,0,tt12706854,2
40679,2.43: Seiin Koukou Danshi Volley-bu,2.43: Seiin High School Boys Volleyball Team,2.43 清陰高校男子バレー部,6.14,"Drama, Sports","Genius setter Kimichika Haijima moves back to Fukui from Tokyo after an incident within his school's volleyball team forces him out. There, he is reunited with his childhood friend, Yuni Kuroba, a member of the Monshiro Middle School Boys' Volleyball Team, who is unaware of his own talents. Haijima notices Kuroba's abilities and is determined to form a new volleyball team with Kuroba as the team's ace. At the prefectural tournament, Kuroba crumbles under the pressure, which causes the Monshiro team to fall apart after losing. The loss also creates a rift between Haijima and Kuroba, leading the former to quit the team. Now, as students at Seiin High School, Haijima and Kuroba find themselves on the same volleyball team once again. Having learned from his past mistakes, Haijima helps Kuroba overcome his performance anxiety to become the ace and carry the team to the prefectural championship. With support from team captain Shinichirou Oda and vice captain Misao Aoki, Seiin aim to win the prefecturals and become Fukui's representatives at the Spring Tournament. To do this, they will need to beat Fukuho Technical High School, the reigning champions of Fukui. Will Haijima's team defeat the odds, or are they doomed to repeat his history of losing?",TV,12.0,"Jan 8, 2021 to Mar 26, 2021",winter 2021,Finished Airing,"Aniplex, Dentsu, Animax, Movic, Fuji TV, Fujipacific Music, Shueisha, Fuji Creative, Tohan Corporation, Japan Volleyball Association",Funimation,David Production,Novel,23 min per ep,PG-13 - Teens 13 or older,8149.0,1842,353,43054.0,108063,https://cdn.myanimelist.net/images/anime/1907/110083.jpg,tt13286958,8,2.43 清陰高校男子バレー部,JP,ja,imdbDisplay,\N,0,tt13286958,2
36793,3D Kanojo: Real Girl,Real Girl,３Ｄ彼女　リアルガール,6.85,Romance,"For Hikari Tsutsui, life within the two-dimensional realm is much simpler. Socially inept and awkward, he immerses himself in video games and anime, only to be relentlessly ridiculed and ostracized by his classmates. Sharing his misery is Yuuto Itou, his only friend, who wears cat ears and is equally obsessed with the world of games. After being forced to clean the pool as punishment for arriving late, Tsutsui meets Iroha Igarashi, but he attempts to steer clear of her, as her notoriety precedes her. Brazenly blunt, loathed by female classmates, and infamous for messing around with boys, Tsutsui believes that getting involved with her would cause nothing but problems. 3D Kanojo: Real Girl is a story revolving around these two outcasts—a boy full of emotions he has never experienced before, struggling to lay them bare, and a girl who strives to break him out of his shell.",TV,12.0,"Apr 4, 2018 to Jun 20, 2018",spring 2018,Finished Airing,"VAP, Nippon Television Network, DeNA",Sentai Filmworks,Hoods Entertainment,Manga,22 min per ep,PG-13 - Teens 13 or older,4630.0,672,1521,161092.0,324468,https://cdn.myanimelist.net/images/anime/1327/93616.jpg,tt7112156,2,3D Kanojo: Real Girl,FR,\N,imdbDisplay,\N,0,tt7112156,4
