Skip to content

A data analysis and dashboard made on Japanese anime for a Data Science class.

License

Notifications You must be signed in to change notification settings

aadiraju/animetrics

Repository files navigation

Anime-trics: Japanese Anime Dataset Analysis and Dashboard

Where to start

To view the dashboard, see the file named Group 100 - Animetrics.twbx. For the Jupyter Notebooks showing the analysis featuring a step-by-step explanation of how the analysis was carried out, look for files called milestone1.ipynb and milestone2.ipynb, both of which are located in the analysis folder. Have fun! :)

Presentation Video

Here's a link to the dashboard presentation video : Animetrics Dashboard Presentation

About the topic

Japanese Animation, colloquially known as Anime has taken the world by storm, especially recently, and I have been invested in this field for quite some time. "MyAnimeList" is a database website that records almost every anime in existence, along with giving its users the opportunity to rate their favourite anime on a scale of 1 to 10. There is a lot to learn about this dataset, like What genre of anime is popular amongst people that have seen very few anime vs. the genre that is popular among anime veterans?, or even something like Do people rate anime movies/OVAs higher over traditional episodic series? As the market for anime continues to grow by the day, it might be useful to look back at what the viewers feel, allowing us to make informed choices as to what direction this industry could/should take in the future...

About the Dataset

The Anime Recommendations Database dataset contains metadata and information of around 12,000 different anime series/movies/OVAs etc, along with 76000 user ratings for these anime. The anime metadata and the user ratings were collected through a public database site called MyAnimeList. The dataset is divided into two separate spreadsheets called "anime.csv" and "rating.csv", located in data/raw(here).

The anime.csv file contains the following fields:

  • A unique ID to identify each anime generated by myanimelist.net (anime_id)
  • The full name of the anime (name)
  • A comma-separated list of each genre the anime falls under (genre)
  • Whether the anime is a TV series, a movie, an OVA, etc (type)
  • The number of episodes the anime has (which is 1 for movies) (episodes)
  • The average user rating out of 10 for the anime (rating), and
  • The number of members that are in the anime's "group", i.e. the number of MyAnimeList users that are fans of the anime (members).

The rating.csv file contains the following fields:

  • The unique user id generated by MyAnimeList (user_id)
  • The unique anime ID of the show the user rated which corresponds to the field in anime.csv (anime_id), and
  • The rating out of 10 that a particular user gave to the anime with the anime_id (which is -1 if the user has not rated it) (rating).

I will be using these two files for my analysis.

Team Members

  • Abhineeth: 3rd year CS major very interested in practical applications of Data Science

References

Anime Recommendations Dataset on Kaggle

MyAnimeList