Animeniacs

Data Analytics and Visualization Bootcamp

FINAL PROJECT

https://dseg27.github.io/Animeniacs/

OVERVIEW

There is currently no well-known source available to recommend anime series for people to watch, so we’re stepping in to fill this need. We've hypothesized that we can use sentence embedding and scikitLearn's cosine similarity function on anime and live action show synopses to recommend animes for someone, given their favorite live-action title.

PIPELINE

Data Source: Kaggle CSVs

TECHNOLOGIES AND FILES

Data Wrangling: We used Python and Pandas to explore, clean, and analyze our data. databases.ipynb In this file, you can see how we cleaned our data. Here are examples of how we removed titles that were missing synopses, and how we removed an innapropriate genre to make the website more family-friendly:

Below, you can see how we removed titles in Spanish, removed stop words, and dropped duplicate titles:

Machine Learning: We used SentenceTransformers (a Python framework) in order to implement sentence embedding on our live action and anime data. We also used this framework to implement the cosine similarity function in order to compare the live action and anime embeddings, in order to find synopses that were similar.

chart_data_cleaning.py This file contains several functions that transform our data so that it can be used to display data in the charts on our site. This file also holds the sentence embedding function which peforms sentence embedding on the live action and anime titles, and then applies the cosine similarity function to compare the live action sentence embedding and anime sentence embeddings. It then ultimately returns the five anime recommendations.

model.ipynb This file contains the isolated framework and implementation of the machine learning aspect of this project. Here is how recommendations were generated:

Data Storage: After cleaning the data and generating the necessary results from our machine learning model, we exported the final, cleaned data to CSVs and stored them in our Final Resources folder:

The final CSV that holds our live action titles and the corresponding five anime recommendations (live_actions_with_anime_recs.csv) was then converted into a JSON file that we could use to pull data from and display on our website. This JSON file (final.json) was then used in our Updated Website folder for visualization.

Visualization:

We used HTML, JavaScript and CSS to create our own website. Our folder structure for these files lives in the Updated Website folder of the repo. On this site, a user can search for their favorite live action show using a drop-down menu that pulls live-action titles from our JSON file.

Once selected, a button is clicked that will display a list of 5 recommended animes, and the match percentage.

The user can also view charts that display anime data including top ranked shows and their episode lengths, a comparison of anime and live action growth over the last two decades, as well as bar charts that compare anime and live action titles. These charts were generated using the JavaScript library, Chart.js.

Limitations & Future Improvements: While we had over 20,000 live action titles to work with, our website was not able to quickly load all the titles into our drop-down menu without crashing the page. Therefore, we were only able to display 1,000 live action titles for a user to choose from:

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
Data Scripts		Data Scripts
Final Resources		Final Resources
Updated Website		Updated Website
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Scripts

Data Scripts

Final Resources

Final Resources

Updated Website

Updated Website

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

_config.yml

_config.yml

index.html

index.html

Repository files navigation

Animeniacs

Data Analytics and Visualization Bootcamp

FINAL PROJECT

OVERVIEW

PIPELINE

Data Source: Kaggle CSVs

TECHNOLOGIES AND FILES

About

Releases

Packages

Contributors 4

Languages

dseg27/Animeniacs

Folders and files

Latest commit

History

Repository files navigation

Animeniacs

Data Analytics and Visualization Bootcamp

FINAL PROJECT

OVERVIEW

PIPELINE

Data Source: Kaggle CSVs

TECHNOLOGIES AND FILES

About

Resources

Stars

Watchers

Forks

Languages