moreco

Data & Visual Analytics Group Project

Description

This project presents a novel interface for a reccomendation engine. The goal was to provide users with a better understanding behind the rationale for the recomendations. It also encourages users to explore recommendations which aren't necessarily the top filtered results as in most systems.

Data Sources

Data sets were obtained from the following sources:

IMDB (Internet Movie DataBase) Title- Basics info
IMDB Ratings- Ratings info
IMDB Cast/Crew - Principals info
IMDB Director/Writer - Crew info
GroupLens - MovieLens 25M info

Data was also collect from the following sites:

Database Schema

TABLE	COLUMN(S) (DATATYPE)
tags	id (integer), name (text)
movies	id (integer), kind (text), primary_title (text), original_ttitle (text), release_year (integer), runtime_minutes (integer), genres (text)
scores	fk_id (text), tag_id_1 (real), tag_id_2 (real), ... tag_id_n (real)
directors	id (text), name (text)
trailers	id (text), yt_video_id (text)
posters	id (text), img_url (text)
movie_meta	[d (text), year (text), genres (text), title (text), runtime_minutes (text)

Raw Dataset Schema

DATA SET NAME	COLUMN(S)
genome_scores	movieId, tagId, relevance
genome_tags	tagId, tag
links	movieId, imdbId, tmdbId
movies	movieId, title, genres
ratings	userId, movieId, rating, timestamp
tags	userId, movieId, tag, timestamp
imdb_name_basics	nconst, primaryName, birthYear, deathYear, primaryProfession, knownForTitles
imdb_title_basics	tconst, titleType, primaryTitle, originalTitle, isAdult, startYear, endYear, runtimeMinutes, genres
imdb_ratings	tconst, averageRating, numVotes
imdb_crew	tconst, directors, writers
imdb_principals	tconst, ordering, nconst, category, job, characters

How to Run

Automatic

The manual steps have been added to a script for convenience. The database will be downloaded as part of this script. Downloading will be skipped if it already exists. Note that this doesn't detect database changes so you will need to manually download it if you don't have the latest database.

Run python run_local.py

Manual

Download the movie_sqlite.db database from the releases in the repo.
Place downloaded databse in the db directory.
Navigate to moreco dir.
Install the python library dependencies using pip install -r requirements.txt.
Navigate to the visual dir.
Start the server with python server.py
Open web browser to the link posted.

Technologies Used

Improvements

Features

Tag weights: we currently have a pre-determined set of weights for the tags depending on their order in the permutation. To provide a more personalized experience we could allow the user to control these individual weights. A simple slider next to each tag would be sufficient to enable this feature.
Expand the site to include directors/actors as entities which could be recommended.

Fixes / Cleanup

Database: optimizations by indexing columns, normalization of the data, proper column data type usage.
Add links for each recommendation to be able to go to the IMDB website.
Processing: improve the way we make predictions to speed up calculations and allow more tags to be selected.
Data sources: consolidate the data sources which are used for the app (currently a csv and database are being used).
Data: clean the tag data set to remove duplicates. Many of our tags are highly correlated and represent the same concept. For example: sci fi, sci-fi, science fiction, scifi. We can safely assume these all represent the same underlying feature and thus we should reduce them down to a single entity. Prior to doing so, some basic analysis should be done to validate that these are highly correlated throughout the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
data_prep		data_prep
imgs		imgs
report		report
visual		visual
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_local.py		run_local.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

moreco

Description

Data Sources

Database Schema

Raw Dataset Schema

How to Run

Automatic

Manual

Technologies Used

Improvements

Features

Fixes / Cleanup

About

Releases 1

Packages

Contributors 3

Languages

cwipy7/moreco

Folders and files

Latest commit

History

Repository files navigation

moreco

Description

Data Sources

Database Schema

Raw Dataset Schema

How to Run

Automatic

Manual

Technologies Used

Improvements

Features

Fixes / Cleanup

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages