# MovieRecommender

### A ai-based movie recommendation system trained with MovieLens dataset


## The dataset - a brief description

To train a neural network, which should be able to recommend you good movies based on your rating-history, a large-scale dataset is needed. Therefore the [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/) dataset is used. It contains:
- 100.000 Ratings, each rating contains a one-to-five-star rating made by one user (with userID XY) on one movie (with userID YZ).
- 943 Users, labeled with an userID, age, gender, occupation and zip-code
- 1682 Movies, labeled with an movieID, title, release dates, IMDb URL and a list of genres

You can find the full documentation of the dataset structure under ML_100_INFO.md


### Acknowledgement

The dataset was published here:
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
History and Context. ACM Transactions on Interactive Intelligent
Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
DOI=http://dx.doi.org/10.1145/2827872


### Importing data

First the data musst be imported in a usable dataformat to do some pre-processing. Also some analyzation can be done to get a feeling for the dataset, its content, diversity and limitations.
To access the data its useful to define some classes:

In [34]:
class Movie:
    def __init__(self, id, title, r_date, v_date, url, g_list):
        self.id = id
        self.title = title
        self.release_date = r_date
        self.video_release = v_date
        self.url = url
        self.genre_list = g_list

    def get_genres(self):
        genres = []
        genre = ["unknown", "Action", "Adventure", "Animation", "Children's", "Comedy", "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"]
        for i in range(len(self.genre_list)):
            if  int(self.genre_list[i]):
                genres.append(genre[i])
        return genres

class User:
    def __init__(self, id, age, gender, occ, zip):
        self.id = id
        self.age = age
        self.gender = gender
        self.occupation = occ
        self.postal_code = zip


Then its time to start importing the dataset (and required modules)

In [38]:
#Module Imports
import csv

#Dataset import
import csv

with open(r'ml-100k\u.data') as ml_100_data: #ml-100k\u.data
    data_csv = csv.reader(ml_100_data, delimiter ='\t')
    ratings = list(data_csv)

movies = list()
with open(r'ml-100k\u.item') as movies_data: #ml-100k\u.data
    movies_csv = csv.reader(movies_data, delimiter ='|')
    for row in movies_csv:
        movies.append(Movie(id=row[0], title=row[1], r_date=row[2], v_date=row[3], url=str(row[4]), g_list=row[5:24]))
print(f"Sanity check:\nThe movie {movies[0].title} contains the genres: {movies[0].get_genres()}")

users = list()
with open(r'ml-100k\u.user') as users_data: #ml-100k\u.data
    users_csv = csv.reader(users_data, delimiter ='|')
    for row in users_csv:
        users.append(User(id=row[0],age=row[1],gender=row[2],occ=row[3],zip=row[4]))
print(f"Sanity check:\nThe user 111 ({users[110].gender} {users[110].age}) works as {users[110].occupation} - ID: {users[110].id}")

Sanity check:
The movie Toy Story (1995) contains the genres: ['Animation', "Children's", 'Comedy']
Sanity check:
The user 111 (M 57) works as engineer - ID: 111


### Analyzation


In [None]:
# Analyse: Best Movie, average ratings, average rating counts,
# #TODO: plot ratings, best movie and stuff

### Preprocessing
After successfully importing all relevant data, the next step is to pre-process the data so it fits our nn-input

## The nn-architecture
### Goal
The goal of the recommender system is to predict the best movie recommendation for user with respect to his ratings on other movies and the movie-ratings of other users in general. The easiest way to determine the "best" movie is to predict the users rating on unwatched movies and pick the highest rated one.
### First approach: Genre-recommendation
The first idea to predict the users rating (before diving into collaborative filtering) was to calculate every users preferences based on his rating-history and then train classical dense neural-network with it:

``Picture``

First the genre-preference calculation has to be done. The output of it should be a weighted genre list containg values in \[0,1]
The input of the nn is defined by the count of genres: 18 movie genres + 18 preference-weighted genres

In [None]:
#approach specific pre-processing
ratings.sort() #by time and user
#change movie id to genre or include at row[4:23]

In [None]:
def calc_pref(pref_genres, movie_genres, rating):
    return new_pref_genres

for users in ratings
    pref_genres = list()
    for user_ratings in user
        #get_movie_genres
        pref_genres = calc_pref(pref_genres, movie_genres, rating)
        #add movie_genres and pref_gen to row

# Findings and outlook