## Colley Rating Movie method within Genre

This is the Colley method applied to movies assumed to be in the Movielens dataset format. 

https://grouplens.org/datasets/movielens/

The userMovie matrix has already been created with makeUserMovieMatrix.ipynb.

### Load the ratings and movies into dataframes

In [1]:
import pandas as pd
import numpy as np

userMovie = np.load('userMovieMatrix.npy')

numberUsers, numberGenreMovies = userMovie.shape

genreFilename = 'action.csv'
genre = pd.read_csv(genreFilename)

### Create Colley ratings

Note, a movie competes over one users ratings. As such, the games are row-wise for all pairs of nonzero entries. 

In [2]:
colleyMatrix = 2*np.diag(np.ones(numberGenreMovies))
b = np.ones(numberGenreMovies)

for i in range(numberUsers):
    for j in range(numberGenreMovies):
        if (userMovie[i,j] != 0): # then there are games
            for k in range(j+1,numberGenreMovies):
                team1ID = j
                team1Score = userMovie[i,j]
                if (userMovie[i,k] != 0): # then there is a game between movie j and k
                    team2ID = k
                    team2Score = userMovie[i,k]

                    colleyMatrix[team1ID, team2ID] -= 1
                    colleyMatrix[team2ID, team1ID] -= 1

                    colleyMatrix[team1ID, team1ID] += 1
                    colleyMatrix[team2ID, team2ID] += 1

                    if team1Score > team2Score:
                        b[team1ID] += 1/2
                        b[team2ID] -= 1/2
                    elif team1Score < team2Score:
                        b[team1ID] -= 1/2
                        b[team2ID] += 1/2
                    else:  # it is a tie and make 1/2 a win and 1/2 a loss for both teams
                        b[team1ID] += 0; # this equates to adding nothing
                        b[team2ID] += 0; # clearly this code could be deleted

### Sort and print the ranking of teams

In [None]:
r = np.linalg.solve(colleyMatrix,b)
iSort = np.argsort(-r)

# Remove movies with no ratings, find total ratings by column sums 
totalMovieRatings = np.sum(userMovie,0)

print('\n\n************** Colley Rating Method (with ties) **************\n')
print('===========================')
print('Rank       Rating  Movie   ')
print('===========================')
rank = 1
for i in range(numberGenreMovies):
    if (totalMovieRatings[iSort[i]] != 0):  # if the movie has at least 1 rating
        print(f'{rank:4d} {r[iSort[i]]:10.3f}  {genre.at[iSort[i],"title"]}')        
        rank += 1
        
print('')   # extra carriage return

print('Total films with ratings: %d' % len(np.where(totalMovieRatings!=0)[0]))
print('Total films in genre: %d' % numberGenreMovies)