# Movie Ratings: Movie Association Metrics

In the **Popularity Ratings** notebook we found out that the most popular movie among the users is the **"Toy Story"**. In this notebook we will dig a little bit deeper and compute movie association scores for the **"Toy Story"**.

1) First, we will calculate **simple association metrics** for "Toy Story" and other movies. This association metrics is going to show how often a movie was rated together with the **"Toy Story"**. We will display the top 3 movies with the highest percentage of the **"Toy Story"** raters.

\begin{equation*}
AM = \frac {\#(X \cap Y)} {\#(X)}
\end{equation*}

${\#(X \cap Y)}$ - number of times a selected movie and "Toy Story" were rated together by a user 

${\#(X)}$ - total number of ratings of the selected movie 


2) Next, we will calculate movie **lift metrics** in relation to the **"Toy Story"**. The **lift metrics** tells if there is a relationship among the items.

\begin{equation*}
LM = \frac {P(A \cap B)} {P(A) P(B)}
\end{equation*}

In general the interpretation for the **lift metrics** is as follows:

- If the lift metrics is close to **1** then products are not related, and it is unlikely that the purchase of one product is going to affect the purchase of another.

- If the lift metrics is greater than **1** then it might be that the products complement one another, and if one product has been purchased there is a higher probability that the other product will be bought as well.

- If the lift metrics is smaller that one, then it is likely that the products are substitutes of one another, 
and if one of the products is bought, it is unlikely that another will be bought together.


3) Lastly, we will calculate movie ratings correlations matrix. We will separately display a correlation vector for the **"Toy Story"** and each movie in the movie list, then we will submit the top 3 movies with the highest correlation coefficient and the top 3 movies with the lowest correlation coeffiecient. We will use **Pearson correlation formula** for the computations.


In [17]:
# Settings 
import os
import numpy as np
import pandas as pd
import sqlite3
from sqlite3 import Error as SQLiteError

# Pandas
pd.set_option('precision', 4)

# SQLite
dbfile = "sqlitedb/movielens.db"
if not os.path.isfile(dbfile):
    print("Failed to detect the database file.")
    
# Establish DB Connection
conn = sqlite3.connect(dbfile)
if not conn:
    print("Failed to establish DB connection.")

In [18]:
# Data Query
query = """
    SELECT r.movie_id AS MovieID, 
        m.movie_name AS MovieName,
        r.user_id AS UserID,
        CASE u.user_gender
            WHEN 0 THEN 'M'
            ELSE 'F'
        END Gender,
        r.rating as Rating
        
    FROM ratings AS r
        LEFT JOIN movies as m ON m.movie_id = r.movie_id
        LEFT JOIN users as u ON u.user_id = r.user_id
    ORDER BY r.movie_id, r.user_id
"""

summary = pd.read_sql_query(query, conn)

print("\nSummary DataFrame:\n")
summary.head(n=5)


Summary DataFrame:



Unnamed: 0,MovieID,MovieName,UserID,Gender,Rating
0,1,Toy Story,139,M,2
1,1,Toy Story,755,M,2
2,1,Toy Story,1577,F,4
3,1,Toy Story,1940,M,4
4,1,Toy Story,2765,M,4


## 1. Association Metrics Data

In order to solve the task, it will be convenient to add an extra column to our summary dataset: **RatedToyStory"**. If the user rated "Toy Story", then **RatedToyStory** = 1, else **RatedToyStory**=0.


In [3]:
# IDs of users that rated "Toy Story"
toy_story_raters = set(summary[summary['MovieName'] == 'Toy Story']['UserID'])

# Data Summary for Computing Association Metrics
association_summary = summary[summary['MovieName'] != 'Toy Story'].filter(['MovieName', 'UserID', 'Rating'])

# Adding new column 'RatedToyStory'
association_summary['RatedToyStory'] = association_summary['UserID']\
    .apply(lambda x: 1 if x in toy_story_raters else 0)

association_summary.head(n=5)


Unnamed: 0,MovieName,UserID,Rating,RatedToyStory
17,Babe,139,2,1
18,Babe,1577,3,1
19,Babe,3048,5,1
20,Babe,3823,2,1
21,Babe,4117,5,1


## 2. Computing Simple Association Metrics

In this section we are going to find out which movies most often occurred with the **"Toy Story"**.


### 2.1. Toy Story: Simple Association Metrics

In [4]:
simple_association_summary = association_summary[['MovieName', 'RatedToyStory']]\
    .groupby('MovieName')\
    .agg(
        toy_story_ratings = ('RatedToyStory', lambda x: x.sum()),
        total_ratings = ('RatedToyStory', lambda x: x.count()),
        ratings_proportion = ('RatedToyStory', lambda x: x.sum()/x.count() )
    )\
    .sort_values(['ratings_proportion'], ascending=False)


simple_association_summary

Unnamed: 0_level_0,toy_story_ratings,total_ratings,ratings_proportion
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Independence Day (ID4),13,13,1.0
Star Wars: Episode IV - A New Hope,14,15,0.9333
Star Wars: Episode VI - Return of the Jedi,13,14,0.9286
Total Recall,11,12,0.9167
Groundhog Day,11,12,0.9167
Pulp Fiction,10,11,0.9091
"Sixth Sense, The",10,12,0.8333
Schindler's List,10,12,0.8333
Stand by Me,9,11,0.8182
Raiders of the Lost Ark,9,11,0.8182


### 2.2. Top 3 Movies: Simple Association Metrics with "Toy Story"

In [5]:
simple_association_summary.head(n=3)

Unnamed: 0_level_0,toy_story_ratings,total_ratings,ratings_proportion
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Independence Day (ID4),13,13,1.0
Star Wars: Episode IV - A New Hope,14,15,0.9333
Star Wars: Episode VI - Return of the Jedi,13,14,0.9286


## 3. Computing Lift Metrics

The **Lift Metrics** helps us to predict how likely the user is going to choose to watch a movie "X" if he already has watched a movie "Y".

### 3.1. Movie Lift Metrics Related to "Toy Story"

In order to compute **lift metrics** for a movie we need to find 

1) Total number of users;

2) Number of users that watched a movie for which we are computing the **lift metrics**;

3) Number of users that watched "Toy Story";

4) Number of users that wathed both "Toy Story" and the movie for which we are computing the **lift metrics**.


In order to compute the **total number of users** the easiest thing to do is to run a SQL query on table **users**.

In [6]:
# Total Number of Users

# SQL Query
query = """
    SELECT count(user_id) as total_users FROM users
    """

total_users = pd.read_sql_query(query, conn).iloc[0,0]
print("Total number of users is equal to {0}.".format(total_users))

Total number of users is equal to 20.


Next we will compute the number of users who rated "Toy Story". We can do it by counting the number of "Toy Story" ratings. Please note that from the section **2** we already have a list of "Toy Story" raters.

In [7]:
number_of_toy_story_raters = len(toy_story_raters)
print("The number of users who rated 'Toy Story' is {0}.".format(number_of_toy_story_raters))

The number of users who rated 'Toy Story' is 17.


Now we are ready to produce the **lift metrics** summary table. Before moving on with the Python code, I will provide a brief explanation on how the metrics is computed. The lift metrics formula looks as follows:

\begin{equation*}
LM = \frac {P(A \cap B)} {P(A) P(B)}
\end{equation*}

${P(A \cap B)} = $ **Number of Users Who Watched A and B / Total Number of Users**

${P(A)} = $ **Number of Users Who Watched A / Total Number of Users**

${P(B)} = $ **Number of Users Who Watched B / Total Number of Users**

**LM = (Number of Users Who Watched A and B x Total Number of Users) / (Number of Users Who Watched A x Number of Users Who Watched B)**

In [8]:
# Lift Metrics Summary

lift_metrics_summary = association_summary[['MovieName', 'RatedToyStory']]\
    .groupby('MovieName')\
    .agg(
        toy_story_ratings = ('RatedToyStory', lambda x: x.sum()),
        total_ratings = ('RatedToyStory', lambda x: x.count()),
        lift_metrics = ('RatedToyStory', lambda x: x.sum() * total_users/number_of_toy_story_raters/x.count())
    )\
    .sort_values(['lift_metrics'], ascending=False)

lift_metrics_summary

Unnamed: 0_level_0,toy_story_ratings,total_ratings,lift_metrics
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Independence Day (ID4),13,13,1.1765
Star Wars: Episode IV - A New Hope,14,15,1.098
Star Wars: Episode VI - Return of the Jedi,13,14,1.0924
Total Recall,11,12,1.0784
Groundhog Day,11,12,1.0784
Pulp Fiction,10,11,1.0695
"Sixth Sense, The",10,12,0.9804
Schindler's List,10,12,0.9804
Stand by Me,9,11,0.9626
Raiders of the Lost Ark,9,11,0.9626


### 3.2. Top 3 Movies by "Toy Story" Lift Metrics

In [9]:
# Top 3 Movies by "Toy Story" Lift Metrics

lift_metrics_summary.head(n=3)

Unnamed: 0_level_0,toy_story_ratings,total_ratings,lift_metrics
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Independence Day (ID4),13,13,1.1765
Star Wars: Episode IV - A New Hope,14,15,1.098
Star Wars: Episode VI - Return of the Jedi,13,14,1.0924


### 3.3. Last 3 Movies by "Toy Story" Lift Metrics

I think it will be of interest to see the last three movies according to "Toy Story" lift metrics.

In [10]:
# Last 3 Movies by "Toy Story" Lift Metrics

lift_metrics_summary.tail(n=3)

Unnamed: 0_level_0,toy_story_ratings,total_ratings,lift_metrics
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Shakespeare in Love,8,11,0.8556
Saving Private Ryan,8,11,0.8556
Forrest Gump,7,10,0.8235


## 4. Movie Ratings Correlations

In this section we will compute the correlations among movie rating vectors. The movie rating vector is a vector of user ratings for a particular movie. How movie ratings vector look like? Well, we will figure that out, but beforehand we will derive the movie ratings matrix by doing a cross tabulation (crosstab). The row names correspond to the movie names. The columns represent the user ids. The data is the ratings data. 

If the movie was not rated by the user, we will give it a rating of **0**.

In [11]:
# Movie Ratings Matrix
movie_ratings_matrix = pd.crosstab(summary.MovieName, summary.UserID, 
                                      values=summary.Rating,
                                      aggfunc=np.sum
                                  ).fillna(0)
movie_ratings_matrix

UserID,139,755,1202,1577,1940,2765,3048,3118,3823,4117,4388,4489,4656,4790,4796,5277,5347,5448,5450,6037
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Babe,2.0,0.0,0.0,3.0,0.0,0.0,5.0,0.0,2.0,5.0,2.0,0.0,1.0,0.0,4.0,0.0,0.0,2.0,4.0,0.0
Blade Runner,0.0,2.0,4.0,4.0,0.0,0.0,4.0,2.0,0.0,0.0,3.0,5.0,0.0,0.0,0.0,0.0,2.0,3.0,0.0,0.0
Forrest Gump,2.0,2.0,4.0,0.0,0.0,0.0,1.0,3.0,4.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,0.0,3.0,5.0,0.0
Gladiator,0.0,4.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,4.0,1.0,0.0,5.0,4.0,2.0,2.0,2.0,4.0,5.0,0.0
Groundhog Day,2.0,0.0,0.0,1.0,3.0,0.0,0.0,3.0,1.0,5.0,5.0,5.0,2.0,0.0,5.0,2.0,4.0,0.0,0.0,0.0
Independence Day (ID4),3.0,5.0,0.0,4.0,4.0,2.0,0.0,2.0,4.0,1.0,1.0,0.0,3.0,0.0,0.0,2.0,3.0,0.0,2.0,0.0
"Matrix, The",5.0,4.0,3.0,1.0,0.0,2.0,0.0,5.0,0.0,0.0,0.0,4.0,0.0,2.0,2.0,0.0,0.0,1.0,1.0,4.0
Pulp Fiction,1.0,0.0,0.0,0.0,0.0,4.0,5.0,0.0,4.0,0.0,4.0,2.0,3.0,3.0,0.0,0.0,3.0,2.0,0.0,2.0
Raiders of the Lost Ark,0.0,0.0,0.0,1.0,5.0,0.0,2.0,0.0,0.0,0.0,1.0,3.0,3.0,0.0,3.0,3.0,5.0,5.0,1.0,0.0
Saving Private Ryan,0.0,2.0,4.0,0.0,5.0,3.0,0.0,0.0,1.0,4.0,3.0,3.0,0.0,1.0,2.0,0.0,0.0,5.0,0.0,0.0


OK! So now we can answer how the movie ratings vector looks like. The movie ratings vector is a row that corresponds to one of the movies in our movie ratings matrix. Let's extract **"Toy Story"** rating vector.

In [12]:
# Toy Story Rating Vector
movie_ratings_matrix.loc[["Toy Story"], :]

UserID,139,755,1202,1577,1940,2765,3048,3118,3823,4117,4388,4489,4656,4790,4796,5277,5347,5448,5450,6037
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Toy Story,2.0,2.0,0.0,4.0,4.0,4.0,4.0,3.0,3.0,4.0,2.0,2.0,2.0,2.0,0.0,1.0,2.0,0.0,5.0,2.0


As I have already mentioned, it is nothing else but a vector of user ratings of a particular movie. In our case it is the vector of the **"Toy Story"** ratings.

### 4.1. Movie Rating Correlation Matrix

In the introductory part of the section we have built movie ratings matrix, and now we can easily find the movie correlations matrix. We hope that the matrix shows us how the movies are related to each other. Before moving on with the calculations lets recall what is correlation coefficient.


**Pearson Correlation Coefficient** is a statistics measuring linear dependence between two variables **X** and **Y**. It takes values from **-1 to 1**:

- 1 shows a total positive linear dependence;
- 0 - shows the absence of linear dependence; 
- (-1) shows a total negative linear dependence.

In [13]:
# Movie Correlation Matrix

movie_correlation_matrix = movie_ratings_matrix.T.corr()

# Movie Correlation Matrix for the First Five Movies
movie_correlation_matrix.iloc[:5, :5]

MovieName,Babe,Blade Runner,Forrest Gump,Gladiator,Groundhog Day
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Babe,1.0,0.0081,0.113,0.2373,0.1302
Blade Runner,0.0081,1.0,0.174,-0.2451,0.0427
Forrest Gump,0.113,0.174,1.0,0.0796,-0.3058
Gladiator,0.2373,-0.2451,0.0796,1.0,-0.1326
Groundhog Day,0.1302,0.0427,-0.3058,-0.1326,1.0


### 4.2. Movie Correlation with "Toy Story"

As soon as we have all movie correlation matrix, we can easily find how other movies correlate with **"Toy Story"**.

In [14]:
# Movie Correlations with Toy Story

correlations_with_toy_story = \
    movie_correlation_matrix.loc[:, ['Toy Story']].sort_values(['Toy Story'], ascending=False)
correlations_with_toy_story

MovieName,Toy Story
MovieName,Unnamed: 1_level_1
Toy Story,1.0
"Shawshank Redemption, The",0.5238
Independence Day (ID4),0.3778
Total Recall,0.3164
Babe,0.3156
Star Wars: Episode IV - A New Hope,0.1607
Pulp Fiction,0.0982
Star Wars: Episode VI - Return of the Jedi,0.0618
Forrest Gump,-0.0605
Groundhog Day,-0.0935


### 4.3. Top 3 Movies by Correlation with Toy Story

In [20]:
# Top 3 Movies by Correlation with Toy Story
correlations_with_toy_story.iloc[1:, :].head(n=3)

MovieName,Toy Story
MovieName,Unnamed: 1_level_1
"Shawshank Redemption, The",0.5238
Independence Day (ID4),0.3778
Total Recall,0.3164


### 4.4. Last 3 Movies by Correlation with Toy Story

It is also of interest to see the bottom three movies by correlation with "Toy Story". We will be dealing here with negative correlations.

In [21]:
# Last 3 Movies by Correlation with Toy Story
correlations_with_toy_story.iloc[1:, :].tail(n=3)

MovieName,Toy Story
MovieName,Unnamed: 1_level_1
Schindler's List,-0.2672
"Silence of the Lambs, The",-0.2872
Shakespeare in Love,-0.5601


## 5. Summary

We are all done! Let's summarize the results. 

I will start with the technical result: the **Association Metrics** and the **Lift Metrics** provide the same ranking for the movies, and this is because the formula for the **Lift Metrics** can be expressed as

$$ LM = k * AM $$

The coefficient **k** in our particular case:

**k = Total Number of Users / Users that Watched the "Toy Story"**

If we compare the movie ranking results by the **Association Metrics** with the rankings by the **Lift Metrics** you will see that they are absolutely identical, and this is because the formulas work this way.

Next let's check the **top 3** movies. We already know that either we use the **Lift Metrics** or the **Association Metrics**, we get the same result.

Our top 3 movies are as follows

1) "Independence Day";

2) "Star Wars: Episode IV - A New Hope";

4) "Star Wars: Episode VI - Return of the Jedi"

All the movies listed above are quite popular family friendly movies, and there is no surprize, that the **Association Metrics with the "Toy Story"** for those movies was the highest. ("Toy Story" is also one of the popular family movies."). 

Let's go now to the **Movie Correlation Matrix**. What it tells us? Roughly, if the correlation coefficient of a movie relative to another movie is positive and significantly differs from zero, then we can tell that our audience rated movie **A** similar to movie **B**. If the movie correlation coefficient is significantly negative, then we would say that users who liked movie **A** didn't liked movie **B**, and users who liked movie **B** didn't like movie **A**.

Ok, let's try to explain our results (for me it doesn't seem to be an easy exercise!!!). The top 3 movies most closely correlated with the "Toy Story" are:

1) "The Shawshank Redemption" (0.5238)

2) "Independence Day" (0.3778)

3) "Total Recall" (0.3164)

I can not relate the "Toy Story" with "The Shawshank Redemption", let's try to look at the ratings of the both movies.


In [22]:
movie_ratings_matrix.loc[['Toy Story', 'Shawshank Redemption, The'],:]

UserID,139,755,1202,1577,1940,2765,3048,3118,3823,4117,4388,4489,4656,4790,4796,5277,5347,5448,5450,6037
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Toy Story,2.0,2.0,0.0,4.0,4.0,4.0,4.0,3.0,3.0,4.0,2.0,2.0,2.0,2.0,0.0,1.0,2.0,0.0,5.0,2.0
"Shawshank Redemption, The",0.0,0.0,1.0,5.0,5.0,5.0,5.0,0.0,4.0,4.0,0.0,4.0,0.0,0.0,0.0,2.0,0.0,1.0,0.0,0.0


From the results we can see the following to things that affected the value of the correlation coefficient:

1. In majority of the cases users who liked the "Shawshank Redemption" also liked "Toy Story".

2. In majority of the cases users who didn't like one of the movies, didn't like or didn't watch another.

Let's move to the "Independence Day". This movie has a correlation coefficient with the "Toy Story" of 0.38.
Both of the movies were quite popular among our user audience, and both movies can be assigned to "family friendly" category, so it is reasonable to expect a positive correlation among the ratings. 

...and again, let's look at the rating results!

In [23]:
movie_ratings_matrix.loc[['Toy Story', 'Independence Day (ID4)'],:]

UserID,139,755,1202,1577,1940,2765,3048,3118,3823,4117,4388,4489,4656,4790,4796,5277,5347,5448,5450,6037
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Toy Story,2.0,2.0,0.0,4.0,4.0,4.0,4.0,3.0,3.0,4.0,2.0,2.0,2.0,2.0,0.0,1.0,2.0,0.0,5.0,2.0
Independence Day (ID4),3.0,5.0,0.0,4.0,4.0,2.0,0.0,2.0,4.0,1.0,1.0,0.0,3.0,0.0,0.0,2.0,3.0,0.0,2.0,0.0


I would also like to give a reference to a **Mean Ratings** summary and point out, that both movies have very close mean value ratings:

- "Toy Story" (2.82)
- "Independence Day" (2.77)

Another interesting case is a negative correlation between the "Toy Story" and "Shakespeare in Love". I suggest to look at 3 indicators which we can pick up from the previous analysis 

- mean rating 
- popularity rating 
- positivity rating

For the "Toy Story" we have (2.82, 17, 0.35), for the "Shakespeare in Love" - (2.90, 11, 0.27).
Let's also look at the actual ratings.

In [25]:
movie_ratings_matrix.loc[['Toy Story', 'Shakespeare in Love'],:]

UserID,139,755,1202,1577,1940,2765,3048,3118,3823,4117,4388,4489,4656,4790,4796,5277,5347,5448,5450,6037
MovieName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Toy Story,2.0,2.0,0.0,4.0,4.0,4.0,4.0,3.0,3.0,4.0,2.0,2.0,2.0,2.0,0.0,1.0,2.0,0.0,5.0,2.0
Shakespeare in Love,3.0,2.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0,2.0,0.0,1.0,0.0,3.0,5.0,3.0,0.0,1.0,0.0,2.0


As a conclusion I would like to point out, that the implementation of correlation matrix was unefficient and gave rather contradictive results. One of the shortcomings was a replacement of missing ratings with zeros. 

Probably it would be better to calculate two correlation matrices (instead of one), the one would be based on all of the movies and movie rating flags (rated/didn't rate), and another would be based on actual ratings of the users who have seen both of the movies. The first matrix would reflect movie choice correlations, and another would reflect the actual ratings correlation.