# Recommending Movies

-------------------------------------------------

The raw code for this Jupyter notebook is by default hidden for easier reading. The main focus of this particular page of the notebook is on the graphs and their interpretation. To toggle on/off the raw code, click below:

In [1]:
# Setup Code toggle button
from IPython.core.display import HTML  

HTML(''' 
<center><h3>
<a href="javascript:code_toggle()">Talk is cheap, show me the code.</a>
</center></h3>
<script>
    var code_show=true; //true -> hide code at first

    function code_toggle() {
        $('div.prompt').hide(); // always hide prompt

        if (code_show){
            $('div.input').hide();
        } else {
            $('div.input').show();
        }
        code_show = !code_show
    }
    $( document ).ready(code_toggle);
</script>
''')

In [2]:
# Setup notebook theme
from jupyterthemes import get_themes
from jupyterthemes.stylefx import set_nb_theme
set_nb_theme(get_themes()[1])

In [1]:
# Load R magic
%load_ext rpy2.ipython

&nbsp;

## Get the Data

This time we are skipping Python and going streight into R. The data is provided in tab seperated files which can easily be read into an R dataframe. Unfortuantely Python dataframes print with infinitly better formatting than R though. It makes the data much easier to inspect.

&nbsp;

In [12]:
%%R

u.user <- read.delim("../data/u.user",
                     sep="|",
                     col.names=c("user.id", "age", "gender", "occupation", "zip.code")
                    )

u.user

    user.id age gender    occupation zip.code
1         2  53      F         other    94043
2         3  23      M        writer    32067
3         4  24      M    technician    43537
4         5  33      F         other    15213
5         6  42      M     executive    98101
6         7  57      M administrator    91344
7         8  36      M administrator    05201
8         9  29      M       student    01002
9        10  53      M        lawyer    90703
10       11  39      F         other    30329
11       12  28      F         other    06405
12       13  47      M      educator    29206
13       14  45      M     scientist    55106
14       15  49      F      educator    97301
15       16  21      M entertainment    10309
16       17  30      M    programmer    06355
17       18  35      F         other    37212
18       19  40      M     librarian    02138
19       20  42      F     homemaker    95660
20       21  26      M        writer    30068
21       22  25      M        writ

In [16]:
%%R

u.data <- read.delim("../data/u.data",
                     sep="\t",
                     col.names=c("user.id", "item.id", "rating", "timestamp")
                    )

u.data

      user.id item.id rating timestamp
1         186     302      3 891717742
2          22     377      1 878887116
3         244      51      2 880606923
4         166     346      1 886397596
5         298     474      4 884182806
6         115     265      2 881171488
7         253     465      5 891628467
8         305     451      3 886324817
9           6      86      3 883603013
10         62     257      2 879372434
11        286    1014      5 879781125
12        200     222      5 876042340
13        210      40      3 891035994
14        224      29      3 888104457
15        303     785      3 879485318
16        122     387      5 879270459
17        194     274      2 879539794
18        291    1042      4 874834944
19        234    1184      2 892079237
20        119     392      4 886176814
21        167     486      4 892738452
22        299     144      4 877881320
23        291     118      2 874833878
24        308       1      4 887736532
25         95     546    

In [19]:
%%R

u.item <- read.delim("../data/u.item",
                     sep="|",
                     col.names=c("movie.id", "movie.title", "release.date",
                                 "video.release.date", "IMDB.URL", "unknown",
                                 "action", "adventure", "animation", "children",
                                 "comedy", "crime", "documentary", "drama",
                                 "fantasy", "film-noir", "horror", "musical",
                                 "mystery", "romance", "sci-fi", "thriller",
                                 "war", "western"
                                )
                    )

u.item

     movie.id
1           2
2           3
3           4
4           5
5           6
6           7
7           8
8           9
9          10
10         11
11         12
12         13
13         14
14         15
15         16
16         17
17         18
18         19
19         20
20         21
21         22
22         23
23         24
24         25
25         26
26         27
27         28
28         29
29         30
30         31
31         32
32         33
33         34
34         35
35         36
36         37
37         38
38         39
39         40
40         41
41         42
42         43
43         44
44         45
45         46
46         47
47         48
48         49
49         50
50         51
51         52
52         53
53         54
54         55
55         56
56         57
57         58
58         59
59         60
60         61
61         62
62         63
63         64
64         65
65         66
66         67
67         68
68         69
69         70
70         71
71    

&nbsp;

## Find the 3 users who are closest to you

Using the metrics:

*  age
*  gender
*  occupation

Dataframes can be subsetted based on values in each row (column values) using the pattern:

```R
dataframe[ dataframe$column.id == x, ]
```

&nbsp;

In [27]:
%%R

u.user[ u.user$age == 29 & u.user$gender == 'M' & u.user$occupation == 'programmer', ]

    user.id age gender occupation zip.code
44       45  29      M programmer    50233
221     222  29      M programmer    27502


&nbsp;

Only two hits, lets throw in a scientist then.

&nbsp;

In [31]:
%%R

u.user[ u.user$age == 29 & u.user$gender == 'M' & u.user$occupation == 'scientist', ]

    user.id age gender occupation zip.code
482     483  29      M  scientist    43212


&nbsp;

Users 45, 222, and 483 it is.

## Inspecting User 45

Now to choose one from them as a substitute "me" which one do I identitify most with based on their movie rattings?

&nbsp;

In [111]:
%%R

movie.lookup <- function(x) {
    1+1
}

umovies <- function(x) {
    #movie <- u.item[ u.item$movie.id == x[2] ]
    #paste(x[4], sep=" ")
    paste("Moves", x[3], sep=" ")
}

In [112]:
%%R

u45.data <- u.data[ u.data$user.id == 45, ]
by(u45.data, u45.data, umovies)

user.id: 45
item.id: 1
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 7
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 13
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 15
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 21
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 24
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 25
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 45
item.id: 50
rating: 2
timestamp: 880996629
[1] NA
------------------------------------------------------------ 
user.id: 4

In [52]:
%%R

u.data[ u.data$user.id == 45, ]

      user.id item.id rating timestamp
535        45      25      4 881014015
1504       45     109      5 881012356
1535       45     118      4 881014550
1921       45     763      2 881013563
2011       45     473      3 881014417
3679       45     472      3 881014417
5317       45     127      5 881007272
6160       45     121      4 881013563
7761       45    1061      2 881016056
8070       45     476      3 881015729
8262       45     276      5 881012184
9865       45      15      4 881012184
11471      45     100      5 881010742
11876      45     237      4 881008636
14556      45     952      4 881014247
15145      45       1      5 881013176
15225      45      21      3 881014193
17966      45    1060      3 881012184
18704      45     108      4 881014620
20876      45     823      4 881014785
21537      45     225      4 881014070
22447      45     762      4 881013563
24011      45     934      2 881015860
25704      45     820      4 881015860
25798      45     756    