<a href="https://colab.research.google.com/github/JKwanLee/recomm_practice/blob/initial_commit/recomm_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2018 Google LLC.

# Introduction

We will create a movie recommendation system based on the [MovieLens](https://movielens.org/) dataset available [here](http://grouplens.org/datasets/movielens/).  The data consists of movies ratings (on a scale of 1 to 5).

## Outline
  1. Exploring the MovieLens Data (10 minutes)
  1. Preliminaries (25 minutes)
  1. Training a matrix factorization model (15 minutes)
  1. Inspecting the Embeddings (15 minutes)
  1. Regularization in matrix factorization (15 minutes)
  1. Softmax model training (30 minutes)

## Setup

Let's get started by importing the required packages.

In [1]:
# title Imports 
from __future__ import print_function

import numpy as np
import pandas as pd
import collections
from mpl_toolkits.mplot3d import Axes3D
from IPython import display
from matplotlib import pyplot as plt
import sklearn
import sklearn.manifold
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
tf.logging.set_verbosity(tf.logging.ERROR)

Instructions for updating:
non-resource variables are not supported in the long term


In [2]:
# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format

def mask(df, key, function):
  """Returns a filtered dataframe, by applying function to key"""
  return df[function(df[key])]

def flatten_cols(df):
  df.columns = [' '.join(col).strip() for col in df.columns.values]
  return df

pd.DataFrame.mask = mask
pd.DataFrame.flatten_cols = flatten_cols


In [3]:
# Install Altair and activate its colab renderer.
print("Installing Altair...")
!pip install git+git://github.com/altair-viz/altair.git
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
alt.renderers.enable('colab')
print("Done installing Altair.")

Installing Altair...
Collecting git+git://github.com/altair-viz/altair.git
  Cloning git://github.com/altair-viz/altair.git to /tmp/pip-req-build-nfqjujki
  Running command git clone -q git://github.com/altair-viz/altair.git /tmp/pip-req-build-nfqjujki
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Building wheels for collected packages: altair
  Building wheel for altair (PEP 517) ... [?25l[?25hdone
  Created wheel for altair: filename=altair-4.2.0.dev0-py3-none-any.whl size=732855 sha256=6e8934443926ba0d8f3e226fa6d9fbaf1591f77cc84918e910e72ed99fd95393
  Stored in directory: /tmp/pip-ephem-wheel-cache-tliu_5s9/wheels/06/13/e0/5bd72c969fe3954ee1561739e5c58e2ddfe5c10fcdffb12faa
Successfully built altair
Installing collected packages: altair
  Attempting uninstall: altair
    Found existing installation: altair 4.1.0
    Uninstalling altair-4.1.0:
      Successfully uninst

In [4]:
# Download MovieLens data.
print("Downloading movielens data...")
from urllib.request import urlretrieve
import zipfile

urlretrieve("http://files.grouplens.org/datasets/movielens/ml-100k.zip", "movielens.zip")
zip_ref = zipfile.ZipFile('movielens.zip', "r")
zip_ref.extractall()
print("Done. Dataset contains:")
print(zip_ref.read('ml-100k/u.info'))


Downloading movielens data...
Done. Dataset contains:
b'943 users\n1682 items\n100000 ratings\n'


In [5]:
pwd

'/content'

In [8]:
!ls -l

total 4820
drwxr-xr-x 2 root root    4096 Sep  6 11:38 ml-100k
-rw-r--r-- 1 root root 4924029 Sep  6 11:38 movielens.zip
drwxr-xr-x 1 root root    4096 Sep  1 19:26 sample_data


In [9]:
!ls -l ml-100k

total 15776
-rw-r--r-- 1 root root     716 Sep  6 11:38 allbut.pl
-rw-r--r-- 1 root root     643 Sep  6 11:38 mku.sh
-rw-r--r-- 1 root root    6750 Sep  6 11:38 README
-rw-r--r-- 1 root root 1586544 Sep  6 11:38 u1.base
-rw-r--r-- 1 root root  392629 Sep  6 11:38 u1.test
-rw-r--r-- 1 root root 1583948 Sep  6 11:38 u2.base
-rw-r--r-- 1 root root  395225 Sep  6 11:38 u2.test
-rw-r--r-- 1 root root 1582546 Sep  6 11:38 u3.base
-rw-r--r-- 1 root root  396627 Sep  6 11:38 u3.test
-rw-r--r-- 1 root root 1581878 Sep  6 11:38 u4.base
-rw-r--r-- 1 root root  397295 Sep  6 11:38 u4.test
-rw-r--r-- 1 root root 1581776 Sep  6 11:38 u5.base
-rw-r--r-- 1 root root  397397 Sep  6 11:38 u5.test
-rw-r--r-- 1 root root 1792501 Sep  6 11:38 ua.base
-rw-r--r-- 1 root root  186672 Sep  6 11:38 ua.test
-rw-r--r-- 1 root root 1792476 Sep  6 11:38 ub.base
-rw-r--r-- 1 root root  186697 Sep  6 11:38 ub.test
-rw-r--r-- 1 root root 1979173 Sep  6 11:38 u.data
-rw-r--r-- 1 root root     202 Sep  6 11:38 u.genre
-

In [11]:
!head -10 ml-100k/u.user

1|24|M|technician|85711
2|53|F|other|94043
3|23|M|writer|32067
4|24|M|technician|43537
5|33|F|other|15213
6|42|M|executive|98101
7|57|M|administrator|91344
8|36|M|administrator|05201
9|29|M|student|01002
10|53|M|lawyer|90703


In [12]:
!head -10 ml-100k/u.data

196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
298	474	4	884182806
115	265	2	881171488
253	465	5	891628467
305	451	3	886324817
6	86	3	883603013


In [13]:
!head -10 ml-100k/u.item

1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0
5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0
6|Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)|01-Jan-1995||http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
7|Twelve Monkeys (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Twelve%20Monkeys%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|0|0
8|Babe (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Babe%20(1995)|0|0|0|0|1

In [14]:
# Load each data set (users, movies, and ratings).
users_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv(
    'ml-100k/u.user', sep='|', names=users_cols, encoding='latin-1')

ratings_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv(
    'ml-100k/u.data', sep='\t', names=ratings_cols, encoding='latin-1')
