####  Recommendation system in bigquery ML 

Task 1: Get MovieLens data <br>
In this task you will use the command line to create a BigQuery dataset to store the MovieLens data. The MovieLens
data will then be loaded from a Cloud Storage bucket into the dataset.
Start the Cloud Shell Editor  <br>
To create a BigQuery dataset and load the MovieLens data the Cloud Shell is used. <br>
1.In the GCP Console, click Activate Cloud Shell <br> 
2.If prompted, click Continue. <br>
Create and Load BigQuery Dataset <br>
1.Run the following command to create a BigQuery dataset named movies:
```shell
bq --location=EU mk --dataset movies
```
2.Run the following commands separately in the Cloud Shell:
```shell
bq load --source_format=CSV \
--location=EU \
--autodetect movies.movielens_ratings \
gs://dataeng-movielens/ratings.csv
bq load --source_format=CSV \
--location=EU \
--autodetect movies.movielens_movies_raw \
gs://dataeng-movielens/movies.csv
```


In [None]:
SELECT
COUNT(DISTINCT userId) numUsers,
COUNT(DISTINCT movieId) numMovies,
COUNT(*) totalRatings
FROM
movies.movielens_ratings

In [None]:
SELECT
*   -- movieid , title, genres
FROM
movies.movielens_movies_raw
WHERE
movieId < 5

In [None]:
-- You can see that the genres column is a formatted string. Parse the genres into an array and rewrite the results 
-- into a table named movielens_movies.
CREATE OR REPLACE TABLE
movies.movielens_movies AS
SELECT
* REPLACE(SPLIT(genres, "|") AS genres)
FROM
movies.movielens_movies_raw


Matrix factorization is a collaborative filtering technique that relies on two vectors called the user factors and the item 
factors. The user factors is a low-dimensional representation of a user_id and the item factors similarly represents 
an item_id. <br>
To perform a matrix factorization of our data, you use the typical BigQuery ML syntax except that 
the model_type is matrix_factorization and you have to identify which columns play what roles in the collaborative filtering 
setup. <br>


In [None]:
-- NOTE: The query below is for reference only. Please DO NOT EXECUTE this query in your project
CREATE OR REPLACE MODEL movies.movie_recommender
OPTIONS (model_type='matrix_factorization', user_col='userId', 
item_col='movieId', rating_col='rating', l2_reg=0.2, num_factors=16) AS
SELECT userId, movieId, rating
FROM movies.movielens_ratings

In [None]:
SELECT * FROM ML.EVALUATE(MODEL `cloud-training-prod-bucket.movies.movie_recommender`)

In [None]:
-- predict 
SELECT
*
FROM
ML.PREDICT(MODEL `cloud-training-prod-bucket.movies.movie_recommender`,
(
SELECT
movieId,
title,
903 AS userId
FROM
`movies.movielens_movies`,
UNNEST(genres) g
WHERE
g = 'Comedy' ))
ORDER BY
predicted_rating DESC
LIMIT
5 

![prediction](prediction.png)


In [None]:
-- 2.This result includes movies the user has already seen and rated in the past. Let’s remove them:
SELECT
  *
FROM
  ML.PREDICT(MODEL `cloud-training-prod-bucket.movies.movie_recommender`,
    (
    WITH
      seen AS (
      SELECT
        ARRAY_AGG(movieId) AS movies
      FROM
        movies.movielens_ratings
      WHERE
        userId = 903 )
    SELECT
      movieId,
      title,
      903 AS userId
    FROM
      movies.movielens_movies,
      UNNEST(genres) g,
      seen
    WHERE
      g = 'Comedy'
      AND movieId NOT IN UNNEST(seen.movies) ))
ORDER BY
  predicted_rating DESC
LIMIT
  5



###  Task 5:Apply customer targeting
In this task you will look at how to identify the top-rated movies for a specific user. Sometimes, you have a product and 
have to find the customers who are likely to appreciate it.
1.You wish to get more reviews for movieId=96481 which has only one rating and you wish to send coupons to the 100 
users who are likely to rate it the highest. Identify those users using:


In [None]:
SELECT
  *
FROM
  ML.PREDICT(MODEL `cloud-training-prod-bucket.movies.movie_recommender`,
    (
    WITH
      allUsers AS (
      SELECT
        DISTINCT userId
      FROM
        movies.movielens_ratings )
    SELECT
      96481 AS movieId,
      (
      SELECT
        title
      FROM
        movies.movielens_movies
      WHERE
        movieId=96481) title,
      userId
    FROM
      allUsers ))
ORDER BY
  predicted_rating DESC
LIMIT
  100


### Task 6: Perform Batch predictions for all users and movies

In [None]:
SELECT
  *
FROM
  ML.RECOMMEND(MODEL `cloud-training-prod-bucket.movies.movie_recommender`)
LIMIT 
  100000
