# Comics Rx
## [A comic book recommendation system](https://github.com/MangrobanGit/comics_rx)
<img src="https://images.unsplash.com/photo-1514329926535-7f6dbfbfb114?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2850&q=80" width="400" align='left'>

# Goal: Develop Matrix Functions for App

### Calculate Implicit Ratings Matrix

as inspired by John Naujoks, suggested by Miles Erickson

---

# Libraries

In [32]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
# %autoreload 1 #would be where you need to specify the files
# %aimport comic_recs

import pandas as pd # dataframes
import os
import pickle

# Data storage
from sqlalchemy import create_engine # SQL helper
#import psycopg2 as psql #PostgreSQL DBs

# import necessary libraries
import pyspark
from pyspark.sql import SparkSession
from pyspark.ml.evaluation import RegressionEvaluator
# from pyspark.sql.types import (StructType, StructField, IntegerType
#                                ,FloatType, LongType, StringType)
from pyspark.sql.types import *

import pyspark.sql.functions as F
from pyspark.sql.functions import col, explode, lit, isnan, when, count, lower
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder, TrainValidationSplit
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.sql import DataFrame


import time
import itertools
from functools import reduce
import numpy as np

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [33]:
import sys

In [34]:
sys.path.append('..')

In [35]:
# Custom
import data_fcns as dfc
import keys  # Custom keys lib
import comic_recs as cr

In [36]:
spark = pyspark.sql.SparkSession.builder.master("local[*]").getOrCreate()

In [37]:
# spark config
spark = SparkSession \
    .builder \
    .appName("movie recommendation") \
    .config("spark.driver.maxResultSize", "1g") \
    .config("spark.driver.memory", "1g") \
    .config("spark.executor.memory", "4g") \
    .config("spark.master", "local[*]") \
    .getOrCreate()
# get spark context
#sc = spark.sparkContext

# New Users

Let's develop an input process that uses titles, rather than pulling the specific person's info from the modeling data.

In [38]:
comics_df = spark.read.json('support_data/comics.json')
comics_df.persist()

DataFrame[comic_id: bigint, comic_title: string, img_url: string]

1. Get the item factors 

In [39]:
# item_factors = als_model.itemFactors.toPandas()

item_factors = pd.read_pickle('support_data/item_factors.pkl')

In [40]:
item_factors.head()

Unnamed: 0,id,features
0,60,"[-0.12471740692853928, -0.13675494492053986, -..."
1,80,"[0.10105234384536743, 0.3860373795032501, -0.7..."
2,110,"[-0.17939595878124237, -0.096303790807724, -1...."
3,140,"[0.1732894778251648, -0.3066675066947937, -0.4..."
4,160,"[-0.3997923731803894, -0.17846563458442688, -0..."


In [41]:
item_factors.columns = ['item_id', 'features']

In [42]:
item_factors.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1515 entries, 0 to 1514
Data columns (total 2 columns):
item_id     1515 non-null int32
features    1515 non-null object
dtypes: int32(1), object(1)
memory usage: 17.8+ KB


In [43]:
item_factors.features[0]

[-0.12471740692853928,
 -0.13675494492053986,
 -1.0376602411270142,
 -0.11408093571662903,
 -0.1141376793384552]

### Create Official item factors matrix or dataframe

In [44]:
item_factors_df = pd.read_pickle('support_data/item_factors.pkl')

In [45]:
item_factors_df.head()

Unnamed: 0,id,features
0,60,"[-0.12471740692853928, -0.13675494492053986, -..."
1,80,"[0.10105234384536743, 0.3860373795032501, -0.7..."
2,110,"[-0.17939595878124237, -0.096303790807724, -1...."
3,140,"[0.1732894778251648, -0.3066675066947937, -0.4..."
4,160,"[-0.3997923731803894, -0.17846563458442688, -0..."


In [46]:
item_factors_df.columns = ['comic_id', 'features']

In [47]:
comics_df.show()

+--------+--------------------+--------------------+
|comic_id|         comic_title|             img_url|
+--------+--------------------+--------------------+
|      17|1 For $1 Axe Cop ...|https://comrx.s3-...|
|      20|1 For $1 Conan th...|https://comrx.s3-...|
|      22|1 For $1 Mass Eff...|https://comrx.s3-...|
|      24|1 For $1 Star War...|https://comrx.s3-...|
|      27|1 For $1 Usagi Yo...|https://comrx.s3-...|
|      18|1 For 1 Baltimore...|https://comrx.s3-...|
|       2|100 Bullets Broth...|https://comrx.s3-...|
|       4|100 Penny Press S...|https://comrx.s3-...|
|       6|100 Penny Press T...|https://comrx.s3-...|
|       8|12 Reasons To Die...|https://comrx.s3-...|
|       9|    13 Coins (Other)|https://comrx.s3-...|
|      11|1602 Witch Hunter...|https://comrx.s3-...|
|      29|2021 Lost Childre...|https://comrx.s3-...|
|      31|23 Skidoo One Sho...|https://comrx.s3-...|
|      36|3 Floyds Alpha Ki...|https://comrx.s3-...|
|      33|30 Days of Night ...|https://comrx.s

#### Get comics info

In [48]:
comics_pdf = comics_df.toPandas()

In [49]:
comics_pdf.head()

Unnamed: 0,comic_id,comic_title,img_url
0,17,1 For $1 Axe Cop Bad Guy Eart (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...
1,20,1 For $1 Conan the Barbarian (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...
2,22,1 For $1 Mass Effect Foundati (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...
3,24,1 For $1 Star Wars Legacy (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...
4,27,1 For $1 Usagi Yojimb (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...


In [50]:
item_factors_df.shape

(1515, 2)

In [51]:
combo = item_factors_df.merge(comics_pdf, left_on='comic_id', right_on='comic_id', how='inner', )

In [52]:
combo.head()

Unnamed: 0,comic_id,features,comic_title,img_url
0,60,"[-0.12471740692853928, -0.13675494492053986, -...",8house (Image),https://comrx.s3-us-west-2.amazonaws.com/cover...
1,80,"[0.10105234384536743, 0.3860373795032501, -0.7...",Action Comics Annual (DC),https://comrx.s3-us-west-2.amazonaws.com/cover...
2,110,"[-0.17939595878124237, -0.096303790807724, -1....",Adventure Time (Boom),https://comrx.s3-us-west-2.amazonaws.com/cover...
3,140,"[0.1732894778251648, -0.3066675066947937, -0.4...",Age of X-Man Amazing Nightcra (Marvel),https://comrx.s3-us-west-2.amazonaws.com/cover...
4,160,"[-0.3997923731803894, -0.17846563458442688, -0...",Aliens Defiance (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...


In order to make slicing easier down the road, set index to comic_id

In [53]:
coms = combo.copy()

In [54]:
coms.set_index(['comic_id'], inplace=True)

In [55]:
coms.head()

Unnamed: 0_level_0,features,comic_title,img_url
comic_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
60,"[-0.12471740692853928, -0.13675494492053986, -...",8house (Image),https://comrx.s3-us-west-2.amazonaws.com/cover...
80,"[0.10105234384536743, 0.3860373795032501, -0.7...",Action Comics Annual (DC),https://comrx.s3-us-west-2.amazonaws.com/cover...
110,"[-0.17939595878124237, -0.096303790807724, -1....",Adventure Time (Boom),https://comrx.s3-us-west-2.amazonaws.com/cover...
140,"[0.1732894778251648, -0.3066675066947937, -0.4...",Age of X-Man Amazing Nightcra (Marvel),https://comrx.s3-us-west-2.amazonaws.com/cover...
160,"[-0.3997923731803894, -0.17846563458442688, -0...",Aliens Defiance (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...


In [56]:
coms.loc[60]

features       [-0.12471740692853928, -0.13675494492053986, -...
comic_title                                       8house (Image)
img_url        https://comrx.s3-us-west-2.amazonaws.com/cover...
Name: 60, dtype: object

In [57]:
comics_pdf.loc[comics_pdf['comic_id']==20]

Unnamed: 0,comic_id,comic_title,img_url
1,20,1 For $1 Conan the Barbarian (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...


In [58]:
combo.shape

(1515, 4)

In [59]:
coms.shape

(1515, 3)

In [60]:
combo.head()

Unnamed: 0,comic_id,features,comic_title,img_url
0,60,"[-0.12471740692853928, -0.13675494492053986, -...",8house (Image),https://comrx.s3-us-west-2.amazonaws.com/cover...
1,80,"[0.10105234384536743, 0.3860373795032501, -0.7...",Action Comics Annual (DC),https://comrx.s3-us-west-2.amazonaws.com/cover...
2,110,"[-0.17939595878124237, -0.096303790807724, -1....",Adventure Time (Boom),https://comrx.s3-us-west-2.amazonaws.com/cover...
3,140,"[0.1732894778251648, -0.3066675066947937, -0.4...",Age of X-Man Amazing Nightcra (Marvel),https://comrx.s3-us-west-2.amazonaws.com/cover...
4,160,"[-0.3997923731803894, -0.17846563458442688, -0...",Aliens Defiance (Dark Horse),https://comrx.s3-us-west-2.amazonaws.com/cover...


In [62]:
coms.to_pickle('support_data/comics_factors_use.pkl')