# How-to guide for Personalized Recommendations use-case on Abacus.AI platform
This notebook provides you with a hands on environment to build a model that creates personalized recommendations using the Abacus.AI Python Client Library.

We'll be using the [User Item Recommendations](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/user_movie_ratings.csv), [Movie Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/movies_metadata.csv), and [User Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/users_metadata.csv) datasets, each of which has information about the user and/or their choice of movies.


## Table of content
[Installation and imports](#scrollTo=-CHABbdhcDZg)

[1. Create a Project](#scrollTo=j_6LiH43cM9Z)

[2. Add Datasets to your Project](#scrollTo=8O41vBUQcgxN)

[3. Train a Model](#scrollTo=RWvYvPEmdfg7)

[(Checkpoint)](#scrollTo=C0mIg2VHdnfA)

[4. Evaluate your Model Metrics](#scrollTo=jBK2e1WNd6L3)

[5. Deploy Model](#scrollTo=Xc5YAK8veBt1)

[6. Make Prediction](#scrollTo=BzFpIsJ_eGmk)

## Installation and imports

1. Install the Abacus.AI library.

In [1]:
!pip install abacusai



We'll also import pandas for visualization in this notebook.

In [2]:
import pandas as pd  # A tool we'll use to download and preview CSV files
pd.set_option('display.max_colwidth', None)  # We set the max_colwidth to None to have an unlimited width of characters

2. Add your Abacus.AI [API Key](https://abacus.ai/app/profile/apikey) generated using the API dashboard as follows:

In [3]:
#@title Abacus.AI API Key
api_key = '2fdecde877dc45fab937eff82b70eff0'  #@param {type: "string"}

3. Import the Abacus.AI library and instantiate a client

In [4]:
from abacusai import ApiClient
client = ApiClient(api_key)

## 1. Create a Project

Abacus.AI projects are containers that have datasets and trained models. By specifying a business **Use Case**, Abacus.AI tailors the deep learning algorithms to produce the best performing model possible for your data.

We'll call the `list_use_cases` method to retrieve a list of the available Use Cases currently available on the Abacus.AI platform.

In [5]:
use_cases = client.list_use_cases()
use_cases

[UseCase(use_case='UCPLUGANDPLAY',
   pretty_name='Plug & Play Your Tensorflow Model',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='EMBEDDINGS_ONLY',
   pretty_name='Vector Matching Engine',
   description='Upload embeddings and leverage our similarity search infrastructure.. Scale to high traffic, update your index in near realtime'),
 UseCase(use_case='MODEL_WITH_EMBEDDINGS',
   pretty_name='Tensorflow Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='TORCH_MODEL_WITH_EMBEDDINGS',
   pretty_name='PyTorch Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. H

We can use pandas to pretty-print the use cases.

In [6]:
pd.DataFrame(use_case.to_dict() for use_case in use_cases)

Unnamed: 0,use_case,pretty_name,description
0,UCPLUGANDPLAY,Plug & Play Your Tensorflow Model,Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!
1,EMBEDDINGS_ONLY,Vector Matching Engine,"Upload embeddings and leverage our similarity search infrastructure.. Scale to high traffic, update your index in near realtime"
2,MODEL_WITH_EMBEDDINGS,Tensorflow Model With Vector Matching Engine,Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!
3,TORCH_MODEL_WITH_EMBEDDINGS,PyTorch Model With Vector Matching Engine,Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!
4,PYTHON_MODEL,Custom Python Model,Upload your training code and let Abacus.AI handle training. Host your models on our infrastructure and get a JSON api with auto scaling and more!
5,DOCKER_MODEL,Plug & Play Your Dockerized Model,Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!
6,DOCKER_MODEL_WITH_EMBEDDINGS,Plug & Play Your Dockerized Model with Vector Matching Engine,Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!
7,CUSTOMER_CHURN,Customer Churn Prediction,Identify customers who are most likely to churn out of your system and send them marketing promotions/emails to retain them. Deploy a real-time deep learning model that identifies customers who are most likely to leave and increase retention.
8,ENERGY,Real-Time Forecasting,"Accurately forecast energy or computation usage in real-time. Make downstream planning decisions based on your predictions. We use generative modeling (GANs) to augment your dataset with synthetic data. This unique approach allows us to make accurate predictions in real-time, even when you have little historical data."
9,FINANCIAL_METRICS,Financial Metrics Forecasting,"Accurately plan your cash flow, revenue, and sales with state-of-the-art deep learning-based forecasting. We use generative modeling (GANs) to augment your dataset with synthetic data. This unique approach allows us to make accurate predictions, even when you have little historical data."


In this notebook, we're going to create a model that creates personalized recommendations using the User Item Recommendations, Movie Attributes, and User Attributes datasets. The **USER_RECOMMENDATIONS** use case is best tailored for this situation. For the purpose of taking an example, we will use the IMDB movie dataset that has movie metadata, user metadata, and user-movie ratings.

In [7]:
#@title Abacus.AI Use Case

use_case = 'USER_RECOMMENDATIONS'  #@param {type: "string"}

By calling the `describe_use_case_requirements` method we can view what datasets are required for this use_case.

In [8]:
requirements = client.describe_use_case_requirements(use_case)
pd.DataFrame(requirement.to_dict() for requirement in requirements)

Unnamed: 0,dataset_type,name,description,required,allowed_feature_mappings,allowed_nested_feature_mappings
0,USER_ITEM_INTERACTIONS,User-Item Interactions,"This dataset corresponds to all the user-item interactions on your website or application. For example, all the actions (e.g. click, purchase, view) taken by a particular user on a particular item (e.g product, video. article) recorded as a time-based log.",True,"{'ITEM_ID': {'description': 'This is the unique identifier of each item in your catalog. This is typically your product id, article id, or the video id.', 'allowed_feature_types': ['CATEGORICAL'], 'required': True}, 'USER_ID': {'description': 'This is a unique identifier of each user in your user base.', 'allowed_feature_types': ['CATEGORICAL'], 'required': True}, 'ACTION_TYPE': {'description': 'This is an optional column that specifies the type of action the user took. This could include any action that is specific to you (e.g., view, click, purchase, rating, comment, like, etc). You can always upload a dataset that has no action_type column if all the actions in the dataset are the same (e.g., a dataset of only purchases or clicks).', 'allowed_feature_types': ['CATEGORICAL'], 'required': False}, 'TIMESTAMP': {'description': 'The timestamp when a particular action occurred.', 'allowed_feature_types': ['TIMESTAMP'], 'required': False}, 'ACTION_WEIGHT': {'description': 'This is an optional column that specifies the weight of the action (e.g., video watch time, price of item purchased). This is used to optimize the the model to maximize actions with this value.', 'allowed_feature_types': ['NUMERICAL'], 'required': False}, 'IGNORE': {'description': 'Ignore this column in training', 'multiple': True, 'required': False}}",
1,CATALOG_ATTRIBUTES,Catalog Attributes,"This dataset corresponds to all the information you have in your catalog. If you want to recommend actions instead of items to users, you are welcome to upload an action catalog.",,"{'ITEM_ID': {'description': 'This is a unique identifier of each item in your catalog. This is typically your product id, article id, or video id.', 'allowed_feature_types': ['CATEGORICAL'], 'required': True}, 'PREDICTION_RESTRICT': {'description': 'This is an optional column that is used to restrict predictions to items matching a specific value of this column. If this is set, then the prediction api call will require that a includeFilter specifying a value for this column be included.', 'allowed_feature_types': ['CATEGORICAL'], 'required': False}, 'IGNORE': {'description': 'Ignore this column in training', 'multiple': True, 'required': False}}",
2,USER_ATTRIBUTES,User Attributes,This dataset corresponds to all the attributes or meta-data that you have about your user base. Any user profile information will be relevant here.,,"{'USER_ID': {'description': 'The unique identifier for the user.', 'allowed_feature_types': ['CATEGORICAL'], 'required': True}, 'IGNORE': {'description': 'Ignore this column in training', 'multiple': True, 'required': False}}",


Finally, let's create the project.

In [9]:
recommendations_project = client.create_project(name='Movie Recommendations', use_case=use_case)
recommendations_project.to_dict()

{'project_id': 'f4b4fc54',
 'name': 'Movie Recommendations',
 'use_case': 'USER_RECOMMENDATIONS',
 'created_at': '2021-11-24T18:13:33+00:00',
 'feature_groups_enabled': True}

## 2. Add Datasets to your Project

Abacus.AI can read datasets directly from `AWS S3` or `Google Cloud Storage` buckets, otherwise you can also directly upload and store your datasets with Abacus.AI. For this notebook, we will have Abacus.AI read the datasets directly from a public S3 bucket's location.

We are using three datasets for this notebook. We'll tell Abacus.AI how the datasets should be used when creating them by tagging each dataset with a special Abacus.AI **Dataset Type**.
- [User Item Recommendations](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/user_movie_ratings.csv) (**USER_ITEM_INTERACTIONS**): 
This dataset contains information about multiple users' ratings of movies with specified IDs.
- [Movie Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/movies_metadata.csv) (**CATALOG_ATTRIBUTES**): This dataset contains attributes about movies with specified IDs, such as each movie's name and genre.
- [User Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/users_metadata.csv) (**USER_ATTRIBUTES**): This dataset contains information about users with specified IDs, such as their age, gender, occupation, and zip code. 

### Add the datasets to Abacus.AI

First we'll use Pandas to preview the files, then add them to Abacus.AI.

In [10]:
pd.read_csv('https://s3.amazonaws.com//abacusai.exampledatasets/user_recommendations/user_movie_ratings.csv')

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,3408,4,978300275
2,1,2355,5,978824291
3,1,1287,5,978302039
4,1,2804,5,978300719
...,...,...,...,...
575276,6040,1089,4,956704996
575277,6040,1094,5,956704887
575278,6040,562,5,956704746
575279,6040,1096,4,956715648


In [11]:
pd.read_csv('https://s3.amazonaws.com//abacusai.exampledatasets/user_recommendations/movies_metadata.csv')

Unnamed: 0,movie_id,movie,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
3878,3948,Meet the Parents (2000),Comedy
3879,3949,Requiem for a Dream (2000),Drama
3880,3950,Tigerland (2000),Drama
3881,3951,Two Family House (2000),Drama


In [12]:
pd.read_csv('https://s3.amazonaws.com//abacusai.exampledatasets/user_recommendations/users_metadata.csv')

Unnamed: 0,user_id,gender,age,occupation,zip_code
0,1,F,Under 18,K-12 student,48067
1,2,M,56+,self-employed,70072
2,3,M,25-34,scientist,55117
3,4,M,45-49,executive/managerial,02460
4,5,M,25-34,writer,55455
...,...,...,...,...,...
6035,6036,F,25-34,scientist,32603
6036,6037,F,45-49,academic/educator,76006
6037,6038,F,56+,academic/educator,14706
6038,6039,F,45-49,other,01060


Using the Create Dataset API, we can tell Abacus.AI the public S3 URI of where to find the datasets. We will also give each dataset a Refresh Schedule, which tells Abacus.AI when it should refresh the dataset (take an updated/latest copy of the dataset).

The Refresh Schedule is given with a cron string. For example, when entering "0 12 * * *", the dataset is going to be re-read from the s3 at 12pm UTC, so that no update are missed.

If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language: [https://crontab.guru/#0_12_\*_\*_\*](https://crontab.guru/#0_12_*_*_*)

**Note: This cron string will be evaluated in UTC time zone**

In [15]:
user_item_dataset = client.create_dataset_from_file_connector(
    name='User Item Recommendations',
    table_name='User_Item_Recommendations',
    location='s3://abacusai.exampledatasets/user_recommendations/user_movie_ratings.csv',
    refresh_schedule='0 12 * * *'
)

movie_attributes_dataset = client.create_dataset_from_file_connector(
    name='Movie Attributes',
    table_name='Movie_Attributes',
    location='s3://abacusai.exampledatasets/user_recommendations/movies_metadata.csv',
    refresh_schedule='0 12 * * *'
)

user_attributes_dataset = client.create_dataset_from_file_connector(
    name='User Attributes',
    table_name='User_Attributes',
    location='s3://abacusai.exampledatasets/user_recommendations/users_metadata.csv',
    refresh_schedule='0 12 * * *'
)

datasets = [user_item_dataset, movie_attributes_dataset, user_attributes_dataset]

## 3. Create Feature Groups and add them to your Project

Datasets are created at the organization level and can be used to create feature groups as follows:

In [17]:
feature_group = client.create_feature_group(table_name='personalized_recommendations',sql='SELECT * from User_Item_Recommendations')

Adding Feature Group to the project:

In [18]:
client.add_feature_group_to_project(feature_group_id=feature_group.feature_group_id,project_id = recommendations_project.project_id)

Setting the Feature Group type according to the use case requirements:

In [20]:
client.set_feature_group_type(feature_group_id=feature_group.feature_group_id, project_id = recommendations_project.project_id, feature_group_type= "USER_ITEM_INTERACTIONS")

Check current Feature Group schema:

In [21]:
client.get_feature_group_schema(feature_group_id=feature_group.feature_group_id)

[Feature(name='user_id',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='movie_id',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='rating',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='timestamp',
   select_clause=None,
   feature_mapping=None,
   s

#### For each **Use Case**, there are special **Column Mappings** that must be applied to a column to fulfill use case requirements. We can find the list of available **Column Mappings** by calling the *Describe Use Case Requirements* API:

In [22]:
client.describe_use_case_requirements(use_case)[0].allowed_feature_mappings

{'ITEM_ID': {'description': 'This is the unique identifier of each item in your catalog. This is typically your product id, article id, or the video id.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'USER_ID': {'description': 'This is a unique identifier of each user in your user base.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'ACTION_TYPE': {'description': 'This is an optional column that specifies the type of action the user took. This could include any action that is specific to you (e.g., view, click, purchase, rating, comment, like, etc). You can always upload a dataset that has no action_type column if all the actions in the dataset are the same (e.g., a dataset of only purchases or clicks).',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': False},
 'TIMESTAMP': {'description': 'The timestamp when a particular action occurred.',
  'allowed_feature_types': ['TIMESTAMP'],
  'required': False},
 'ACTION_WEIGHT': {'description

In [23]:
client.set_feature_mapping(project_id = recommendations_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='movie_id',feature_mapping='ITEM_ID')
client.set_feature_mapping(project_id = recommendations_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='user_id',feature_mapping='USER_ID')
client.set_feature_mapping(project_id = recommendations_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='timestamp',feature_mapping='TIMESTAMP')

[Feature(name='user_id',
   select_clause=None,
   feature_mapping='USER_ID',
   source_table='User_Item_Recommendations',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='movie_id',
   select_clause=None,
   feature_mapping='ITEM_ID',
   source_table='User_Item_Recommendations',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='rating',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='timestamp',
   select_clause=None,
   feature_mapping=

For each required Feature Group Type within the use case, you must assign the Feature group to be used for training the model:

In [24]:
client.use_feature_group_for_training(project_id=recommendations_project.project_id, feature_group_id=feature_group.feature_group_id)

Now that we've our feature groups assigned, we're almost ready to train a model!

To be sure that our project is ready to go, let's call project.validate to confirm that all the project requirements have been met:

In [25]:
recommendations_project.validate()

ProjectValidation(valid=True,
  dataset_errors=[],
  column_hints={})

## 4. Train a Model

For each **Use Case**, Abacus.AI has a bunch of options for training. We can call the `get_training_config_options` API to see the available options.

In [26]:
training_config_options = recommendations_project.get_training_config_options()
training_config_options

[TrainingConfigOptions(name='TEST_SPLIT',
   data_type='INTEGER',
   value_type=None,
   value_options=None,
   value=None,
   default=None,
   options={'range': [5, 20]},
   description='Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data.',
   required=None,
   last_model_value=None),
 TrainingConfigOptions(name='DROPOUT_RATE',
   data_type='INTEGER',
   value_type=None,
   value_options=None,
   value=None,
   default=None,
   options={'range': [0, 90]},
   description='Dropout percentage rate.',
   required=None,
   last_model_value=None),
 TrainingConfigOptions(name='BATCH_SIZE',
   data_type='ENUM',
   value_type=None,
   value_options=None,
   value=None,
   default=None,
   options={'values': [8, 16, 32, 64, 128, 256, 384, 512, 740, 1024]},
   description='Batch size.',
   required=None,
   last_model_value=None),
 TrainingConfigOptions(name='SKIP_HISTORY_FILTERING',
   data_type='BOOLEAN',
   value_type=None,


To have a nice display:

In [27]:
pd.DataFrame(training_config_option.to_dict() for training_config_option in training_config_options)

Unnamed: 0,name,data_type,value_type,value_options,value,default,options,description,required,last_model_value
0,TEST_SPLIT,INTEGER,,,,,"{'range': [5, 20]}",Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data.,,
1,DROPOUT_RATE,INTEGER,,,,,"{'range': [0, 90]}",Dropout percentage rate.,,
2,BATCH_SIZE,ENUM,,,,,"{'values': [8, 16, 32, 64, 128, 256, 384, 512, 740, 1024]}",Batch size.,,
3,SKIP_HISTORY_FILTERING,BOOLEAN,,,,False,,Do not remove items which have past interactions from recommendations.,,
4,MAX_HISTORY_LENGTH,INTEGER,,,,,"{'range': [0, 200]}",Maximum length of user-item history to include user in training examples.,,
5,USE_ITEM_ATTRIBUTE_BUCKETING,BOOLEAN,,,,,,"Prefer recommending items which have attribute similarity. Useful when we have natural item categories which are related, like e-commerce categories.",,
6,UNORDERED_HISTORY,BOOLEAN,,,,False,,Order of user item interactions is not important.,,
7,MAX_USER_HISTORY_LEN_PERCENTILE,INTEGER,,,,,"{'range': [95, 100]}",Filter out users with history length above this percentile.,,
8,DOWNSAMPLE_ITEM_POPULARITY_PERCENTILE,DECIMAL,,,,,"{'range': [0.1, 1.0]}",Downsample items more popular than this percentile.,,
9,RECENT_DAYS_FOR_TRAINING,INTEGER,,,,,"{'range': [1, 1000]}",Limit training data to a certain latest number of days.,,


In this notebook, we'll just train with the default options, but definitely feel free to experiment, especially if you have familiarity with Machine Learning (See the description of the parameters [here](https://abacus.ai/app/help/useCases/user_recommendations/training)).

In [28]:
training_config = {
    'BATCH_SIZE': None,
    'DOWNSAMPLE_ITEM_POPULARITY_PERCENTILE': None,
    'DROPOUT_RATE': None,
    'EXCLUDE_TIME_FEATURES': None,
    'IGNORE_ACTION_WEIGHT_COLUMN': None,
    'MAX_HISTORY_LEN': None,
    'MAX_USER_HISTORY_LEN_PERCENTILE': None,
    'RECENT_DAYS_FOR_TRAINING': None,
    'RERANKING_PERSONALIZATION_FACTOR': None,
    'SEARCH_QUERY_COLUMN': None,
    'SKIP_HISTORY_FILTERING': False,
    'TARGET_EVENT_WEIGHTS': None,
    'TEST_ON_USER_SPLIT': False,
    'TEST_SPLIT': None,
    'TRAINING_START_DATE': None,
    'UNORDERED_HISTORY': False
 }

In [29]:
recommendations_model = recommendations_project.train_model(training_config=training_config)
recommendations_model.to_dict()

{'name': 'Movie Recommendations Model',
 'model_id': 'd7db57ac2',
 'model_config': {'MAX_HISTORY_LEN': None,
  'TARGET_EVENT_WEIGHTS': None,
  'EXCLUDE_TIME_FEATURES': None,
  'IGNORE_ACTION_WEIGHT_COLUMN': None,
  'RERANKING_PERSONALIZATION_FACTOR': None},
 'created_at': '2021-11-24T18:52:04+00:00',
 'project_id': 'f4b4fc54',
 'shared': False,
 'shared_at': None,
 'train_function_name': None,
 'predict_function_name': None,
 'training_input_tables': None,
 'source_code': None,
 'location': None,
 'refresh_schedules': None,
 'latest_model_version': {'model_version': '137693df6c',
  'status': 'PENDING',
  'model_id': 'd7db57ac2',
  'model_config': {'MAX_HISTORY_LEN': None,
   'TARGET_EVENT_WEIGHTS': None,
   'EXCLUDE_TIME_FEATURES': None,
   'IGNORE_ACTION_WEIGHT_COLUMN': None,
   'RERANKING_PERSONALIZATION_FACTOR': None},
  'training_started_at': None,
  'training_completed_at': None,
  'dataset_versions': None,
  'error': None,
  'pending_deployment_ids': None,
  'failed_deployment_id

After we start training the model, we can call this blocking call that routinely checks the status of the model until it is trained and evaluated.

In [30]:
recommendations_model.wait_for_evaluation()

Model(name='Movie Recommendations Model',
  model_id='d7db57ac2',
  model_config={'MAX_HISTORY_LEN': None, 'TARGET_EVENT_WEIGHTS': None, 'EXCLUDE_TIME_FEATURES': None, 'IGNORE_ACTION_WEIGHT_COLUMN': None, 'RERANKING_PERSONALIZATION_FACTOR': None},
  created_at='2021-11-24T18:52:04+00:00',
  project_id='f4b4fc54',
  shared=False,
  shared_at=None,
  train_function_name=None,
  predict_function_name=None,
  training_input_tables=None,
  source_code=None,
  location=None,
  refresh_schedules=None,
  latest_model_version=ModelVersion(model_version='137693df6c',
  status='COMPLETE',
  model_id='d7db57ac2',
  model_config={'MAX_HISTORY_LEN': None, 'TARGET_EVENT_WEIGHTS': None, 'EXCLUDE_TIME_FEATURES': None, 'IGNORE_ACTION_WEIGHT_COLUMN': None, 'RERANKING_PERSONALIZATION_FACTOR': None},
  training_started_at='2021-11-24T18:53:59+00:00',
  training_completed_at='2021-11-24T19:48:22+00:00',
  dataset_versions=['6134b4244', '49ec355a8'],
  error=None,
  pending_deployment_ids=[],
  failed_deploy

## **(Checkpoint)**
Training can take an hour or two to complete, but we encourage you to run the remaining calls on your own time. If your page times out or you hit refresh, you can restore your progress in this section.

In [None]:
!pip install abacusai
import pandas as pd
pd.set_option('display.max_colwidth', None)
api_key = ''  #@param {type: "string"}
from abacusai import ApiClient
client = ApiClient(api_key) 
recommendations_project = next(project for project in client.list_projects() if project.name == 'Movie Recommendations')
recommendations_model = recommendations_project.list_models()[-1]
recommendations_model.wait_for_evaluation()

## Evaluate your Model Metrics

After your model is done training you can inspect the model's quality by reviewing the model's metrics.

In [31]:
recommendations_model.get_metrics().to_dict()

{'model_id': 'd7db57ac2',
 'model_version': '137693df6c',
 'metrics': {'ndcg': 0.330879625622261,
  'ndcg@5': 0.24688789200373065,
  'ndcg@10': 0.28277192204812096,
  'map': 0.061221039683096536,
  'map@5': 0.08421448087431693,
  'map@10': 0.06970096965622921,
  'mrr': 0.24470498577438823,
  'personalization@10': 0.966502239137158,
  'coverage': 0.4511746391168978},
 'baseline_metrics': None,
 'target_column': None}

To get a better understanding on what these metrics mean, visit our [documentation](https://abacus.ai/app/help/useCases/USER_RECOMMENDATIONS/training) page.

## 5. Deploy Model

After the model has been trained, we need to deploy the model to be able to start making predictions. Deploying a model will reserve cloud resources to host the model for Realtime and/or batch predictions.

In [37]:
recommendations_deployment = client.create_deployment(name='Personalized Recommendations Deployment',model_id=recommendations_model.model_id)
recommendations_deployment.wait_for_deployment()

Deployment(deployment_id='13b0891610',
  name='Personalized Recommendations Deployment',
  status='ACTIVE',
  description='',
  deployed_at='2021-11-24T19:54:19+00:00',
  created_at='2021-11-24T19:53:50+00:00',
  project_id='f4b4fc54',
  model_id='d7db57ac2',
  model_version='137693df6c',
  feature_group_id=None,
  feature_group_version=None,
  calls_per_second=5,
  auto_deploy=True,
  regions=[{'name': 'Us East 1', 'value': 'us-east-1'}],
  error=None,
  refresh_schedules=None)

After the model is deployed, we need to create a deployment token for authenticating prediction requests. This token is only authorized to predict on deployments in this project, so it's safe to embed this token inside of a user-facing application or website.

In [38]:
deployment_token = recommendations_project.create_deployment_token().deployment_token
deployment_token

'070add66f05f4a7e9aceaa40e0b4a554'

## 6. Make Predictions


Now that you have an active deployment and a deployment token to authenticate requests, you can make the `get_recommendations` API call below.

To see a full description of the prediction API parameters, visit our [documentation](https://abacus.ai/app/help/useCases/USER_RECOMMENDATIONS/predictions) page. 

NB: The REST API keywords described in the documentation use the CamelCase word convention while the Python API one below use the snake case convention, see [here](https://medium.com/better-programming/string-case-styles-camel-pascal-snake-and-kebab-case-981407998841) for more information.

For the purpose of data visualization, we store the source file content in Pandas' dataframes.

In [39]:
movies = pd.read_csv('https://s3.amazonaws.com//abacusai.exampledatasets/user_recommendations/movies_metadata.csv', dtype=object)
users = pd.read_csv('https://s3.amazonaws.com//abacusai.exampledatasets/user_recommendations/users_metadata.csv', dtype=object)

### Select a User ID

The first step is to select a user by inputting his/her user ID.

In [40]:
user_id = "10" #@param {type: "string"}
users[users["user_id"] == user_id]

Unnamed: 0,user_id,gender,age,occupation,zip_code
9,10,F,35-44,academic/educator,95370


### Build the query

The query is a dictionary with the key being the column used as **ITEM_ID** (in our example, the *movie_id* column) and the value being the corresponding ID.

In [41]:
my_query_data = {"user_id": user_id}

### Run the Get Recommendations API

This command will return a list of recommendations for the user with the specified ID. The recommendation would be determined based on what movies the user liked in the past and how the movies and users are related to each other depending on their attributes.


In [42]:
recommendations = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data=my_query_data,
)
recommendations

[{'movie_id': '1721'},
 {'movie_id': '1393'},
 {'movie_id': '2858'},
 {'movie_id': '593'},
 {'movie_id': '1183'},
 {'movie_id': '1639'},
 {'movie_id': '265'},
 {'movie_id': '17'},
 {'movie_id': '2959'},
 {'movie_id': '1907'},
 {'movie_id': '3147'},
 {'movie_id': '34'},
 {'movie_id': '1094'},
 {'movie_id': '1617'},
 {'movie_id': '3044'},
 {'movie_id': '266'},
 {'movie_id': '608'},
 {'movie_id': '296'},
 {'movie_id': '531'},
 {'movie_id': '58'},
 {'movie_id': '25'},
 {'movie_id': '151'},
 {'movie_id': '11'},
 {'movie_id': '2028'},
 {'movie_id': '912'},
 {'movie_id': '1092'},
 {'movie_id': '2329'},
 {'movie_id': '1625'},
 {'movie_id': '2890'},
 {'movie_id': '509'},
 {'movie_id': '2687'},
 {'movie_id': '3148'},
 {'movie_id': '1266'},
 {'movie_id': '50'},
 {'movie_id': '39'},
 {'movie_id': '1680'},
 {'movie_id': '2710'},
 {'movie_id': '2908'},
 {'movie_id': '2085'},
 {'movie_id': '3252'},
 {'movie_id': '534'},
 {'movie_id': '3897'},
 {'movie_id': '1358'},
 {'movie_id': '3105'},
 {'movie_id'

A convenient way to visualize the data is within a Pandas Dataframe, by joining it with the movies dataframe to have the movies' names and genres.

In [43]:
recommendations_df = pd.DataFrame(recommendations).merge(movies, on="movie_id")
recommendations_df

Unnamed: 0,movie_id,movie,genres
0,1721,Titanic (1997),Drama|Romance
1,1393,Jerry Maguire (1996),Drama|Romance
2,2858,American Beauty (1999),Comedy|Drama
3,593,"Silence of the Lambs, The (1991)",Drama|Thriller
4,1183,"English Patient, The (1996)",Drama|Romance|War
5,1639,Chasing Amy (1997),Drama|Romance
6,265,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance
7,17,Sense and Sensibility (1995),Drama|Romance
8,2959,Fight Club (1999),Drama
9,1907,Mulan (1998),Animation|Children's


### Set the number of results per page

For convenience, the number of recommendations is set to 50 per page by default. You can change the default value by editing the value of the `num_items` keyword. The example below sets the number of pages to 10.

In [44]:
recommendations_base = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data=my_query_data,
    num_items=10,
)
recommendations_base_df = pd.DataFrame(recommendations_base).merge(movies, on="movie_id")
recommendations_base_df

Unnamed: 0,movie_id,movie,genres
0,1721,Titanic (1997),Drama|Romance
1,1393,Jerry Maguire (1996),Drama|Romance
2,2858,American Beauty (1999),Comedy|Drama
3,593,"Silence of the Lambs, The (1991)",Drama|Thriller
4,1183,"English Patient, The (1996)",Drama|Romance|War
5,1639,Chasing Amy (1997),Drama|Romance
6,265,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance
7,17,Sense and Sensibility (1995),Drama|Romance
8,2959,Fight Club (1999),Drama
9,1907,Mulan (1998),Animation|Children's


You can easily select the page to display with the keyword `page`. For example, let's say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the `page` keyword will display a list of items that rank from 11th to 20th. 

In [45]:
recommendations_page = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data={"user_id":user_id},
    num_items=10,
    page=2,
)
recommendations_page_df = pd.DataFrame(recommendations_page).merge(movies, on="movie_id")
recommendations_page_df

Unnamed: 0,movie_id,movie,genres
0,3147,"Green Mile, The (1999)",Drama|Thriller
1,34,Babe (1995),Children's|Comedy|Drama
2,1094,"Crying Game, The (1992)",Drama|Romance|War
3,1617,L.A. Confidential (1997),Crime|Film-Noir|Mystery|Thriller
4,3044,Dead Again (1991),Mystery|Romance|Thriller
5,266,Legends of the Fall (1994),Drama|Romance|War|Western
6,608,Fargo (1996),Crime|Drama|Thriller
7,296,Pulp Fiction (1994),Crime|Drama
8,531,"Secret Garden, The (1993)",Children's|Drama
9,58,"Postino, Il (The Postman) (1994)",Drama|Romance


You can add a column with the relative item scores by specifying the column name for the keyword `score_field` (example: "score")

In [46]:
recommendations_score = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data=my_query_data,
    num_items=10,
    score_field = "score"
)
recommendations_score_df = pd.DataFrame(recommendations_score).merge(movies, on="movie_id")
recommendations_score_df

Unnamed: 0,movie_id,score,movie,genres
0,1721,100.0,Titanic (1997),Drama|Romance
1,1393,84.93,Jerry Maguire (1996),Drama|Romance
2,2858,71.64,American Beauty (1999),Comedy|Drama
3,593,68.05,"Silence of the Lambs, The (1991)",Drama|Thriller
4,1183,62.65,"English Patient, The (1996)",Drama|Romance|War
5,1639,56.81,Chasing Amy (1997),Drama|Romance
6,265,56.11,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance
7,17,55.29,Sense and Sensibility (1995),Drama|Romance
8,2959,51.21,Fight Club (1999),Drama
9,1907,45.51,Mulan (1998),Animation|Children's


### Use of Scaling Factors to bias the model toward certain items

You can use a scaling factor to add a bias toward certain items of your items datasets using the keyword `scaling_factor`. 

The input is a list of dictionaries where the format of each dictionary is as follows: {"column": "col0", "values": ["value0", "value1"], "factor": 1.1}. 

The command below is using scaling factors based on the column "Genres" to add positive bias to the comedies with a scaling factor of 3, and a negative bias to the dramas with a scaling factor of 0.25.

We now have a Comedy in the second recommendation for user ID 10 (while in the unscaled run, there was no Comedy in the first 50 recommendations). Similarly there is no more Drama|Romance in her first 10 recommendations.

In [47]:
recommendations_scaling = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data=my_query_data,
    num_items=10,
    score_field = "score",
    scaling_factors=[
        {"column": "genres", "values": ["Comedy"], "factor": 3},
        {"column": "genres", "values": ["Drama"], "factor": 0.25},
    ],
)
recommendations_scaling_df = pd.DataFrame(recommendations_scaling).merge(movies, on="movie_id")
recommendations_scaling_df

Unnamed: 0,movie_id,score,movie,genres
0,1721,100.0,Titanic (1997),Drama|Romance
1,1393,85.28,Jerry Maguire (1996),Drama|Romance
2,2858,70.87,American Beauty (1999),Comedy|Drama
3,593,67.25,"Silence of the Lambs, The (1991)",Drama|Thriller
4,1183,63.14,"English Patient, The (1996)",Drama|Romance|War
5,1639,56.94,Chasing Amy (1997),Drama|Romance
6,265,56.8,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance
7,17,55.99,Sense and Sensibility (1995),Drama|Romance
8,2959,50.94,Fight Club (1999),Drama
9,1907,45.7,Mulan (1998),Animation|Children's


### Item exclusion

You can also exclude certain items from the list of recommendations using the keyword `exclude_items`. The command below is removing with genres Comedy|Romance and Drama|Romance from the recommendations list for the user.

In [48]:
recommendations_excluded = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data=my_query_data,
    score_field = "score",
    exclude_items=[{"column": "genres", "values": ["Comedy|Romance", "Drama|Romance"]}],
    num_items=10,
)
recommendations_excluded_df = pd.DataFrame(recommendations_excluded).merge(movies, on="movie_id")
recommendations_excluded_df

Unnamed: 0,movie_id,score,movie,genres
0,1721,100.0,Titanic (1997),Drama|Romance
1,1393,84.93,Jerry Maguire (1996),Drama|Romance
2,2858,71.64,American Beauty (1999),Comedy|Drama
3,593,68.05,"Silence of the Lambs, The (1991)",Drama|Thriller
4,1183,62.65,"English Patient, The (1996)",Drama|Romance|War
5,1639,56.81,Chasing Amy (1997),Drama|Romance
6,265,56.11,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance
7,17,55.29,Sense and Sensibility (1995),Drama|Romance
8,2959,51.21,Fight Club (1999),Drama
9,1907,45.51,Mulan (1998),Animation|Children's


### Item restriction

You can also restrict the list of recommendations to certain items using the keyword `restrict_items`. 

The input is a list of dictionaries where the format of each dictionary is as follows: {"column": "col0", "values": ["value0", "value1", "value3", ...]}

The command below is returning only comedies and dramas in the user recommendation list.

In [49]:
recommendations_restricted = ApiClient().get_recommendations(
    deployment_token=deployment_token, 
    deployment_id=recommendations_deployment.deployment_id, 
    query_data=my_query_data,
    score_field = "score",
    restrict_items=[{"column": "genres", "values": ["Comedy", "Drama"]}],
    num_items=10,
)
recommendations_restricted_df = pd.DataFrame(recommendations_restricted).merge(movies, on="movie_id")
recommendations_restricted_df

Unnamed: 0,movie_id,score,movie,genres
0,1721,100.0,Titanic (1997),Drama|Romance
1,1393,85.28,Jerry Maguire (1996),Drama|Romance
2,2858,70.87,American Beauty (1999),Comedy|Drama
3,593,67.25,"Silence of the Lambs, The (1991)",Drama|Thriller
4,1183,63.14,"English Patient, The (1996)",Drama|Romance|War
5,1639,56.94,Chasing Amy (1997),Drama|Romance
6,265,56.8,Like Water for Chocolate (Como agua para chocolate) (1992),Drama|Romance
7,17,55.99,Sense and Sensibility (1995),Drama|Romance
8,2959,50.94,Fight Club (1999),Drama
9,1907,45.7,Mulan (1998),Animation|Children's


### Compare the different user recommendations

We can compare the recommendation results with pandas commands.

In [50]:
dataframe = pd.concat(
    [
        recommendations_base_df["movie"],
        recommendations_scaling_df["movie"],
        recommendations_excluded_df["movie"],
        recommendations_restricted_df["movie"],
    ],
    axis=1,
    ignore_index=True
)
dataframe.columns = ["Without Filter", "With Scaling Factor", "With Exclusion List", "With Restricted List"]
dataframe

Unnamed: 0,Without Filter,With Scaling Factor,With Exclusion List,With Restricted List
0,Titanic (1997),Titanic (1997),Titanic (1997),Titanic (1997)
1,Jerry Maguire (1996),Jerry Maguire (1996),Jerry Maguire (1996),Jerry Maguire (1996)
2,American Beauty (1999),American Beauty (1999),American Beauty (1999),American Beauty (1999)
3,"Silence of the Lambs, The (1991)","Silence of the Lambs, The (1991)","Silence of the Lambs, The (1991)","Silence of the Lambs, The (1991)"
4,"English Patient, The (1996)","English Patient, The (1996)","English Patient, The (1996)","English Patient, The (1996)"
5,Chasing Amy (1997),Chasing Amy (1997),Chasing Amy (1997),Chasing Amy (1997)
6,Like Water for Chocolate (Como agua para chocolate) (1992),Like Water for Chocolate (Como agua para chocolate) (1992),Like Water for Chocolate (Como agua para chocolate) (1992),Like Water for Chocolate (Como agua para chocolate) (1992)
7,Sense and Sensibility (1995),Sense and Sensibility (1995),Sense and Sensibility (1995),Sense and Sensibility (1995)
8,Fight Club (1999),Fight Club (1999),Fight Club (1999),Fight Club (1999)
9,Mulan (1998),Mulan (1998),Mulan (1998),Mulan (1998)
