## How-to guide for Related Items use-case on Abacus.AI platform
This notebook provides you with a hands on environment to build a model that suggests related items using the Abacus.AI Python Client Library.

We'll be using the [User Item Recommendations](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/user_movie_ratings.csv), [Movie Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/movies_metadata.csv), and [User Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/users_metadata.csv) datasets, each of which has information about the user and/or their choice of movies.

1. Install the Abacus.AI library.

In [None]:
!pip install abacusai

We'll also import pandas and pprint tools for visualization in this notebook.

In [1]:
import pandas as pd # A tool we'll use to download and preview CSV files
import pprint # A tool to pretty print dictionary outputs
pp = pprint.PrettyPrinter(indent=2)

2. Add your Abacus.AI [API Key](https://abacus.ai/app/profile/apikey) generated using the API dashboard as follows:

In [2]:
#@title Abacus.AI API Key

api_key = '2fdecde877dc45fab937eff82b70eff0'  #@param {type: "string"}

3. Import the Abacus.AI library and instantiate a client.

In [3]:
from abacusai import ApiClient
client = ApiClient(api_key)

## 1. Create a Project

Abacus.AI projects are containers that have datasets and trained models. By specifying a business **Use Case**, Abacus.AI tailors the deep learning algorithms to produce the best performing model possible for your data.

We'll call the `list_use_cases` method to retrieve a list of the available Use Cases currently available on the Abacus.AI platform.

In [4]:
client.list_use_cases()

[UseCase(use_case='UCPLUGANDPLAY',
   pretty_name='Plug & Play Your Tensorflow Model',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='EMBEDDINGS_ONLY',
   pretty_name='Vector Matching Engine',
   description='Upload embeddings and leverage our similarity search infrastructure.. Scale to high traffic, update your index in near realtime'),
 UseCase(use_case='MODEL_WITH_EMBEDDINGS',
   pretty_name='Tensorflow Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='TORCH_MODEL_WITH_EMBEDDINGS',
   pretty_name='PyTorch Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. H

In this notebook, we're going to create a model that suggests related items using the User Item Recommendations, Movie Attributes, and User Attributes datasets. The 'USER_RELATED' use case is best tailored for this situation. For the purpose of taking an example, we will use the IMDB movie dataset that has movie metadata, user metadata, and user-movie ratings.

In [5]:
#@title Abacus.AI Use Case

use_case = 'USER_RELATED'  #@param {type: "string"}

By calling the `describe_use_case_requirements` method we can view what datasets are required for this use_case.

In [6]:
for requirement in client.describe_use_case_requirements(use_case):
  pp.pprint(requirement.to_dict())

{ 'allowed_feature_mappings': { 'ACTION_TYPE': { 'allowed_feature_types': [ 'CATEGORICAL'],
                                                 'description': 'This is an '
                                                                'optional '
                                                                'column that '
                                                                'specifies the '
                                                                'type of '
                                                                'action the '
                                                                'user took. '
                                                                'This could '
                                                                'include any '
                                                                'action that '
                                                                'is specific '
                                                

Finally, let's create the project.

In [7]:
related_items_project = client.create_project(name='Related Movies', use_case=use_case)
related_items_project.to_dict()

{'project_id': '15e31351a2',
 'name': 'Related Movies',
 'use_case': 'USER_RELATED',
 'created_at': '2021-11-23T19:28:45+00:00',
 'feature_groups_enabled': True}

**Note: When feature_groups_enabled is False then the use case does not support feature groups (collection of ML features). Therefore, Datasets are created at the organization level and tied to a project to further use them for training ML models**

## 2. Add Datasets to your Project

Abacus.AI can read datasets directly from `AWS S3` or `Google Cloud Storage` buckets, otherwise you can also directly upload and store your datasets with Abacus.AI. For this notebook, we will have Abacus.AI read the datasets directly from a public S3 bucket's location.

We are using three datasets for this notebook. We'll tell Abacus.AI how the datasets should be used when creating it by tagging each dataset with a special Abacus.AI **Dataset Type**.
- [User Item Recommendations](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/user_movie_ratings.csv) (**USER_ITEM_INTERACTIONS**): 
This dataset contains information about multiple users' ratings of movies with specified IDs.
- [Movie Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/movies_metadata.csv) (**CATALOG_ATTRIBUTES**): This dataset contains attributes about movies with specified IDs, such as each movie's name and genre.
- [User Attributes](https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/users_metadata.csv) (**USER_ATTRIBUTES**): This dataset contains information about users with specified IDs, such as their age, gender, occupation, and zip code. 

### Add the datasets to Abacus.AI

First we'll use Pandas to preview the files, then add them to Abacus.AI.

In [8]:
pd.read_csv('https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/user_movie_ratings.csv')

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,3408,4,978300275
2,1,2355,5,978824291
3,1,1287,5,978302039
4,1,2804,5,978300719
...,...,...,...,...
575276,6040,1089,4,956704996
575277,6040,1094,5,956704887
575278,6040,562,5,956704746
575279,6040,1096,4,956715648


In [9]:
pd.read_csv('https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/movies_metadata.csv')

Unnamed: 0,movie_id,movie,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
3878,3948,Meet the Parents (2000),Comedy
3879,3949,Requiem for a Dream (2000),Drama
3880,3950,Tigerland (2000),Drama
3881,3951,Two Family House (2000),Drama


In [10]:
pd.read_csv('https://s3.amazonaws.com//realityengines.exampledatasets/user_recommendations/users_metadata.csv')

Unnamed: 0,user_id,gender,age,occupation,zip_code
0,1,F,Under 18,K-12 student,48067
1,2,M,56+,self-employed,70072
2,3,M,25-34,scientist,55117
3,4,M,45-49,executive/managerial,02460
4,5,M,25-34,writer,55455
...,...,...,...,...,...
6035,6036,F,25-34,scientist,32603
6036,6037,F,45-49,academic/educator,76006
6037,6038,F,56+,academic/educator,14706
6038,6039,F,45-49,other,01060


Using the Create Dataset API, we can tell Abacus.AI the public S3 URI of where to find the datasets. We will also give each dataset a Refresh Schedule, which tells Abacus.AI when it should refresh the dataset (take an updated/latest copy of the dataset).

If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language: [https://crontab.guru/#0_12_\*_\*_\*](https://crontab.guru/#0_12_*_*_*)

**Note: This cron string will be evaluated in UTC time zone**

In [19]:
user_item_dataset = client.create_dataset_from_file_connector(name='User Item Recommendations', table_name='User_Item_Recommendations',
                                     location='s3://realityengines.exampledatasets/user_recommendations/user_movie_ratings.csv',
                                     refresh_schedule='0 12 * * *')

movie_attributes_dataset = client.create_dataset_from_file_connector(name='Movie Attributes', table_name='Movie_Attributes',
                                     location='s3://realityengines.exampledatasets/user_recommendations/movies_metadata.csv',
                                     refresh_schedule='0 12 * * *')

user_attributes_dataset = client.create_dataset_from_file_connector(name='User Attributes', table_name='User_Attributes',
                                     location='s3://realityengines.exampledatasets/user_recommendations/users_metadata.csv',
                                     refresh_schedule='0 12 * * *')

datasets = [user_item_dataset, movie_attributes_dataset, user_attributes_dataset]

## 3. Create Feature Groups and add them to your Project

Datasets are created at the organization level and can be used to create feature groups as follows:

In [23]:
feature_group = client.create_feature_group(table_name='Related_Items1',sql='select * from User_Item_Recommendations')

Adding Feature Group to the project:

In [24]:
client.add_feature_group_to_project(feature_group_id=feature_group.feature_group_id,project_id = related_items_project.project_id)

Setting the Feature Group type according to the use case requirements:

In [25]:
client.set_feature_group_type(feature_group_id=feature_group.feature_group_id, project_id = related_items_project.project_id, feature_group_type= "USER_ITEM_INTERACTIONS")

Check current Feature Group schema:

In [26]:
client.get_feature_group_schema(feature_group_id=feature_group.feature_group_id)

[Feature(name='user_id',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations12',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='movie_id',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations12',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='rating',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations12',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='timestamp',
   select_clause=None,
   feature_mapping=None

#### For each **Use Case**, there are special **Column Mappings** that must be applied to a column to fulfill use case requirements. We can find the list of available **Column Mappings** by calling the *Describe Use Case Requirements* API:

In [27]:
client.describe_use_case_requirements(use_case)[0].allowed_feature_mappings

{'ITEM_ID': {'description': 'This is the unique identifier of each item in your catalog. This is typically your product id, article id, or the video id.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'USER_ID': {'description': 'This is a unique identifier of each user in your user base.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'ACTION_TYPE': {'description': 'This is an optional column that specifies the type of action the user took. This could include any action that is specific to you (e.g., view, click, purchase, rating, comment, like, etc). You can always upload a dataset that has no action_type column if all the actions in the dataset are the same (e.g., a dataset of only purchases or clicks).',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': False},
 'TIMESTAMP': {'description': 'The timestamp when a particular action occurred.',
  'allowed_feature_types': ['TIMESTAMP'],
  'required': False},
 'ACTION_WEIGHT': {'description

In [28]:
client.set_feature_mapping(project_id=related_items_project.project_id, feature_group_id= feature_group.feature_group_id, feature_name='movie_id', feature_mapping='ITEM_ID')
client.set_feature_mapping(project_id=related_items_project.project_id, feature_group_id= feature_group.feature_group_id,feature_name='user_id', feature_mapping='USER_ID')
client.set_feature_mapping(project_id=related_items_project.project_id, feature_group_id= feature_group.feature_group_id,feature_name='timestamp', feature_mapping='TIMESTAMP')

[Feature(name='user_id',
   select_clause=None,
   feature_mapping='USER_ID',
   source_table='User_Item_Recommendations12',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='movie_id',
   select_clause=None,
   feature_mapping='ITEM_ID',
   source_table='User_Item_Recommendations12',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='rating',
   select_clause=None,
   feature_mapping=None,
   source_table='User_Item_Recommendations12',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='timestamp',
   select_clause=None,
   feature_ma

For each required Feature Group Type within the use case, you must assign the Feature group to be used for training the model:

In [29]:
client.use_feature_group_for_training(project_id=related_items_project.project_id, feature_group_id=feature_group.feature_group_id)

Now that we've our feature groups assigned, we're almost ready to train a model!

To be sure that our project is ready to go, let's call project.validate to confirm that all the project requirements have been met:

In [30]:
related_items_project.validate()

ProjectValidation(valid=True,
  dataset_errors=[],
  column_hints={})

## 4. Train a Model

For each **Use Case**, Abacus.AI has a bunch of options for training. We can call the *Get Training Config Options* API to see the available options.

In [31]:
related_items_project.get_training_config_options()

[TrainingConfigOptions(name='TEST_SPLIT',
   data_type='INTEGER',
   value_type=None,
   value_options=None,
   value=None,
   default=None,
   options={'range': [5, 20]},
   description='Percent of dataset to use for test data. We support using a range between 6% to 20% of your dataset to use as test data.',
   required=None,
   last_model_value=None),
 TrainingConfigOptions(name='DROPOUT_RATE',
   data_type='INTEGER',
   value_type=None,
   value_options=None,
   value=None,
   default=None,
   options={'range': [0, 90]},
   description='Dropout percentage rate.',
   required=None,
   last_model_value=None),
 TrainingConfigOptions(name='BATCH_SIZE',
   data_type='ENUM',
   value_type=None,
   value_options=None,
   value=None,
   default=None,
   options={'values': [8, 16, 32, 64, 128, 256, 384, 512, 740, 1024]},
   description='Batch size.',
   required=None,
   last_model_value=None),
 TrainingConfigOptions(name='SKIP_HISTORY_FILTERING',
   data_type='BOOLEAN',
   value_type=None,


In this notebook, we'll just train with the default options, but definitely feel free to experiment, especially if you have familiarity with Machine Learning.

In [32]:
related_items_model = related_items_project.train_model(training_config={})
related_items_model.to_dict()

{'name': 'Related Movies Model',
 'model_id': '1bb3a9d48',
 'model_config': {},
 'created_at': '2021-11-24T01:56:45+00:00',
 'project_id': '15e31351a2',
 'shared': False,
 'shared_at': None,
 'train_function_name': None,
 'predict_function_name': None,
 'training_input_tables': None,
 'source_code': None,
 'location': None,
 'refresh_schedules': None,
 'latest_model_version': {'model_version': 'a0d9f3f46',
  'status': 'PENDING',
  'model_id': '1bb3a9d48',
  'model_config': {},
  'training_started_at': None,
  'training_completed_at': None,
  'dataset_versions': None,
  'error': None,
  'pending_deployment_ids': None,
  'failed_deployment_ids': None}}

After we start training the model, we can call this blocking call that routinely checks the status of the model until it is trained and evaluated:

In [33]:
related_items_model.wait_for_evaluation()

Model(name='Related Movies Model',
  model_id='1bb3a9d48',
  model_config={},
  created_at='2021-11-24T01:56:45+00:00',
  project_id='15e31351a2',
  shared=False,
  shared_at=None,
  train_function_name=None,
  predict_function_name=None,
  training_input_tables=None,
  source_code=None,
  location=None,
  refresh_schedules=None,
  latest_model_version=ModelVersion(model_version='a0d9f3f46',
  status='COMPLETE',
  model_id='1bb3a9d48',
  model_config={},
  training_started_at='2021-11-24T01:58:40+00:00',
  training_completed_at='2021-11-24T02:34:49+00:00',
  dataset_versions=['20674efe8'],
  error=None,
  pending_deployment_ids=[],
  failed_deployment_ids=[]))

**Note that model training might take some minutes to some hours depending upon the size of datasets, complexity of the models being trained and a variety of other factors**

## **Checkpoint** [Optional]
As model training can take an hours to complete, your page could time out or you might end up hitting the refresh button, this section helps you restore your progress:

In [None]:
!pip install abacusai
import pandas as pd
import pprint
pp = pprint.PrettyPrinter(indent=2)
api_key = ''  #@param {type: "string"}
from abacusai import ApiClient
client = ApiClient(api_key)
related_items_project = next(project for project in client.list_projects() if project.name == 'Related Movies')
related_items_model = related_items_project.list_models()[-1]
related_items_model.wait_for_evaluation()

## Evaluate your Model Metrics

After your model is done training you can inspect the model's quality by reviewing the model's metrics:

In [34]:
pp.pprint(related_items_model.get_metrics().to_dict())

{ 'baseline_metrics': None,
  'metrics': { 'coverage': 0.4511746391168978,
               'map': 0.061221039683096536,
               'map@10': 0.06970096965622921,
               'map@5': 0.08421448087431693,
               'mrr': 0.24470498577438823,
               'ndcg': 0.330879625622261,
               'ndcg@10': 0.28277192204812096,
               'ndcg@5': 0.24688789200373065,
               'personalization@10': 0.966502239137158},
  'model_id': '1bb3a9d48',
  'model_version': 'a0d9f3f46',
  'target_column': None}


To get a better understanding on what these metrics mean, visit our [documentation](https://abacus.ai/app/help/useCases/USER_RELATED/training) page.

## 5. Deploy Model

After the model has been trained, we need to deploy the model to be able to start making predictions. Deploying a model will reserve cloud resources to host the model for Realtime and/or batch predictions.

In [35]:
related_items_deployment = client.create_deployment(name='Related Items Deployment',description='Related Items Deployment',model_id=related_items_model.model_id)
related_items_deployment.wait_for_deployment()

Deployment(deployment_id='b74e9e9f2',
  name='Related Items Deployment',
  status='ACTIVE',
  description='Related Items Deployment',
  deployed_at='2021-11-24T02:35:50+00:00',
  created_at='2021-11-24T02:35:22+00:00',
  project_id='15e31351a2',
  model_id='1bb3a9d48',
  model_version='a0d9f3f46',
  feature_group_id=None,
  feature_group_version=None,
  calls_per_second=5,
  auto_deploy=True,
  regions=[{'name': 'Us East 1', 'value': 'us-east-1'}],
  error=None,
  refresh_schedules=None)

After the model is deployed, we need to create a deployment token for authenticating prediction requests. This token is only authorized to predict on deployments in this project, so it's safe to embed this token inside of a user-facing application or website.

In [36]:
deployment_token = related_items_project.create_deployment_token().deployment_token
deployment_token

'1012ff1a3e644ea9a9064e0520c0dd81'

## 6. Predict


Now that you have an active deployment and a deployment token to authenticate requests, you can make the `get_related_items` API call below.

This command will return a list of related items based on the provided user_id (1) and movie_id (466). The related items list would be determined based on what movies the user liked in the past and how the movies and users are related to each other depending on their attributes.



In [37]:
ApiClient().get_related_items(deployment_token=deployment_token,
               deployment_id=related_items_deployment.deployment_id,
               query_data={"user_id":"1","movie_id":"466"})

[{'movie_id': '370'},
 {'movie_id': '1377'},
 {'movie_id': '3869'},
 {'movie_id': '780'},
 {'movie_id': '380'},
 {'movie_id': '520'},
 {'movie_id': '3208'},
 {'movie_id': '1445'},
 {'movie_id': '3107'},
 {'movie_id': '2027'},
 {'movie_id': '368'},
 {'movie_id': '1517'},
 {'movie_id': '733'},
 {'movie_id': '1391'},
 {'movie_id': '157'},
 {'movie_id': '3253'},
 {'movie_id': '3033'},
 {'movie_id': '10'},
 {'movie_id': '1370'},
 {'movie_id': '135'},
 {'movie_id': '2628'},
 {'movie_id': '1580'},
 {'movie_id': '1552'},
 {'movie_id': '2115'},
 {'movie_id': '2002'},
 {'movie_id': '592'},
 {'movie_id': '1676'},
 {'movie_id': '165'},
 {'movie_id': '1918'},
 {'movie_id': '292'},
 {'movie_id': '1405'},
 {'movie_id': '2478'},
 {'movie_id': '2335'},
 {'movie_id': '457'},
 {'movie_id': '1923'},
 {'movie_id': '2170'},
 {'movie_id': '2916'},
 {'movie_id': '112'},
 {'movie_id': '3256'},
 {'movie_id': '3440'},
 {'movie_id': '1196'},
 {'movie_id': '2617'},
 {'movie_id': '1485'},
 {'movie_id': '1608'},
 {'