## How-to guide for Predictive Lead Scoring use-case on Abacus.AI platform
This notebook provides you with a hands on environment to build a model that predicts the probability of a user with specified attributes being a lead using the Abacus.AI Python Client Library.

We'll be using the [Individual Company Sales Data](https://s3.amazonaws.com//realityengines.exampledatasets/sales_scoring/company_sales_data_v2.csv) dataset, which contains information about users, their attributes, and their lead IDs.

1. Install the Abacus.AI library.

In [None]:
!pip install abacusai

We'll also import pandas and pprint tools for visualization in this notebook.

In [1]:
import pandas as pd # A tool we'll use to download and preview CSV files
import pprint # A tool to pretty print dictionary outputs
pp = pprint.PrettyPrinter(indent=2)

2. Add your Abacus.AI [API Key](https://abacus.ai/app/profile/apikey) generated using the API dashboard as follows:

In [2]:
#@title Abacus.AI API Key

api_key = '2fdecde877dc45fab937eff82b70eff0'  #@param {type: "string"}

3. Import the Abacus.AI library and instantiate a client.

In [3]:
from abacusai import ApiClient
client = ApiClient(api_key)

## 1. Create a Project

Abacus.AI projects are containers that have datasets and trained models. By specifying a business **Use Case**, Abacus.AI tailors the deep learning algorithms to produce the best performing model catered specifically for your data.

We'll call the `list_use_cases` method to retrieve a list of the Use Cases currently available on the Abacus.AI platform.

In [4]:
client.list_use_cases()

[UseCase(use_case='UCPLUGANDPLAY',
   pretty_name='Plug & Play Your Tensorflow Model',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='EMBEDDINGS_ONLY',
   pretty_name='Vector Matching Engine',
   description='Upload embeddings and leverage our similarity search infrastructure.. Scale to high traffic, update your index in near realtime'),
 UseCase(use_case='MODEL_WITH_EMBEDDINGS',
   pretty_name='Tensorflow Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='TORCH_MODEL_WITH_EMBEDDINGS',
   pretty_name='PyTorch Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. H

In this notebook, we're going to create a predictive lead scoring model using the Individual Company Sales Data dataset. The 'SALES_SCORING' use case is best tailored for this situation.

In [5]:
#@title Abacus.AI Use Case

use_case = 'SALES_SCORING'  #@param {type: "string"}

By calling the `describe_use_case_requirements` method we can view what datasets are required for this use_case.

In [6]:
for requirement in client.describe_use_case_requirements(use_case):
  pp.pprint(requirement.to_dict())

{ 'allowed_feature_mappings': { 'IGNORE': { 'description': 'Ignore this column '
                                                           'in training',
                                            'multiple': True,
                                            'required': False},
                                'LEAD_ID': { 'allowed_feature_types': [ 'CATEGORICAL'],
                                             'description': 'This is a unique '
                                                            'identifier of '
                                                            'each user in the '
                                                            'user base.',
                                             'required': True},
                                'LEAD_SCORE': { 'allowed_feature_types': [ 'CATEGORICAL',
                                                                           'NUMERICAL'],
                                                'description': 'This denotes 

Finally, let's create the project.

In [7]:
lead_scoring_project = client.create_project(name='Predictive Lead Scoring Project', use_case=use_case)
lead_scoring_project.to_dict()

{'project_id': '14f23cb08e',
 'name': 'Predictive Lead Scoring Project',
 'use_case': 'SALES_SCORING',
 'created_at': '2021-11-23T19:18:30+00:00',
 'feature_groups_enabled': True}

**Note: When feature_groups_enabled is True then the use case supports feature groups (collection of ML features). Feature groups are created at the organization level and can be tied to a project to further use it for training ML models**

## 2. Add Datasets to your Project

Abacus.AI can read datasets directly from `AWS S3` or `Google Cloud Storage` buckets, otherwise you can also directly upload and store your datasets with Abacus.AI. For this notebook, we will have Abacus.AI read the datasets directly from a public S3 bucket's location.

We are using one dataset for this notebook. We'll tell Abacus.AI how the dataset should be used when creating it by tagging the dataset with a special Abacus.AI **Dataset Type**.
- [Individual Company Sales Data](https://s3.amazonaws.com//realityengines.exampledatasets/sales_scoring/company_sales_data_v2.csv) (**LEADS**): This dataset contains information about users, their attributes, and their lead IDs.

### Add the dataset to Abacus.AI

First we'll use Pandas to preview the file, then add it to Abacus.AI.

In [8]:
pd.read_csv('https://s3.amazonaws.com//realityengines.exampledatasets/sales_scoring/company_sales_data_v2.csv')

Unnamed: 0,lead_id,flag,gender,education,house_val,age,online,customer_psy,marriage,child,occupation,mortgage,house_owner,region,car_prob,fam_income
0,0,Y,M,4. Grad,756460,1_Unk,N,B,,U,Professional,1Low,,Midwest,1,L
1,1,N,F,3. Bach,213171,7_>65,N,E,,U,Professional,1Low,Owner,Northeast,3,G
2,2,N,M,2. Some College,111147,2_<=25,Y,C,,Y,Professional,1Low,Owner,Midwest,1,J
3,3,Y,M,2. Some College,354151,2_<=25,Y,B,Single,U,Sales/Service,1Low,,West,2,L
4,4,Y,F,2. Some College,117087,1_Unk,Y,J,Married,Y,Sales/Service,1Low,,South,7,H
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39995,39995,Y,F,3. Bach,0,7_>65,Y,C,,U,Retired,1Low,,South,3,F
39996,39996,N,F,1. HS,213596,4_<=45,N,I,Married,U,Blue Collar,1Low,Owner,South,1,D
39997,39997,Y,M,0. <HS,134070,3_<=35,Y,F,Married,U,Sales/Service,1Low,Owner,Midwest,4,E
39998,39998,N,M,1. HS,402210,7_>65,Y,E,,Y,Sales/Service,1Low,,West,2,B


Using the Create Dataset API, we can tell Abacus.AI the public S3 URI of where to find the datasets. We will also give each dataset a Refresh Schedule, which tells Abacus.AI when it should refresh the dataset (take an updated/latest copy of the dataset).

If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language: [https://crontab.guru/#0_12_\*_\*_\*](https://crontab.guru/#0_12_*_*_*)

**Note: This cron string will be evaluated in UTC time zone**

In [10]:
lead_scoring_dataset = client.create_dataset_from_file_connector(name='Individual Company Sales Data',table_name='Individual_Company_Sales_Data',
                                     location='s3://realityengines.exampledatasets/sales_scoring/company_sales_data_v2.csv',
                                     refresh_schedule='0 12 * * *')
datasets = [lead_scoring_dataset]

## 3. Create Feature Groups and add them to your Project

Datasets are created at the organization level and can be used to create feature groups as follows:

In [15]:
feature_group = client.create_feature_group(table_name='Predictive_Lead_Scoring',sql='SELECT * FROM Individual_Comapny_Sales_DataS')

Adding Feature Group to the project:

In [16]:
client.add_feature_group_to_project(feature_group_id=feature_group.feature_group_id,project_id = lead_scoring_project.project_id)

Setting the Feature Group type according to the use case requirements:

In [17]:
client.set_feature_group_type(feature_group_id=feature_group.feature_group_id, project_id = lead_scoring_project.project_id, feature_group_type= "LEADS")

Check current Feature Group schema:

In [18]:
client.get_feature_group_schema(feature_group_id=feature_group.feature_group_id)

[Feature(name='lead_id',
   select_clause=None,
   feature_mapping=None,
   source_table='Individual_Comapny_Sales_Data122',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='flag',
   select_clause=None,
   feature_mapping=None,
   source_table='Individual_Comapny_Sales_Data122',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='gender',
   select_clause=None,
   feature_mapping=None,
   source_table='Individual_Comapny_Sales_Data122',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='education',
   select_clause=None,
   feature_m

#### For each **Use Case**, there are special **Column Mappings** that must be applied to a column to fulfill use case requirements. We can find the list of available **Column Mappings** by calling the *Describe Use Case Requirements* API:

In [20]:
client.describe_use_case_requirements(use_case)[0].allowed_feature_mappings

{'LEAD_ID': {'description': 'This is a unique identifier of each user in the user base.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'LEAD_SCORE': {'description': 'This denotes if the user turned into a lead (customer) or not. The score can be 0 or 1, 0 for not a customer yet and 1 for a customer',
  'allowed_feature_types': ['CATEGORICAL', 'NUMERICAL'],
  'required': True},
 'IGNORE': {'description': 'Ignore this column in training',
  'multiple': True,
  'required': False}}

In [21]:
client.set_feature_mapping(project_id = lead_scoring_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='lead_id',feature_mapping='LEAD_ID')
client.set_feature_mapping(project_id = lead_scoring_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='flag',feature_mapping='LEAD_SCORE')

[Feature(name='lead_id',
   select_clause=None,
   feature_mapping='LEAD_ID',
   source_table='Individual_Comapny_Sales_Data122',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='flag',
   select_clause=None,
   feature_mapping='LEAD_SCORE',
   source_table='Individual_Comapny_Sales_Data122',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='gender',
   select_clause=None,
   feature_mapping=None,
   source_table='Individual_Comapny_Sales_Data122',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='education',
   select_clause=None,

For each required Feature Group Type within the use case, you must assign the Feature group to be used for training the model:

In [None]:
client.use_feature_group_for_training(project_id=lead_scoring_project.project_id, feature_group_id=feature_group.feature_group_id)

Now that we've our feature groups assigned, we're almost ready to train a model!

To be sure that our project is ready to go, let's call project.validate to confirm that all the project requirements have been met:

In [None]:
lead_scoring_project.validate()

## 4. Train a Model

For each **Use Case**, Abacus.AI has a bunch of options for training. We can call the *Get Training Config Options* API to see the available options.

In [None]:
lead_scoring_project.get_training_config_options()

In this notebook, we'll just train with the default options, but definitely feel free to experiment, especially if you have familiarity with Machine Learning.

In [None]:
lead_scoring_model = lead_scoring_project.train_model(training_config={})
lead_scoring_model.to_dict()

After we start training the model, we can call this blocking call that routinely checks the status of the model until it is trained and evaluated:

In [None]:
lead_scoring_model.wait_for_evaluation()

**Note that model training might take some minutes to some hours depending upon the size of datasets, complexity of the models being trained and a variety of other factors**

## **Checkpoint** [Optional]
As model training can take an hours to complete, your page could time out or you might end up hitting the refresh button, this section helps you restore your progress:

In [None]:
!pip install abacusai
import pandas as pd
import pprint
pp = pprint.PrettyPrinter(indent=2)
api_key = ''  #@param {type: "string"}
from abacusai import ApiClient
client = ApiClient(api_key)
lead_scoring_project = next(project for project in client.list_projects() if project.name == 'Predictive Lead Scoring Project')
lead_scoring_model = lead_scoring_project.list_models()[-1]
lead_scoring_model.wait_for_evaluation()

## Evaluate your Model Metrics

After your model is done training you can inspect the model's quality by reviewing the model's metrics.

In [None]:
pp.pprint(lead_scoring_model.get_metrics().to_dict())

To get a better understanding on what these metrics mean, visit our [documentation](https://abacus.ai/app/help/useCases/SALES_SCORING/training) page.

## 5. Deploy Model

After the model has been trained, we need to deploy the model to be able to start making predictions. Deploying a model will reserve cloud resources to host the model for Realtime and/or batch predictions.

In [None]:
lead_scoring_deployment = client.create_deployment( name ='Lead Scoring Deployment', model_id=lead_scoring_model.model_id,description='Lead Scoring Deployment')
lead_scoring_deployment.wait_for_deployment()

After the model is deployed, we need to create a deployment token for authenticating prediction requests. This token is only authorized to predict on deployments in this project, so it's safe to embed this token inside of a user-facing application or website.

In [None]:
deployment_token = lead_scoring_project.create_deployment_token().deployment_token
deployment_token

## 6. Predict


Now that you have an active deployment and a deployment token to authenticate requests, you can make the `predict_lead` API call below.

This command will return the probability of a user with specified attributes being a lead. The prediction would be perfomed based on information about the leads of users with similar attributes.

In [None]:
ApiClient().predict_lead(deployment_token=deployment_token,
                         deployment_id=lead_scoring_deployment.deployment_id,
                         query_data={"house_val":329280,"gender":"M","age":"4_<=45","online":"Y","customer_psy":"D","marriage":"Single","child":"N","occupation":"Professional","mortgage":"1Low","house_owner":"Owner","region":"South","car_prob":"3","fam_income":"E"})