## 1. Instalação


Antes de rodar esse notebook, crie um serviço de machile learning do Watson

* Create a Watson Machine Learning service instance (a free plan is offered) and associate it with your project




We'll be using a few libraries for this exercise:

1. [Watson Machine Learning Client](http://wml-api-pyclient.mybluemix.net/): Client library to work with the Watson Machine Learning service on IBM Cloud. Library available on [pypi](https://pypi.org/project/watson-machine-learning-client/). Service available on [IBM Cloud](https://cloud.ibm.com/catalog/services/machine-learning).
1. [Pixiedust](https://github.com/pixiedust/pixiedust): Python Helper library for Jupyter Notebooks. Available on [pypi](https://pypi.org/project/pixiedust/).
1. [ibmos2spark](https://github.com/ibm-watson-data-lab/ibmos2spark): Facilitates Data I/O between Spark and IBM Object Storage services

In [1]:
!pip install --upgrade ibmos2spark
!pip install --upgrade pixiedust
#!pip install watson-machine-learning-client-V4
!pip install -U ibm-watson-machine-learning

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20210507182551-0000
KERNEL_ID = c17fa639-1f91-4160-87e4-19e27e162346
Collecting ibmos2spark
  Downloading ibmos2spark-1.0.1-py2.py3-none-any.whl (7.4 kB)
Installing collected packages: ibmos2spark
Successfully installed ibmos2spark-1.0.1
Collecting pixiedust
  Downloading pixiedust-1.1.19.tar.gz (197 kB)
[K     |████████████████████████████████| 197 kB 15.9 MB/s eta 0:00:01
[?25hCollecting geojson
  Downloading geojson-2.5.0-py2.py3-none-any.whl (14 kB)
Collecting astunparse
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting markdown
  Downloading Markdown-3.3.4-py3-none-any.whl (97 kB)
[K     |████████████████████████████████| 97 kB 4.6 MB/s  eta 0:00:01
[?25hCollecting colour
  Downloading colour-0.1.5-py2.py3-none-any.whl (23 kB)
Collecting requests
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 7.4 MB/s  eta 0:00

  Building wheel for ibm-cos-sdk-s3transfer (setup.py) ... [?25ldone
[?25h  Created wheel for ibm-cos-sdk-s3transfer: filename=ibm_cos_sdk_s3transfer-2.7.0-py2.py3-none-any.whl size=88603 sha256=6098b3f9831fa15ea622d4fdddde128d86fe1ba9fb52957698284ee0f8c49295
  Stored in directory: /home/spark/shared/.cache/pip/wheels/5f/b7/14/fbe02bc1ef1af890650c7e51743d1c83890852e598d164b9da
Successfully built ibm-cos-sdk ibm-cos-sdk-core ibm-cos-sdk-s3transfer
[31mERROR: conda 4.8.2 requires ruamel_yaml>=0.11.14, which is not installed.[0m
[31mERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", but you'll have scipy 1.5.0 which is incompatible.[0m
[31mERROR: botocore 1.16.11 has requirement urllib3<1.26,>=1.20, but you'll have urllib3 1.26.4 which is incompatible.[0m
[31mERROR: aiohttp 3.6.2 has requirement chardet<4.0,>=2.0, but you'll have chardet 4.0.0 which is incompatible.[0m
Installing collected packages: certifi, pyparsing, packaging, jmespath, docutils, cha

In [2]:
import pixiedust

Pixiedust database opened successfully
Table VERSION_TRACKER created successfully
Table METRICS_TRACKER created successfully

Share anonymous install statistics? (opt-out instructions)

PixieDust will record metadata on its environment the next time the package is installed or updated. The data is anonymized and aggregated to help plan for future releases, and records only the following values:

{
   "data_sent": currentDate,
   "runtime": "python",
   "application_version": currentPixiedustVersion,
   "space_id": nonIdentifyingUniqueId,
   "config": {
       "repository_id": "https://github.com/ibm-watson-data-lab/pixiedust",
       "target_runtimes": ["Data Science Experience"],
       "event_id": "web",
       "event_organizer": "dev-journeys"
   }
}
You can opt out by calling pixiedust.optOut() in a new cell.


[31mPixiedust runtime updated. Please restart kernel[0m
Table SPARK_PACKAGES created successfully
Table USER_PREFERENCES created successfully
Table service_connections created successfully


In [4]:
#access_base = pixiedust.sampleData('https://raw.githubusercontent.com/centesimo/recommendation-ml/master/data/base_help_outubro.csv')
access_base = pixiedust.sampleData('https://raw.githubusercontent.com/centesimo/recommendation-ml/master/data/watson_cr_07052021.csv')


Downloading 'https://raw.githubusercontent.com/centesimo/recommendation-ml/master/data/watson_cr_07052021.csv' from https://raw.githubusercontent.com/centesimo/recommendation-ml/master/data/watson_cr_07052021.csv
Downloaded 1962452 bytes
Creating pySpark DataFrame for 'https://raw.githubusercontent.com/centesimo/recommendation-ml/master/data/watson_cr_07052021.csv'. Please wait...
Loading file using 'SparkSession'
Successfully created pySpark DataFrame for 'https://raw.githubusercontent.com/centesimo/recommendation-ml/master/data/watson_cr_07052021.csv'


<a id="kmeans"></a>
## 3. Create a *k*-means model with Spark

In this section of the notebook you use the *k*-means implementation to associate every customer to a cluster based on their shopping history.

First, import the Apache Spark Machine Learning packages ([MLlib](http://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html)) that you need in the subsequent steps:


In [5]:
from pyspark.ml import Pipeline
from pyspark.ml.clustering import KMeans
from pyspark.ml.clustering import KMeansModel
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.linalg import Vectors

procedures_cols = access_base.drop('produto_id', '_c820').columns

In [6]:
product_id = access_base.drop('procedure_id', '_c820').columns
print(product_id)

['produto_id', '32', '40', '47', '51', '52', '53', '55', '57', '60', '61', '69', '71', '91', '118', '122', '126', '136', '140', '141', '144', '145', '146', '149', '151', '152', '153', '155', '156', '159', '163', '165', '167', '178', '182', '183', '184', '185', '186', '188', '194', '231', '232', '234', '239', '240', '244', '249', '250', '262', '270', '282', '283', '285', '290', '292', '300', '301', '302', '308', '310', '322', '325', '328', '329', '333', '334', '335', '338', '339', '340', '347', '348', '354', '355', '357', '362', '370', '373', '379', '381', '382', '388', '389', '396', '397', '398', '403', '404', '419', '421', '423', '425', '427', '430', '431', '437', '447', '451', '453', '466', '469', '479', '480', '483', '484', '486', '487', '490', '491', '492', '494', '507', '510', '512', '517', '520', '525', '527', '532', '533', '537', '541', '542', '543', '544', '545', '547', '549', '550', '554', '556', '557', '561', '564', '565', '569', '570', '574', '582', '591', '594', '596', '598

<a id="prepare_data"></a>
### 3.1 Prepare data

Create a new data set with just the data that you need. Filter the columns that you want, in this case the customer ID column and the product-related columns. Remove the columns that you don't need for aggregating the data and training the model:

### We won't need this part because out data allready have formated
# Just go to the part 3.2 below


<a id="build_model"></a>
### 3.2 Create clusters and define the model 

Create 100 clusters with a *k*-means model based on the number of times a specific customer purchased a product.

First, create a feature vector by combining the product and quantity columns:

In [7]:
assembler = VectorAssembler(inputCols=["{}".format(x) for x in procedures_cols],outputCol="features") # Assemble vectors using product fields

Next, create the *k*-means clusters and the pipeline to define the model:

In [8]:
kmeans = KMeans(maxIter=10, predictionCol="cluster").setK(100).setSeed(1)  # Initialize model
pipeline = Pipeline(stages=[assembler, kmeans])
model = pipeline.fit(access_base)

In [9]:
help_cluster = model.transform(access_base)

# Here we compare many clusters to get the better cluster for our application

<a id="persist"></a>
## 4. Persist the model 

In this section you will learn how to store your pipeline and model in Watson Machine Learning repository by using Python client libraries.


<a id="persist"></a>
## Now to access the machine Learning of Watson we must use IAM KEY and get a token to access the same.

Below are the code to get the token


In [10]:
import requests

apikey = "75LWtvCg0l7p75_1dsWoXWVHA3jvXpZunfV1GQULnKzC"

# Get an IAM token from IBM Cloud
url     = "https://iam.bluemix.net/oidc/token"
headers = { "Content-Type" : "application/x-www-form-urlencoded" }
data    = "apikey=" + apikey + "&grant_type=urn:ibm:params:oauth:grant-type:apikey"
IBM_cloud_IAM_uid = "bx"
IBM_cloud_IAM_pwd = "bx"
response  = requests.post( url, headers=headers, data=data, auth=( IBM_cloud_IAM_uid, IBM_cloud_IAM_pwd ) )
iam_token = response.json()["access_token"]

print('Token = '+iam_token)


Token = eyJraWQiOiIyMDIxMDQyMDE4MzYiLCJhbGciOiJSUzI1NiJ9.eyJpYW1faWQiOiJJQk1pZC01NTAwMDhIU0RGIiwiaWQiOiJJQk1pZC01NTAwMDhIU0RGIiwicmVhbG1pZCI6IklCTWlkIiwianRpIjoiMWRlOGQ5OTEtNTQwOS00MmU4LTg3NjAtNzRjYzg4ZTQ3NWE2IiwiaWRlbnRpZmllciI6IjU1MDAwOEhTREYiLCJnaXZlbl9uYW1lIjoiUmVjaWNsYXJlIiwiZmFtaWx5X25hbWUiOiJRdWFsaWRhZGUiLCJuYW1lIjoiUmVjaWNsYXJlIFF1YWxpZGFkZSIsImVtYWlsIjoicmVjaWNsYXJlQHJlY2ljbGFyZXF1YWxpZGFkZS5jb20uYnIiLCJzdWIiOiJyZWNpY2xhcmVAcmVjaWNsYXJlcXVhbGlkYWRlLmNvbS5iciIsImF1dGhuIjp7InN1YiI6InJlY2ljbGFyZUByZWNpY2xhcmVxdWFsaWRhZGUuY29tLmJyIiwiaWFtX2lkIjoiaWFtLTU1MDAwOEhTREYiLCJuYW1lIjoiUmVjaWNsYXJlIFF1YWxpZGFkZSIsImdpdmVuX25hbWUiOiJSZWNpY2xhcmUiLCJmYW1pbHlfbmFtZSI6IlF1YWxpZGFkZSIsImVtYWlsIjoicmVjaWNsYXJlQHJlY2ljbGFyZXF1YWxpZGFkZS5jb20uYnIifSwiYWNjb3VudCI6eyJ2YWxpZCI6dHJ1ZSwiYnNzIjoiZDMzNGQ1NDdhZTUxNDZjNThhZWFhMGVjN2JkZjlmZDEiLCJmcm96ZW4iOnRydWV9LCJpYXQiOjE2MjA0MTQzODAsImV4cCI6MTYyMDQxNTU4MCwiaXNzIjoiaHR0cHM6Ly9pYW0ubmcuYmx1ZW1peC5uZXQvb2lkYy90b2tlbiIsImdyYW50X3R5cGUiOiJ1cm46aWJtOnBhcmFtczp

In [11]:
wml_credentials = {
                   "url": "https://us-south.ml.cloud.ibm.com",
                   "apikey":"l9QiumN1_TS0xNhRDhdKeuqVkmshTHhim0d300wKI-Y9",
                  }

from ibm_watson_machine_learning import APIClient
client = APIClient(wml_credentials)
client.set.default_space("0c9fcc9d-6471-478f-80d3-c5b2382dea8f")


'SUCCESS'

### 4.1 Configure IBM Watson Machine Learning credentials

To access your machine learning repository programmatically, you need to copy in your credentials, which you can see in your **IBM Watson Machine Learning** service details in IBM Cloud.

> **IMPORTANT**: Update `apikey` and `instance_id` below. Credentials can be found on _Service Credentials_ tab of the Watson Machine Learning service instance created on the IBM Cloud.

Connect to the Watson Machine Learning service using the provided credentials.

### 4.2 Save the model 

#### Save the model to the Watson Machine Learning repository

You use the Watson Machine Learning client's [Repository class](http://wml-api-pyclient.mybluemix.net/#repository) to store and manage models in the Watson Machine Learning service. 

> **NOTE**: You can also use Watson Studio to manage models. In this notebook we are using the client library instead.

In [12]:
train_data = access_base.withColumnRenamed('produto_id', 'label')
#client.repository.ModelMetaNames.show()
#client.service_instance.get_details()

In [13]:
train_data

DataFrame[label: int, 32: double, 40: double, 47: double, 51: double, 52: double, 53: double, 55: double, 57: double, 60: double, 61: double, 69: double, 71: double, 91: double, 118: double, 122: double, 126: double, 136: double, 140: double, 141: double, 144: double, 145: double, 146: double, 149: double, 151: double, 152: double, 153: double, 155: double, 156: double, 159: double, 163: double, 165: double, 167: double, 178: double, 182: double, 183: double, 184: double, 185: double, 186: double, 188: double, 194: double, 231: double, 232: double, 234: double, 239: double, 240: double, 244: double, 249: double, 250: double, 262: double, 270: double, 282: double, 283: double, 285: double, 290: double, 292: double, 300: double, 301: double, 302: double, 308: double, 310: double, 322: double, 325: double, 328: double, 329: double, 333: double, 334: double, 335: double, 338: double, 339: double, 340: double, 347: double, 348: double, 354: double, 355: double, 357: double, 362: double, 370

> **TIP**: Update the cell below with your name, email, and name you wish to give to your model.

In [14]:
sofware_spec_uid = client.software_specifications.get_id_by_name("spark-mllib_2.4")
#client.software_specifications.list()

In [15]:
#client.software_specifications.get_id_by_name("spark-mllib_2.4") 
model_props = { 
        client.repository.ModelMetaNames.NAME: "Shopping Recommendation Engine", 
        client.repository.ModelMetaNames.TYPE: 'mllib_2.4', 
        client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sofware_spec_uid
}

published_model = client.repository.store_model(model=model, pipeline=pipeline, meta_props=model_props, training_data=train_data)

Display list of existing models in the Watson Machine Learning repository


In [16]:
#client.repository.list_models()

In [17]:
import json
saved_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(saved_model_uid)
print(json.dumps(model_details, indent=2))

{
  "entity": {
    "label_column": "label",
    "pipeline": {
      "id": "7c1e9bd5-45b2-4028-bbae-46d464d4e525"
    },
    "software_spec": {
      "id": "390d21f8-e58b-4fac-9c55-d7ceda621326",
      "name": "spark-mllib_2.4"
    },
    "training_data_references": [
      {
        "connection": {
          "access_key_id": "not_applicable",
          "endpoint_url": "not_applicable",
          "secret_access_key": "not_applicable"
        },
        "id": "1",
        "location": {},
        "schema": {
          "fields": [
            {
              "metadata": {
                "modeling_role": "target"
              },
              "name": "label",
              "nullable": true,
              "type": "integer"
            },
            {
              "metadata": {},
              "name": "32",
              "nullable": true,
              "type": "double"
            },
            {
              "metadata": {},
              "name": "40",
              "nullable": true,
 

<a id="deploy"></a>
## 5. Deploy model to the IBM cloud

You use the Watson Machine Learning client's [Deployments class](http://wml-api-pyclient.mybluemix.net/#deployments) to deploy and score models.

### 5.1 Create an online deployment for the model

In [18]:
metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "Procedure Recommendation Engine Deployment",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

created_deployment = client.deployments.create(saved_model_uid, meta_props=metadata)



#######################################################################################

Synchronous deployment creation for uid: 'dd032404-ff78-4d01-8133-3f3d74c6adca' started

#######################################################################################


initializing
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='4f8f289f-1214-4640-8ad0-1cc16a6b2bea'
------------------------------------------------------------------------------------------------




### 5.2 Retrieve the scoring endpoint for this model

In [19]:
deployments_details = client.deployments.get_details('52d20564-b164-480f-9a93-090b88b99f0e')

In [20]:
scoring_endpoint = client.deployments.get_scoring_href(created_deployment) 
print(scoring_endpoint)

https://us-south.ml.cloud.ibm.com/ml/v4/deployments/4f8f289f-1214-4640-8ad0-1cc16a6b2bea/predictions


<a id="test_deploy"></a>
### 5.3 Test the deployed model

To verify that the model was successfully deployed to the cloud, you'll specify a customer ID, for example customer 14887, to predict this customer's cluster against the Watson Machine Learning deployment, and see if it matches the cluster that was previously associated this customer ID.

In [21]:
customer = help_cluster.collect()
print("Previously calculated cluster = {}".format(customer[0].cluster))

Previously calculated cluster = 1


To determine the customer's cluster using Watson Machine Learning, you need to load the customer's purchase history. This function uses the local data frame to select every product field and the number of times that customer 14887 purchased a product.

In [22]:
from six import iteritems
def get_procedure_counts_for_product(product_id):
        
    product = access_base.filter('produto_id = {}'.format(product_id)).take(1000000)       
    fields = [] 
    values = [] 
    for row in product:                
        for procedure_col in procedures_cols:
            field = procedure_col
            value = row[field]    
            fields.append(field) 
            values.append(value)    
    
    return (fields, values)

This function takes the customer's purchase history and calls the scoring endpoint:

In [23]:
client.deployments.ScoringMetaNames.show()
deployment_id = client.deployments.get_id(created_deployment)

---------------------  ----  --------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
META_PROP NAME         TYPE  REQUIRED  SCHEMA
NAME                   str   N
INPUT_DATA             list  N         [{'name(optional)': 'string', 'id(optional)': 'string', 'fields(optional)': 'array[string]', 'values': 'array[array[string]]'}]
INPUT_DATA_REFERENCES  list  N         [{'id(optional)': 'string', 'name(optional)': 'string', 'type(required)': 'string', 'connection(required)': {'href(required)': 'string'}, 'location(required)': {'bucket': 'string', 'path': 'string'}, 'schema(optional)': {'id(required)': 'string', 'fields(required)': [{'name(required)': 'string', 'type(required)': 's

In [24]:
def get_cluster_from_watson_ml(fields, values):
    scoring_payload = {client.deployments.ScoringMetaNames.INPUT_DATA:[{'fields': fields, 'values': [values]}]}
    predictions = client.deployments.score(deployment_id, scoring_payload)
    return predictions['predictions'][0]['fields'][len(procedures_cols)+1]    

Finally, call the functions defined above to get the product history, call the scoring endpoint, and get the cluster associated to customer 14887:

In [25]:
procedure_counts = get_procedure_counts_for_product(32)

fields = procedure_counts[0]
values = procedure_counts[1]
print("Cluster calculated by Watson ML = {}".format(get_cluster_from_watson_ml(fields, values)))



Cluster calculated by Watson ML = cluster


<a id="create_recomm"></a>
## 6. Create product recommendations

Now you can create some procedure recommendations.

First, run this cell to create a function that queries the database and finds the most popular items for a cluster. In this case, the **help_cluster** dataframe is the database.

In [26]:
def get_popular_procedures_in_cluster(cluster):            
    df_cluster_procedures = help_cluster.filter('cluster = {}'.format(cluster))            
    df_cluster_procedures_agg = df_cluster_procedures.groupby('cluster').sum()                
    row = df_cluster_procedures_agg.rdd.collect()[0]
    
    items = []
    for procedure_col in procedures_cols:
        field = 'sum({})'.format(procedure_col)        
        items.append((procedure_col, row[field]))
        sortedItems = sorted(items, key=lambda x: x[1], reverse=True) # Sort by score
        popular = [x for x in sortedItems if x[1] > 0]
        
    return popular

Now, run this cell to create a function that will calculate the recommendations based on a given cluster. This function finds the most popular procedure in the cluster, filters out procedure already purchased by the customer or currently in the customer's shopping cart, and finally produces a list of recommended procedure.

In [27]:
#from pyspark.sql.functions import desc
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
def get_recommendations_by_cluster(cluster, purchased_quantities):
    # Existing customer products    
    user_procedures = []
    for i in range(0, len(procedures_cols)):
        if purchased_quantities[i] > 0:            
            user_procedures.append((procedures_cols[i], purchased_quantities[i]))
            #Aqui ele pega o total de procedimentos que o usuário acessou e coloca dentro da var user_procedures
               
    df_user_procedures = sc.parallelize(user_procedures).toDF(["PROCEDURES","COUNT"])    
    cluster_procedures = get_popular_procedures_in_cluster(cluster)        
    df_cluster_procedures = sc.parallelize(cluster_procedures).toDF(["PROCEDURES","COUNT"])
              
    print('Clustering...')
    df_recommended_procedures = df_cluster_procedures.alias('cl').join(df_user_procedures.alias('cu'), df_cluster_procedures['PROCEDURES'] == df_user_procedures['PROCEDURES'], 'leftouter')            
    return df_recommended_procedures
    
    # AQUI NÂO VAI SER APLICAR, porque temos que recomandar algo que ele já tenha acessado também e difícilmente ele não irá acessar nada
    # Caso o usuário não tenha acessado nada devemos retornar os mais acessaos GERAL
    #print('RECOMMENDED PROCEDURES2:')        
    #df_recommended_procedures_not_accessed_yet = df_recommended_procedures.filter('cu.PROCEDURES IS NULL').select('cl.PROCEDURES','cl.COUNT').sort(desc('cl.COUNT'))
    #df_recommended_procedures.show(10)    

Next, run this cell to create a function that produces a list of recommended items based on the procedures and quantities in a user's cart. This function uses Watson Machine Learning to calculate the cluster based on the history cart contents and then calls the **get_recommendations_by_cluster** function.

In [28]:
# This function would be used to find recommendations based on the procedures and quantities in a user's cart
#ESSA FUNÇÃO NÃO TEM EFETIVIDADE PARA A HELP POR CONTA QUE AS RECOMENDAÇÕES SÃO COM BASE EM ACESSOS
def get_recommendations_for_history_cart(procedures, quantities):
    fields = []
    values = []
    for procedure_col in procedures_cols:
        field = '{}'.format(procedure_col)
        if procedure_col in procedures:
            value = quantities[procedures.index(procedure_col)]
        else:
            value = 0
        fields.append(field)
        values.append(value)
    return get_recommendations_by_cluster(get_cluster_from_watson_ml(fields, values), values)

Run this cell to create a function that produces a list of recommended items based on the purchase history of a customer. This function uses Watson Machine Learning to calculate the cluster based on the customer's purchase history and then calls the **get_recommendations_by_cluster** function.

In [29]:
# This function is used to find recommendations based on the purchase history of a customer
def get_recommendations_for_user_purchase_history(product_id):
    procedures_counts = get_procedure_counts_for_product(product_id)
    fields = procedures_counts[0]
    values = procedures_counts[1]
    return get_recommendations_by_cluster(get_cluster_from_watson_ml(fields, values), values)

Now you can take customer 12027 and produce a recommendation based on that customer's purchase history:

In [30]:
def get_procedures(recommendation):
    recommended_procedures = []        

    for index, row in enumerate(recommendation):                 
        recommended_procedures.append(row[0])
    
    return recommended_procedures

In [31]:
access_rows = access_base.take(10000000);

products_id=[32, 33]

#products_id=[45, 46, 47, 49, 51, 52, 53, 54, 55, 57, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 70, 71]
              


#for row in access_rows:
#    products_id.append(row['produto_id'])

In [None]:
import pandas as pd
products = []
procedures_by_product = pd.Series([])
product_df=[]
data = {'produto_id':  products_id}
product_df = pd.DataFrame (data, columns = ['produto_id'])

mergerd_recommended = []
for i in range(len(product_df)):
    for row in products_id:
        print(product_df['produto_id'][i], row)
        if(product_df['produto_id'][i] == row):
            procedures_recommended = get_recommendations_for_user_purchase_history(row)            
            mergerd_recommended.append(get_procedures(procedures_recommended.take(1000000)))            
            procedures_by_product[i] = mergerd_recommended

32 32
Clustering...
32 33
33 32
33 33
Clustering...


In [None]:
product_df.insert(1, "RECOMMENDED_PROCEDURES_ARRAY", procedures_by_product)

In [None]:
display(product_df)