# Association Rule Mining using EasyMiner API

This example demonstrates the possibility of association rule mining using complex REST API of data mining system EasyMiner.
<br /><br />
To use this example, you must have a working instance of EasyMiner. For testing purposes, you can use our demo server.

## Dataset IRIS

This example code is based on daset IRIS from the [UCI Repository](https://archive.ics.uci.edu/ml/datasets/iris). The file used in this exmample is located in the folder with this notebook: [iris.csv](./iris.csv)
<br /><br />
The dataset contains columns *sepallength*, *petalwidth*, *sepalwidth*, *petallength* and *class*. For rule miming also as for classification model building, the column *class* should be used in consequent part of rules, other columns should be used in antecedent.

## 1. Setup variables, import dependencies

To run this example, you have to configure the following variables.

In [None]:
# Import requested libraries
import requests
import json
import time
import urllib

# Setup details about the used file
CSV_FILE = 'iris.csv'
CSV_SEPARATOR = ','
CSV_ENCODING = 'utf8'

To use the integrated data mining API provided by EasyMiner, you must have an user account on a running instance of EasyMiner. Please input the URL of the API in the variable *API_URL*

In [None]:
# Configure access variables
API_URL = 'https://br-dev.lmcloud.vse.cz/easyminercenter/api'

To work with EasyMiner you must register an user account.
It can be realized using the GUI also as using the API. 
<br />
If you already have an account, please input your API KEY in the following variable *API_KEY*:

In [None]:
API_KEY = ''

In case you do not have an user account yet,
you can register a new one using the following code:

In [None]:
if API_KEY == "":
    user_name = 'testuser' + str(time.time())
    user_email = user_name + "@domain.tld"
    user_password = user_name
    
    # JSON configuration of the API request
    json_data = json.dumps({"name": user_name, "email": user_email, "password": user_password})

    # Send request for miner creation
    r = requests.post(API_URL + "/users?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"}, data = json_data.encode())
    
    # Get the API key of the newly registered user account
    API_KEY = r.json()["apiKey"]

Please check the configuration using the simple API call:

In [None]:
# Check the functionality of the user account
r = requests.get(API_URL + "/auth?apiKey=" + API_KEY, headers={"Accept": "application/json"})

# Parse the response as JSON
auth_user = r.json()

# If everything works correctly, you should get the details of your account:
auth_user

## 2. Upload CSV file to EasyMiner server (create datasource)

In [None]:
# HTTP request for uploading of the CSV file
r = requests.post(API_URL + '/datasources?separator=' + urllib.parse.quote(CSV_SEPARATOR) + '&encoding=' + CSV_ENCODING + '&type=limited&apiKey=' + API_KEY, files = {("file", open(CSV_FILE, 'rb'))}, headers = {"Accept": "application/json"})

# Get datasource ID (identificates the dataset on EasyMiner server) from the server response
datasource_id = r.json()["id"]


For debug purposes, print datasource_id - if the datasource was created successfully, the datasource_id should be greater than 0)  


In [None]:
datasource_id

## 3. Create miner

In [None]:
# Define name for the miner {optional value for your better orientation in list of miners]
miner_name = 'TEST MINER'

# JSON configuration of the API request (will be sent as body of the HTTP request)
json_data = json.dumps({"name": miner_name, "type": "cloud", "datasourceId": datasource_id})

# Send request for miner creation
r = requests.post(API_URL + "/miners?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"}, data = json_data.encode())

# Get ID of the created miner (identificates the miner on EasyMiner server)
miner_id = r.json()["id"]

For debug purposes, print datasource_id - if the datasource was created successfully, the datasource_id should be greater than )  


In [None]:
miner_id

## 4. Preprocess data 
It is not possible to use the uploaded data fields from the uploaded datasource directly for definition of the data mining task. You have to generate attribute from each attribute you want to use.
<br /><br />
The simplest preprocessing method is to use the values of the data field "as they are" using the preprocessing method "each value - one bin".
<br /><br />
The uploaded data fields are identified using their names. Remember, the names has not be exactly the same as in the uploaded file (in case of duplicities etc.). You should get the list of data fields (columns) in the datasource:

In [None]:
# Request from the EasyMiner list of columns (data fields) available in the existing datasource
r = requests.get(API_URL + '/datasources/' + str(datasource_id) + '?apiKey=' + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"})

# The response contains properties of the datasource also as the list of columns. Get only the columns... 
datasource_columns = r.json()['column']


Check the list of columns:

In [None]:
datasource_columns

### Construction of preprocessing requests - simple usage of the original data values 

In case you want to preprocess all the columns from the data field using the method "each value - one bin", you can simple use the following code:

In [None]:
# Define variable for collecting of list of prepared attributes
attributes_columns_map = {}

# Process all the columns...
for col in datasource_columns:
    # You can work with the column name or the column ID. Both these values are parsed from the previous JSON response.
    column_name = col['name']
    
    # You have to select 
    attribute_name = column_name
    
    # Construct the definition of preprocessing request; 
    # for identification of the column from datasource, you can use its ID (set it to property "column"), or its name (set it to property "columnName").. 
    json_data = json.dumps({"miner": miner_id, "name": attribute_name, "columnName": column_name, "specialPreprocessing": "eachOne"})
    
    # Send the request and wait for the response;
    # dependently on the size of the used datasource, it can take a bit longer time...
    r = requests.post(API_URL + "/attributes?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"}, data = json_data.encode())
    if r.status_code != 201:
        break  # error occurred - the preprocessing of the selected attribute failed
    attributes_columns_map[column_name] = r.json()['name']


The list of prepared attributes is:

In [None]:
attributes_columns_map

### 4.b Other preprocessing methods

New version of the data mining system EasyMiner supports also all
standard preprocessing methods for preparation of attributes from data fields (datasource columns):
 - *equidistant intervals* - group numerical values to intervals with given length or to defined count of intervals
 - *equifrequent intervals* - group numerical values to given count of intervals with almost the same frequencies of values in the datasource
 - *equisized intervals* - group numerical values to intervals with requested minimal value of support

# 4. Define association rule mining task

Define attributes for the antecedent and consequent parts of association rules. The attributes can be configured  to either appear with any value or constrained to only one fixed value.
<br /><br />
This step also entails definition of threshold values on interest measures (confidence, support, lift) and optionally you can also enable CBA prunning of results.

In [None]:
# Define pattern of association rules you are interested in
antecedent = [
    {"attribute" : attributes_columns_map["sepallength"]},
    {"attribute" : attributes_columns_map["petalwidth"]},
    {"attribute" : attributes_columns_map["sepalwidth"]},
    {"attribute" : attributes_columns_map["petallength"]}
]
consequent = [
    {
        "attribute" : attributes_columns_map["class"],
        # Optionally, you can also select only one value of the given attribute 
        # - uncommenting and editing the following line.
        # The same option works also for attributes in antecedent.
        # "fixedValue" : "Iris-setosa"
    }
]

# Define requested interesting measures:
# - following definition of requests minimal values of confidence 0.5 and support 0.1;
# - with the same structure, you can add also the interest measure "LIFT"
interest_measures = [
    {
        "name": "CONF", # considence
        "value": 0.5
    },
    {
        "name": "SUPP", # support
        "value": 0.1
    }
]

# Define the name of the prepared task (for better identification of results)
task_name = "Test task"

# Define the maximum count of results
max_rules_count = 1000

# Compose the body of the task definition request
json_data = json.dumps({
    "miner": miner_id,
    "name": task_name,
    "limitHits": max_rules_count,
    "IMs": interest_measures,
    "antecedent": antecedent,
    "consequent": consequent
})

# Send the request for simple task creation
r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"}, data = json_data.encode())

# Get the ID of the created task
task_id = r.json()["id"]


The task ID is:

In [None]:
task_id

# 5. Execute the mining task

Everything is prepared! You can execute the task and then work with the results...

In [None]:
# Send the request
r = requests.get(API_URL + "/tasks/" + str(task_id) + "/start?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"})

# Wait for result (dependently on the task definition and the size of analyzed data, it can take even a long tame)
while True:
    time.sleep(1)
    # Check the task state
    r = requests.get(API_URL + "/tasks/" + str(task_id) + "/state?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"})
    task_state = r.json()["state"]
    print("task_state:" + task_state)
    if task_state == "solved":
        break
    if task_state == "failed":
        print("task failed executing")
        break


# 6. Export the task results

The task results can be exported in the following formats: 
 - simple JSON format (suitable for simple reading of results etc.)
 - standardized PMML Association Model (we recommend the version 4.2)
 - GUHA PMML

## 6.a Simple export of association rules as JSON

In [None]:
# Send the export request
r = requests.get(API_URL + '/tasks/' + str(task_id) + '/rules?apiKey=' + API_KEY, headers = {"Accept": "application/json"})

# Parse the response as JSON
task_rules = r.json()

# and then work with the results...

The results in JSON are: 

In [None]:
task_rules

## 6.b Export results as PMML

In [None]:
# Select the PMML format - possible values are "guha", "associationmodel", "associationmodel-4.2"
pmml_format = "associationmodel" 

# Send the PMML export request
r = requests.get(API_URL + '/tasks/' + str(task_id) + '/pmml?model=' + pmml_format + '&apiKey=' + API_KEY)

# Get the response as text (and then parse it as XML etc.)
pmml = r.text


## 7. Definition of another data mining task

When you have already prepared the attributes for mining of association rules,
you can use them to definition and solving of more data mining tasks.
<br />
The following lines demonstrate the definition and execution of another task.
It is possible to say that it is only modification of the previous sections 4. - 6.  

In [None]:
# Define pattern of association rules you are interested in
antecedent = [
    {"attribute" : attributes_columns_map["sepallength"]},
    {"attribute" : attributes_columns_map["petalwidth"]},
    {"attribute" : attributes_columns_map["sepalwidth"]},
    {"attribute" : attributes_columns_map["petallength"]}
]
consequent = [
    {
        "attribute" : attributes_columns_map["class"]
    }
]

# Define requested interesting measures:
# - following definition of requests minimal values of confidence 0.5 and support 0.1;
# - with the same structure, you can add also the interest measure "LIFT"
interest_measures = [
    {
        "name": "CONF", # considence
        "value": 0.3
    },
    {
        "name": "SUPP", # support
        "value": 0.01
    }
]

special_interest_measures = [
    {
        "name": "CBA" # request rule prunning using rCBA
    }
]

# Define the name of the prepared task (for better identification of results)
task_name = "Test task - classification"

# Define the maximum count of results
max_rules_count = 1000

# Compose the body of the task definition request
json_data = json.dumps({
    "miner": miner_id,
    "name": task_name,
    "limitHits": max_rules_count,
    "IMs": interest_measures,
    "specialIMs": special_interest_measures,
    "antecedent": antecedent,
    "consequent": consequent
})

# Send the request for simple task creation
r = requests.post(API_URL + "/tasks/simple?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"}, data = json_data.encode())

# Get the ID of the created task
task2_id = r.json()["id"]

# 8. Execute the mining task

In [None]:
# Send the request
r = requests.get(API_URL + "/tasks/" + str(task2_id) + "/start?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"})

# Wait for result (dependently on the task definition and the size of analyzed data, it can take even a long tame)
while True:
    time.sleep(1)
    # Check the task state
    r = requests.get(API_URL + "/tasks/" + str(task2_id) + "/state?apiKey=" + API_KEY, headers = {'Content-Type': 'application/json', "Accept": "application/json"})
    task_state = r.json()["state"]
    print("task_state:" + task_state)
    if task_state == "solved":
        break
    if task_state == "failed":
        print("task failed executing")
        break


# 9. Export the task results

In [None]:
# Send the export request
r = requests.get(API_URL + '/tasks/' + str(task2_id) + '/rules?apiKey=' + API_KEY, headers = {"Accept": "application/json"})

# Parse the response as JSON
task2_rules = r.json()

# and then work with the results...
task2_rules