## Quantum Feature Selection for ML

A hybrid quantum computing-based approach for optimal feature selection in machine learning.

Quantum Feature Selecton is hyrbid quantum computing approach to optimize feature selection in artificial intelligence/machine learning (AI/ML) model training and prediction. This solution approaches feature selection as an optimization problem and selects the most critical variables and eliminates the redundant and irrelevant ones. The solution increases the predictive power of machine learning applications, decreases over-fitting and reduces training time. 

This sample notebook shows you how to use quantum feature selection algorithm from AWS Marketplace.

> **Note**: This is a reference notebook and it cannot run unless you make changes suggested in the notebook.

#### Pre-requisites:
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**
1. Some hands-on experience using [Amazon SageMaker](https://aws.amazon.com/sagemaker/).
1. To use this algorithm successfully, ensure that:
    1. Either your IAM role has these three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used: 
        1. **aws-marketplace:ViewSubscriptions**
        1. **aws-marketplace:Unsubscribe**
        1. **aws-marketplace:Subscribe**  
    2. or your AWS account has a subscription to For Seller to update: ML Robustness: Poison attack on images. 

#### Contents:
1. [Subscribe to the algorithm](#1.-Subscribe-to-the-algorithm)
1. [Prepare dataset](#2.-Prepare-dataset)
	1. [Dataset format expected by the algorithm](#A.-Dataset-format-expected-by-the-algorithm)
	1. [Configure dataset](#B.-Configure-dataset)
	1. [Upload datasets to Amazon S3](#C.-Upload-datasets-to-Amazon-S3)
1. [Execute the training process](#3.-Execute-the-training-process)
	1. [Set up environment](#3.1-Set-up-environment)
	1. [Execute model](#3.2-Execute-model)
    1. [Visualize Output](#3.3-Inspect-the-Output-in-S3)
1. [Clean-up](#4.-Clean-up)
	1. [Unsubscribe to the listing (optional)](#Unsubscribe-to-the-listing-(optional))


#### Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

### 1. Subscribe to the algorithm

To subscribe to the algorithm:
1. Open the algorithm listing page **Quantum Feature Selection for Machine Learning**
1. On the AWS Marketplace listing,  click on **Continue to subscribe** button.
1. On the **Subscribe to this software** page, review and click on **"Accept Offer"** if you agree with EULA, pricing, and support terms. 
1. Once you click on **Continue to configuration button** and then choose a **region**, you will see a **Product Arn**. This is the algorithm ARN that you need to specify while training a custom ML model. Copy the ARN corresponding to your region and specify the same in the following cell.

In [1]:
algo_arn = ""

### 2. Prepare dataset

In [2]:
import os
import json 
import uuid
import boto3
import pickle
import base64
import tarfile
from pprint import pprint

import numpy as np
import pandas as pd

import urllib.request
from urllib.parse import urlparse

import sagemaker as sage
from sagemaker import ModelPackage
from sagemaker import get_execution_role

#### A. Dataset format expected by the algorithm

The algorithm requires data in the format as described for best results:
* Input File name should be input.zip
* The input data files must contain a csv and json file, input.csv and input_config.json respectively.
* For detailed instructions, please refer sample input and algorithm input details.

#### B. Configure dataset

Instructions

    Supported content types: 'zip' file only with file name 'input.zip'.The zip file includes two files with following name and information.

    a. 'input.csv' : This csv file contains features as 'feature_0', 'feature_1', upto 'feature_N',along with target column as 'Class'. The feature selection algorithm selects name of these described features.
    

    b. 'input_config.json' : This json contains algorithm configuration including dwave credentials and dataset field descriptions.

    Mandatory Fields:

    a. 'input_config.json': dwave_sapi_token, target_variable, discrete_features, number_of_features_to_be_selected ,alpha , number_of_runs.

    Input field descriptions:

    a. 'dwave_sapi_token' : The user secure API token of Dwave Leap quantum cloud service. For example,'dwave_sapi_token'= 'DEV-****'. This API token is provided after registering and subscribing to Dwave Leap cloud service.

    b. 'target_variable' : The name of the target variable as mentioned in the input.csv file. For example, "Class" is the target variable name in the sample input file.

    c. 'discrete_features' : The list of name of all the discrete type of features/variables including both independent and dependent variables in the dataset i.e, input.csv file. For example, "Class" is the only discrete variable in the sample input file.

    d. 'number_of_features_to_be_selected': The number of features to be selected.

    e. 'alpha': This is a hyper-paramter that adjusts for relevancy and redundancy in the dataset. The higher the value, the more focus of algorithm is on maximizing relevancy and less focus on minimizing redundancy. For example, the value of alpha =0.5, gives equal weightage to both of the objectives of the algorithm.

    f. 'number_of_runs': The number of runs a Dwave solver should be iterated through, to get the desired results (Numerical).

In [6]:
training_dataset="input/input.zip"

#### C. Upload datasets to Amazon S3

In [7]:
role = get_execution_role()

In [6]:
sagemaker_session = sage.Session()

bucket = sagemaker_session.default_bucket()
bucket

In [None]:
# training input location
common_prefix = "qfs"
training_input_prefix = common_prefix + "/training-input-data"
TRAINING_WORKDIR = "input" #Input directory in Jupyter Server
training_input = sagemaker_session.upload_data(TRAINING_WORKDIR, key_prefix=training_input_prefix) #uploads data from jupyter server to S3
print("Training input uploaded to " + training_input)

## 3. Execute the training process

Now that dataset is available in an accessible Amazon S3 bucket, we are ready to execute a training pipeline to get clean sentiment class labels using clean-sentiment-classification-labels Algorithm. 

### 3.1 Set up environment

In [10]:
output_location = 's3://{}/{}/{}'.format(bucket, common_prefix,'Output')

In [None]:
output_location

### 3.2 Execute model

For information on creating an `Estimator` object, see [documentation](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)

In [12]:
training_instance_type="ml.m5.4xlarge"

In [13]:
#Create an estimator object for running a training job
estimator = sage.algorithm.AlgorithmEstimator(
    algorithm_arn=algo_arn,
    base_job_name="qfs",
    role=role,
    train_instance_count=1,
    train_instance_type=training_instance_type,
    input_mode="File",
    output_path=output_location,
    sagemaker_session=sagemaker_session,
    instance_count=1,
    instance_type=training_instance_type
)

#Run the training job.
estimator.fit({"training": training_input})

2024-01-05 08:28:40 Starting - Starting the training job...
2024-01-05 08:29:03 Starting - Preparing the instances for trainingProfilerReport-1704443320: InProgress
......
2024-01-05 08:30:04 Downloading - Downloading input data...
[34mtime taken for solver:  2.0213379859924316[0m
[34mtraining complete[0m

2024-01-05 08:32:31 Uploading - Uploading generated training model
2024-01-05 08:32:31 Completed - Training job completed
Training seconds: 157
Billable seconds: 157


See this [blog-post](https://aws.amazon.com/blogs/machine-learning/easily-monitor-and-visualize-metrics-while-training-models-on-amazon-sagemaker/) for more information how to visualize metrics during the process. You can also open the training job from [Amazon SageMaker console](https://console.aws.amazon.com/sagemaker/home?#/jobs/) and monitor the metrics/logs in **Monitor** section.

In [None]:
#output is available on following path
estimator.output_path

## Note: Inferencing is done within training pipeline. Real time inference endpoint/batch transform job is not required.

### 3.3 Inspect the Output in S3

In [15]:
parsed_url = urlparse(estimator.output_path)
bucket_name = parsed_url.netloc
file_key = parsed_url.path[1:]+'/'+estimator.latest_training_job.job_name+'/output/'+"model.tar.gz"

s3_client = sagemaker_session.boto_session.client('s3')
response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)

In [16]:
bucketFolder = estimator.output_path.rsplit('/')[3] +'/Output/'+ estimator.latest_training_job.job_name+'/output/'+"model.tar.gz"

In [None]:
s3_conn = boto3.client("s3")
bucket_name=bucket
with open('output.tar.gz', 'wb') as f:
    s3_conn.download_fileobj(bucket_name, bucketFolder, f)
    print("Output file loaded from bucket")

In [19]:
with tarfile.open('output.tar.gz') as file:
    file.extractall('./output')

In [22]:
import json

with open("output/output/output.json","r") as op:
    output= json.load(op)

print("Result:")
output
    

Result


{'Optimial_selected_features': ['feature_3',
  'feature_16',
  'feature_18',
  'feature_19',
  'feature_26',
  'feature_28',
  'feature_29',
  'feature_31',
  'feature_35',
  'feature_40',
  'feature_44',
  'feature_52',
  'feature_64',
  'feature_51',
  'feature_53',
  'feature_58',
  'feature_73',
  'feature_80',
  'feature_66',
  'feature_76']}

### 4. Clean-up

#### Unsubscribe to the listing (optional)

If you would like to unsubscribe to the algorithm, follow these steps. Before you cancel the subscription, ensure that you do not have any [deployable model](https://console.aws.amazon.com/sagemaker/home#/models) created from the model package or using the algorithm. Note - You can find this information by looking at the container name associated with the model. 

**Steps to unsubscribe to product from AWS Marketplace**:
1. Navigate to __Machine Learning__ tab on [__Your Software subscriptions page__](https://aws.amazon.com/marketplace/ai/library?productType=ml&ref_=mlmp_gitdemo_indust)
2. Locate the listing that you want to cancel the subscription for, and then choose __Cancel Subscription__  to cancel the subscription.