Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update readme #24

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,21 @@ directly on a D-Wave quantum computer's quantum processing unit (QPU).

---

## Installation

You can run this example without installation in cloud-based IDEs that support
the [Development Containers specification](https://containers.dev/supporting)
(aka "devcontainers").

For development environments that do not support ``devcontainers``, install
requirements:

pip install -r requirements.txt

If you are cloning the repo to your local system, working in a
[virtual environment](https://docs.python.org/3/library/venv.html) is
recommended.

## Usage

Run `python app.py` and open http://127.0.0.1:8050/ in your browser. A
Expand All @@ -47,6 +62,63 @@ features and redundancy penalty). Solutions typically take 1-3 seconds. Once
complete, the bar chart will update to reflect the selected features, and the
bar graph for accuracy scores will also be updated.

## Problem Description

The goal for this feature selection application is to choose features that will help
the machine-learning model learn by promoting diversity between features and strong
relationships to the target variable. The model sets the following objectives and constraints
to achieve this goal:

**Objectives:** minimize the redundancy metric (correlation between features) between each
pair of features to promote diversity and maximize correlation between features and the target
to promote a strong relationship.

**Constraints:** choose the requested number of features.

## Model Overview

In this example we use the Titanic and Scene datasets to generate a constrained quadratic model.
The datasets are assumed to be clean, meaning there are no missing entries or repeated features.
The features of the dataset are used to build a correlation matrix which compares the features to
each other as well as a correlation matrix that compares the features to the target variable. Those
correlation matrices are used to build the objective function.

---
**Note:** Although a model overview is provided here, all of the code to build
the feature selection model is contained within
[D-Wave's scikit-learn plug-in](https://github.com/dwavesystems/dwave-scikit-learn-plugin).

---
cpotts-dwave marked this conversation as resolved.
Show resolved Hide resolved

### Parameters

These are the parameters of the problem:

- `num_features`: the number of features to select
- `redund_val`: used to determine factor applied to redundancy terms
cpotts-dwave marked this conversation as resolved.
Show resolved Hide resolved
- 0: features will be selected as to minimize the redundancy without any consideration to quality
- 1: places the maximum weight on the quality of the features

### Variables
- `x_i`: binary variable that shows if feature `i` is selected


### Objective
The objective function has two terms. The first term minimizes the correlation between
chosen features in the dataset (this term is weighted by the redundancy parameter, `redund_val`). The
second term maximizes the correlation between the features and the target variable.
### Constraints
A single constraint is used to require that the model select a number of features equal to the `num_features`
parameter.
## Code Overview

Given a selected value for the `num_features` and `redund_val` sliders, the code proceeds as follows:

* The selected dataset and parameters are passed to D-Wave's feature selection scikit-learn plugin
* The resulting selected features are returned from the plugin
* A random forest classifier model is trained on the selected features and accuracy score is calculated
* The display image is updated to reflect the selected features and the classifier accuracy

## References

Milne, Andrew, Maxwell Rounds, and Phil Goddard. 2017. "Optimal Feature
Expand Down