This is a customizable pipeline for multilabel classification and feature selection.
To import the project, perform the following:
- Git clone / download the project.
- Go to the root folder.
- Run "pip install -r requirements.txt".
To run the main file, go to the root folder and execute:
python main.py
Python version used: 3.7.6
Pip version used: 20.0.2
Pipelines can be created and executed as follows:
from multilabel.problem_transformation.multilabel_pipeline import multilabel_pipeline_factory, execute_multilabel_pipeline
create_pipeline = multilabel_pipeline_factory(
dataset="emotions",
problem_transformer="binary_relevance",
feature_selector="mutual_information",
feature_filter="select_k_best",
classifier="decision_tree"
)
results = execute_multilabel_pipeline(create_pipeline)
print("results: ", results)
Parameters to multilabel_pipeline_factory are as follows:
dataset_name
Datasets are taken from skmultilearn.datasets. Following datasets are available:
- tmc2007_500
- birds
- yeast
- bibtex
- rcv1subset3
- rcv1subset5
- rcv1subset4
- genbase
- rcv1subset1
- rcv1subset2
- Corel5k
- enron
- emotions
- medical
- delicious
- scene
- mediamill
problem_transformer This is the name of the Problem Transformation algorithm. Currently only one algorithm is available:
- binary_relevance
feature_selector This is the name of the Feature Selection algorithm. Currently the following feature selectors are available: Scorers:
- mutual_information
- f_classif
feature_filter Currently the following filter method is available:
- select_k_best
classifier This is the name of the classifier. Following classifiers are available:
- decision_tree
- linear_regression
- naive_bayes
- random_forest
- svm
multilabel_pipeline_factory returns a create function which creates the elements of the pipeline.
multilabel_pipeline_factory returns a curried function. It returns the following tuple:
([dataset_training, dataset_testing], select_features, transform_problem, classifier)
- [training_set, testing_set] - Training and testing sets. Their structure is the same as of skmultilearn.
- select_features - this is a function that takes a dataset and returns a new dataset after selecting features
- transform_problem - this is a function that takes a classifier and datasets [1] as input, and outputs the classification results.
- classifier - this is a classifier which is used by transform_problem function [3].
To customize parts parts of the pipeline, create your own factory method. Following is an example:
def custom_factory_method():
def create_pipeline():
return [custom_dataset, custom_select_features, custom_transform_problem, custom_classifier]
Each of the items in return type of create_pipeline need not be custom. You can import any of them from packages in this folder and pass them here for default implementations. For example:
from skmultilearn.dataset import load_datasets
from feature_selectors.feature_selector import feature_selector
from problem_transformation.impls.binary_relevance import transform_problem
from classifiers.impls.decision_tree import DecisionTree
def custom_factory_method():
def create_pipeline():
dataset_training = load_dataset("emotions", "training")
dataset_testing = load_dataset("emotions", "testing")
return [[dataset_training, dataset_testing], feature_selector, transform_problem, DecisionTree]
To provide your own parts to the pipeline, see the following examples ([path to package.module name]):
- select_features => feature_selectors.feature_selector.py
- problem_transformer => problem_transformation.impls.binary_relevance.py
- classifier => classifiers.impls.decision_tree.py