# How to use the framework


In case that FrImCla is not installed in your system, the first task consist in installing using pip.

In [1]:
!pip install frimcla

Collecting frimcla
[31m  Could not find a version that satisfies the requirement frimcla (from versions: )[0m
[31mNo matching distribution found for frimcla[0m


To begin, we have to import all the classes that we will need to be able to use our framework.

In [3]:
import warnings
import time
import argparse
from FrImCla.utils.conf import Conf
from imutils import paths
from __future__ import print_function
from FrImCla.index_features import generateFeatures
from FrImCla.StatisticalComparison import statisticalComparison
from FrImCla.train import train
from FrImCla.prediction import prediction
warnings.simplefilter(action="ignore", category=FutureWarning)

### Configuring the dataset path

First of all we have to know the path which we have our dataset. The dataset must have a folder for each class that we want to predict. 

In [4]:
datasetPath = "/home/magarcd/Escritorio/FrImCla/melanoma"

### Configuring output path

Now we have to indicate to the program where we want to save the results that the framework will generate.

In [5]:
outputPath = "/home/magarcd/Escritorio/FrImCla/output/ppppppppp"

### Batch size

With this parameter, we configure the size of the batch in which the framework divides the set of images. Then the framework studies the images of each batch. This is to avoid high consumption of memory and optimize its use as much as possible. 

In [6]:
batchSize = 32 

### Feature Extractor

In this step we decide the feature extractor models that we are going to use with our dataset. These models will extract the most important points of the images. Then we save the points and with the classifier models that we will choose after this, we will classify the images with the classes of the dataset. Each feature extractor model has a different way to collect the most important points and for this reason we have to compare the models, because there is not a model that always fits better with the datasets.

In [7]:
featureExtractors = [["inception","False"]]

Now that we have the feature extractor models we can execute the algorithm that collect the features of the dataset for each model. The only thing that we have to do is indicate the paths of the dataset and the output and the models that we want to use for the study. The verbose parameter is to indicate whether we want to appear information about the execution on console.

In [None]:
verbose = False
generateFeatures(outputPath, batchSize, datasetPath, featureExtractors, verbose)

This algorithm will create a set of files that contains the features of the images. Each file corresponds to a model of those indicated above. 

### Classification models

Once we have stored the features of the images, we have to choose the clasiffication models that we are going to use for the dataset. All these classifiers will be used for each feature extractor model to know which is the performance of every combination.

In [7]:
modelClassifiers = [ "MLP","SVM","KNN", "LogisticRegression", "GradientBoost", "RandomForest"]

With the classifiers chosen, now that we have to do is to carry out a statistical analysis. The analysis studies and compares every combination. Once the analysis has compared all the combinations gives us the best combination of feature extractor model and classifier model and all the combinations that have not significant differencies with the best result.  

### Performance measures

We have to select a performance measure to know which is the best model. In this case, there are five different measures (accuracy, recall, precision, auroc and f1). The user have to select only one of the five measures. Accuracy is the default measure.

In [8]:
measure = "accuracy"

In [11]:
statisticalComparison(outputPath, datasetPath, featureExtractors, modelClassifiers, measure, verbose)

ImportError: [joblib] Attempting to do parallel computing without protecting your import on a system that does not support forking. To use parallel-computing in a script, you must protect your main loop using "if __name__ == '__main__'". Please see the joblib documentation on Parallel for more information

### Training the best model

Finally, we have to train the resulting combination. In this step, we only train the best model with all the images of the datset. In this case we do not split the dataset in test and train data, we need all the images to train and improve the results of the model. 

In [9]:
trainingSize = 1
train(outputPath, datasetPath, trainingSize)

Once we have the model trained, we can predict the class of the new images.

### Prediction

Now, with the best model trained we can predict the classes of our images. For this task, we have developed another algorithm to use the model. This execution will give us the predicted class of the image that we choose. 

In [None]:
image = "negat.jpg"
featureExtractor = ["inception", "False"]
prediction(image, outputPath, datasetPath, featureExtractor, classifier)