# Machine Learning - Task B

**Part of IAFIG-RMS *Python for Bioimage Analysis* Course.**

*Mikolaj Kundegorski*

2019-12-13

In this task you will use regression to classify toy examples (shaped blobs) as different 'classes' or categories.

## The Task

In an experiment we were able to image lots of different types of cells. These cells are sparse so we were easily able to segment them, find their bounding box and create a database of images each containing a single cell. We then convinced a PhD student to go through and manually categorise our cells. We want to use regression to be able to automatically categorise new, unlabelled cells from future experiments.

To do this, we will:
1. Use a logistic (categorical) regression.
2. Use training data to fit the regression and test data to check how well our model works.

## Task B.1

Run the following two cells to set-up and visualise the data. Feel free to change parameters as you explore the system.

In [None]:
# Utils is a custom module written to simplify these tutorials
# You do not need to understand these codes for this practical
from utils.practice_data import generateBlobsData  # this loads data into a DataFrame
from utils.practice_data import showBlobs  # this allows quick visualisation of the data

# Generate a pandas DataFrame of data
# with a column 'class', i.e. the categry a cell belongs to,
# and a column 'raw_data' which hold the NumPy array/image
imageDir = './assets/simple_blobs/'
number_of_samples = 1200
image_size = 64  # in pixels
number_of_classes = 11 #2-6 - normal blobs. 7-11 more difficult
problem = generateBlobsData(imageDir, number_of_classes, number_of_samples, image_size, noiseSize=20)

In [None]:
#Visualise the data
display(problem.loc[:,'class'].describe())  # Note the number of unqiue classes
showBlobs(problem.sample(20))  # plots the images with their class above.

## Task B.2

Look at the documentation for [sklearn.linear_model.LogisticRegression()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) and, specifically, this example: https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html#sphx-glr-auto-examples-linear-model-plot-iris-logistic-py.

The following cell needs to 'wrangle' our data into training and test data and then run Logistic Regression to create a simple model with which we can predict the classes of our test data and calculate a 'score', here the mean accuracy, of our model.

Complete the following code cell by filling in all the `____`s with appropriate methods/functions and parameters.

In [None]:
%matplotlib widget
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn import model_selection

x=np.stack(problem['raw_data'])
x=x.reshape(x.shape[0],-1) # x as as vector
y=problem['class'].values.reshape(-1,1).flatten().astype(int)
x_train, x_test, y_train, y_test = model_selection.train_test_split(x, y, test_size=0.2, random_state=0)

logreg = linear_model.LogisticRegression(random_state=0).fit(x_train, y_train)
y_predict = logreg.predict(x_test)
test_set_score = logreg.score(x_test, y_test)
train_set_score = logreg.score(x_train, y_train)
print("Score on training set: {} and on testing: {} ".format(train_set_score,test_set_score))

Run the following cell to display the results (on your test data) of our model. Does the model look good?

In [None]:
f, axis = plt.subplots(1,1)  # create a figure with a single axis (subplot)

axis.plot(y_test, y_test, '-')  # plot true vs true, i.e. the ideal case
axis.scatter(y_test,y_predict)  # plot a scatter of the true value against the prediction value
axis.set_ylabel('Predicted Value')
axis.set_xlabel('True Value')

plt.show()