# 7b Machine Learning - Part II

In this final part of the exercise, you will attempt to use a trained k-nearest neighbours model to identify the substitution of an unknown benzene derivative. In particular you are provided with two isomers of aminobenzoic acid.
<br>
<br>To begin with this task, you will need to collect some spectra of your own. You may wish to collect data specifically for isomer *x* or *y*, or you may collect data for both.
<br>
<br>Data should be collected using the left-most IR spectrometry in the 1st year (downstairs) lab - all other spectral data were acquired using this instrument.
<br>
<br>For each isomer you investigate, you will need to collect 5 repeated spectra, cleaning the stage throughouly with isopropanol between collections.
<br>
<br>The following data collection parameters should be employed:

| Parameter | Value |
| :-------: | :---: |
| Measurement Mode | % Transmittance |
| Apodization | Happ-Genzel |
| No. of Scans | 10 |
| Resolution/cm-1 | 0.9 |
| Range/cm-1 | 400 - 4000 |

Each spectrum should be exported from the software as a .txt file (*File* --> *Export*). It is recommended that you use the same systematic naming convention employed in the original spectral library.

---

Now that you have your own data to be identified, the entire original spectral library can be used as a training dataset for the k-nearest neighbours algorithm, with your own acquired data constituting the smaller test dataset.

✏️ Import all the libraries you've been using throughout the exercise, including your C317 library.

In [202]:
from C317 import *

✏️ Load in a DataFrame of the original spectral data, with **no** PCA applied and with all repeated column headings the same.
<br>
<br>**Note:** If you did not do this earlier, you may need to change the code in your library slightly to allow for no PCA to be an option (do not just delete the PCA code!).

In [192]:
load_spectra(20,1)

Unnamed: 0,C,C.1,C.2,C.3,C.4,Y,Y.1,Y.2,Y.3,Y.4
630,0.000231,0.000226,0.000227,0.000227,0.000218,0.000277,0.000277,0.000279,0.000279,0.00028
631,0.00023,0.000226,0.000226,0.000226,0.000218,0.000277,0.000277,0.000279,0.000279,0.000279
632,0.00023,0.000226,0.000226,0.000226,0.000217,0.000277,0.000277,0.000279,0.000278,0.000279
633,0.00023,0.000226,0.000226,0.000226,0.000217,0.000277,0.000277,0.000279,0.000279,0.000279
634,0.00023,0.000226,0.000226,0.000226,0.000217,0.000277,0.000278,0.000279,0.000279,0.000279


✏️ In the same way, load in your collected data, and put it into a separate DataFrame. Process it in exactly the same way as the original data.

In [194]:
load_new_spectra(0,1).head()

Unnamed: 0,C,C.1,C.2,C.3,C.4,Y,Y.1,Y.2,Y.3,Y.4
630,0.000231,0.000226,0.000227,0.000227,0.000218,0.000277,0.000277,0.000279,0.000279,0.00028
631,0.00023,0.000226,0.000226,0.000226,0.000218,0.000277,0.000277,0.000279,0.000279,0.000279
632,0.00023,0.000226,0.000226,0.000226,0.000217,0.000277,0.000277,0.000279,0.000278,0.000279
633,0.00023,0.000226,0.000226,0.000226,0.000217,0.000277,0.000277,0.000279,0.000279,0.000279
634,0.00023,0.000226,0.000226,0.000226,0.000217,0.000277,0.000278,0.000279,0.000279,0.000279


✏️ Just as you did before, extract the appropriate labels (o/m/p) from the original spectral data and store them in a list.

In [None]:
labels=[name[0] for name in load_spectra(0,1)]

✏️ Using the the entire original spectral DataFrame and the labels you have extracted, train a k-NN model (do not include your personally collected data).

In [196]:
MachineLearn(load_spectra(20,1),1,3,return_obj=False)

✏️ Now use the `predict()` method of the trained `KNeighborsClassifier` object to attempt to classify your personally acquired spectra. Consult with a demonstrator to determine if your classification has been succesful.

In [201]:
MachineLearn(load_spectra(10,1),1,3,return_obj=True).predict(load_new_spectra(10,1)['C'].T)
MachineLearn(load_spectra(10,1),1,3,return_obj=True).predict(load_new_spectra(10,1)['Y'].T) 
#Machine learning predicts that 'C' is ortho and incorrectly predicts 'Y' is para (due to large spread in meta data)

array(['p', 'p', 'p', 'p', 'p'], dtype='<U1')

---

### Post-Lab

To submit your work, please save all completed notebooks and add them into a single .zip folder.
<br>
<br>Upload this .zip folder to Canvas to evidence completion of the exercise.

In addition, please complete the post-lab quiz and questionnaire which can be found alongside this exercise on Canvas. All feedback is welcome.
<br>
<br>Both an uploaded .zip folder and a completed post-lab quiz are required for credit to be awarded for this exercise.

---