# Pytorch Learning Notebook  
# PCA, Classification
1 - Carbendazim  
2 - Thiacloprid  
4 - Acetamiprid   
Mixtures of the above mentioned analytes

Batch 3 of the colloids was chosen for all of the recordings due to the superior signal intensity that it showed.  

Mixtures 2 + 4 and 1 + 2 + 4 are different from all of the other data. Integration time was changed from 500ms to 1500ms due to insufficient strength of the signal

In [1]:
%load_ext autoreload
%autoreload 2

In [13]:
from tools.ramanflow.read_data import ReadData as rd
from tools.ramanflow.prep_data import PrepData as rpd

In [14]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

## Read the data

The naming convention here as follows: NameOfAnalyte_BatchNo_ColloidsReductionSpeed_TypeOfReading  
**NameOfAnalyte**: 1 = car, 2 = thia, 4 = aceta  
**BatchNo**: Here it gets a bit tricky. The batch of colloids is the same in all of the measurements. However we splitted the acquisition into 3 separate recodrings to have a bit of variation within the batch. So the colloids that were used are all from the same batch.  
**CollidsReductionSpeed**: 3min is the time of how long it took to reduce 90ml of HH+NaOH with 10ml of AgNO3. I call it reduction speed for convinience and because it represents the matter more accurately  
**TypeOfReading**: Generally speaking there are 2 types of reading. Single spectra acquisition or spectral mapping where multiple spectra acquired. 50X50 means the mapping has 50X50 spectral images each of which corresponds to single spectra. So there are 2500 spectra in one recording.   

Total amount of data for each of the analyte and the mixtures is 7500 spectra.

#### 1 - Carbendazim

In [4]:
f_sup, car_batch1_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1/mapping50X50/1_3min_b3_50X50_spectral_mapping_1.tif")
f_sup, car_batch2_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1/mapping50X50/1_3min_b3_50X50_spectral_mapping_2.tif")
f_sup, car_batch3_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1/mapping50X50/1_3min_b3_50X50_spectral_mapping_3.tif")

<tifffile.TiffFile '1_3min_b3_50X50_…l_mapping_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1_3min_b3_50X50_…l_mapping_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1_3min_b3_50X50_…l_mapping_3.tif'> ImageJ series contains unidentified dimension


#### 2 - Thiacloprid

In [5]:
f_sup, thia_batch1_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2/mapping 50X50/2_3min_b3_50X50_spectral_mapping_1.tif")
f_sup, thia_batch2_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2/mapping 50X50/2_3min_b3_50X50_spectral_mapping_2.tif")
f_sup, thia_batch3_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2/mapping 50X50/2_3min_b3_50X50_spectral_mapping_3.tif")

<tifffile.TiffFile '2_3min_b3_50X50_…l_mapping_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2_3min_b3_50X50_…l_mapping_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2_3min_b3_50X50_…l_mapping_3.tif'> ImageJ series contains unidentified dimension


#### 4 - Acetamiprid

In [6]:
f_sup, aceta_batch1_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 4/mapping50X50/4_3min_b3_50X50_spectral_mapping_1.tif")
f_sup, aceta_batch2_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 4/mapping50X50/4_3min_b3_50X50_spectral_mapping_2.tif")
f_sup, aceta_batch3_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 4/mapping50X50/4_3min_b3_50X50_spectral_mapping_3.tif")

<tifffile.TiffFile '4_3min_b3_50X50_…l_mapping_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '4_3min_b3_50X50_…l_mapping_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '4_3min_b3_50X50_…l_mapping_3.tif'> ImageJ series contains unidentified dimension


### Mixtures

#### 1 + 2

In [7]:
f_sup, car_thia_batch1_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+2/mapping50X50/1+2_3min_b3_50X50_spectral_mapping_1.tif")
f_sup, car_thia_batch2_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+2/mapping50X50/1+2_3min_b3_50X50_spectral_mapping_2.tif")
f_sup, cat_thia_batch3_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+2/mapping50X50/1+2_3min_b3_50X50_spectral_mapping_3.tif")

<tifffile.TiffFile '1+2_3min_b3_50X5…l_mapping_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1+2_3min_b3_50X5…l_mapping_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1+2_3min_b3_50X5…l_mapping_3.tif'> ImageJ series contains unidentified dimension


#### 1 + 4  

**This set of measurements had some problems**  

The premixed solution of 1 and 4 that was made on 04/20 and measured on 04/21 showed strong signal.  
**However** the premixed solution of 1 and 4 that was made on 04/21 and measured on 04/22 showed no signal at all.  
In order to have a data to train on we made a decision to procede with 04/20 solution even though at the time of the experiment it was already 2 days old.  
This poses another challenge of the data reproducibility and uniformity. We normally would prefer the experimental condition to stay the same across all measurements. 

In [8]:
f_sup, car_aceta_batch1_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+4/mapping50X50/1+4(2days)_3min_b3_50X50_spectral_mapping_1.tif")
f_sup, car_aceta_batch2_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+4/mapping50X50/1+4(2days)_3min_b3_50X50_spectral_mapping_2.tif")
f_sup, car_aceta_batch3_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+4/mapping50X50/1+4(2days)_3min_b3_50X50_spectral_mapping_3.tif")

<tifffile.TiffFile '1+4(2days)_3min…al_mapping_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1+4(2days)_3min…al_mapping_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1+4(2days)_3min…al_mapping_3.tif'> ImageJ series contains unidentified dimension


#### 2 + 4  

As was mentioned before, this particular mixture was having a hard time producing good signal at 500ms integration time. So 1500ms was chosen instead. However the mapping size was reduced to save the time during the acquisition.  

**However, recording 6 and 7 had some issues**  

In [9]:
f_sup, thia_aceta_batch1_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32__1500ms_spectral_mapping_1.tif")
f_sup, thia_aceta_batch2_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32__1500ms_spectral_mapping_2.tif")
f_sup, thia_aceta_batch3_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32__1500ms_spectral_mapping_3.tif")
f_sup, thia_aceta_batch4_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32__1500ms_spectral_mapping_4.tif")
f_sup, thia_aceta_batch5_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32__1500ms_spectral_mapping_5.tif")
f_sup, thia_aceta_batch6_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32_1500ms_spectral_mapping_6(problem).tif")
f_sup, thia_aceta_batch7_3min_mapping = rd.read_data("data/20220421 SERS data generation/analyte 2+4/mapping 32X32/2+4_3min_b3_32X32__1500ms_spectral_mapping_7(problem).tif")

<tifffile.TiffFile '2+4_3min_b3_32X3…l_mapping_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2+4_3min_b3_32X3…l_mapping_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2+4_3min_b3_32X3…l_mapping_3.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2+4_3min_b3_32X3…l_mapping_4.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2+4_3min_b3_32X3…l_mapping_5.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2+4_3min_b3_32X3…_6(problem).tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '2+4_3min_b3_32X…g_7(problem).tif'> ImageJ series contains unidentified dimension


#### 1 + 2 + 4

In [10]:
f_sup, car_thia_aceta_batch1_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+2+4/mapping32X32/1+2+4_3min_b3_32X32_spectral_mapping_1500ms_1.tif")
f_sup, car_thia_aceta_batch2_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+2+4/mapping32X32/1+2+4_3min_b3_32X32_spectral_mapping_1500ms_2.tif")
f_sup, car_thia_aceta_batch3_3min_mapping = rd.read_data("data/20220422 SERS data generation/analyte 1+2+4/mapping32X32/1+2+4_3min_b3_32X32_spectral_mapping_1500ms_3.tif")

<tifffile.TiffFile '1+2+4_3min_b3_3…ing_1500ms_1.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1+2+4_3min_b3_3…ing_1500ms_2.tif'> ImageJ series contains unidentified dimension
<tifffile.TiffFile '1+2+4_3min_b3_3…ing_1500ms_3.tif'> ImageJ series contains unidentified dimension


## Data preprocessing

#### Collecting all three batches from each of the analyte into one matrix  

**At the same time we remove cosmic rays that may appear in individual spectra**

In [27]:
whos

Variable                             Type       Data/Info
---------------------------------------------------------
aceta_batch1_3min_mapping            ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
aceta_batch2_3min_mapping            ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
aceta_batch3_3min_mapping            ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
car_aceta_batch1_3min_mapping        ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
car_aceta_batch2_3min_mapping        ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
car_aceta_batch3_3min_mapping        ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
car_batch1_3min_mapping              ndarray    2500x1600: 4000000 elems, type `float64`, 32000000 bytes (30.517578125 Mb)
car_batch2_3min_mapping

### 1

In [None]:
car_collected_preprocessed = np.zeros((7500, 1600))
car_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(car_batch1_3min_mapping, 7)
car_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(car_batch2_3min_mapping, 7)
car_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(car_batch3_3min_mapping, 7)

### 2

In [None]:
thia_collected_preprocessed = np.zeros((7500, 1600))
thia_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(thia_batch1_3min_mapping, 7)
thia_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(thia_batch2_3min_mapping, 7)
thia_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(thia_batch3_3min_mapping, 7)

### 4

In [None]:
aceta_collected_preprocessed = np.zeros((7500, 1600))
aceta_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(aceta_batch1_3min_mapping, 7)
aceta_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(aceta_batch2_3min_mapping, 7)
aceta_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(aceta_batch3_3min_mapping, 7)

### 1 + 2

In [None]:
car_thia_collected_preprocessed = np.zeros((7500, 1600))
car_thia_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(car_thia_batch1_3min_mapping, 7)
car_thia_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(car_thia_batch2_3min_mapping, 7)
car_thia_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(car_thia_batch3_3min_mapping, 7)

### 1 + 4

In [None]:
car_aceta_collected_preprocessed = np.zeros((7500, 1600))
car_aceta_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(car_aceta_batch1_3min_mapping, 7)
car_aceta_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(car_aceta_batch2_3min_mapping, 7)
car_aceta_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(car_aceta_batch3_3min_mapping, 7)

### 2 + 4

In [None]:
thia_aceta_collected_preprocessed = np.zeros((7168, 1600))
thia_aceta_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(car_batch1_3min_mapping, 7)
thia_aceta_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(car_batch2_3min_mapping, 7)
thia_aceta_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(car_batch3_3min_mapping, 7)

### 1 + 2 + 4

In [None]:
car_thia_aceta_collected_preprocessed = np.zeros((7500, 1600))
car_thia_aceta_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(car_batch1_3min_mapping, 7)
car_thia_aceta_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(car_batch2_3min_mapping, 7)
car_thia_aceta_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(car_batch3_3min_mapping, 7)

### SVD on 1 (1 as in quantity not as 1 in Carbendazim) analyte

In [18]:
car_collected_preprocessed = np.zeros((7500, 1600))
car_collected_preprocessed[0:2500] = rpd.remove_cosmic_rays(car_batch1_3min_mapping, 7)
car_collected_preprocessed[2500:5000] = rpd.remove_cosmic_rays(car_batch2_3min_mapping, 7)
car_collected_preprocessed[5000:] = rpd.remove_cosmic_rays(car_batch3_3min_mapping, 7)

(7168, 1600)