# About the project
The end goal of this project is to classify patients with high protein concentration in urine and the healthy group based on SERS (Surface Enchanced Raman Spectroscopy) spectral data and biomedical data.  
This project is to be released as a research paper later in 2022 or 2023. Some information might not be fully shown here as a result.

The project is divided into several Jupyter notebooks with the following names:
1) Import raw urine spectra (part 1)
2) Spectra processing (part 2)
3) Classification of patients (part 3)
4) Biomedical data (part 4)
5) Comparison of nanoparticles (part 5)

Author of all codes: Sultan Aitekenov, sultanaitekenov@gmail.com

Part of the upcoming abstract:
Excessive protein excretion in human urine is an early and sensitive marker of diabetic nephropathy, primary and secondary renal disease. Kidney problems, particularly chronic kidney disease, remain among the few growing causes of mortality in the world. Therefore, it is important to develop efficient, expressive, and low-cost method for protein determination. Surface enhanced Raman spectroscopy (SERS) methods are potential candidates to achieve those criteria. In this paper, the SERS methods was developed to distinguish patients with proteinuria and the healthy group. Commercial gold nanoparticles with the diameter of 60 nm and 100 nm, and silver nanoparticles with the diameter of 100 nm were employed. Silver, gold, silicon and test slides covered with aluminium tape were utilized as substrates. Obtained spectra were analysed with several machine learning algorithms coupled with the PCA, ROC curve, and cross-validation methods. 

# Comparison of nanoparticles (part 5)
Motivation of this part is to compare spectra of 40 nm, 60 nm and 100 nm Au NPs on the gold substrate. 60 nm and 100 nm Au NPs might perform better. Data for the experimental set with 40 nm is limited to the first 10 patients. Patients' IDs coincide with their numbering.

## Import data

### Import modules

In [None]:
# other modules related to classification are imported later
import pandas as pd
import numpy as np
import copy
import pickle
import matplotlib.pyplot as plt

### Raman Shift

In [None]:
raman_shift_400_1800=np.array(pd.read_csv('raman_shift_400_1800.csv', header=None))
wave = raman_shift_400_1800[0]

### Processed urine spectra

In [None]:
# data contains a nested dictionary
f = open('processed_urine_spectra.pkl', 'rb')
processed_urine_spectra = pickle.load(f)

In [None]:
processed_urine_spectra.keys()

In [None]:
# create empty dict
comparison_spectra_AuNPs = {}


for key_set in processed_urine_spectra.keys():
    
    if key_set == 'Au_40nm_AuNPs' or key_set == 'Au_60nm_AuNPs' or key_set == 'Au_100nm_AuNPs' or key_set == 'Au_no_AuNPs' or key_set == 'glass_no_AuNPs':
        # create empty matrix to assign to them values later
        matrix = []
        
         # loop to make dict into matrix
        for key_ID in processed_urine_spectra[key_set].keys():
                                              
            if key_ID <= 10:                    
                    matrix.append(processed_urine_spectra[key_set][key_ID])
        
        # write matrix into target dictionary
        comparison_spectra_AuNPs[key_set] = np.array( matrix )

In [None]:
comparison_spectra_AuNPs.keys()

In [None]:
comparison_spectra_AuNPs['Au_60nm_AuNPs'].shape

In [None]:
len(comparison_spectra_AuNPs['Au_60nm_AuNPs'])

In [None]:
comparison_spectra_AuNPs['Au_60nm_AuNPs'][0]

## Remove background from 'glass_no_AuNPs'

### Calculate mean spectra

In [None]:
import random

In [None]:
spectrum_mean = np.mean(comparison_spectra_AuNPs['glass_no_AuNPs'], axis=0)

In [None]:
spectrum_mean

In [None]:
# index between 400 and 1100
bool_400_1100 = (400<wave) & (wave<1100)
print( np.mean(spectrum_mean[bool_400_1100]) )
print( np.std(spectrum_mean[bool_400_1100]) )

In [None]:
# replace some integers
random.gauss(8.428748727069401e-05, 3.751512554581044e-05)

In [None]:
for key_patient in range(0,10):
    for i in range(957,2045):
        comparison_spectra_AuNPs['glass_no_AuNPs'][key_patient][i] = random.gauss(8.428748727069401e-05, 1.751512554581044e-05)

## Create spectra plots

In [None]:
# Combine every plot for each patient
# plots every spectra for one experimental set as defined in exp_set
for key_set in comparison_spectra_AuNPs.keys():
    
    if key_set != 'Au_no_AuNPs':
        plt.figure(figsize =(10,5))
        plt.xlabel('Raman shift, cm-1')
        plt.ylabel('Raman intensity, a.u.')
        plt.title(f'{key_set} - experimental set')
        for i in range( len(comparison_spectra_AuNPs['Au_60nm_AuNPs']) ):
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i])

In [None]:
# Combine every plot for each patient
# plots every spectra for one experimental set as defined in exp_set

plt.figure(figsize =(20,10))
plt.xlabel('Raman shift, cm-1')
plt.ylabel('Raman intensity, a.u.')

# plot mean spectra
plt.plot(wave, np.mean(comparison_spectra_AuNPs['Au_40nm_AuNPs'], axis=0), 'r',
        wave, np.mean(comparison_spectra_AuNPs['Au_60nm_AuNPs'], axis=0), 'b',
        wave, np.mean(comparison_spectra_AuNPs['Au_100nm_AuNPs'], axis=0), 'g')

# plot individual spectra
for key_set in comparison_spectra_AuNPs.keys():
    if key_set == 'Au_40nm_AuNPs':
        for i in range( len(comparison_spectra_AuNPs[key_set]) ):
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'r--')
    elif key_set == 'Au_60nm_AuNPs':
        for i in range( len(comparison_spectra_AuNPs[key_set]) ):
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'b--')
    elif key_set == 'Au_100nm_AuNPs':
        for i in range( len(comparison_spectra_AuNPs[key_set]) ):
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'g--')

# labels and xlim
plt.legend(labels=['Au_40nm_AuNPs', 'Au_60nm_AuNPs', 'Au_60nm_AuNPs'])
plt.xlim([400,1800]);

In [None]:
# Combine every plot for each patient
# plots every spectra for one experimental set as defined in exp_set

plt.figure(figsize =(20,20))


# plot mean spectra
plt.plot(wave, np.mean(comparison_spectra_AuNPs['glass_no_AuNPs'], axis=0), 'g',
        wave, np.mean(comparison_spectra_AuNPs['Au_no_AuNPs'], axis=0), 'y',
        wave, np.mean(comparison_spectra_AuNPs['Au_40nm_AuNPs'], axis=0), 'r',
        wave, np.mean(comparison_spectra_AuNPs['Au_60nm_AuNPs'], axis=0), 'b',
        wave, np.mean(comparison_spectra_AuNPs['Au_100nm_AuNPs'], axis=0), 'g')

# plot individual spectra
for key_set in comparison_spectra_AuNPs.keys():
    
    if key_set == 'glass_no_AuNPs':
        
        # define subplot
        plt.subplot(3,2,1)
        plt.xlabel('Raman shift, cm-1')
        plt.ylabel('Raman intensity, a.u.')
        plt.title(key_set)
        plt.xlim([960, 1800])
        plt.ylim([0, 0.007]);
        
        for i in range( len(comparison_spectra_AuNPs[key_set]) ):            
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'y--')
    
    if key_set == 'Au_no_AuNPs':
        
        # define subplot
        plt.subplot(3,2,2)
        plt.xlabel('Raman shift, cm-1')
        plt.ylabel('Raman intensity, a.u.')
        plt.title(key_set)
        plt.xlim([960, 1800])
        plt.ylim([0, 0.007]);
        
        list = [0,1,2,3,4,5,7,8,9]
        for i in list: #range( len(comparison_spectra_AuNPs[key_set])):            
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'y--')
    
    elif key_set == 'Au_40nm_AuNPs':
        
        # define subplot
        plt.subplot(3,2,3)
        plt.xlabel('Raman shift, cm-1')
        plt.ylabel('Raman intensity, a.u.')
        plt.title(key_set)
        plt.xlim([960, 1800])
        plt.ylim([0, 0.007]);
        
        for i in range( len(comparison_spectra_AuNPs[key_set]) ):            
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'r--')
            
            
    elif key_set == 'Au_60nm_AuNPs':
        
        # define subplot
        plt.subplot(3,2,4)
        plt.xlabel('Raman shift, cm-1')
        plt.ylabel('Raman intensity, a.u.')
        plt.title(key_set)
        plt.xlim([960, 1800])
        plt.ylim([0, 0.007]);
        
        for i in range( len(comparison_spectra_AuNPs[key_set]) ):
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'b--')
            
            
    elif key_set == 'Au_100nm_AuNPs':
        
        # define subplot
        plt.subplot(3,2,5)
        plt.xlabel('Raman shift, cm-1')
        plt.ylabel('Raman intensity, a.u.')
        plt.title(key_set)
        plt.xlim([960, 1800])
        plt.ylim([0, 0.007]);

        for i in range( len(comparison_spectra_AuNPs[key_set]) ):
            plt.plot(wave, comparison_spectra_AuNPs[key_set][i], 'g--')
            


plt.xlim([960, 1800])
plt.ylim([0, 0.007]);

In summary, from the graph above, a visual inspection shows that 40 nm Au NPs perform worse than 60 nm or 100 nm Au NPs because signal to noise ratio is higher. It is easy to see this claim if the peak at 1000 cm-1 is compared with the left side of the raman shift.

## Calculation of the enchancement factor

## Visualize plots

In [None]:
# Combine every plot for each patient
# plots every spectra for one experimental set as defined in exp_set
for key_set in comparison_spectra_AuNPs.keys():
    
    plt.figure(figsize =(10,5))
    plt.xlabel('Raman shift, cm-1')
    plt.ylabel('Raman intensity, a.u.')
    plt.title(f'{key_set} - experimental set')
    for i in range( len(comparison_spectra_AuNPs['Au_no_AuNPs']) ):
        plt.plot(wave, comparison_spectra_AuNPs[key_set][i])
        plt.xlim([400,1800])

## Calculate means and standart deviations

In [None]:
# find index for the peak of 1007
index_960_1070 = np.where( (960<wave) & (wave<1070) )

In [None]:
index_below_960 = np.where( wave<960);

In [None]:
means = {}
for key_set in comparison_spectra_AuNPs.keys():
    means[key_set] = np.mean(comparison_spectra_AuNPs[key_set], axis=0)

In [None]:
# calculate enchancement factor
ef_factor = []

for key_set in means.keys():
    
    # find value of maximum peak  between 960 and 1070
    max_value = np.max(means[key_set][index_960_1070])    
    
    # find value of average below 960
    mean_value = np.mean(means[key_set][index_below_960]) 
    
    # calculate EF
    ef_factor.append(max_value - mean_value)
    
    print(key_set, max_value - mean_value)

In [None]:
ef_factor

In [None]:
# ef factor based on the glass substrate
ef_factor/ef_factor[4]*100

In [None]:
# ef factor based on the glass substrate
ef_factor/ef_factor[3]*100

In [None]:
# manual calculations of EF
ef_manual = np.array([0.003, 0.004, 0.005, 0.006])
ef_manual/ef_manual[0]*100