# Part 1 Data Collection

## **ChEMBL Database**

The [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/) is a database that contains curated bioactivity data of more than 2 million compounds. It is compiled from more than 76,000 documents, 1.2 million assays and the data spans 13,000 targets and 1,800 cells and 33,000 indications.
[Data as of March 25, 2020; ChEMBL version 26].

Install the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database.

In [None]:
! pip install chembl_webresource_client

 **Importing libraries**

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

**Target search for coronavirus**

In [None]:
# Target search for coronavirus
target = new_client.target
target_query = target.search('Human immunodeficiency virus')
targets = pd.DataFrame.from_dict(target_query)
targets

**Select and retrieve bioactivity data for *SARS coronavirus 3C-like proteinase* (nth entry)**

In [None]:
selected_target = targets.target_chembl_id[4]
selected_target

Here, we will retrieve only bioactivity data for *coronavirus 3C-like proteinase* (CHEMBL3927) that are reported as IC$_{50}$ values in nM (nanomolar) unit.

In [None]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")

In [None]:
df = pd.DataFrame.from_dict(res)

In [None]:
df

In [None]:
df.shape

In [None]:
df.standard_type.unique()

We are collected the data from the chEMBL dataset. now saving the data into csv file.

In [None]:
df.to_csv('bioactivity_data.csv', index=False)

 The statement emphasizes that a lower drug concentration is generally preferred because it implies that a smaller amount of the drug is needed to achieve the desired effect. Conversely, a higher concentration would require a larger volume of the medication, which may not be feasible or practical in many situations.

**Copying the files to Drive**

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)

Next, we create a **data** folder in our **Colab Notebooks** folder on Google Drive.

In [None]:
! mkdir "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

 Attempting to copy a file named "bioactivity_data.csv" to the directory "/content/gdrive/My Drive/Colab Notebooks/data" within your Google Drive.

In [None]:
! cp bioactivity_data.csv "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

In [None]:
! ls -l "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

Let's see the CSV files that we have so far.

In [None]:
! ls

Taking a glimpse of the **bioactivity_data.csv** file that we've just created.

In [None]:
! head bioactivity_data.csv

**Handling missing data**
If any compounds has missing value for the **standard_value** column then drop it

In [None]:
df2= df[df.standard_value.notna()]
df2

Apparently, for this dataset there is no missing data. But we can use the above code cell for bioactivity data of other target protein.

**Data pre-processing of the bioactivity data**

### **Labeling compounds as either being active, inactive or intermediate**
The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**.

In [None]:
bioactivity_class = []
for i in df2.standard_value:
  if float(i) >= 10000:
    bioactivity_class.append("inactive")
  elif float(i) <= 1000:
    bioactivity_class.append("active")
  else:
    bioactivity_class.append("intermediate")

Iterate the Molecule_cheml_id, canonical_smiles,Standard_value and bioactivity_class into a list

In [None]:
mol_cid = []
for i in df2.molecule_chembl_id:
  mol_cid.append(i)

In [None]:
canonical_smiles = []
for i in df2.canonical_smiles:
  canonical_smiles.append(i)

In [None]:
standard_value= []
for i in df2.standard_value:
  standard_value.append(i)

Combining the list into a dataframe

In [None]:
data_tuples = list(zip(mol_cid, canonical_smiles, bioactivity_class, standard_value))
df3 = pd.DataFrame( data_tuples,  columns=['molecule_chembl_id', 'canonical_smiles', 'bioactivity_class', 'standard_value'])

In [None]:
df3

In [None]:
df3.shape

Alternative method

In [None]:
#selection = ['molecule_chembl_id', 'canonical_smiles', 'standard_value']
#df3 = df2[selection]
#df3

In [None]:
#df4=pd.concat([df3,pd.Series(bioactivity_class)], axis=1)
#df4

In [None]:
#df4.columns

In [None]:
#df4 = df4.rename(columns={0: 'bio_activity_class'})
#df4

In [None]:
df3.to_csv('bioactivity_preprocessed_data.csv', index=False)
df3

In [None]:
! ls -l

Let's copy to the Google Drive

In [None]:
! cp bioactivity_preprocessed_data.csv "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

In [None]:
! ls "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

# Part 2 Expolring the Data Analysis

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
! conda install -c rdkit rdkit -y
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

**Load Bioactivity data**

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("//content/bioactivity_preprocessed_data.csv")

In [None]:
df

**Calculate Lipinski descriptors**
Christopher Lipinski, a scientist at Pfizer, came up with a set of rule-of-thumb for evaluating the **druglikeness** of compounds. Such druglikeness is based on the Absorption, Distribution, Metabolism and Excretion (ADME) that is also known as the pharmacokinetic profile. Lipinski analyzed all orally active FDA-approved drugs in the formulation of what is to be known as the **Rule-of-Five** or **Lipinski's Rule**.

The Lipinski's Rule stated the following:
* Molecular weight < 500 Dalton
* Octanol-water partition coefficient (LogP) < 5
* Hydrogen bond donors < 5
* Hydrogen bond acceptors < 10

In [None]:
! pip install rdkit

In [None]:
#importing the libraries

import numpy as np
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski

In [None]:
import numpy as np
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski

def lipinski(smiles, verbose=False):
    moldata = []
    for elem in smiles:
        mol = Chem.MolFromSmiles(elem)
        if mol is not None:
            moldata.append(mol)
        else:
            print(f"Invalid SMILES string: {elem}")

    if not moldata:
        print("No valid molecules found. Check your input SMILES strings.")
        return None

    baseData = np.arange(1, 1)
    i = 0
    for mol in moldata:
        desc_MolWt = Descriptors.MolWt(mol)
        desc_MolLogP = Descriptors.MolLogP(mol)
        desc_NumHDonors = Lipinski.NumHDonors(mol)
        desc_NumHAcceptors = Lipinski.NumHAcceptors(mol)

        row = np.array([desc_MolWt, desc_MolLogP, desc_NumHDonors, desc_NumHAcceptors])

        if i == 0:
            baseData = row
        else:
            baseData = np.vstack([baseData, row])
        i = i + 1

    columnNames = ["MW", "LogP", "NumHDonors", "NumHAcceptors"]
    descriptors = pd.DataFrame(data=baseData, columns=columnNames)

    return descriptors


In [None]:
a=df.canonical_smiles
smiles_list=list(a)
print(smiles_list)

descriptors = lipinski(smiles_list)
descriptors

In [None]:
df_lipinski = lipinski(df.canonical_smiles)

**Combining the dataframes**

In [None]:
df_lipinski
#MW refers to molecular weight
#Lop p refers to solubility
#NumHDonors refer to Hydrogen bond donors
#NumHAcceptors refer to Hydrogen Bond Acceptors

In [None]:
df

In [None]:
df_lipinski.columns

In [None]:
df.columns

In [None]:
df_combine=pd.concat([df,df_lipinski],axis=1)

In [None]:
df_combine

**Convert IC50 to pIC50**
To allow **IC50** data to be more uniformly distributed, we will convert **IC50** to the negative logarithmic scale which is essentially **-log10(IC50)**.

This custom function pIC50() will accept a DataFrame as input and will:
* Take the IC50 values from the ``standard_value`` column and converts it from nM to M by multiplying the value by 10$^{-9}$
* Take the molar value and apply -log10
* Delete the ``standard_value`` column and create a new ``pIC50`` column

In [None]:
import numpy as np

def pIC50(input):
    pIC50 = []

    for i in input['standard_value_norm']:
        molar = i*(10**-9) # Converts nM to M
        pIC50.append(-np.log10(molar))

    input['pIC50'] = pIC50
    x = input.drop('standard_value_norm', 1)

    return x

Point to note: Values greater than 100,000,000 will be fixed at 100,000,000 otherwise the negative logarithmic value will become negative.

In [None]:
df_combine.standard_value.describe()

In [None]:
-np.log10( (10**-9)* 10000000000 )

In [None]:
-np.log10( (10**-9)* 100000000 )

In [None]:
def norm_value(input):
    norm = []

    for i in input['standard_value']:
        if i > 100000000:
          i = 100000000
        norm.append(i)

    input['standard_value_norm'] = norm
    x = input.drop('standard_value', 1)

    return x

In [None]:
df_norm = norm_value(df_combine)
df_norm

In [None]:
df_norm.to_csv('corona_bioactivity_data.csv', index=False)

In [None]:
!cp corona_bioactivity_data.csv "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

In [None]:
df_norm.standard_value_norm.describe()

In [None]:
df_final = pIC50(df_norm)
df_final

In [None]:
df_final.pIC50.describe()

**Removing the 'intermediate' bioactivity class**

Here, we will be removing the ``intermediate`` class from our data set.

In [None]:
df_2class = df_final[df_final.bioactivity_class != 'intermediate']
df_2class

In [None]:
df_2class.to_csv('corona_bioactivity_data.csv', index=False)

**Exploratory Data Analysis (Chemical Space Analysis) via Lipinski descriptors**

In [None]:
import seaborn as sns
sns.set(style='ticks')
import matplotlib.pyplot as plt

**Frequency plot of the 2 bioactivity classes**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.countplot(x='bioactivity_class', data=df_2class, edgecolor='black')

plt.xlabel('Bio activity class', fontsize=14, fontweight='bold')
plt.ylabel('Frequency', fontsize=14, fontweight='bold')

plt.savefig('plot_bio_activity_class.pdf')

**Scatter plot of MW versus LogP**

It can be seen that the 2 bioactivity classes are spanning similar chemical spaces as evident by the scatter plot of MW vs LogP.

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.scatterplot(x='MW', y='LogP', data=df_2class, hue='bioactivity_class', size='pIC50', edgecolor='black', alpha=0.7)

plt.xlabel('MW', fontsize=14, fontweight='bold')
plt.ylabel('LogP', fontsize=14, fontweight='bold')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0)
plt.savefig('plot_MW_vs_LogP.pdf')

**pIC50 Value**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'bioactivity_class', y = 'pIC50', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('pIC50 value', fontsize=14, fontweight='bold')

plt.savefig('plot_ic50.pdf')

In [None]:
def mannwhitney(descriptor, verbose=False):
  # https://machinelearningmastery.com/nonparametric-statistical-significance-tests-in-python/
  from numpy.random import seed
  from numpy.random import randn
  from scipy.stats import mannwhitneyu

# seed the random number generator
  seed(1)

# actives and inactives
  selection = [descriptor, 'bioactivity_class']
  df = df_2class[selection]
  active = df[df.bioactivity_class == 'active']
  active = active[descriptor]

  selection = [descriptor, 'bioactivity_class']
  df = df_2class[selection]
  inactive = df[df.bioactivity_class == 'inactive']
  inactive = inactive[descriptor]

# compare samples
  stat, p = mannwhitneyu(active, inactive)
  #print('Statistics=%.3f, p=%.3f' % (stat, p))

# interpret
  alpha = 0.05
  if p > alpha:
    interpretation = 'Same distribution (fail to reject H0)'
  else:
    interpretation = 'Different distribution (reject H0)'

  results = pd.DataFrame({'Descriptor':descriptor,
                          'Statistics':stat,
                          'p':p,
                          'alpha':alpha,
                          'Interpretation':interpretation}, index=[0])
  filename = 'mannwhitneyu_' + descriptor + '.csv'
  results.to_csv(filename)

  return results

In [None]:
mannwhitney('pIC50')

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'bioactivity_class', y = 'MW', data = df_2class)

plt.xlabel('Bioactivity class', fontsize=14, fontweight='bold')
plt.ylabel('MW', fontsize=14, fontweight='bold')

plt.savefig('plot_MW.pdf')

In [None]:
mannwhitney("MW")

**Log p**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'bioactivity_class', y = 'LogP', data = df_2class)

plt.xlabel('Bio_activity class', fontsize=14, fontweight='bold')
plt.ylabel('LogP', fontsize=14, fontweight='bold')

plt.savefig('plot_LogP.pdf')

In [None]:
mannwhitney('LogP')

**NUMHDonors**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'bioactivity_class', y = 'NumHDonors', data = df_2class)

plt.xlabel('Bio_activity class', fontsize=14, fontweight='bold')
plt.ylabel('NumHDonors', fontsize=14, fontweight='bold')

plt.savefig('plot_NumHDonors.pdf')

In [None]:
mannwhitney("NumHDonors")

**NumHAcceptors**

In [None]:
plt.figure(figsize=(5.5, 5.5))

sns.boxplot(x = 'bioactivity_class', y = 'NumHAcceptors', data = df_2class)

plt.xlabel('Bio_activity class', fontsize=14, fontweight='bold')
plt.ylabel('NumHAcceptors', fontsize=14, fontweight='bold')

plt.savefig('plot_NumHAcceptors.pdf')

In [None]:
mannwhitney('NumHAcceptors')

**pIC50 values**

Taking a look at pIC50 values, the actives and inactives displayed statistically significant difference, which is to be expected since threshold values (IC50 < 1,000 nM = Actives while IC50 > 10,000 nM = Inactives, corresponding to pIC50 > 6 = Actives and pIC50 < 5 = Inactives) were used to define actives and inactives.

**Lipinski's descriptors**

 The 4 Lipinski's descriptors (MW, LogP, NumHDonors and NumHAcceptors), only LogP exhibited no difference between the actives and inactives while the other 3 descriptors (MW, NumHDonors and NumHAcceptors) shows statistically significant difference between actives and inactives.

In [None]:
#Ziping the all the output files
! zip -r results.zip . -i *.csv *.pdf

# Part 3 - Descriptor Calculation and Dataset Preparation

we will be calculating molecular descriptors that are essentially quantitative description of the compounds in the dataset

lipinski descriptor will provide us with a set of simple molecular descriptors that essentially will be giving us a quick overview of the drug-like properties of the molecule and so historically christopher lipinski created a set of four descriptors that he had investigated in his research that are responsible for drug-like properties whereby he analyzed a set of orally active drugs and then he came up with this rule of five whereby compounds that are passing the rule of five will make good oral drugs.

The lipinski descriptor will be describing the global features of the molecule, in particular the molecular size of the molecule, its solubility, and the number of hydrogen bond donors and acceptors, which is the propensity to accept and donate hydrogen bonds, while the pubchem fingerprints, which we will be using today as well for the model building, are describing the local features of the molecules.

In [None]:
! wget https://github.com/dataprofessor/bioinformatics/raw/master/padel.zip
! wget https://github.com/dataprofessor/bioinformatics/raw/master/padel.sh

In [None]:
! unzip padel.zip

In [None]:
import pandas as pd

In [None]:
df3 = pd.read_csv('//content/corona_bioactivity_data.csv')
df3

In [None]:
selection = ['canonical_smiles','molecule_chembl_id']
df3_selection = df3[selection]
df3_selection.to_csv('molecule.smi', sep='\t', index=False, header=False)

In [None]:
! cat molecule.smi | head -5

In [None]:
! cat molecule.smi | wc -l

**Calculate fingerprint descriptors
Calculate PaDEL descriptors**

In [None]:
! cat padel.sh

In [None]:
! bash padel.sh

In [None]:
! ls -l

**Preparing the X and Y Data Matrices
X data matrix**

In [None]:
df3_X = pd.read_csv('descriptors_output.csv')

In [None]:
df3_X

In [None]:
df3_X = df3_X.drop(columns=['Name'])
df3_X

**Y variable
Convert IC50 to pIC50**

In [None]:
df3_Y = df3['pIC50']
df3_Y

In [None]:
#Combining X and Y
dataset3 = pd.concat([df3_X,df3_Y], axis=1)
dataset3

In [None]:
dataset3.to_csv('corona_bioactivity_data_3class_pIC50_pubchem_fp.csv', index=False)

In [None]:
!cp corona_bioactivity_data_3class_pIC50_pubchem_fp.csv "/content/gdrive/My Drive/Colab Notebooks/Bio_project"

## From Here.....
https://github.com/gaganchapa/Credit-Card-Fraud-Detection/blob/main/data.csv





# Part 4 Regression Models with Random Forest

In [None]:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

In [None]:
df = pd.read_csv('corona_bioactivity_data_3class_pIC50_pubchem_fp.csv')

In [None]:
df.to_csv("data.csv")

**Input Features**

In [None]:
X = df.drop('pIC50', axis=1)
X

In [None]:
Y = df.pIC50
Y

In [None]:
X.shape

In [None]:
Y.shape

In [None]:
#Removing the low varience features
from sklearn.feature_selection import VarianceThreshold
selection = VarianceThreshold(threshold=(.8 * (1 - .8)))
X = selection.fit_transform(X)

In [None]:
X.shape

In [None]:
X

**Spliting the dataset**

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

In [None]:
X_train.shape, Y_train.shape

In [None]:
X_test.shape, Y_test.shape

In [None]:
X_train

**Building a Regression Model using Random Forest**

In [None]:
from sklearn.metrics import mean_squared_error


In [None]:
model_forest = RandomForestRegressor(n_estimators=10)
model_forest.fit(X_train, Y_train)


In [None]:
Y_pred = model_forest.predict(X_test)

In [None]:
mse_forest = mean_squared_error(Y_test, Y_pred)
print("MSE for Random Forest: ",mse_forest)

In [None]:
!pip install seaborn

In [None]:
!pip install  matplotlib

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.set(color_codes=True)
sns.set_style("white")

ax = sns.regplot(x=Y_test, y=Y_pred, scatter_kws={'alpha':0.4})
ax.set_xlabel('Experimental pIC50', fontsize='large', fontweight='bold')
ax.set_ylabel('Predicted pIC50', fontsize='large', fontweight='bold')
ax.set_xlim(0, 12)
ax.set_ylim(0, 12)
ax.figure.set_size_inches(5, 5)
plt.show()

# **ANN**

In [None]:
!pip install visualkeras==0.0.1

In [None]:
import visualkeras


In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense,Dropout

In [None]:
model = Sequential()
model.add(Dense(128, activation='relu', input_dim = X.shape[1]))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))

model.add(Dense(1, activation = 'linear'))

In [None]:
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
visualkeras.layered_view(model)

In [None]:
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
model.summary()

In [None]:
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
history = model.fit(X_train, Y_train, validation_split=0.5, epochs =15)


In [None]:
ann_y_pred = model.predict(X_test)

In [None]:
from sklearn.metrics import r2_score
import numpy as np
from sklearn.metrics import mean_squared_error

# r2 = r2_score(ann_y_pred, Y_test)
mse_ann = mean_squared_error(ann_y_pred, Y_test)
root_mse = np.sqrt(mse_ann)

print('MSE score:', root_mse)

In [None]:
from matplotlib import pyplot as plt
#plot the training and validation accuracy and loss at each epoch
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'y', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

CNN


In [None]:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Conv1D, MaxPool1D

from tensorflow.keras.optimizers import Adam


print(tf.__version__)

In [None]:
X.shape[1]

In [None]:
X

In [None]:
X_train.shape

In [None]:
X_train = X_train.reshape(16,130,1)

In [None]:
model_cnn = Sequential()
model_cnn.add(Conv1D(filters=64,kernel_size=3,activation='relu', input_shape =(130,1)))
# model.add(MaxPool1D(pool_size=2))
model_cnn.add(Conv1D(filters=32, kernel_size=3, activation='relu'))
# model.add(MaxPool1D(pool_size=2))
# model.add(Conv1D(filters=64, kernel_size=2, activation='relu'))
# model_cnn.add(Conv1D(filters=16, kernel_size=2, activation='relu'))
model_cnn.add(Flatten())
model_cnn.add(Dense(16, activation='relu'))


model_cnn.add(Dense(1, activation='linear'))

In [None]:
from keras.utils.vis_utils import plot_model
plot_model(model_cnn, to_file='model_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
model_cnn.summary()


In [None]:
model_cnn.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae'])
history_cnn=model_cnn.fit(X_train,Y_train,epochs=25, validation_split= 0.3,verbose=1)

In [None]:
cnn_y_pred = model_cnn.predict(X_test)


In [None]:
from sklearn.metrics import r2_score
r2 = r2_score(cnn_y_pred, Y_test)
print('R2 score:', r2)

In [None]:
# r2 = r2_score(ann_y_pred, Y_test)
mse_cnn = mean_squared_error(cnn_y_pred, Y_test)
# root_mse = np.sqrt(mse_ann)

print('MSE score:', mse_cnn)

In [None]:
from matplotlib import pyplot as plt
#plot the training and validation accuracy and loss at each epoch
loss = history_cnn.history['loss']
val_loss = history_cnn.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'y', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
import matplotlib.pyplot as plt
fig = plt.figure()
ax.set_title('Loss on Different Models')
ax = fig.add_axes([0,0,1,1])
langs = ['ANN', 'CNN', 'Random Forest']
students = [mse_ann,mse_cnn,mse_forest]
ax.bar(langs,students)
plt.title("Loss Plot on different Models")
plt.show()