# Imported Modules

### pubchempy
The packages that are necessary in this project include pubchempy which enables us to get the compound ID of our statins. These compound IDs will help us get the smiles of our desired compounds of which in this case its the statins that were requested.

### rdkit
The rdkit module assists us to get access to the Chem module that allows us to convert a smile to a molecule that will enable us to get the fingerprints of the statins through the use of rdmolops. RDkit also help us gain access to the DataStructs that enables us to calculate the TanimotoSimilarity between the statins.

In [13]:
import pubchempy as pcp
import rdkit
from rdkit import Chem
from rdkit.Chem import rdmolops
from rdkit import DataStructs

# Compound ID and Smiles
In this section we are trying to get the compound IDs of our statins via pubchem through the use of names of the statins. After that, we extract 
the smiles of each statins based on their respective compound IDs.

In [20]:
simvastin = pcp.get_compounds("Simvastatin", "name") #Getting the compound ID of the statin
pravastatin = pcp.get_compounds("Pravastatin", "name")
atorvastatin = pcp.get_compounds("Atorvastatin", "name")
fluvastatin = pcp.get_compounds("Fluvastatin", "name")
# print(simvastin_smiles)

#smiles
simvastin_smiles = simvastin[0].isomeric_smiles #Getting the smiles of the statins
pravastatin_smiles = pravastatin[0].isomeric_smiles
atorvastatin_smiles = atorvastatin[0].isomeric_smiles
fluvastatin_smiles = fluvastatin[0].isomeric_smiles

print("Simvastatin :",simvastin_smiles)
print("Pravastatin :",pravastatin_smiles)
print("Atorvastatin :",atorvastatin_smiles)
print("Fluvastatin :",fluvastatin_smiles)

Simvastatin : CCC(C)(C)C(=O)O[C@H]1C[C@H](C=C2[C@H]1[C@H]([C@H](C=C2)C)CC[C@@H]3C[C@H](CC(=O)O3)O)C
Pravastatin : CC[C@H](C)C(=O)O[C@H]1C[C@@H](C=C2[C@H]1[C@H]([C@H](C=C2)C)CC[C@H](C[C@H](CC(=O)O)O)O)O
Atorvastatin : CC(C)C1=C(C(=C(N1CC[C@H](C[C@H](CC(=O)O)O)O)C2=CC=C(C=C2)F)C3=CC=CC=C3)C(=O)NC4=CC=CC=C4
Fluvastatin : CC(C)N1C2=CC=CC=C2C(=C1/C=C/[C@H](C[C@H](CC(=O)O)O)O)C3=CC=C(C=C3)F


# Converting Smiles to RDKit Molecules
In this section we are converting smiles to RDKit molecules that we will need to get the fingerprints of these statins.

In [22]:
simvastin_mol = Chem.MolFromSmiles(simvastin_smiles) #Smiles to RdKit Molecules
pravastatin_mol = Chem.MolFromSmiles(pravastatin_smiles)
atorvastatin_mol = Chem.MolFromSmiles(atorvastatin_smiles)
fluvastatin_mol = Chem.MolFromSmiles(fluvastatin_smiles)

# print(simvastin_mol)
# print(pravastatin_mol)
# print(atorvastatin_mol)
# print(fluvastatin_mol)

# Fingerprints
We are getting the fingerprints of the statins through the use of rdmolops module.

In [24]:
simvastin_fp = rdmolops.RDKFingerprint(simvastin_mol) #Getting the fingerprint of the statin
pravastatin_fp = rdmolops.RDKFingerprint(pravastatin_mol)
atorvastatin_fp = rdmolops.RDKFingerprint(atorvastatin_mol)
fluvastatin_fp = rdmolops.RDKFingerprint(fluvastatin_mol)

print(simvastin_fp)
print(pravastatin_fp)
print(atorvastatin_fp)
print(fluvastatin_fp)

<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x000001AE8AB0A0A0>
<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x000001AE8AB09D20>
<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x000001AE8AB0A6C0>
<rdkit.DataStructs.cDataStructs.ExplicitBitVect object at 0x000001AE8AB0A570>


# Tanimoto score
Here we first created a dictionary of the name of the statin and the fingerprint of the statin. We then create a for loop that will loop through the dictionary's key value pair, meaning enumerate through the dictionary. Through this looping, we get to calculate the Tanimoto Similarity between the fingerprints of the main statin (statin that we are comparing against) and the statins (statins we are comparing to) that are part of the dictionary.

NB : The main statin is not part of the dictionary because it is the one that we are comparing against.

The same process is followed for each statin :~
  ### 1. Simvastatin

In [33]:
#Calculate Tanimoto score for Simvastatin
statins_comp_one = {"Pravastatin" : pravastatin_fp, "Atorvastatin" : atorvastatin_fp, "Fluvastatin" : fluvastatin_fp}

for position, (statin, fingerprint) in enumerate(statins_comp_one.items(), start=1):
    simvastatin_tanimoto_score = DataStructs.TanimotoSimilarity(simvastin_fp, fingerprint)
    print("Similarity between Simvastatin and ",statin,"is :", simvastatin_tanimoto_score)


Similarity between Simvastatin and  Pravastatin is : 0.8719879518072289
Similarity between Simvastatin and  Atorvastatin is : 0.3004318322023442
Similarity between Simvastatin and  Fluvastatin is : 0.2929368029739777


### 2. Pravastatin

In [35]:
#Calculate Tanimoto score for Pravastatin
statins_comp_two = {"Simvastatin" : simvastin_fp, "Atorvastatin" : atorvastatin_fp, "Fluvastatin" : fluvastatin_fp}

for position, (statin, fingerprint) in enumerate(statins_comp_two.items(), start=1):
    pravastatin_tanimoto_score = DataStructs.TanimotoSimilarity(pravastatin_fp, fingerprint)
    print("Similarity between Pravastatin and ",statin,"is :", pravastatin_tanimoto_score)


Similarity between Pravastatin and  Simvastatin is : 0.8719879518072289
Similarity between Pravastatin and  Atorvastatin is : 0.30357142857142855
Similarity between Pravastatin and  Fluvastatin is : 0.3044776119402985


### 3. Atorvastatin

In [37]:
#Calculate Tanimoto score for Atorvastatin
statins_comp_three = {"Simvastatin" : simvastin_fp,"Pravastatin" : pravastatin_fp,"Fluvastatin" : fluvastatin_fp}
# print(statins_comp_three[1])

for position, (statin, fingerprint) in enumerate(statins_comp_three.items(), start=1):
    # print(position, statin, fingerprint)
    atorvastatin_tanimoto_score = DataStructs.TanimotoSimilarity(atorvastatin_fp, fingerprint)
    print("Similarity between Atorvastatin and ",statin,"is :", atorvastatin_tanimoto_score)


Similarity between Atorvastatin and  Simvastatin is : 0.3004318322023442
Similarity between Atorvastatin and  Pravastatin is : 0.30357142857142855
Similarity between Atorvastatin and  Fluvastatin is : 0.5218404193360513


### 4. Fluvastatin

In [39]:
#Calculate Tanimoto score for Fluvastatin
statins_comp_four = {"Simvastatin" : simvastin_fp,"Pravastatin" : pravastatin_fp,"Atorvastatin" : atorvastatin_fp}

for position, (statin, fingerprint) in enumerate(statins_comp_four.items(), start=1):
    fluvastatin_tanimoto_score = DataStructs.TanimotoSimilarity(fluvastatin_fp, fingerprint)
    print("Similarity between Fluvastatin and ",statin,"is :", fluvastatin_tanimoto_score)

Similarity between Fluvastatin and  Simvastatin is : 0.2929368029739777
Similarity between Fluvastatin and  Pravastatin is : 0.3044776119402985
Similarity between Fluvastatin and  Atorvastatin is : 0.5218404193360513


# Closest Match
Here we are stating the statins that show the greatest similarity with other statins. The similarity showed below is the ones that was the greatest than the other similarity values hence a greater TanimatoSimilarity shows a higher similarity.
### 1. Simvastatin
Simvastatin and Pravastatin shows a greater degree of similarity with a tanimoto score of 0.872 which is higher tanimoto similarity than other statins.

### 2. Pravastatin
As mentioned previously, Pravastatin and Simvastatin shows a higher similarity with a tanimoto score 0.872.

### 3. Atorvastatin
Atorvastatin and Fluvastatin have a greater similarity with a tanimoto score of is 0.522.

### 4. Fluvastatin
As mentioned previously, Fluvastatin and Atorvastatin have a higher similairty in comparison to other statins with a tanimoto score of 0.522.

# Challenges
The challenge that I encountered was when I was trying to display the names of the statins that I was currently calculating their similarity via the loop, that is when I realized I needed to create a dictionary to store the name of the statin and the fingerprint of the statin. With this dictionary, it was easy to display the names of the statins that we are comparing and their similarity value.

# Potential Improvements
It would have been better if I displayed the closest match of each statin in a calculated way but I just did it manually as I do see the values.