<a href="https://colab.research.google.com/github/KeoratileShongoane/AML/blob/main/Tanimoto_Score.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Table of Contents**

- [1. Introduction](#1-introduction)
- [2. Objective](#2-objective)
- [3. Methodology](#3-methodology)
  - [3.1 Data Collection](#31-data-collection)
  - [3.2 Molecular Representation](#32-molecular-representation)
  - [3.3 Fingerprint Generation](#33-fingerprint-generation)
  - [3.4 Tanimoto Score](#34-tanimoto-score)
- [4. Analysis](#4-analysis)
- [5. Conclusion](#5-conclusion)




# __Imports__


In [1]:
# Installation
%pip install rdkit-pypi
%pip install pubchempy


# Importing libraries

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import DataStructs
import pubchempy as pcp


Collecting rdkit-pypi
  Downloading rdkit_pypi-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Downloading rdkit_pypi-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit-pypi
Successfully installed rdkit-pypi-2022.9.5
Collecting pubchempy
  Downloading PubChemPy-1.0.4.tar.gz (29 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pubchempy
  Building wheel for pubchempy (setup.py) ... [?25l[?25hdone
  Created wheel for pubchempy: filename=PubChemPy-1.0.4-py3-none-any.whl size=13820 sha256=d45fdc8c2fc26ada44757005749e635dc9d256c342e66a79dade59049c249b05
  Stored in directory: /root/.cache/pip/wheels/90/7c/45/18a0671e3c3316966ef7ed9ad2b3f3300a7e41d3421a44e799
Successfully built pubchempy
Installing collected packages: p

# __1. Introduction__

Molecular fingerprinting is a technique used to represent molecules as binary vectors or strings, enabling efficient comparison of molecular structures. In this assignment,molecular fingerprints will be used to calculate Tanimoto similarity scores between a set of statins—Simvastatin, Pravastatin, Atorvastatin, and Fluvastatin.

# __2. Objective__
The objective of this assignment is to calculate and analyze the pairwise Tanimoto similarity scores between the selected statins based on their molecular fingerprints. This analysis will help in understanding the structural similarity among these compounds.

# __3. Methodology__

### __3.1 Data Collection__
SMILES strings for the statins were fetched from the PubChem database using the PubChemPy library. These strings provide a textual representation of the molecular structures.

In [2]:
# List of statin names
statins = ["Simvastatin", "Pravastatin", "Atorvastatin", "Fluvastatin"]

# Fetch SMILES strings using PubChemPy
smiles_dict = {}
for statin in statins:
    compound = pcp.get_compounds(statin, 'name')[0]
    smiles_dict[statin] = compound.isomeric_smiles

###__3.2 Molecular Representation__
The SMILES strings are converted to molecular representations using the RDKit library. These representations are used to generate molecular fingerprints.

In [3]:
# Convert SMILES to molecular representations using RDKit
molecules = {name: Chem.MolFromSmiles(smiles) for name, smiles in smiles_dict.items()}

###__3.3 Fingerprint Generation__


Molecular fingerprints are generated using the RDKit library's Morgan fingerprint algorithm. These fingerprints are binary vectors that capture structural features of the molecules.

In [4]:
# Create a FingerprintGenerator
fp_generator = AllChem.GetMorganFingerprintAsBitVect

# Generate fingerprints
fingerprints = {name: fp_generator(mol, radius=2, nBits=2048) for name, mol in molecules.items()}


###__3.4 Tanimoto Score__


In [5]:
# Calculate Tanimoto scores between every possible pair
tanimoto_scores = {}
names = list(fingerprints.keys())

for i in range(len(names)):
    for j in range(i+1, len(names)):
        name1, name2 = names[i], names[j]
        score = DataStructs.TanimotoSimilarity(fingerprints[name1], fingerprints[name2])
        tanimoto_scores[(name1, name2)] = score

# Print the Tanimoto scores
for pair, score in tanimoto_scores.items():
    print(f"Tanimoto score between {pair[0]} and {pair[1]}: {score:.4f}")


Tanimoto score between Simvastatin and Pravastatin: 0.3855
Tanimoto score between Simvastatin and Atorvastatin: 0.0755
Tanimoto score between Simvastatin and Fluvastatin: 0.0800
Tanimoto score between Pravastatin and Atorvastatin: 0.2366
Tanimoto score between Pravastatin and Fluvastatin: 0.1978
Tanimoto score between Atorvastatin and Fluvastatin: 0.4026


#__4. Analysis__


Analysis
The Tanimoto scores provide a measure of similarity between the molecular fingerprints of different statins. A score of 1 indicates identical molecular structures, while a score closer to 0 indicates little to no similarity.

Simvastatin and Pravastatin (0.3855): The score suggests that these two statins have moderate structural similarity. Although their chemical makeup is different, they share some common molecular features.

Simvastatin and Atorvastatin (0.0755): The low score indicates that Simvastatin and Atorvastatin have very little structural similarity. This suggests that their molecular structures are quite different.

Simvastatin and Fluvastatin (0.0800): Similar to the previous comparison, the low score indicates that Simvastatin and Fluvastatin have very little in common at the molecular level.

Pravastatin and Atorvastatin (0.2366): This score is somewhat higher than the comparison between Simvastatin and Atorvastatin, indicating that Pravastatin shares more molecular features with Atorvastatin, but they are still not very similar.

Pravastatin and Fluvastatin (0.1978): The score indicates a lower similarity between these two statins compared to Pravastatin and Atorvastatin, but it is still within a moderate range.

Atorvastatin and Fluvastatin (0.4026): This score is the highest among all pairs, suggesting that Atorvastatin and Fluvastatin share the most structural similarity compared to other pairs in this set. This shows the similar functional groups or core molecular structures.

#__5. Conclusion__
Based on the Tanimoto scores, Atorvastatin and Fluvastatin are the most structurally similar statins, while Simvastatin and Atorvastatin have the least similarity. This suggests that despite belonging to the same drug class (statins), these compounds exhibit considerable variation in their molecular structures, which could result in differences in their pharmacological behavior, efficacy, and side effects. Understanding these structural similarities can be important in drug discovery and development, especially when exploring potential side effects, efficacy, and interactions with biological targets.