| Tanimoto Score Range | Classification          | Interpretation                                        |
| -------------------- | ----------------------- | ----------------------------------------------------- |
| **≥ 0.85**           | **High similarity**     | Very close structural analogs; likely shared scaffold |
| **0.60 – 0.85**      | **Moderate similarity** | Partial scaffold or substructure overlap              |
| **< 0.60**           | **Weak similarity**     | Structurally distinct                                 |


In [10]:
import pandas as pd
from rdkit import Chem
from rdkit.Chem import DataStructs

### Database 2 search : MCDB
 ### 3NOP SIMILARITY

In [11]:
# Load the data from the Excel file
excel_file = '//Users/randyaryee/Desktop/Selected_metabolites_docking_folder_Supantha/Tanimoto_similarity_check_on_Bovine_Milk_databases/MCDB/MCDB_Data_latest.csv'  # Replace with your actual Excel file path
df = pd.read_csv(excel_file)

In [12]:
# Define a function to calculate Tanimoto similarity
def calculate_tanimoto(smiles1, smiles2):
    mol1 = Chem.MolFromSmiles(smiles1)
    mol2 = Chem.MolFromSmiles(smiles2)
    if mol1 is not None and mol2 is not None:
        fp1 = Chem.RDKFingerprint(mol1)
        fp2 = Chem.RDKFingerprint(mol2)
        return DataStructs.FingerprintSimilarity(fp1, fp2)
    else:
        return None  # Return None if one of the molecules is invalid

# Canonical SMILES of 3-Nitrooxypropanol
reference_smiles = '[N+](=O)([O-])OCCCO'

# Calculate the Tanimoto similarity for each molecule with 3-Nitrooxypropanol
tanimoto_scores = []
for idx, row in df.iterrows():
    molecule_smiles = row['Smiles']
    tanimoto_score = calculate_tanimoto(reference_smiles, molecule_smiles)
    tanimoto_scores.append(tanimoto_score)

# Add the Tanimoto scores as a new column to the DataFrame
df['Tanimoto_Similarity_with_3-Nitrooxypropanol'] = tanimoto_scores

# Save the result to a new Excel file
df.to_excel('MCDB_database_Tanimoto_Similarity_with_3-Nitrooxypropanol.xlsx', index=False)

print("Tanimoto similarity calculation complete. Results saved to 'MCDB_database_Tanimoto_Similarity_with_3-Nitrooxypropanol.xlsx'.")


### Sort and export the 0.95 tanimoto molecules o new file
# Column name created by your prior cell
score_col = 'Tanimoto_Similarity_with_3-Nitrooxypropanol'

# Ensure numeric (handles None / strings safely)
df[score_col] = pd.to_numeric(df[score_col], errors='coerce')

# Filter 0.95 to 1.00 (inclusive) and sort descending
df_095_plus = (
    df.loc[df[score_col].between(0.0, 1.00, inclusive="both")]
      .sort_values(by=score_col, ascending=False)
)

# Export to a new Excel file
output_file = "MCDB_database_3NOP_tanimoto_score_Sorted.xlsx"
df_095_plus.to_excel(output_file, index=False)

print(f"Exported {len(df_095_plus)} rows to '{output_file}'.")





Tanimoto similarity calculation complete. Results saved to 'MCDB_database_Tanimoto_Similarity_with_3-Nitrooxypropanol.xlsx'.
Exported 2279 rows to 'MCDB_database_3NOP_tanimoto_score_Sorted.xlsx'.
