# Planned Exploration

## Exploring SMILES for chemical molecules

* RDKit:
RDKit is a powerful and widely used toolkit for cheminformatics and drug discovery.
Website: [RDKit](https://www.rdkit.org/)

* Open Babel:
Open Babel is an open, collaborative project that provides a chemical toolbox designed to speak the many languages of chemical data.
Website: [Open Babel](http://openbabel.org/)

* ChemPy:
ChemPy is a package useful for chemistry in Python. It supports various chemical operations and calculations.
GitHub repository: [ChemPy](https://github.com/bjodah/chempy)

* Pybel (a part of the Open Babel project):
Pybel is a convenient wrapper for the Open Babel toolkit, providing a Pythonic interface.
Documentation: [Pybel Documentation](https://open-babel.readthedocs.io/en/latest/UseTheLibrary/Python_Pybel.html)

* Indigo Toolkit:
Indigo is a cheminformatics toolkit that includes tools for handling molecular structures.
Website: [Indigo Toolkit](https://lifescience.opensource.epam.com/indigo/)

* OEChem (OpenEye Chemistry Toolkit):
OEChem is a cheminformatics toolkit developed by OpenEye Scientific Software.
Website: [OEChem](https://www.eyesopen.com/oechem-tk)

* Smilite
smilite is a Python module to download and analyze SMILES strings (Simplified Molecular-Input Line-entry System) of chemical compounds from ZINC (a free database of commercially-available compounds for virtual screening, http://zinc.docking.org).
Now supports both Python 3.x and Python 2.x.
Website: [github smilite](https://github.com/rasbt/smilite)



### Installation of RDKit

    `https://github.com/rdkit/rdkit/blob/master/Docs/Book/Install.md`
    `pip install rdkit`


    https://github.com/SeongokRyu/CH485---Artificial-Intelligence-and-Chemistry/blob/master/Practice%2005/smiles_cnn_logP.py

In [None]:
# import rdkit
import numpy as np
from rdkit import Chem, DataStructs
from rdkit.Chem import Descriptors, AllChem
from rdkit.Chem.Crippen import MolLogP
from rdkit.Chem.rdMolDescriptors import CalcTPSA
# import matplotlib.pyplot as plt

In [None]:
# Use SMILES to get molecular weight
smiles = "CCO"
mol = Chem.MolFromSmiles(smiles)
mw = Descriptors.MolWt(mol) # Calculate molecular weight using Descriptors module
mw_integer = int(mw) # Convert the molecular weight to an integer
print(f"Molecular Weight: {mw_integer}")

Molecular Weight: 46


In [None]:
def read_ZINC(num_mol):
    f = open('ZINC.smiles', 'r')
    contents = f.readlines()

    smi = []
    fps = []
    logP = []
    tpsa = []
    for i in range(num_mol):
        smi = contents[i].strip()
        m = Chem.MolFromSmiles(smi)
        fp = AllChem.GetMorganFingerprintAsBitVect(m,2)
        arr = np.zeros((1,))
        DataStructs.ConvertToNumpyArray(fp,arr)
        fps.append(arr)
        logP.append(MolLogP(m))
        tpsa.append(CalcTPSA(m))

    fps = np.asarray(fps).astype(float)
    logP = np.asarray(logP).astype(float)
    tpsa = np.asarray(tpsa).astype(float)

    return fps, logP, tpsa

### Install chemspipy to get SMILES from the compound by name - Failed because cannot get API key (WIP)
bash : `pip install chemspipy`

`from chemspipy import ChemSpider`

In [None]:
from chemspipy import ChemSpider

# Initialize the ChemSpider API client
cs = ChemSpider('your_api_key_here')  # You need to obtain an API key from ChemSpider

# Search for a compound by name
compound_name = "acetic acid"
results = cs.search(compound_name)

# Print the first result's SMILES notation
if results:
    smiles = results[0].smiles
    print(f"SMILES for {compound_name}: {smiles}")
else:
    print(f"No results found for {compound_name}")

### Attempt to get the SMILES from PubChem  - Failed (WIP)

In [None]:
# import requests

# def get_smiles_from_name(molecule_name):
#     base_url = "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name"
#     search_url = f"{base_url}/{molecule_name}/property/CanonicalSMILES/JSON"

#     response = requests.get(search_url)

#     if response.status_code == 200:
#         data = response.json()
#         if 'PropertyTable' in data and 'Properties' in data['PropertyTable'][0]:
#             smiles = data['PropertyTable'][0]['Properties'][0]['CanonicalSMILES']
#             return smiles
#         else:
#             print(f"No SMILES found for {molecule_name}")
#     else:
#         print(f"Error: {response.status_code}")

# # Example usage
# molecule_name = "acetic acid"
# smiles_result = get_smiles_from_name(molecule_name)

# if smiles_result:
#     print(f"SMILES for {molecule_name}: {smiles_result}")


In [None]:
print("hello world")