populate drugs structure property table #142

sgosline · 2024-04-09T21:21:00Z

need to add drug structure/fingerprints.

jjacobson95 · 2024-04-11T00:14:47Z

This is pretty straight forward with python rdkit. Just need to know how detailed we want the fingerprints to be

from rdkit import Chem
from rdkit.Chem import AllChem
def smiles_to_fingerprint(smiles):
    mol = Chem.MolFromSmiles(smiles)
    fingerprint = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=1024)  # update these parameters
    fingerprint_array = np.array(fingerprint)
    return fingerprint_array

sgosline · 2024-04-16T18:11:29Z

also need to add mordred descriptors

jjacobson95 · 2024-04-24T00:16:09Z

@sgosline What are your thoughts on changing the schema for Drug Descriptor to have three columns, improve_drug_id, fingerprint, and mordred?

Also what should these files be called? "{dataset}_drug_descriptor.tsv"? or "{dataset}_structures.tsv"?

Current Schema:

  Drug:
    description: List of chemicals/drugs used in the data package. Each identifer corresponds to a distinct structure.
    slots:
      - improve_drug_id
    attributes:
      chem_name:
        description: Name of drug
      canSMILES:
        description: Canonical SMILE string
      isoSMILES:
        description: Isomeric SMILE string
      InChIKey:
        description: InChIKey
      formula:
        description: Chemical formula
      weight:
        description: Molecular weight
        range: float
      pubchem_id:
        description: PubChem Identifier for this drug, can be many.
        range: int
  Drug Descriptor:
    description: Computational summary of drug chemical properties
    slots:
      - improve_drug_id
    attributes:
      structural_descriptor:
        description: string name describing structural descriptor
      descriptor_value:
        range: any
        description: value representing descriptor value

sgosline · 2024-06-11T15:42:34Z

There are currently 1800 different mordred descriptors - creating an entry for each drug will be time consuming and space intensive. We need a use case/algorithm to motivate this further.

sgosline · 2024-06-11T18:23:15Z

Here is the script to intrgrate: https://github.com/adpartin/mol-features/blob/master/src/gen_mol_fea.py

sgosline added the new data Request for additional data to be added label Apr 9, 2024

jjacobson95 self-assigned this Apr 11, 2024

sgosline mentioned this issue Jun 11, 2024

Drug descriptor addition to pipeline #192

Merged

sgosline assigned sgosline and unassigned jjacobson95 Jun 20, 2024

sgosline closed this as completed in #192 Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

populate drugs structure property table #142

populate drugs structure property table #142

sgosline commented Apr 9, 2024

jjacobson95 commented Apr 11, 2024

sgosline commented Apr 16, 2024

jjacobson95 commented Apr 24, 2024

sgosline commented Jun 11, 2024

sgosline commented Jun 11, 2024

populate drugs structure property table #142

populate drugs structure property table #142

Comments

sgosline commented Apr 9, 2024

jjacobson95 commented Apr 11, 2024

sgosline commented Apr 16, 2024

jjacobson95 commented Apr 24, 2024

sgosline commented Jun 11, 2024

sgosline commented Jun 11, 2024