# NistChemPy Tutorial

You can load NIST compound by NIST ID.

In [1]:
import nistchempy as nist
X = nist.Compound('C85018')
X.__dict__

{'ID': 'C85018',
 'name': 'Phenanthrene',
 'synonyms': ['Phenanthren', 'Phenanthrin', 'Phenantrin'],
 'formula': 'C14 H10',
 'mol_weight': 178.2292,
 'inchi': 'InChI=1S/C14H10/c1-3-7-13-11(5-1)9-10-12-6-2-4-8-14(12)13/h1-10H',
 'inchi_key': 'YNPNZTXNASCQKK-UHFFFAOYSA-N',
 'cas_rn': '85-01-8',
 'ir': [],
 'ms': [],
 'uvvis': [],
 'mol2d': None,
 'mol3d': None,
 'data_refs': {'mol2d': 'https://webbook.nist.gov/cgi/cbook.cgi?Str2File=C85018',
  'mol3d': 'https://webbook.nist.gov/cgi/cbook.cgi?Str3File=C85018',
  'Gas phase thermochemistry data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C85018&Units=SI&Mask=1#Thermo-Gas',
  'Condensed phase thermochemistry data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C85018&Units=SI&Mask=2#Thermo-Condensed',
  'Phase change data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C85018&Units=SI&Mask=4#Thermo-Phase',
  'Reaction thermochemistry data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C85018&Units=SI&Mask=8#Thermo-React',
  "Henry's Law data": 'ht

If available, you can load MOL-files containing 2D/3D coordinates. They are stored as text but can be easily converted into molecule using any cheminformatics libraries.

In [2]:
X.get_3d() # X.get_2d() for 2D coordinates
print(X.mol3d)


  NIST    07011517253D 1   1.00000  -539.53865
Copyright by the U.S. Sec. Commerce on behalf of U.S.A. All rights reserved.
 24 26  0  0  0  0  0  0  0  0999 V2000
    4.2671    4.2111    6.0319 C    0  0  0  0  0  0  0  0  0  0  0  0
    3.4011    3.3615    5.3683 C    0  0  0  0  0  0  0  0  0  0  0  0
    3.4337    3.2256    3.9602 C    0  0  0  0  0  0  0  0  0  0  0  0
    5.2115    4.9687    5.3136 C    0  0  0  0  0  0  0  0  0  0  0  0
    5.2684    4.8584    3.9386 C    0  0  0  0  0  0  0  0  0  0  0  0
    4.3927    3.9962    3.2378 C    0  0  0  0  0  0  0  0  0  0  0  0
    4.4609    3.8894    1.8079 C    0  0  0  0  0  0  0  0  0  0  0  0
    2.5375    2.3405    3.2259 C    0  0  0  0  0  0  0  0  0  0  0  0
    2.6439    2.2686    1.8051 C    0  0  0  0  0  0  0  0  0  0  0  0
    1.5565    1.5396    3.8570 C    0  0  0  0  0  0  0  0  0  0  0  0
    3.6253    3.0639    1.1234 C    0  0  0  0  0  0  0  0  0  0  0  0
    1.7801    1.4135    1.0811 C    0  

Also you can download spectra, they will be stored as a list.

In [3]:
X.get_spectra('ir')
X.ir

[Spectrum(C85018, IR spectrum #0),
 Spectrum(C85018, IR spectrum #1),
 Spectrum(C85018, IR spectrum #2),
 Spectrum(C85018, IR spectrum #3),
 Spectrum(C85018, IR spectrum #4),
 Spectrum(C85018, IR spectrum #5),
 Spectrum(C85018, IR spectrum #6)]

Each spectra contains text of JCAMP-DX file.

In [4]:
spec = X.ir[2]
print(spec.compound, spec.spec_type, spec.spec_idx)
print()
print(spec.jdx_text)

Compound(C85018) ir 2

##TITLE=PHENANTHRENE
##JCAMP-DX=4.24
##DATA TYPE=INFRARED SPECTRUM
##CLASS=COBLENTZ
##ORIGIN=CENTRE D'ETUDES NUCLEAIRES DE GRENOBLE
##OWNER=COBLENTZ SOCIETY
Collection (C) 2018 copyright by the U.S. Secretary of Commerce
on behalf of the United States of America. All rights reserved.
##DATE=Not specified, most likely prior to 1970
##CAS REGISTRY NO=85-01-8
##MOLFORM=C14 H10
##SOURCE REFERENCE=COBLENTZ NO. 4253
##$NIST SOURCE=COBLENTZ
##$NIST IMAGE=cob4253
##SPECTROMETER/DATA SYSTEM=Not specified, most likely a prism, grating, or hybrid spectrometer.
##STATE=SOLUTION (SATURATED IN HEPTANE)
##PATH LENGTH=0.05 CM
$$PURITY 99.99%
##SAMPLING PROCEDURE=TRANSMISSION
##RESOLUTION=4
##DATA PROCESSING=DIGITIZED BY NIST FROM HARD COPY
##XUNITS=MICROMETERS
##YUNITS=TRANSMITTANCE
##XFACTOR=1.000000
##YFACTOR=1
##DELTAX=000.011124
##FIRSTX=14.665
##LASTX=35.1221
##FIRSTY=0.843
##MAXX=35.1221
##MINX=14.665
##MAXY=0.93
##MINY=0.358
##NPOINTS=1840
##XYDATA=(X++(Y..Y))
14.665000 0

There are four available search types. In addition to main identifier, you can limit the search using several parameters:

In [5]:
nist.print_search_parameters()

Units      :   Units for thermodynamic data, "SI" or "CAL" for calorie-based
MatchIso   :   Exactly match the specified isotopes (formula search only)
AllowOther :   Allow elements not specified in formula (formula search only)
AllowExtra :   Allow more atoms of elements in formula than specified (formula search only)
NoIon      :   Exclude ions from the search (formula search only)
cTG        :   Contains gas-phase thermodynamic data
cTC        :   Contains condensed-phase thermodynamic data
cTP        :   Contains phase-change thermodynamic data
cTR        :   Contains reaction thermodynamic data
cIE        :   Contains ion energetics thermodynamic data
cIC        :   Contains ion cluster thermodynamic data
cIR        :   Contains IR data
cTZ        :   Contains THz IR data
cMS        :   Contains MS data
cUV        :   Contains UV/Vis data
cGC        :   Contains gas chromatography data
cES        :   Contains vibrational and electronic energy levels
cDI        :   Contains constant

These parameters can be specified than initializing Search object or later in find_compounds method as &ast;&ast;kwargs.

In [6]:
search = nist.Search(NoIon = True, cMS = True)
search.parameters

SearchParameters(Units=SI, NoIon=True, cMS=True)

After that you can find compounds. Search object will have four properties. 

In [7]:
search.find_compounds(identifier = '1,2,3*-butane', search_type = 'name')
print(search)
print(search.success, search.lost, search.IDs, search.compounds)

Search(Success=True, Lost=False, Found=4)
True False ['C1871585', 'C298180', 'C1529686', 'C1464535'] []


Compounds are not loaded during the search because downloading of each compound requires downloading of distinct web-page and it takes time. You can load them using load_found_compounds method or manually using Compound object.

In [8]:
search.load_found_compounds()
print(search.compounds)
print(search.compounds[0].name)
print(search.compounds[0].synonyms)

[Compound(C1871585), Compound(C298180), Compound(C1529686), Compound(C1464535)]
Propane, 1,2,3-trichloro-2-methyl-
['1,2,3-Trichloro-2-methylpropane', '1,2,3-Trichloroisobutane']


Search by cas ignores some search parameters. E.g. here it finds AgCl despite the fact that there are no available MS data.

In [9]:
search.find_compounds('7783-90-6', 'cas')
search.parameters

SearchParameters(Units=SI, NoIon=True, cMS=True)

In [10]:
search.load_found_compounds()
X = search.compounds[0]
X.data_refs

{'mol2d': 'https://webbook.nist.gov/cgi/cbook.cgi?Str2File=C7783906',
 'mol3d': 'https://webbook.nist.gov/cgi/cbook.cgi?Str3File=C7783906',
 'Condensed phase thermochemistry data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C7783906&Units=SI&Mask=2#Thermo-Condensed',
 'Phase change data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C7783906&Units=SI&Mask=4#Thermo-Phase',
 'Reaction thermochemistry data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C7783906&Units=SI&Mask=8#Thermo-React',
 'Gas phase ion energetics data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C7783906&Units=SI&Mask=20#Ion-Energetics',
 'Constants of diatomic molecules': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C7783906&Units=SI&Mask=1000#Diatomic'}

Same works for the inchi or inchi key.

In [11]:
search.find_compounds('DOWJXOHBNXRUOD-UHFFFAOYSA-N', 'inchi', cTC = True)
search

Search(Success=True, Lost=False, Found=1)

In [12]:
search.load_found_compounds()
X = search.compounds[0]
X.data_refs

{'mol2d': 'https://webbook.nist.gov/cgi/cbook.cgi?Str2File=C832699',
 'mol3d': 'https://webbook.nist.gov/cgi/cbook.cgi?Str3File=C832699',
 'Phase change data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C832699&Units=SI&Mask=4#Thermo-Phase',
 'Gas phase ion energetics data': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C832699&Units=SI&Mask=20#Ion-Energetics',
 'ir': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C832699&Units=SI&Mask=80#IR-Spec',
 'ms': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C832699&Units=SI&Mask=200#Mass-Spec',
 'uvvis': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C832699&Units=SI&Mask=400#UV-Vis-Spec',
 'Gas Chromatography': 'https://webbook.nist.gov/cgi/cbook.cgi?ID=C832699&Units=SI&Mask=2000#Gas-Chrom'}

The most powerful search is search by chemical formula. It supports * for > 0 coefficient and ? for > 1 coefficient. However there are possibility for that search that number of existing entries will be higher that NIST restriction of 400 compounds. To check that check the lost property of the Search object.

In [13]:
search = nist.Search(NoIon = True, cMS = True)
search.find_compounds('C6H*O?', 'formula')
search

Search(Success=True, Lost=True, Found=400)

To overcome this problem just split your formula into more specific ones.

In [14]:
overflows = []
for i in range(1, 7):
    search.find_compounds(f'C6H?O{i}', 'formula')
    overflows.append( (len(search.IDs), search.lost) )
overflows

[(170, False), (178, False), (80, False), (42, False), (7, False), (24, False)]

This strategy can be used to combine search results and use obtained IDs to collect spectroscopic data.