# Automated parsing

Automated parsers in ChemDataExtractor will extract data from tables and from simple sentences.
First we need to import the needed elements from ChemDataExtractor:

In [1]:
from chemdataextractor.doc import Document
from chemdataextractor.doc.text import Heading, Paragraph
from chemdataextractor.model.units import TemperatureModel
from chemdataextractor.model.model import Compound, ModelType, StringType
from chemdataextractor.parse.elements import I
from chemdataextractor.parse.actions import join

Let's have a look at an example document

In [2]:
doc = Document(Heading('Properties of caffeine (1)'),
              Paragraph('Lorem ipsum dolor sit amet. We also determined the glass transition temperature of caffeine, Tg = 123.4°C'))
doc

Without specifying a model, the glass transition temperature is not extracted.

In [3]:
doc.records

[]

Then we have to define a model. We are setting the mandatory element `specifier` and a `compound`. 

In [4]:
class GlassTransitionTemperature(TemperatureModel):
    specifier_expr = ((I('Glass') + I('transition') + I('temperature')) | I('Tg')).add_action(join)
    specifier = StringType(parse_expression=specifier_expr, required=True, contextual=True, updatable=True)
    compound = ModelType(Compound, required=True, contextual=True)

Finally, we can extract the desired information from the document:

In [5]:
doc = Document(Heading('Properties of caffeine (1)'),
              Paragraph('Lorem ipsum dolor sit amet, consetetur sadipscing elitr. We also determined the glass transition temperature of caffeine, Tg = 123.4°C'))
doc.models = [GlassTransitionTemperature]

for record in doc.records:
    print(record.serialize())

{'GlassTransitionTemperature': {'raw_value': '123.4', 'raw_units': '°C', 'value': [123.4], 'units': 'Celsius^(1.0)', 'specifier': 'glass transition temperature', 'compound': {'Compound': {'names': ['caffeine'], 'labels': ['1']}}}}
