# Automated parsing for tables with *TableDataExtractor*

First, we will check out a particular table we want to parse. The table can be passed into the *ChemDataExtractor* (CDE) framework manually, or, will be processed automatically when a document is passed into CDE. We are using some keywords for *TableDataExtractor* (TDE), ``use_notes_in_first_col=False`` and ``use_title_row=False`` (see [TDE documentation](https://juraj-mavracic-tabledataextractor.readthedocs-hosted.com/en/latest/)).

At the moment no records will be found since we haven't defined a model yet.

In [1]:
from chemdataextractor.doc.table_new import Table
from tabledataextractor.output.print import print_table

path = "../examples/data/table_example.csv"
table = Table(caption=[], 
              table_data=path, 
              use_notes_in_first_col=False, use_title_row=False)
print_table(table.tde_table.raw_table)

table.records

                                   Temperatures  Temperatures  Magnetic moment  
Type                     Compound  Tc/K          Tn/K          B [T]            
Inorganic                BiFeO3    1100          643                            
Inorganic                 LaCrO3   257           150           0.1 mT           
Organic                  LaCrO2                  10            500              
Inorganic                Gd                      294           659 T            
* This table is nothing                                                         




[]

## Model Creation

We want to retrieve the Curie temperatures, Tc, from the table. To define a suitable model, we can input some base model types. In our case, ``TemperatureModel`` is the right choice. It assumes units of temperature automatically. Alternatively, ``BaseModel`` can be used for anything. Also, we can import some parsing objects from CDE, like ``I``, ``W``, ``R``, ``Optional``, and other elements we need to create parse expressions.

A ``specifier`` is the only mandatory element for the new model.

Finally, due to the current structure of the CDE code, we need to add the newly defined model to the ``Compound`` model (this step will be removed in the near future):

In [2]:
from chemdataextractor.model.units.temperature import TemperatureModel
from chemdataextractor.parse.elements import I
from chemdataextractor.model.model import Compound
from chemdataextractor.model.base import ListType, ModelType

class CurieTemperature(TemperatureModel):
    specifier = I('TC')

Compound.CurieTemperature = ListType(ModelType(CurieTemperature))

We then parse the table, by passing on newly created models into the table as a list. Alternatively, we can add the new model to the ``chemdataextractor.model.model`` module. In that case, it would be used automatically for every table loaded into CDE.

In [3]:
table = Table(models=[CurieTemperature],
              caption=[], 
              table_data=path, 
              use_notes_in_first_col=False, use_title_row=False)
table.records

[{'names': ['BiFeO3'],
  'CurieTemperature': [{'raw_value': '1100',
    'raw_units': 'K',
    'value': [1100.0],
    'units': 'Kelvin^(1.0)'}]},
 {'names': ['LaCrO3'],
  'CurieTemperature': [{'raw_value': '257',
    'raw_units': 'K',
    'value': [257.0],
    'units': 'Kelvin^(1.0)'}]}]

## Advanced Features

We can add custom fields to the model, that will be parsed automatically. For that we have to specify the data model of the fields (``StringType``, ``FloatType``, ...) and provide a ``parse expression`` that is composed out of parse elements, like all other parse expressions in ChemDataExtractor. 

These field can be made required (``required = True``) if needed, or optional (``required = False``, default).

In [4]:
from chemdataextractor.model.base import StringType

class CurieTemperature(TemperatureModel):
    specifier = I('TC')
    label = StringType(parse_expression=I('inorganic'))
    
table = Table(models=[CurieTemperature],
              caption=[], 
              table_data=path, 
              use_notes_in_first_col=False, use_title_row=False)
table.records

[{'names': ['BiFeO3'],
  'CurieTemperature': [{'raw_value': '1100',
    'raw_units': 'K',
    'value': [1100.0],
    'units': 'Kelvin^(1.0)',
    'label': 'Inorganic'}]},
 {'names': ['LaCrO3'],
  'CurieTemperature': [{'raw_value': '257',
    'raw_units': 'K',
    'value': [257.0],
    'units': 'Kelvin^(1.0)',
    'label': 'Inorganic'}]}]

In [5]:
class CurieTemperature(TemperatureModel):
    specifier = I('TC')
    label = StringType(parse_expression=I('inorganic'))
    something_else = StringType(parse_expression=I('TableDataExtractor'), 
                                required=True)
    
table = Table(models=[CurieTemperature],
              caption=[], table_data=path, 
              use_notes_in_first_col=False, use_title_row=False)
table.records 

[]

## Future

1. Models will be updated automatically based on what is found in the document. For example, based on the specifier obtained with the *definitions* feature in CDE, the model specifier will be updated on the go.

 

2. Interdependency resolution, for partial records. For example, if the compound is found in the caption of the table, or, if a required model-label is found as separate data or in a different table.



3. New elements of the model (not manually defined by the user) can be suggested probabilistically, based on what has been ususally found in the document. For example, the pressure can be added to a record of the glass transition temperature.



4. Automated creation of new models, based on the *definitions*. 
