<!---
Blue Brain Search is a text mining toolbox focused on scientific use cases.

Copyright (C) 2020  Blue Brain Project, EPFL.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
-->

# Attribute Extraction Demo

In [None]:
import spacy

from bluesearch.mining import AttributeExtractor, AttributeAnnotationTab, TextCollectionWidget

In [None]:
core_nlp_url = "<url>"
grobid_quantities_url = "<url>/service/processQuantityText"

In [None]:
entity_model = spacy.load("en_ner_craft_md")

In [None]:
attribute_extractor = AttributeExtractor(
    core_nlp_url,
    grobid_quantities_url,
    entity_model)

# Widget

The easiest way is to use the `TextCollectionWidget` to interactively inspect annotations on a number of different texts.

In [None]:
texts = [
    "To a stirred solution of 39 (108 mg) in THF (2 mL) was added 1 M THF solution of TBAF-AcOH (1:1, 300 lL, 0.30 mmol) at ice-water temperature, and the mixture was stirred at the same temperature for 1 h and then at room temperature for 5 h. The solvent was removed under reduced pressure, and the resultant residue was purified by preparative thin layer chromatography (Merck, 113895) (methanol/chloroform, 1:10) to give 20 (181 mg, 34% from 38) as a colorless oil.",
    "The first generation of selective inhibitors for ACE2 have been designed and synthesised (table 1) [15, 16] . A series of non-peptide compounds were constructed based upon the ACE2 substrate consensus P-X(1-3)-P 1 -fl-X hydrophobic and the requirement of a centrally located carboxylate to co-ordinate with the zinc ion [15] . This lead resulted in the synthesis of an inhibitor (MLN-4760, table 1) possessing sub-nanomolar affinity (IC 50 , 50% inhibitor concentration 0.44 nM) for ACE2 and 220,000-and 22,000fold less affinity for human tACE and bovine carboxypeptidase A, respectively [15] .",
    "We extended our validated protocol to assess the delivery of aerosolized yeast and spores. Eight A/Jcr mice per group were exposed to aerosolized H99 (a), KN99 (a), or a yeast-spore mixture obtained from mated mixtures in the whole-body exposure chamber for one hour at standardized aerobiology conditions (13 lpm air flow-rate, 19 PSI, and 70% RH). Four mice per group were humanly euthanized and lung, spleen, and brain tissues were sterilely collected at one hour and three weeks post exposure and CFUs were determined as previously described.",
]

In [None]:
TextCollectionWidget(texts, attribute_extractor, entity_model)

# Manual

In [None]:
text = texts[0]

The `AttributeAnnotationTab` widget can be used to display an annotation summary for a given text.

In [None]:
AttributeAnnotationTab(attribute_extractor, entity_model, text)

One can use the functionality of the `AttributeExtractor` class to manually extract the attribute information.

In [None]:
df_attributes = attribute_extractor.extract_attributes(text)
df_attributes

In [None]:
measurements = attribute_extractor.get_grobid_measurements(text)
annotated_text = attribute_extractor.annotate_quantities(text, measurements, width=70)
annotated_text