# Using a pretrained model

Import required modules and create the config file.

> Note, eventually I will settle on sensible defaults and remove this step.

In [None]:
from config import Config
from sequence_processor import SequenceProcessor

PATH_TO_CONFIG = './config.ini' # set the path to your config here

config = Config(PATH_TO_CONFIG) # parse config

Load everything we need to perform prediction with the __neural net__.

> Note, you have to unzip the pre-trained model first if this is not already done.

In [None]:
path_to_saved_model = '../pretrained_models/CRAFT'

# create a new SequenceProccesor object
sp = SequenceProcessor(config)

# load a previous model
sp.load(path_to_saved_model)

Perform the __prediction__, returning a simple `json`. Set `jupyter=True` to get nicely formatted prediction in the notebook.

In [None]:
# raw text
abstract = '''The aim of this study was to establish the cut-off point of ultrasound (US) B-lines number for detecting the presence of significant interstitial lung disease (ILD) in patients with systemic sclerosis (SSc) (SSc-ILD) in relation to high-resolution computed tomography (HRCT) findings.Consecutive SSc-ILD patients underwent chest HRCT, lung US (LUS), pulmonary function test, and clinical assessment. Exclusion criteria were represented by the presence of a coexisting congestive heart failure and a clinical history suggestive of lung or pleural diseases. HRCT images were scored for the presence of ILD by 2 readers, in accordance with the Warrick scoring system. US assessment was performed by a US skilled rheumatologist, blinded to HRCT results and clinical data, and included the bilateral evaluation of 14 lung intercostal spaces (LIS). In each LIS, the number of B-lines was recorded and summed. To test discriminant validity, we used the receiver operating characteristic (ROC) curve analysis applying a Warrick score of 7 as external criterion for the presence of SSc-ILD.Forty patients completed the study. The US B-lines number and the Warrick score confirmed excellent correlation (Spearman rho: 0.958, P = .0001). The ROC curve analysis revealed that the presence of 10 US B-lines is the cut-off point with the greatest positive likelihood ratio (12.52) for the presence of significant SSc-ILD.The detection of 10 B-lines is highly predictive for the HRCT presence of significant SSc-ILD. In SSc patients, the LUS assessment as first imaging tool may represent an effective model to improve the correct timing of chest HRCT.'''

# perform prediction
annotation = sp.predict(abstract, jupyter=True)

__Saber__ takes about one tenth of a second to process an abstract of ~250 words.

In [None]:
%timeit sp.predict(abstract)

The `predict()` method returns a `json` formatted string for ease-of-use in downstream applications. For example:

In [None]:
import json

ann = json.loads(annotation) # convert to json object

In [None]:
ann['ents'] # get a list of annotated entities

In [None]:
ann['text'] # get proccesed and normalized text that prediction was peformed on