# Extracting the overall response rate (ORR) 

This is an advanced notebook. If you are new to IRuta Notebooks, then please start with the Notebook: `Notebook [IRuta] - Get started.ipynb`

This notebook will show you how **batch processing** works. In batch processing, Ruta rules are applied to a collection of documents. In this example IRuta Notebook, the documents are 100 PubMed abstracts. The goal is to extract the overall response rate (ORR). ORR is defined as the proportion of patients who have a partial or complete response to therapy. The results should be collected in a CSV-Table.

## Step 1: Extracting sentences, ORR-indicators and numbers
We are reading the files from an input directory. This means that all text files from that directory are processed. For each file, the script:
1. Extracts sentences
2. Annotates mentionings of the overall response rate (plus synonyms) and find numeric values
3. Collect sentences with overall response rate indicator in a Table

We write the intermediate results into a new directory by setting the output directory.

In [1]:
%inputDir input/pubmed_abstracts/txt100/
%outputDir output/orr_stage1/
%displayMode CSV
%csvConfig SentenceWithORRInd

// Detect sentence ends. Avoid interpreting a dot in a number "5.6" as sentence end.
DECLARE SentEnd, Sentence;
BLOCK(Sentence) Document{}{
    PERIOD{-> SentEnd};
    NUM se:SentEnd{-> UNMARK(se)} NUM;
    se:SentEnd{-> UNMARK(se)} SW;
    SW se:SentEnd{-> UNMARK(se)} NUM _{-PARTOF(PERIOD)};
    ANY+{-PARTOF(Sentence),-PARTOF(SentEnd)-> Sentence};
}

// Find mentionings of overall response rate, treat brackets correctly, detect numbers
DECLARE ORRInd, Value;
DECLARE ORR (Annotation keyword, Value value, Sentence sentence);
WORDLIST orrList = 'wordlists/orr_indicators.txt';
MARKFAST(ORRInd, orrList);
i1:ORRInd{->i1.end=p2.end, UNMARK(i2)} p1:"(" i2:ORRInd p2:")";
(NUM{-PARTOF(Value)} (PERIOD NUM)? SPECIAL.ct=="%"){-> Value};
n:NUM SPECIAL.ct=="%"? SPECIAL.ct=="-" v:@Value{-> v.begin = n.begin};

// Find sentence that contain an overall response rate indicator
DECLARE SentenceWithORRInd;
(ANY ANY @Sentence{CONTAINS(ORRInd)} ANY ANY){-> SentenceWithORRInd};

// Highlight relevant entities within the table.
COLOR(ORRInd, "lightgreen");
COLOR(Value, "pink");

Processed 100/100 files. (took 2s)
205 rows created.


0,1
21723792.txt,"ineligible. The primary endpoint was investigator-assessed overall response rate (ORR); secondary endpoints were PFS, overall survival (OS), safety, and quality of life. RESULTS"
21723792.txt,"). The ORRs were 48.9% (95% confidence interval [CI], 38.5%-59.5%) and 58.7% (95% CI, 47.9%-68.9%; P = .117"
21723792.txt,G. CONCLUSION: The addition of gemcitabine to PB was not associated with a statistically significant improvement in ORR. Treatment
21399997.txt,"2010. The primary endpoint was overall response rate (ORR), and secondary endpoints were overall survival (OS) and progression-free survival (PFS). Of"
21399997.txt,"). Of the 40 evaluable female patients, the ORR was 62.5%. All"
22044606.txt,PSOC. The primary objective of this subsequent phase II study was to determine the overall response rate (ORR; defined by Response Evaluation Criteria in Solid Tumors) of this combination in patients with recurrent PSOC. Secondary
22044606.txt,"treated. Of the 61 patients evaluable for response, there were 20 responders (one complete response and 19 partial responses), for an ORR of 32.8% (95% CI: 21.3%, 46.0%). For"
18981463.txt,"periods. End points included progression-free survival (PFS), objective response rate (ORR), overall survival (OS), and safety. RESULTS"
18815728.txt,weeks. The primary endpoint was overall response rate (ORR). Between
18815728.txt,"8. By intent-to-treat analysis, ORR was 21.1% (95% CI, 8.7-43.7) and disease control rate was 52.6% (95% CI 31.5-72.8) with four PRs and six SDs. Median"


## Step 2: Collecting ORR with their values in a csv table

In [2]:
%inputDir output/orr_stage1
%outputDir output/orr_stage2
%csvConfig ORR keyword value sentence
%saveCSV output/orr_result_table.csv
s:Sentence->{
    (i:ORRInd v:Value){-> CREATE(ORR, "keyword"=i, "value"=v, "sentence"=s)};
    (v:Value i:@ORRInd){-> CREATE(ORR, "keyword"=i, "value"=v, "sentence"=s)};
    (i:ORRInd SW v:Value){-> CREATE(ORR, "keyword"=i, "value"=v, "sentence"=s)};
    };


Processed 100/100 files. (took 2s)
78 rows created.


0,1,2,3,4
16360546.txt.xmi,ORR of 53-62%,ORR,53-62%,"Carboplatin combined with paclitaxel or docetaxel is more effective than carboplatin or taxanes alone, with ORR of 53-62%"
16197624.txt.xmi,ORR was 33%,ORR,33%,"RESULTS: In the pooled series, ORR was 33% (95% confidence interval [CI], 27%-39%)"
18720480.txt.xmi,overall response rate (ORR) was 16%,overall response rate (ORR),16%,"The overall response rate (ORR) was 16% and included 3 complete responses that lasted 10.8 months, > or =32 months, and > or =36 months and 2 partial responses that lasted 13 months and 14 months"
18815728.txt.xmi,ORR was 21.1%,ORR,21.1%,"By intent-to-treat analysis, ORR was 21.1% (95% CI, 8.7-43.7) and disease control rate was 52.6% (95% CI 31.5-72.8) with four PRs and six SDs"
21823829.txt.xmi,ORR was 4%,ORR,4%,The ORR was 4% (one partial response)
19804901.txt.xmi,overall response rate (ORR) was 60%,overall response rate (ORR),60%,The overall response rate (ORR) was 60% (CR 25% and PR 35%)
21561763.txt.xmi,ORR was 3.9%,ORR,3.9%,"In the intention to treat population (n=51); the ORR was 3.9%, median PFS was 1.28 months [95% CI, 1.18-1.90], median OS was 5.81 months [95% CI, 3.48-12.32], the estimated one-year survival rate was 23.7% [95%CI: 12.8-36.5]"
18669461.txt.xmi,ORR was 23.0%,ORR,23.0%,"The ORR was 23.0% (95% CI, 13.2% to 35.5%), median PFS was 30.4 weeks (95% CI, 18.3 to 36.7 weeks), median DR was 44.1 weeks (95% CI, 25.0 to 102.7 weeks), and median OS was 47.1 weeks (95% CI, 36.9 to 79.4 weeks)"
20581887.txt.xmi,ORR was 85%,ORR,85%,"ORR was 85% (CR 12%, very good PR 43% and PR 30%)"
21135284.txt.xmi,ORR of 44%,ORR,44%,"CONCLUSION: Amrubicin shows promising activity, with an ORR of 44% compared with an ORR of 15% for topotecan as second-line treatment in patients with SCLC sensitive to first-line platinum-based chemotherapy"


## Additional Error Analysis: Sentences with ORR indicator, but without match
It is easy to spot False Positives (sentences where an ORR value has been found) in the above table. On the other hand, False Negatives are not possible to detect with the above table alone. For that reason, it is useful to highlight similar key words (e.g. "response rate", "RR", "rate of response". This is done in the cell below.

In [3]:
%inputDir output/orr_stage2
%outputDir output/orr_stage3
%csvConfig SentenceWithIndButNoORR

DECLARE Ind;
DECLARE SentenceWithIndButNoORR;

"RR" -> Ind;
"response rate" -> Ind;
"rate of response" -> Ind;
i:Ind{PARTOF(ORRInd) -> UNMARK(i)};
Sentence{CONTAINS(Ind),-CONTAINS(ORR)->SentenceWithIndButNoORR};

COLOR(Ind, "red");

Processed 100/100 files. (took 1s)
68 rows created.


0,1
21249514.txt.xmi,"We evaluated objective response rate (ORR), progression free survival (PFS), overall survival (OS), and toxicity profiles"
19533023.txt.xmi,The primary end point was objective response rate (ORR)
21697017.txt.xmi,"We observed modest activity with a central nervous system objective response rate of 13.3%; however, median PFS was disappointing"
21697017.txt.xmi,The primary endpoint was CNS objective response rate (ORR)
16360546.txt.xmi,"Clinical trials have shown that anthracycline-taxane combinations are more effective than anthracyclines or taxanes alone in terms of overall response rates (ORR), PFS and OS in women who have not received prior anthracycline chemotherapy"
21937232.txt.xmi,"Analysis included progression-free survival (PFS), overall survival (OS), objective response rate (ORR) and toxicity"
18936475.txt.xmi,"Secondary end points were progression-free survival (PFS), objective response rate (ORR), and safety"
21750555.txt.xmi,"Secondary endpoints included overall survival (OS), objective response rate (ORR), adverse events, and pharmacokinetics"
20723384.txt.xmi,"RESULTS: Among a total of 28 patients, the objective response rate (ORR) of erlotinib was 28.6%"
21456002.txt.xmi,"The primary objective of this phase 2 study was to determine the objective response rate (ORR) in patients who had received 1 previous systemic chemotherapy for unresectable/metastatic melanoma; secondary objectives were to evaluate the clinical response rate (CRR), progression-free survival (PFS), overall survival (OS), duration of response, safety, and pharmacokinetics"
