### Reading in the data

A quick example.

In [1]:
import pandas as pd

In [3]:
train_inputs_df = pd.read_csv("train-inputs.csv")
train_targets_df = pd.read_csv("train-targets.csv")  

Let's read in the train inputs and targets and inspect them.

In [4]:
train_inputs_df.head()

Unnamed: 0.1,Unnamed: 0,ReviewID,PMID,Title,Abstract
0,0,CD007016,11849419,Is peritoneal dialysis adequate for hypercatab...,Peritoneal dialysis (PD) is a therapeutic opti...
1,1,CD007119,16809729,"Phase II trial of alfimeprase, a novel-acting ...","Alfimeprase is a recombinantly produced, genet..."
2,2,CD007119,15178717,Dose-ranging trial with a recombinant urokinas...,Recombinant urokinase (r-UK) is a high-molecul...
3,3,CD007119,11487675,Recombinant tissue plasminogen activator (alte...,Central venous access devices (CVADs) are a ma...
4,4,CD007119,14742778,Alteplase for central catheter clearance: 1 mg...,


In [5]:
train_targets_df.head()

Unnamed: 0.1,Unnamed: 0,ReviewID,Target
0,0,CD007016,"At present, there is insufficient RCT evidence..."
1,1,CD007119,There is inadequate evidence to draw strong co...
2,2,CD006525,Collaborative care is associated with signific...
3,3,CD005571,Administration of systemic prophylactic antibi...
4,4,CD003390,The limited available evidence suggests folate...


The key field here is the `ReviewID` which indicates the Cochrane systematic review. So let's look at a particular review, `CD007119` [here it is on PubMed](https://pubmed.ncbi.nlm.nih.gov/22513946/); and here on the [Cochrane Library](https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD007119.pub2/full).

In [14]:
print(train_targets_df[train_targets_df['ReviewID'] == "CD007119"]["Target"].values[0])

There is inadequate evidence to draw strong conclusions on the efficacy or safety of the drug interventions included in this review. There is some low quality evidence from a meta-analysis of two studies investigating urokinase (various strengths) and some very low evidence from two single studies investigating alteplase 2 mg/2 mL that suggest that these two drug interventions may be effective in treating withdrawal or total occlusion of CVC lumens caused by thrombosis. Further high quality, sufficiently powered research is still required to look at the efficacy and safety of urokinase, alteplase and other chemical, surgical and drug interventions for treating CVC lumen occlusion. Research studies which exclusively include child participants are especially warranted.


This is the summary of the evidence that the review authors came to on the basis of the following *inputs*:

In [16]:
inputs7119 = train_inputs_df[train_inputs_df['ReviewID'] == "CD007119"]
inputs7119.shape

(7, 5)

So we have 7 studies here as inputs, and here are there PMIDs, Titles, Abstracts:

In [18]:
inputs7119.head()

Unnamed: 0.1,Unnamed: 0,ReviewID,PMID,Title,Abstract
1,1,CD007119,16809729,"Phase II trial of alfimeprase, a novel-acting ...","Alfimeprase is a recombinantly produced, genet..."
2,2,CD007119,15178717,Dose-ranging trial with a recombinant urokinas...,Recombinant urokinase (r-UK) is a high-molecul...
3,3,CD007119,11487675,Recombinant tissue plasminogen activator (alte...,Central venous access devices (CVADs) are a ma...
4,4,CD007119,14742778,Alteplase for central catheter clearance: 1 mg...,
5,5,CD007119,7878629,Urokinase versus recombinant tissue plasminoge...,Fifty dysfunctional central venous catheters p...
