# Subtask 2
## Find evidence in support or against specific drugs for the treatment of COVID-19

The goal of this subtask is to find evidence in support or against specific drugs for the treatment of COVID-19 within the OGER-annotated LitCovid dataset (both the abstract dataset and the full text dataset can be used).

In [2]:
import requests
import io
import pandas as pd
from operator import itemgetter

The following drugs in particular will be considered: 

| Drug | Concept IDs|  
| :---- | :----- |  
|[hydroxychloroquine](https://en.wikipedia.org/wiki/Hydroxychloroquine) | `RxNorm:5521, CHEBI:5801, MeSH:D006886, UMLS:C0020336`|  
|[remdesivir](https://en.wikipedia.org/wiki/Remdesivir)| `RxNorm:2284718, MeSH:C000606551, UMLS:C4726677, CHEBI:145994`|  
|[avigan](https://en.wikipedia.org/wiki/Favipiravir)| `CHEBI:134722, MeSH:C462182, UMLS:C1138226`|  

Define the list of drugs and COVID-19 with the concept IDs.

In [137]:
hyd_ids = ['5521', 'CHEBI:5801', 'D006886', 'C0020336']
rem_ids = ['2284718', 'C000606551', 'C4726677', 'CHEBI:145994']
avi_ids = ['CHEBI:134722', 'C462182', 'C1138226']

cov_ids = ['D000086382','D000086402','C000657245','C000656484'] #     Covid-19 and SARS-CoV-2


The function to reconstruct the sentences.

In [132]:
# function to generate a sentence give the OGER .tsv output as pandas DataFrame and the sentence id
def get_sentence(df, sent_id):
    _df = df.drop_duplicates(subset=['start_position'])
    #print(_df[_df["sentence_id"]==sent_id])
    offsets = _df[_df["sentence_id"]==sent_id][['start_position','end_position','matched_term']].to_records(index=False)
    #print('offsets:', offsets)
    
    # get the sentence length
    max_offset = max(offsets,key=itemgetter(1))[1]
    min_offset = min(offsets,key=itemgetter(1))[0]
    #print('max_offset:', max_offset)
    #print('min_offset:', min_offset)
    
    # create list of whitespaces in the sentence length to fill in the words
    sent_len = max_offset-min_offset
    #print('sent_len:', sent_len)
    l = list(" "*(sent_len))
    for start_pos, end_pos, word in offsets:
        l[(start_pos-min_offset):(end_pos-min_offset)] = list(word)
    sent = "".join(l)
    return sent

### Use OGER to annotate the articles
Identify articles from PubMed that contain the drug of interest and COVID-19, here we use the article 'Remdesivir for the Treatment of Covid-19 - Final Report' - `PubMed:32445440`

In [None]:
COVID19_IDS = cov_ids
DRUG_IDS = rem_ids

In [133]:
url = 'https://pub.cl.uzh.ch/projects/ontogene/oger/fetch/pubmed/text_tsv/32445440'

In [134]:
req = requests.get(url)  

df = pd.read_csv(io.StringIO(req.text), sep='\t')
df.columns = [c.lower().replace(' ', '_') for c in df.columns]

Find all sentences that refer to both COVID-19 and the drug of interest. 

In [135]:
sent_ids = df.sentence_id.unique()
sent_ids
found_sentences = []
for sent_id in sent_ids:
    # Check if sentences mentions drug as well as COVID-19
    drug = df[df['sentence_id']==sent_id]['entity_id'].isin(DRUG_IDS).any()
    covid = df[df['sentence_id']==sent_id]['entity_id'].isin(COVID19_IDS).any()
    if drug and covid:
        sent = get_sentence(df, sent_id)
        print(f"sent_id, sent")
        found_sentences.append(sent)

S1 Remdesivir for the Treatment of Covid-19 - Final Report.
S3 We conducted a double-blind, randomized, placebo-controlled trial of intravenous remdesivir in adults who were hospitalized with Covid-19 and had evidence of lower respiratory tract infection.
S11 Our data show that remdesivir was superior to placebo in shortening the time to recovery in adults who were hospitalized with Covid-19 and had evidence of lower respiratory tract infection.


In [136]:
for s in found_sentences:
    print(s, "\n")

Remdesivir for the Treatment of Covid-19 - Final Report. 

We conducted a double-blind, randomized, placebo-controlled trial of intravenous remdesivir in adults who were hospitalized with Covid-19 and had evidence of lower respiratory tract infection. 

Our data show that remdesivir was superior to placebo in shortening the time to recovery in adults who were hospitalized with Covid-19 and had evidence of lower respiratory tract infection. 

