# Subtask 2
## Find evidence in support or against specific drugs for the treatment of COVID-19

The goal of this subtask is to find evidence in support or against specific drugs for the treatment of COVID-19 within the OGER-annotated LitCovid dataset (both the abstract dataset and the full text dataset can be used).

In [156]:
import requests
import io
import pandas as pd
from operator import itemgetter
import numpy as np

The following drugs in particular will be considered: 

| Drug | Concept IDs|  
| :---- | :----- |  
|[hydroxychloroquine](https://en.wikipedia.org/wiki/Hydroxychloroquine) | `RxNorm:5521, CHEBI:5801, MeSH:D006886, UMLS:C0020336`|  
|[remdesivir](https://en.wikipedia.org/wiki/Remdesivir)| `RxNorm:2284718, MeSH:C000606551, UMLS:C4726677, CHEBI:145994`|  
|[avigan](https://en.wikipedia.org/wiki/Favipiravir)| `CHEBI:134722, MeSH:C462182, UMLS:C1138226`|  

Define the list of drugs and COVID-19 with the concept IDs.

In [157]:
hyd_ids = ['5521', 'CHEBI:5801', 'D006886', 'C0020336']
rem_ids = ['2284718', 'C000606551', 'C4726677', 'CHEBI:145994']
avi_ids = ['CHEBI:134722', 'C462182', 'C1138226']

cov_ids = ['D000086382','D000086402','C000657245','C000656484'] # Covid-19 and SARS-CoV-2


The function to reconstruct the sentences.

In [158]:
def rm_duplicates(offset_words):
    offset_words = np.unique(offset_words)
    r = [x for x in offset_words]
    for idx_a, a in enumerate(offset_words):
        for idx_b, b in enumerate(r):
            if a[1] == b[1]:
                if a[0] < b[0]:
                    del r[idx_b]
            elif a[0] == b[0]:
                if a[1] > b[1]:
                    del r[idx_b]
    return r

In [159]:
# Function to create a sentence, given the OGER .tsv output as a panda's DataFrame and the sentence ID
def get_sentence(df, sent_id):
    #_df = df.drop_duplicates(subset=['start_position']) # instead use rm_duplicates()
    offsets = df[df["sentence_id"]==sent_id][['start_position','end_position','matched_term']].to_records(index=False)
    offsets = rm_duplicates(offsets)
    
    # Get the sentence length
    max_offset = max(offsets,key=itemgetter(1))[1]
    min_offset = min(offsets,key=itemgetter(1))[0]
    
    # Create list of spaces in the sentence length to fill up the words
    sent_len = max_offset-min_offset
    l = list(" "*(sent_len))
    for start_pos, end_pos, word in offsets:
        l[(start_pos-min_offset):(end_pos-min_offset)] = list(word)
    sent = "".join(l)
    return sent

### Use OGER to annotate the articles
Identify articles from PubMed that contain the drug of interest and COVID-19, here we use the article 'Remdesivir for the Treatment of Covid-19 - Final Report' - `PubMed:32445440`

In [160]:
COVID19_IDS = cov_ids
DRUG_IDS = hyd_ids

In [161]:
url = 'https://pub.cl.uzh.ch/projects/ontogene/oger/fetch/pubmed/text_tsv/32205204' # 32205204 - hyd, 32445440 - rem

In [162]:
req = requests.get(url)  

df = pd.read_csv(io.StringIO(req.text), sep='\t')
df.columns = [c.lower().replace(' ', '_') for c in df.columns]

In [163]:
df

Unnamed: 0,document_id,type,start_position,end_position,matched_term,preferred_form,entity_id,zone,sentence_id,origin,umls_cui
0,32205204,clinical_drug,0,18,Hydroxychloroquine,Hydroxychloroquine,5521,Title,S1,RxNorm,C0020336
1,32205204,chemical,0,18,Hydroxychloroquine,hydroxychloroquine,CHEBI:5801,Title,S1,ChEBI,CUI-less
2,32205204,chemical,0,18,Hydroxychloroquine,Hydroxychloroquine,D006886,Title,S1,MeSH desc (Chemicals and Drugs),C0020336
3,32205204,chemical,0,18,Hydroxychloroquine,Hydroxychloroquine,D006886,Title,S1,CTD (MESH),C0020336
4,32205204,,19,22,and,,,,S1,,
...,...,...,...,...,...,...,...,...,...,...,...
283,32205204,clinical_drug,1519,1531,azithromycin,Azithromycin,18631,CONCLUSION,S11,RxNorm,C0052796
284,32205204,chemical,1519,1531,azithromycin,Azithromycin,D017963,CONCLUSION,S11,MeSH desc (Chemicals and Drugs),C0052796
285,32205204,chemical,1519,1531,azithromycin,azithromycin,CHEBI:2955,CONCLUSION,S11,ChEBI,CUI-less
286,32205204,chemical,1519,1531,azithromycin,Azithromycin,D017963,CONCLUSION,S11,CTD (MESH),C0052796


Find all sentences that refer to both COVID-19 and the drug of interest. 

In [164]:
sent_ids = df.sentence_id.unique()
sent_ids
found_sentences = []
for sent_id in sent_ids:
    # Check if sentences mentions drug as well as COVID-19
    drug = df[df['sentence_id']==sent_id]['entity_id'].isin(DRUG_IDS).any()
    covid = df[df['sentence_id']==sent_id]['entity_id'].isin(COVID19_IDS).any()
    if drug and covid:
        sent = get_sentence(df, sent_id)
        print(f"{sent_id}:\n{sent}\n")
        found_sentences.append(sent)

S1:
Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial.

S2:
Chloroquine and hydroxychloroquine have been found to be efficient on SARS-CoV-2, and reported to be efficient in Chinese COV-19 patients.

S4:
French Confirmed COVID-19 patients were included in a single arm protocol from early March to March 16th, to receive 600mg of hydroxychloroquine daily and their viral load in nasopharyngeal swabs was tested daily in a hospital setting.

S11:
Despite its small sample size, our survey shows that hydroxychloroquine treatment is significantly associated with viral load reduction/disappearance in COVID-19 patients and its effect is reinforced by azithromycin.

