# Treatment of Covid-19 with Remdesivir - Systematic Review and Meta-Analysis

In [1]:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import re
from urllib.request import Request, urlopen
from urllib.error import HTTPError

## Search pubmed for appropriate studies
1. Click on this [link](https://pubmed.ncbi.nlm.nih.gov/)
2. Type "((((((covid[Title/Abstract]) OR (corona[Title/Abstract])) OR (sars-cov-2[Title/Abstract])) AND (remdesivir[Title/Abstract])) NOT (meta-analysis[Title/Abstract])) NOT (meta analysis[Title/Abstract])) NOT (review[Title/Abstract])) (meta-analyses[Title/Abstract])) NOT (meta analyses[Title/Abstract]))" into search box
3. On the left hand side check "Free Full Text"
4. Click on save and select all results in selection and format Pubmed
5. Click on create file and save it under the name search_results.txt

## Automated Filtering
### Filter for Randomized Control Trials
We only want to include randomized control trials in our meta-analysis. We are using a machine learning tool called robotsearch to filter out only the studies from the pubmed search which used a ranodmized control trial design.
1. Move the file search_results.txt into the robotsearch directory
2. Open Anaconda
3. `cd` your way into the robotsearch directory
4. If the environment is not activated type `conda activate covid_review`.
5. Run `python setup.py install`.
3. Run `robotsearch search_reults.txt` 

The results are saved in the file search_reults_robotviewer_RCTs.txt. Let's look at the result

In [2]:
from robotsearch.parsers import ris
file_input = "robotsearch/search_results.txt"
file_result = "robotsearch/search_results_robotreviewer_RCTs.txt"
with open(file_input, 'r', encoding="utf8") as f:
    inp = ris.load(f)
with open(file_result, 'r', encoding="utf8") as f:
    result = ris.load(f)
print("The inital search result has {} articles".format(len(inp)))
print("{} articles were classified as rcts".format(len(result)))

other non numbered
other non numbered
The inital search result has 672 articles
159 articles were classified as rcts


In [17]:
#extract PMC - they are found in the 'PMC' key.
pmcs = []
for i in range(len(result)):
    if "PMC" in result[i].keys():
        id_raw = result[i]['PMC'][0]
        pmc = id_raw.strip()
        pmcs.append(pmc)

print("{} studies have a pmc-identifier and can be accessed through pubmed open-access".format(len(pmcs)))

155 studies have a pmc-identifier and can be accessed through pubmed open-access


### Filter for Outcome
Let's reduce the number even further by checking if the article contains necessary information. Here we only want articles that use time to clinical improvement. The article has to include the meassure hazard ratio

In [None]:
site= "https://www.ncbi.nlm.nih.gov/pmc/articles/{}/"
hdr = {'User-Agent': 'Mozilla/5.0'}
regex_time = r"method.+(?:time to (?:clinical improvement|recovery))" # the outcome meassure must come after the word method, so it is like mentioned in the methods or results section
regex_hr = r"method.+(?:hazard|odds|rate)[\s-]ratio"
pmcs_outcome = []
for i in range(len(pmcs)):
    print(i)
    url = site.format(pmcs[i])
    req = Request(url,headers=hdr)
    try:
        page = urlopen(req)
        soup = BeautifulSoup(page)
        if re.search(regex_time, soup.prettify(), re.DOTALL|re.IGNORECASE) and re.search(regex_hr, soup.prettify(), re.DOTALL|re.IGNORECASE):
            print(True)
            pmcs_outcome.append(pmcs[i])
    except HTTPError:
        print("httperror")

In [5]:
print("{} articles contain time to clinical improvement/recovery and propper outcome meassure".format(len(pmcs_outcome)))
with open("papers_html.txt", "w") as f:
    for pmc in pmcs_outcome:
        f.write("%s\n" % "https://www.ncbi.nlm.nih.gov/pmc/articles/{}/".format(pmc))

52 articles contain time to clinical improvement/recovery and propper outcome meassure


In [19]:
screening_results = pd.DataFrame(columns = ['pmc', 'non_retracted', 'randomized_controlled', 'placebo_controlled', 'adults', 'infected', 'remdesivir_only', 'propper_outcome'])
screening_results['pmc'] = np.array(pmcs_outcome)
screening_results.to_csv("screening_results.csv", index = False, sep = ";")

NameError: name 'pmcs_outcome' is not defined

## Manual Screening of Articles
Now that we filtered out the promising studies, we must manually check for their eligibility. Even though we automated the classification as randomized controll trial, we must check for errors. We follow the following protocol. Open the file papers_html.txt. Apply to protocol to each study individually. Open the file screening_results.csv in a csv-editor of your choice to record your results. Type 1 into the cell if criterion is met and 0 if it is not met. Stop checking the other criteria once you coded one as 0 and continue to the next. Note that initially when this protocol was created, studies had to include a placebo-group. This turned out to too strict a criterion, because only two studies fulfilled all criteria. Therefore we decided to include open-label studies.
### Not retracted
If the article has been retracted, it shows in a red box on top of the webpage. It's impossible to miss.
### Randomized Controlled Trial
Randomized Clinical Trials are those in which patients are assigned randomly to a group receiving treatment with Remdesivir and a placebo-group. Check the abstract and methods secion to check if the papers are randomized control trials. Don't include studies, in which both experimental and control group involves Remdesivir but different dossage. Keywords indicating that a study is not an RCT are: "retrospective" study or analysis or observational cohort study.
### Majority Aduls. 
Check if proportion of age above 18 is >95%
### Participants infected 
Participants must be infected with Covid-19. No studies analysing Remdesivir as a prevention treatment should be included.
### Remdesivir only 
The only difference between treatment- and control-group should be whether patients received remdesivir or not. Control groups can be placebo or standard-care. If multiple treatments are tested, the study must include a comparison of remdesivir with a placebo or standard care group.
### Propper outcome
To assess the effectiveness of treatment with Remdesivir compared to the placebo or standard-care, there must be a propper effect-meassure reported. 
Check if at least one of the outcomes "time to recovery" or "time to clinical improvement" or equivalent meassures has one of these meassures associated with it: odds-ratio, hazard-ratio, rate-ratio. Keep in mind that what the authors declare as primary outcome might not be the primary outcome for this study. You might even have to look into the appendices to retrieve it.

## Extract eligable studies
We are going to use the studies that fulfill the above criteria. Execute the cell below to get the studies to be included in the final analysis, after you have comleted and saved the screening of articles.

In [None]:
# Please make sure to save your screening_results.csv properly. In Germany the default seperator is semicolon.
df_screening = pd.read_csv("screening_results.csv",sep=";")
df_screening = df_screening.fillna(0)
df_screening["eligable"] = (df_screening.iloc[:,1:].sum(axis = 1)/7).astype(int).astype(bool)
eligable_studies = list(df_screening.loc[df_screening.eligable == True,"pmc"])
print("{} studies were found to be eligable".format(len(eligable_studies)))
for i in range(len(eligable_studies)):
    print("https://www.ncbi.nlm.nih.gov/pmc/articles/{}/".format(eligable_studies[i]))

## Extract Relevant Data
for each of the studies, extract the sample size, and the relevant outcome meassure with confidence interval. The outcome meassure is time to recovery. Sometimes the acutal name of the variable can be slightly different. If there is no time to recovery improvement meassure, then use time to clinical improvement. Extract the values for both the remdesivir-group and the placebo group. The effect meassure is recorded as a ratio: either hazard-ratio, odds-ratio or rate_ratio. Use whatever ratio the authors used in the study to quantify the difference. As covariate document if the study was placebo or open-label.