# Treatment of Covid-19 with Remdesivir - Systematic Review and Meta-Analysis

In [None]:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import re
from urllib.request import Request, urlopen
from urllib.error import HTTPError

## Search pubmed for appropriate studies
1. Click on this [link](https://pubmed.ncbi.nlm.nih.gov/)
2. Type "((((((covid[Title/Abstract]) OR (corona[Title/Abstract])) OR (sars-cov-2[Title/Abstract])) AND (remdesivir[Title/Abstract])) NOT (meta-analysis[Title/Abstract])) NOT (meta analysis[Title/Abstract])) NOT (review[Title/Abstract]))" into search box
3. On the left hand side check "Free Full Text"
4. Click on save and select all results in selection and format Pubmed
5. Click on create file and save it under the name search_results.txt

## Automated Filtering
### Filter for Randomized Control Trials
We only want to include randomized control trials in our meta-analysis. We are using a machine learning tool called robotsearch to filter out only the studies from the pubmed search which used a ranodmized control trial design.
1. Move the file search_results.txt into the robotsearch directory
2. Open Anaconda
3. `cd` your way into the robotsearch directory
4. If the environment is not activated type `conda activate covid_review`.
5. Run `python setup.py install`.
3. Run `robotsearch search_reults.txt` 

The results are saved in the file search_reults_robotviewer_RCTs.txt. Let's look at the result

In [2]:
from robotsearch.parsers import ris
file_input = "robotsearch/search_results.txt"
file_result = "robotsearch/search_results_robotreviewer_RCTs.txt"
with open(file_input, 'r', encoding="utf8") as f:
    inp = ris.load(f)
with open(file_result, 'r', encoding="utf8") as f:
    result = ris.load(f)
print("The inital search result has {} articles".format(len(inp)))
print("{} articles were classified as rcts".format(len(result)))

other non numbered
other non numbered
The inital search result has 672 articles
159 articles were classified as rcts


In [6]:
#extract PMID - they are found in the 'PMID' key.
pmcs = []
non_pmc_i = []
for i in range(len(result)):
    if "PMC" in result[i].keys():
        id_raw = result[i]['PMC'][0]
        pmc = id_raw.strip()
        pmcs.append(pmc)
    else:
        non_pmc_i.append(i)
len(pmcs)

155

### Filter for Outcome
Let's reduce the number even further by checking if the article contains necessary information. Here we only want articles that use time to clinical improvement. The article has to include the meassure hazard ratio

In [None]:
site= "https://www.ncbi.nlm.nih.gov/pmc/articles/{}/"
hdr = {'User-Agent': 'Mozilla/5.0'}
regex_time = r"method.+(?:time to (?:clinical improvement|recovery))" # the outcome meassure must come after the word method, so it is like mentioned in the methods or results section
regex_hr = r"method.+(?:hazard|odds|rate)[\s-]ratio"
pmcs_outcome = []
for i in range(len(pmcs)):
    print(i)
    url = site.format(pmcs[i])
    req = Request(url,headers=hdr)
    try:
        page = urlopen(req)
        soup = BeautifulSoup(page)
        if re.search(regex_time, soup.prettify(), re.DOTALL|re.IGNORECASE) and re.search(regex_hr, soup.prettify(), re.DOTALL|re.IGNORECASE):
            print(True)
            pmcs_outcome.append(pmcs[i])
    except HTTPError:
        print("httperror")

In [5]:
print("{} articles contain time to clinical improvement/recovery and propper outcome meassure".format(len(pmcs_outcome)))
with open("papers_html.txt", "w") as f:
    for pmc in pmcs_outcome:
        f.write("%s\n" % "https://www.ncbi.nlm.nih.gov/pmc/articles/{}/".format(pmc))

52 articles contain time to clinical improvement/recovery and propper outcome meassure


In [15]:
screening_results = pd.DataFrame(columns = ['pmc', 'non_retracted', 'randomized_controlled', 'placebo_controlled', 'adults', 'infected', 'remdesivir_only', 'propper_outcome'])
screening_results['pmc'] = np.array(pmcs_outcome)
screening_results.to_csv("screening_results.csv", index = False)

## Manual Screening of Articles
Now that we filtered out the promising studies, we must manually check for their eligibility. Even though we automated the classification as randomized controll trial, we must check for errors. We follow the following protocol. Open the file papers_html.txt. Apply to protocol to each study individually. Open the file screening_results.csv in a csv-editor of your choice to record your results. Type 1 into the cell if criterion is met and 0 if it is not met. Stop checking the other criteria once you coded one as 0 and continue to the next.
### Not retracted
If the article has been retracted, it shows in a red box on top of the webpage. It's impossible to miss.
### Randomized Controlled Trial
Randomized Clinical Trials are those in which patients are assigned randomly to a group receiving treatment with Remdesivir and a placebo-group. Check the abstract and methods secion to check if the papers are randomized control trials.
### Placebo controlled
Check if both administrators of treatment and patients were blind to whether the medication was remdesivir or placebo. Look for the phrase open label
### Majority Aduls. 
Check if proportion of age above 18 is >95%
### Participants infected 
No prophylactic study
### Remdesivir only 
No combination study and Remdesivir not used in control group as treatment.
### Propper outcome
To assess the effectiveness of treatment with Remdesivir compared to the placebo, there must be a propper effect-meassure reported. 
Check if at least one of the outcomes "time to recovery" or "time to clinical improvement" or equivalent meassures has one of these meassures associated with it: odds-ratio, hazard-ratio, rate-ratio