<h3>Task 5 - Predictive Health Care</h3>
Comparing adverse effects of pain medicaments

<h3>Task Overview</h3>

<b>What you'll learn</b>
 - How to analyze adverse drug effects using provided data.

<b>What you'll do</b>
 - Analyze 2019 FAERS data to find the top 10 Tramal adverse effects.
 - Compare Tramal and Lyrica's adverse effects.
 - Suggest further investigations based on dataset findings.

<h3>Here is your task:</h3>

<b>Jakob asks you to create a PowerPoint slide deck while tackling the following steps. Use screenshots and diagrams to illustrate your findings as well.</b>

<b>Step 1</b>

Create a descriptive overview of adverse effects of tramal based on the available FAERS datasets, which you’ll find in your resource section. For your analysis, only use the FAERS data from the year 2019.

Show the 10 most common adverse effects as they are reported in the FAERS database. Jakob loves bar plots, so would be great if you use one.

<b>Step 2</b>

Compare tramal to another medication called lyrica that is also commonly used to treat neurological pain. Are the adverse effects similar?

Use Rscript to solve the task and make sure using it in your presentation.

</b>Step 3</b>

Define what further investigations might be helpful in determining whether a certain drug might be more preferable over another drug. Base your solution on the results of your dataset work.

In [119]:
# import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [120]:
# read the dataset (Quarter 1)
DEMO19Q1 = pd.read_csv("E:\iNeuron\Dataset\Q1\DEMO19Q1.txt", delimiter="$", low_memory=False)
DRUG19Q1 = pd.read_csv("E:\iNeuron\Dataset\Q1\DRUG19Q1.txt", delimiter="$", low_memory=False)
REAC19Q1 = pd.read_csv("E:\iNeuron\Dataset\Q1\REAC19Q1.txt", delimiter="$", low_memory=False)

In [121]:
# select columns necessary for our analysis
DEMO19Q1 = DEMO19Q1[["primaryid", "caseid", "age", "sex"]]
DRUG19Q1 = DRUG19Q1[["primaryid", "caseid", "drugname"]]
REAC19Q1 = REAC19Q1[["primaryid", "caseid", "pt"]]

In [122]:
# create a function to clean the data
def primary_clean(data):
    data = data.drop_duplicates()
    data = data.dropna()
    print(data.shape)
    print(data.isnull().sum())
    print(data.duplicated().sum())
    return data

In [123]:
# clean each dataset
DEMO19Q1 = primary_clean(DEMO19Q1)

(226536, 4)
primaryid    0
caseid       0
age          0
sex          0
dtype: int64
0


In [124]:
# clean each dataset
DRUG19Q1 = primary_clean(DRUG19Q1)

(1381706, 3)
primaryid    0
caseid       0
drugname     0
dtype: int64
0


In [125]:
# clean each dataset
REAC19Q1 = primary_clean(REAC19Q1)

(1303521, 3)
primaryid    0
caseid       0
pt           0
dtype: int64
0


In [126]:
# create a function to merge and clean dataset
def merge_data(data1, data2, data3):
    data = pd.merge(left=data1, right=data2, how="inner", on=["primaryid", "caseid"])
    final = pd.merge(left=data, right=data3, how="inner", on=["primaryid", "caseid"])
    final = final.drop_duplicates()
    final = final.dropna()
    print(final.shape)
    print(final.isnull().sum())
    print(final.duplicated().sum())
    final["drugname"] = final["drugname"].replace(to_replace="GABAPENTIN.", value="GABAPENTIN")
    final = final[(final['drugname']=="GABAPENTIN") | (final['drugname']=="LYRICA") | (final['drugname']=="TRAMAL")]
    final = final.reset_index(drop="index")
    print(final.shape)
    return final

In [127]:
# merge and clean
Quarter1 = merge_data(DEMO19Q1, DRUG19Q1, REAC19Q1)

(4407726, 6)
primaryid    0
caseid       0
age          0
sex          0
drugname     0
pt           0
dtype: int64
0
(50134, 6)


In [128]:
# read the dataset (Quarter 2)
DEMO19Q2 = pd.read_csv("E:\iNeuron\Dataset\Q2\DEMO19Q2.txt", delimiter="$", low_memory=False)
DRUG19Q2 = pd.read_csv("E:\iNeuron\Dataset\Q2\DRUG19Q2.txt", delimiter="$", low_memory=False)
REAC19Q2 = pd.read_csv("E:\iNeuron\Dataset\Q2\REAC19Q2.txt", delimiter="$", low_memory=False)

In [129]:
# select columns necessary for our analysis
DEMO19Q2 = DEMO19Q2[["primaryid", "caseid", "age", "sex"]]
DRUG19Q2 = DRUG19Q2[["primaryid", "caseid", "drugname"]]
REAC19Q2 = REAC19Q2[["primaryid", "caseid", "pt"]]

In [130]:
# clean each dataset
DEMO19Q2 = primary_clean(DEMO19Q2)

(241381, 4)
primaryid    0
caseid       0
age          0
sex          0
dtype: int64
0


In [131]:
# clean each dataset
DRUG19Q2 = primary_clean(DRUG19Q2)

(1551080, 3)
primaryid    0
caseid       0
drugname     0
dtype: int64
0


In [132]:
# clean each dataset
REAC19Q2 = primary_clean(REAC19Q2)

(1408475, 3)
primaryid    0
caseid       0
pt           0
dtype: int64
0


In [133]:
# merge and clean
Quarter2 = merge_data(DEMO19Q2, DRUG19Q2, REAC19Q2)

(4892508, 6)
primaryid    0
caseid       0
age          0
sex          0
drugname     0
pt           0
dtype: int64
0
(59589, 6)


In [134]:
# read the dataset (Quarter 3)
DEMO19Q3 = pd.read_csv("E:\iNeuron\Dataset\Q3\DEMO19Q3.txt", delimiter="$", low_memory=False)
REAC19Q3 = pd.read_csv("E:\iNeuron\Dataset\Q3\REAC19Q3.txt", delimiter="$", low_memory=False)
DRUG19Q3 = pd.read_csv("E:\iNeuron\Dataset\Q3\DRUG19Q3.txt", delimiter="$", low_memory=False, encoding_errors='ignore')

In [135]:
# select columns necessary for our analysis
DEMO19Q3 = DEMO19Q3[["primaryid", "caseid", "age", "sex"]]
DRUG19Q3 = DRUG19Q3[["primaryid", "caseid", "drugname"]]
REAC19Q3 = REAC19Q3[["primaryid", "caseid", "pt"]]

In [136]:
# clean each dataset
DEMO19Q3 = primary_clean(DEMO19Q3)

(253952, 4)
primaryid    0
caseid       0
age          0
sex          0
dtype: int64
0


In [137]:
# clean each dataset
DRUG19Q3 = primary_clean(DRUG19Q3)

(1651755, 3)
primaryid    0
caseid       0
drugname     0
dtype: int64
0


In [138]:
# clean each dataset
REAC19Q3 = primary_clean(REAC19Q3)

(1504784, 3)
primaryid    0
caseid       0
pt           0
dtype: int64
0


In [139]:
# merge and clean
Quarter3 = merge_data(DEMO19Q3, DRUG19Q3, REAC19Q3)

(5540978, 6)
primaryid    0
caseid       0
age          0
sex          0
drugname     0
pt           0
dtype: int64
0
(62882, 6)


In [140]:
# read the dataset (Quarter 4)
DEMO19Q4 = pd.read_csv("E:\iNeuron\Dataset\Q4\DEMO19Q4.txt", delimiter="$", low_memory=False)
REAC19Q4 = pd.read_csv("E:\iNeuron\Dataset\Q4\REAC19Q4.txt", delimiter="$", low_memory=False)
DRUG19Q4 = pd.read_csv("E:\iNeuron\Dataset\Q4\DRUG19Q4.txt", delimiter="$", low_memory=False)

In [141]:
# select columns necessary for our analysis
DEMO19Q4 = DEMO19Q4[["primaryid", "caseid", "age", "sex"]]
DRUG19Q4 = DRUG19Q4[["primaryid", "caseid", "drugname"]]
REAC19Q4 = REAC19Q4[["primaryid", "caseid", "pt"]]

In [142]:
# clean each dataset
DEMO19Q4 = primary_clean(DEMO19Q4)

(245213, 4)
primaryid    0
caseid       0
age          0
sex          0
dtype: int64
0


In [143]:
# clean each dataset
DRUG19Q4 = primary_clean(DRUG19Q4)

(1417965, 3)
primaryid    0
caseid       0
drugname     0
dtype: int64
0


In [144]:
# clean each dataset
REAC19Q4 = primary_clean(REAC19Q4)

(1359912, 3)
primaryid    0
caseid       0
pt           0
dtype: int64
0


In [145]:
# merge and clean
Quarter4 = merge_data(DEMO19Q4, DRUG19Q4, REAC19Q4)

(4879124, 6)
primaryid    0
caseid       0
age          0
sex          0
drugname     0
pt           0
dtype: int64
0
(53822, 6)


In [146]:
# concatenate all Quarter data into one for our analysis
final_data = pd.concat([Quarter1, Quarter2, Quarter3, Quarter4], ignore_index=True)

# check null/missing values
print(final_data.isnull().sum())

# check any duplicate records
print(final_data.duplicated().sum())

# check the dimension of data
final_data.shape

primaryid    0
caseid       0
age          0
sex          0
drugname     0
pt           0
dtype: int64
0


(226427, 6)