In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

FILENAME = "drug_sex_values.csv"
filepath = f"../data/{FILENAME}"

drug_sex_df = pd.read_csv(filepath)
drug_sex_df.head()

Unnamed: 0,sex,time,start_time,end_time,setting,all drugs,all opioids,stimulants,cannabis,benzodiazepine
0,female,1,2020-01-01,2020-01-31,In Patient,4812.0,583.0,230.0,303.0,91.0
1,female,1,2020-01-01,2020-01-31,Emergency Department,18839.0,767.0,580.0,1116.0,151.0
2,male,1,2020-01-01,2020-01-31,In Patient,5482.0,778.0,537.0,446.0,154.0
3,male,1,2020-01-01,2020-01-31,Emergency Department,18367.0,1304.0,1181.0,1641.0,291.0
4,female,2,2020-02-01,2020-02-29,In Patient,4659.0,630.0,236.0,280.0,99.0


In [2]:
from scipy import stats

# Create two groups of data
in_patient = drug_sex_df[drug_sex_df['setting'] == 'In Patient']['all drugs']
emergency_department = drug_sex_df[drug_sex_df['setting'] == 'Emergency Department']['all drugs']

# Perform t-test
t_stat, p_val = stats.ttest_ind(in_patient, emergency_department)

print("t statistic:", t_stat)
print("p-value:", p_val)

t statistic: -47.28032993787939
p-value: 1.0539363110730132e-96


The t statistic of -47.28 tells us that the difference between the means of the two groups is quite large relative to the variation within the groups. The negative sign indicates that the mean of the first group (In Patient) is smaller than the mean of the second group (Emergency Department).

The p-value is extremely small, far less than 0.05, which is often used as a threshold for statistical significance. This tells us that the difference in means between the two groups would be very unlikely if the true difference in population means were zero (the null hypothesis). 

So, we reject the null hypothesis and conclude that there is a statistically significant difference in drug use between the In Patient setting and the Emergency Department setting. The Emergency Department setting has a significantly higher mean drug use than the In Patient setting according to this dataset.

In [4]:
# Create two groups of data
in_patient = drug_sex_df[drug_sex_df['sex'] == 'male']['all drugs']
emergency_department = drug_sex_df[drug_sex_df['sex'] == 'female']['all drugs']

# Perform t-test
t_stat, p_val = stats.ttest_ind(in_patient, emergency_department)

print("t statistic:", t_stat)
print("p-value:", p_val)

t statistic: 0.30720173144992186
p-value: 0.7590843742191118


The t statistic and p-value are the results of a t-test comparing drug use between the male and female groups within the dataset.

The t statistic of 0.307 tells us that the difference between the means of the two groups is small relative to the variation within the groups.

The p-value is fairly large, at about 0.759. As this value is well above the typically used threshold of 0.05, we fail to reject the null hypothesis and conclude that there is not a significant difference in drug use between males and females within your dataset (at least, not one we can detect with a t-test).

Based on this t-test, gender does not appear to have a significant impact on drug use. 