In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats


# from src.helpers import multivariant_analysis, load_csv


FILENAME = "drug_sex_values.csv"
filepath = f"../data/{FILENAME}"
drug_sex_df = pd.read_csv(filepath)
drug_sex_df.head()

Unnamed: 0,sex,time,start_time,end_time,value,setting,all drugs,all opioids,stimulants,cannabis,benzodiazepine
0,female,1,01/01/2020,01/31/2020,38478.0,ip,4812.0,583.0,230.0,303.0,91.0
1,female,1,01/01/2020,01/31/2020,124275.0,ed,18839.0,767.0,580.0,1116.0,151.0
2,male,1,01/01/2020,01/31/2020,38478.0,ip,5482.0,778.0,537.0,446.0,154.0
3,male,1,01/01/2020,01/31/2020,124275.0,ed,18367.0,1304.0,1181.0,1641.0,291.0
4,female,2,02/01/2020,02/29/2020,35754.0,ip,4659.0,630.0,236.0,280.0,99.0


In [9]:

# Chi-Square test for independence between "sex" and "setting"
contingency_table_sex_setting = pd.crosstab(drug_sex_df['sex'], drug_sex_df['setting'])

chi2, p, dof, ex = chi2_contingency(contingency_table_sex_setting)

print(f"\n'Sex' vs 'Setting'")
print(f"Chi-Square statistic = {chi2}")
print(f"p-value = {p}")
print(f"Degrees of freedom = {dof}")


'Sex' vs 'Setting'
Chi-Square statistic = 0.0
p-value = 1.0
Degrees of freedom = 1


The p-value in your test is 1.0, which is greater than the common alpha level of 0.05. Therefore, we do not reject the null hypothesis. The null hypothesis for this test is that 'Sex' and 'Setting' are independent.

The p-value of 1.0 also indicates that there is extremely strong evidence that you would get the observed data assuming that 'Sex' and 'Setting' are independent. 

In conclusion, based on this Chi-Square test result, 'Sex' and 'Setting' are likely independent; changing one probably does not affect the other. This, of course, is based solely on the present statistical test, and might not incorporate other factors not included within the scope of this test. Also, please ensure that your data met necessary assumptions for conducting a Chi-Square test.

In [18]:
# Convert the 'all drugs' column to categories
drug_sex_df['all_drugs_cat'] = pd.cut(drug_sex_df['all drugs'], bins=[-np.inf, 5000, 10000, np.inf], labels=['Low', 'Medium', 'High'])

# Create a cross-tabulation of the two columns
contingency_table = pd.crosstab(drug_sex_df['setting'], drug_sex_df['all_drugs_cat'])

# Perform the Chi-Square test
chi2, p, dof, ex = chi2_contingency(contingency_table)

print(f"'Setting' vs 'All Drugs Category'")
print(f"Chi-Square statistic = {chi2}")
print(f"p-value = {p}")
print(f"Degrees of freedom = {dof}")

'Setting' vs 'All Drugs Category'
Chi-Square statistic = 159.2
p-value = 2.6925218759599116e-35
Degrees of freedom = 2


The Chi-Square statistic is a measure of how much observed frequencies deviate from the frequencies expected under the null hypothesis, which postulates that the categorical variables are independent. 

In your case, a Chi-Square statistic of 159.2 suggests a notable level of divergence between what's observed and what's expected if drugs category and setting were independent.

The p-value is an extremely small number (approximately 2.69 x 10^-35), which is much less than 0.05. In statistical tests, a common threshold for significance is 0.05, and if the p-value is below this value, you reject the null hypothesis.

Therefore, based on these test results, we reject the null hypothesis that the type of setting and the category of all drug use are independent. There's extremely strong evidence from your data to suggest that the two variables are associated or dependent in some way.

The degree of freedom is 2, calculated from (number of rows - 1) x (number of columns - 1) in your contingency tables.

In this statistical test context, having found a significant association does not imply causality. It does not state that changes in setting cause changes in drug use category, or vice versa. However, it does suggest that

In [20]:
import numpy as np  

# Define the bin edges and labels
edges = [-np.inf, 500, 1000, np.inf]
labels=['Low', 'Medium', 'High']

# Define the columns to analyze
columns = ['all opioids', 'stimulants', 'cannabis', 'benzodiazepine', 'all drugs']

for col in columns:
    # Create the categorical variables
    drug_sex_df[f'{col}_cat'] = pd.cut(drug_sex_df[col], bins=edges, labels=labels)

    # Create a cross-tabulation of the two columns
    contingency_table = pd.crosstab(drug_sex_df['sex'], drug_sex_df[f'{col}_cat'])

    # Perform the Chi-Square test
    chi2, p, dof, ex = chi2_contingency(contingency_table)

    print(f"\n'Sex' vs '{col} Category'")
    print(f"Chi-Square statistic = {chi2}")
    print(f"p-value = {p}")
    print(f"Degrees of freedom = {dof}")


'Sex' vs 'all opioids Category'
Chi-Square statistic = 56.73737373737374
p-value = 4.782289182019959e-13
Degrees of freedom = 2

'Sex' vs 'stimulants Category'
Chi-Square statistic = 52.36036036036036
p-value = 4.266701028366233e-12
Degrees of freedom = 2

'Sex' vs 'benzodiazepine Category'
Chi-Square statistic = 0.0
p-value = 1.0
Degrees of freedom = 0

'Sex' vs 'all drugs Category'
Chi-Square statistic = 0.0
p-value = 1.0
Degrees of freedom = 0

'Sex' vs 'cannabis Category'
Chi-Square statistic = 7.9781818181818185
p-value = 0.01851653968857337
Degrees of freedom = 2


**'Sex' vs 'all opioids Category'**
Chi-square statistic of 56.74 and a p-value close to zero suggest that there is a very strong association between sex and opioid use category. We reject the null hypothesis that these two variables are independent. This finding is statistically significant.

**'Sex' vs 'stimulants Category'**
Similarly, a chi-square statistic of 52.36 and a near-zero p-value indicate that sex and the use of stimulant drugs are strongly associated. We reject the null hypothesis of independence. This is also statistically significant.

**'Sex' vs 'cannabis Category'**
A chi-square statistic of 7.98 and a p-value of 0.0185 suggest that there is some association between sex and cannabis use category. The p-value is less than 0.05, meaning the result is statistically significant, but the association may not be as strong as in the previous tests.

**'Sex' vs 'benzodiazepine Category'**
A chi-square statistic of 0 and a p-value of 1 suggest that there is perfect alignment between what is observed and what is expected if the use of benzodiazepine and sex were independent. Thus, we fail to reject the null hypothesis, indicating no evidence of an association between sex and benzodiazepine use in your data.

**'Sex' vs 'all drugs Category'**
Similarly, a chi-square statistic of 0 and a p-value of 1 show perfect alignment between observed and expected if there was no association between sex and all-drugs use category. Again, we fail to reject the null hypothesis, indicating that there's no strong evidence in your data to suggest an association between sex and all drugs category.