In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.patches as patches
import matplotlib.gridspec as gridspec 
from matplotlib.offsetbox import AnchoredText
from mpl_toolkits.axes_grid1 import make_axes_locatable
import seaborn as sns
import statistics
from scipy.signal import savgol_filter

# Misc
import warnings
warnings.filterwarnings(action='ignore')
plt.rcParams['axes.titlepad'] = 12
plt.rcParams['figure.dpi'] = 400
mpl.rcParams['font.family'] = 'Serif'

In [None]:
def read_csv(path):
    try:
        data = pd.read_csv(path)
        data.fillna(0,inplace=True)
        return data
    except UnicodeDecodeError:
        data = pd.read_excel(path)
        data.fillna(0,inplace=True)
        return data
        

PATH = "/kaggle/input/national-drugs-strategy-household-australia/ndshs_2016.csv"
data = read_csv(PATH)

# other paths
PATH_A = "/kaggle/input/national-drugs-strategy-household-australia/Australia Burden due to illicit drug use by linked disease and sex (2021).xlsx"
PATH_B = "/kaggle/input/national-drugs-strategy-household-australia/Australia Consumer provider and total national illicit drug arrests 200607 to 201920.xlsx"
PATH_C = "/kaggle/input/national-drugs-strategy-household-australia/Australia Number and crude rate (per 100000 population) of drug-related hospitalisations by principal diagnosis and remoteness area 202021.xlsx"
PATH_D = "/kaggle/input/national-drugs-strategy-household-australia/Australia Number and rate (per 100000 population) of alcohol-induced and alcohol-related deaths 19972020.xlsx"
PATH_E = "/kaggle/input/national-drugs-strategy-household-australia/Australia drug-induced deaths by drug type and drug class 19972020.xlsx"

# additional datasets
data_a = read_csv(PATH_A)
data_b = read_csv(PATH_B)
data_c = read_csv(PATH_C)
data_d = read_csv(PATH_D)
data_e = read_csv(PATH_E)

<div style="width:100%;text-align: center;"> <img align=middle src="https://www.guelphpolice.ca/en/resourcesGeneral/Alcohol-and-Drug-Safety-Page-Banner.jpg"style="height:300px;margin-top:3rem;"> </div>


<h3>Table of contents</h3>

<b>1. </b>[**Introduction**](#section-1) <br>
<b>2. </b>[**The first drug that comes to mind when thinking about "a drug problem"**](#section-2) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>2A. </b>[**How did the different age groups differ in perception "a drug problem"**](#section-2A) <br>
<b>3. </b>[**Which one of these drugs do you think directly or indirectly causes the most deaths in Australia?**](#section-3) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>3A. </b>[**How do the age groups differ in what they think causes the most deaths?**](#section-3A) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>3B. </b>[**What drug actually causes the most deaths in Australia?**](#section-3B) <br>
<b>4. </b>[**Alcohol use in Australia**](#section-4) <br>
<b>5. </b>[**The most serious concern for the general community**](#section-5) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>5A. </b>[**How do the age groups differ in what they think is the most serious concern in their community?**](#section-5A) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>5B. </b>[**Drug arrest per year 100,000 (Consumer and Provider)**](#section-5B) <br>
<b>6. </b>[**How the use of "illicit drugs" differ in how people rate themselves in terms of health**](#section-6) <br>
<b>7. </b>[**Impact of alcohol and illicit drug use on the burden of disease and injury in Australia**](#section-7) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>7A. </b>[**Impact of alcohol and illicit drug use on the burden of disease and injury in Australia by gender (divided by total sum of both male and female)**](#section-7A) <br>
&nbsp;&nbsp;&nbsp;&nbsp; <b>7B. </b>[**Impact of alcohol and illicit drug use on the burden of disease and injury in Australia by gender (divided by total sum per gender)**](#section-7B) <br>
<b>8. </b>[**Final thoughts**](#section-8) <br>

<a id="section-1"></a>
<h1>1. Introduction</h1>

<p>Drug use in Australia is a complex issue. According to the National Drug Strategy Household Survey, alcohol is the most widely used drug in Australia, with around two-thirds of the population reporting consuming it in the past year. Cannabis is the most commonly used illicit drug, followed by use of prescription and over-the-counter medications. The use of methamphetamine, including ice, is a significant concern in some parts of the country. The Australian government has implemented a range of measures to address drug use, including prevention, early intervention, treatment, and harm reduction strategies.</p>

<p>The datasets provided are related to the burden of illicit drug use in Australia, including data on linked diseases, consumer and provider arrests, hospitalizations, and deaths. They contain information on specific drug types and classes, as well as demographic information such as gender and remoteness area. These datasets provide a comprehensive overview of the impact of illicit drug use in Australia, and can be used to understand patterns and trends in drug-related harms over time.</p>

<p>This notebook is to provide insights and as well work on my own visualisation and report building skills. The data provided in this notebook has also been sourced outside of Kaggle and I have combined it so we can all have a bit of fun with this data. Please provide feedback where possible to further build my skills in visualisation techniques and reporting.</p>

<a id="section-2"></a>
<h1>2. The first drug that comes to mind when thinking about "a drug problem"</h1>

<p>From the sample data gathered from the National Drugs Strategy Household Survey, it seems that the perception of what constitutes a "drug problem" is similar between males and females in Australia. For example, both males and females reported that the most common drug that comes to mind when thinking about "a drug problem" is Methamphetamine, with approximately 47% of respondents from both genders reporting it as the first drug that comes to mind. This suggests that there is a general consensus among both males and females about the seriousness of this particular drug.</p>

<p>The second most common drug that comes to mind for both males and females is Heroin, with approximately 14% of females and 13% of males reporting it as the first drug that comes to mind. Alcohol and Marijuana|Cannabis are also commonly thought of as the first drug that comes to mind when thinking about "a drug problem" by both males and females, with around 7% of males and 6% of females reporting it.</p>

<p>The sample also shows that there are some differences in the percentage of males and females who report specific drugs as the first one that comes to mind when thinking about "a drug problem." For example, a higher percentage of females than males report Ecstasy and Pain-killers|Analgesics|Opioids as the first drug that comes to mind.</p>

<p>Overall, the data suggests that there is some commonality in the drugs that come to mind when thinking about "a drug problem" for both males and females in Australia, but also some differences in the specific drugs that are thought of.</p>

<h4>Comments for visualisation below:</h4>

<p>The plots for male and female perception (not overall perception) are coloured to highlight "red" where they differ in perception and "dark" where there were no differences.</p>

In [None]:
data_1 = data.groupby(['Sex','A1'])['A1'].count().to_dict()

total_female = data.Sex.value_counts().to_dict()['Female']
total_male = data.Sex.value_counts().to_dict()['Male']

percentage_data = {(gender, drug): (value/total_female if gender == "Female" else value/total_male) * 100 for (gender, drug), value in data_1.items()}

# create the figure and axes for the subplot
fig, ax = plt.subplots(3, 1, figsize=(10, 10))

# separate data for males and females
male_data = {k[1]:v for k,v in percentage_data.items() if k[0] == 'Male'}
female_data = {k[1]:v for k,v in percentage_data.items() if k[0] == 'Female'}

# sorting values
male_data = dict(sorted(male_data.items(), key=lambda x: x[1]))
female_data = dict(sorted(female_data.items(), key=lambda x: x[1]))

# create the bar plots for males and females
male_bars = ax[0].barh(list(male_data.keys()), list(male_data.values()))
female_bars = ax[1].barh(list(female_data.keys()), list(female_data.values()))

# loop through each drug in the data
for i, drug in enumerate(male_data.keys()):
    # if the percentage for the current drug is similar for both males and females
    if abs(male_data[drug] - female_data[drug]) <= 0.5:
        # set the color of the bar to dark
        male_bars[i].set_color('#444444')
        female_bars[i].set_color('#444444')
    else:
        # set the color of the bar to red
        male_bars[i].set_color('#F05454')
        female_bars[i].set_color('#F05454')
        
for i in male_data.keys():
    ax[0].annotate(f"{round(male_data[i],2)}%", xy=(male_data[i] + 1.45, i),va = 'center', ha='center',fontsize=8,fontweight='light')
    
for i in female_data.keys():
    ax[1].annotate(f"{round(female_data[i],2)}%", xy=(female_data[i] + 1.45, i),va = 'center', ha='center',fontsize=8,fontweight='light')
    
# creating overall (male and female included)
data_1 = round((data.A1.value_counts()/len(data)*100),2).to_dict()

# color for overall plot
colors = list(reversed(['#F05454','#F28C8C','#F5B5B5','#F8DFDF','#FBFBFB'] + ["#FBFBFB" for i in range(len(data_1)-5)]))

# bar plot
data_1 = dict(sorted(data_1.items(), key=lambda x: x[1]))
ax[2].barh(list(data_1.keys()), list(data_1.values()), color=colors,edgecolor='#171717',linewidth=0.5)

for i in data_1.keys():
    ax[2].annotate(f"{data_1[i]}%", xy=(data_1[i] + 1.45, i),va = 'center', ha='center',fontsize=8,fontweight='light')

[ax[i].spines[s].set_visible(False) for i in range(3) for s in ['top','right','bottom']]
    
# set title and labels for the subplot
ax[0].set_title("Male perception of 'Drug problem'")
ax[1].set_title("Female perception of 'Drug problem'")
ax[2].set_title("Overall perception of 'Drug problem'")
[ax[i].set_xticks([]) for i in range(3)]
ax[0].grid(axis='y', linestyle=':', alpha=0.2)
ax[1].grid(axis='y', linestyle=':', alpha=0.2)
ax[2].grid(axis='y', linestyle=':', alpha=0.2)
ax[0].set_yticklabels(male_data.keys(),fontsize=8)
ax[1].set_yticklabels(female_data.keys(),fontsize=8)
ax[2].set_yticklabels(data_1.keys(),fontsize=8)
fig.tight_layout()
plt.show()

<a id="section-2A"></a>
<h1>2A. How did the different age groups differ in perception "a drug problem"</h1>

<p>It appears that across all age groups, Methamphetamine is considered the most problematic drug. For the age group 25-34, Methamphetamine is mentioned 1807 times, while Heroin is mentioned 457 times. Similarly, for age group 65+, Methamphetamine is mentioned 2315 times and Heroin is mentioned 958 times. The second most problematic drug varies by age group, but generally it is either Heroin or Marijuana/Cannabis.

It is also worth noting that the number of respondents who did not answer or chose "none" as the most problematic drug increases with age, with the highest percentage in the age group 65+ (204 out of 2315 or 8.8%). Additionally, the number of respondents choosing "drugs other than listed" also increases with age, with the highest percentage in the age group 65+ (32 out of 2315 or 1.4%). In the visualisation below the data is presented using the percentage of drug per age group.</p>

<h4>Comments for visualisation below:</h4>

<p>The difference in colours chosen for the plots highlight whatever is lower than 50% of the max values.</p>

In [None]:
data.Age = data.Age.replace({'Dec-17':'12-17'})

def calculate_percentages(data, age_range, column_name):
    data_age = data[data['AgeGroup'] == age_range]
    data_age = data_age[column_name].value_counts().to_dict()
    total = sum(data_age.values())
    data_age = {key: round((value/total)*100,2) for key, value in data_age.items()}
    data_age = dict(sorted(data_age.items(), key=lambda x: x[1]))
    return data_age

data1 = calculate_percentages(data, '14-19', 'A1')
data2 = calculate_percentages(data, '20-29', 'A1')
data3 = calculate_percentages(data, '30-39', 'A1')
data4 = calculate_percentages(data, '40-49', 'A1')
data5 = calculate_percentages(data, '50-59', 'A1')
data6 = calculate_percentages(data, '60+', 'A1')

fig, axs = plt.subplots(6, 1, figsize=(15, 20))

ages = sorted([i for i in data.AgeGroup.unique() if i != "Missing"])
datas = [data1, data2, data3, data4, data5, data6]

for i, d in enumerate(datas):
    axs[i].barh(list(d.keys()), list(d.values()),color='#F05454')
    axs[i].set_title(f"Age group ({ages[i]}) : perception of 'Drug problem'")
    axs[i].grid(axis='y', linestyle=':', alpha=0.2)
    
    # Find the maximum value in the current dataset
    max_val = max(d.values())
    for j, val in enumerate(d.values()):
        # If the value is less than 50% of the maximum value, color it differently
        if val < max_val * 0.5:
            axs[i].barh(list(d.keys())[j], val, color='#444444')
        # Annotate the bar chart with the value
        axs[i].annotate(f"{val}%", xy=(val, j), xytext=(5, 0),
                        textcoords='offset points', ha='left', va='center')

        
[axs[i].spines[s].set_visible(False) for i in range(6) for s in ['top','right','bottom']]
[axs[i].set_xticks([]) for i in range(6)]
plt.tight_layout()
plt.show()

<a id="section-3"></a>
<h1>3. Which one of these drugs do you think directly or indirectly causes the most deaths in Australia?</h1>

<p>Both males and females in Australia believe that alcohol is the leading cause of drug-related deaths, with 34.36% of respondents identifying it as such. This is followed by tobacco, with 24.96% of respondents identifying it as the second leading cause. Heroin, Methamphetamine, and Cocaine also ranked relatively high on the list, with 10.35%, 18.23% and 2.94% of respondents identifying them as leading causes of drug-related deaths, respectively.</p>

<p>It's also worth noting that there seems to be a relatively small difference in perception between males and females. The largest difference in perception is 1.5% and the smallest difference is 0.01%.</p>

<p>It's also interesting to notice that the percentage of people who responded "Other" or "Not answered" is relatively high, indicating that there may be a lack of awareness or understanding of the specific causes of drug-related deaths among the respondents.</p>

<h4>Comments for visualisation below:</h4>

<p>The plots for male and female perception (not overall perception) are coloured to highlight "red" where they differ in perception and "dark" where there were no differences.</p>

In [None]:
data_1 = data.groupby(['Sex','A2'])['A2'].count().to_dict()

total_female = data.Sex.value_counts().to_dict()['Female']
total_male = data.Sex.value_counts().to_dict()['Male']

percentage_data = {(gender, drug): (value/total_female if gender == "Female" else value/total_male) * 100 for (gender, drug), value in data_1.items()}

# create the figure and axes for the subplot
fig, ax = plt.subplots(3, 1, figsize=(10, 10))

# separate data for males and females
male_data = {k[1]:v for k,v in percentage_data.items() if k[0] == 'Male'}
female_data = {k[1]:v for k,v in percentage_data.items() if k[0] == 'Female'}

# sorting values
male_data = dict(sorted(male_data.items(), key=lambda x: x[1]))
female_data = dict(sorted(female_data.items(), key=lambda x: x[1]))

# create the bar plots for males and females
male_bars = ax[0].barh(list(male_data.keys()), list(male_data.values()))
female_bars = ax[1].barh(list(female_data.keys()), list(female_data.values()))

# loop through each drug in the data
for i, drug in enumerate(male_data.keys()):
    # if the percentage for the current drug is similar for both males and females
    if abs(male_data[drug] - female_data[drug]) <= 0.5:
        # set the color of the bar to dark
        male_bars[i].set_color('#444444')
        female_bars[i].set_color('#444444')
    else:
        # set the color of the bar to red
        male_bars[i].set_color('#F05454')
        female_bars[i].set_color('#F05454')
        
for i in male_data.keys():
    ax[0].annotate(f"{round(male_data[i],2)}%", xy=(male_data[i] + 1.2, i),va = 'center', ha='center',fontsize=8,fontweight='light')
    
for i in female_data.keys():
    ax[1].annotate(f"{round(female_data[i],2)}%", xy=(female_data[i] + 1.2, i),va = 'center', ha='center',fontsize=8,fontweight='light')
    
# creating overall (male and female included)
data_1 = round((data.A2.value_counts()/len(data)*100),2).to_dict()

# color for overall plot
colors = list(reversed(['#F05454','#F28C8C','#F5B5B5','#F8DFDF','#FBFBFB'] + ["#FBFBFB" for i in range(len(data_1)-5)]))

# bar plot
data_1 = dict(sorted(data_1.items(), key=lambda x: x[1]))
ax[2].barh(list(data_1.keys()), list(data_1.values()), color=colors,edgecolor='#171717',linewidth=0.5)

for i in data_1.keys():
    ax[2].annotate(f"{data_1[i]}%", xy=(data_1[i] + 1.2, i),va = 'center', ha='center',fontsize=8,fontweight='light')

[ax[i].spines[s].set_visible(False) for i in range(3) for s in ['top','right','bottom']]
    
# set title and labels for the subplot
ax[0].set_title("Male perception of 'What causes the most deaths?'")
ax[1].set_title("Female perception of 'What causes the most deaths?'")
ax[2].set_title("Overall perception of 'What causes the most deaths?'")
ax[0].set_xticks([])
ax[1].set_xticks([])
ax[2].set_xticks([])
ax[0].grid(axis='y', linestyle=':', alpha=0.2)
ax[1].grid(axis='y', linestyle=':', alpha=0.2)
ax[2].grid(axis='y', linestyle=':', alpha=0.2)
ax[0].set_yticklabels(male_data.keys(),fontsize=8)
ax[1].set_yticklabels(female_data.keys(),fontsize=8)
ax[2].set_yticklabels(data_1.keys(),fontsize=8)
fig.tight_layout()
plt.show()

<a id="section-3A"></a>
<h1>3A. How do the age groups differ in what they think causes the most deaths?</h1>

<p>There are a few key differences and similarities between the age groups in terms of what drugs they believe cause the most deaths.</p>

<ul>
    <li>Alcohol is considered the drug that causes the most deaths across all age groups, with percentages ranging from 34.64% to 39.56%.</li>
    <li>Heroin is considered the second highest drug that causes deaths across all age groups, with percentages ranging from 8.48% to 12.11%.</li>
    <li>Methamphetamine is considered the third highest drug that causes deaths across all age groups, with percentages ranging from 22.31% to 16.66%.</li>
</ul>

<p>In terms of specific age groups, it appears that the 14-19 age group stands out as having the highest percentage of people who believe Methamphetamine (22.31%) causes the most deaths. Meanwhile, the 30-39 and 40-49 age groups stands out as having the highest percentage of people who believe Alcohol causes the most deaths (39.56% and 37.44% respectively)</p>

In [None]:
data1 = calculate_percentages(data, '14-19', 'A2')
data2 = calculate_percentages(data, '20-29', 'A2')
data3 = calculate_percentages(data, '30-39', 'A2')
data4 = calculate_percentages(data, '40-49', 'A2')
data5 = calculate_percentages(data, '50-59', 'A2')
data6 = calculate_percentages(data, '60+', 'A2')

fig, axs = plt.subplots(6, 1, figsize=(15, 20))

ages = sorted([i for i in data.AgeGroup.unique() if i != "Missing"])
datas = [data1, data2, data3, data4, data5, data6]

for i, d in enumerate(datas):
    axs[i].barh(list(d.keys()), list(d.values()),color='#F05454')
    axs[i].set_title(f"Age group ({ages[i]}) : perception of 'What causes the most deaths?'")
    axs[i].grid(axis='y', linestyle=':', alpha=0.2)
    
    # Find the maximum value in the current dataset
    max_val = max(d.values())
    for j, val in enumerate(d.values()):
        # If the value is less than 50% of the maximum value, color it differently
        if val < max_val * 0.5:
            axs[i].barh(list(d.keys())[j], val, color='#444444')
        # Annotate the bar chart with the value
        axs[i].annotate(f"{val}%", xy=(val, j), xytext=(5, 0),
                        textcoords='offset points', ha='left', va='center')

        
[axs[i].spines[s].set_visible(False) for i in range(6) for s in ['top','right','bottom']]
[axs[i].set_xticks([]) for i in range(6)]
plt.tight_layout()
plt.show()

<a id="section-3B"></a>
<h1>3B. What drug actually causes the most deaths in Australia?</h1>

<p>From the additional data retrieved from Australian Institute of Health and Welfare, it appears that drug-induced deaths in Australia have fluctuated over the years, with a peak in 2008 and a low in 2001. The year with the highest rate of all drug-induced deaths was 2008, with a rate of 6.65%. The year with the lowest rate of all drug-induced deaths was 2001, with a rate of 5.39%. Overall, there is a clear upward trend in the rate of drug-induced deaths from 2005 to 2020.</p>

<p>Opioids appear to be a significant contributor to drug-induced deaths in Australia, with a peak rate of 6.52% in 1999 and a low of 2.31% in 2003. The year with the highest rate of deaths due to all opioids was 1999, with a rate of 6.52%. The year with the lowest rate of deaths due to all opioids was 2003, with a rate of 2.31%.</p>

<p>Similarly, depressants also appear to be a significant contributor to drug-induced deaths in Australia, with a peak rate of 2.85% in 1999 and a low of 1.25% in 2003. The year with the highest rate of deaths due to all depressants was 1999, with a rate of 2.85%. The year with the lowest rate of deaths due to all depressants was 2003, with a rate of 1.25%.</p>

In [None]:
my_dict=data_e.to_dict()

for dic in my_dict:
    my_dict[dic] = list(my_dict[dic].values())[0:24]

# get all the keys of the dictionary
keys = list(my_dict.keys())

# removing unwanted
col = ['All drug-induced deaths','Heroin','Natural and semi-synthetic opioids','Methadone','Synthetic opioids','Year','All opioids excluding heroin',
      'Benzodiazepines','Paracetamol','Ibuprofen and aspirin','All non-opioid analgesics','Alcohol', 'Cocaine', 'Cannabinoids ']

for i in col:
    keys.remove(i)

# create the figure
fig = plt.figure(figsize=(15, 7))

# create a grid of 2x1 subplots using gridspec
gs = gridspec.GridSpec(2, 1, height_ratios=[1, 1],hspace=0.5)
ax0 = plt.subplot(gs[0])
ax1 = plt.subplot(gs[1])

# plot the data in the first subplot
color_map = {'All opioids': '#FF2626', 'All depressants': '#676FA3', 'All psychostimulants': '#91C483', 
             'All antidepressants': '#FFAD60', 'All antipsychotics': '#666666'}

for key in keys:
    ax0.plot(my_dict['Year'], my_dict[key], label=key, color=color_map[key],linewidth=1)

# plot the data in the second subplot
y_smooth = savgol_filter(my_dict["All drug-induced deaths"], 21, 6)
ax1.plot(my_dict['Year'], y_smooth, color='#4D96FF',linewidth=1)

# Define the coordinates of the square
x1, x2 = 2005, 2017
y1, y2 = 4.5, 8.5

# Plot the square
rect = plt.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='#F05454', facecolor='#FFF2F2',linestyle='--')
ax1.add_patch(rect)

ax0.set_title("Australia drug-induced deaths by drug type (1997-2020)")
ax1.set_title("Australia drug-induced deaths total (1997-2020)")

# add x and y labels
ax0.set_ylabel('Deaths (%)')
ax1.set_ylabel('Deaths (%)')
ax0.set_xlim(1997, 2020)
ax0.set_ylim(0, 8)
ax1.set_xlim(1997, 2020)
ax1.set_ylim(3, 10)
ax0.grid(axis='y', linestyle=':', alpha=0.5)
ax1.grid(axis='y', linestyle=':', alpha=0.5)

# [ax0.spines[s].set_visible(False) for s in ['top','right']]
# [ax1.spines[s].set_visible(False) for s in ['top','right']]

# add the legend to the first subplot
ax0.legend(loc='center right', bbox_to_anchor=(1.2, 0.5),prop={'size': 10},borderaxespad=0,edgecolor='none', facecolor='none')

# show the plot
plt.tight_layout()
plt.show()


<a id="section-4"></a>
<h1>4. Alcohol use in Australia</h1>

<h4>Findings from alcohol-induced deaths</h4>

<ul>
    <li>The total number of alcohol-induced deaths in Australia has fluctuated over the years, with some years having higher deaths than others. The highest number of deaths was recorded in 2020 with 1452 deaths, while the lowest number of deaths was recorded in 2000 with 1019 deaths.</li>
    <li>The age-standardized rate of alcohol-induced deaths per 100,000 population in Australia has also fluctuated over the years, with some years having higher rates than others. The highest rate was recorded in 2015 with 5.2 deaths per 100,000 population, while the lowest rate was recorded in 2000 with 5.4 deaths per 100,000 population.</li>
    <li>Overall, the total number of alcohol-induced deaths in Australia has increased over the years, while the age-standardized rate of alcohol-induced deaths per 100,000 population has decreased over the years.</li>
    <li>The difference between the highest and lowest number of deaths is 1452-1019 = 433 deaths and the difference between the highest and lowest rate is 5.2-4.5 = 0.7 deaths per 100,000 population, indicating that the impact of alcohol-induced deaths is significant.</li>
</ul>

<h4>Findings from alcohol-related deaths</h4>

<p>The data shows the total number of alcohol-related deaths and the age-standardized version of the data per 100,000 in Australia for the years 1997 to 2020.</p>

<ul>
  <li>The total number of alcohol-related deaths has seen an overall increase from 1997 to 2020, with the highest number of deaths recorded in 2020 at 4516.</li>
  <li>There is also an upward trend in the age-standardized version of the data per 100,000 in Australia, with the highest rate recorded in 2020 at 16.4.</li>
  <li>It's worth noting that the difference between alcohol-induced deaths and alcohol-related deaths is that alcohol-induced deaths are deaths where alcohol is the direct cause of the death, while alcohol-related deaths are deaths where alcohol is involved but not the direct cause.</li>
  <li>It's also worth noting that the data provided by the Australian Institute of Health and Welfare is age-standardized, which means that the rates are adjusted to take into account differences in the age structure of the population. This is done to allow for comparison between populations with different age structures.</li>
  <li>From the analysis, it is clear that the number of alcohol-related deaths has been increasing over the years. This is an indication of a public health concern and an indication that there is a need for more effective prevention and intervention strategies to be implemented in order to reduce the number of alcohol-related deaths in Australia.</li>
</ul>

<h4>Comparison of alcohol-induced vs alcohol-related deaths</h4>

<p>When comparing the age-standardized rates of alcohol-induced deaths and alcohol-related deaths, we can see that the rate of alcohol-induced deaths is consistently lower than the rate of alcohol-related deaths.</p>

<p>For example, in 1997, the age-standardized rate of alcohol-induced deaths was 6.5 deaths per 100,000 population, while the rate of alcohol-related deaths was 16.5 deaths per 100,000 population. In 2020, the rate of alcohol-induced deaths was 5.1 deaths per 100,000 population, while the rate of alcohol-related deaths was 16.4 deaths per 100,000 population.</p>

<p>This suggests that while alcohol may be a contributing factor in many deaths, it is not always the direct cause. It's important to note that the difference between the two types of deaths may be influenced by factors such as the accuracy of death certification and reporting.</p>

In [None]:
"""
Data gathered from: https://www.aihw.gov.au/reports/alcohol/alcohol-tobacco-other-drugs-australia/data
"""

standardize_data = {
    'Year': [1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020],
    'Alcohol-induced deaths': [6.5, 5.8, 6.0, 5.4, 5.4, 5.8, 5.4, 5.2, 5.3, 5.2, 5.2, 5.4, 5.1, 5.0, 4.7, 4.5, 4.9, 5.0, 5.2, 4.8, 5.3, 4.7, 4.9, 5.1],
    'Alcohol-related deaths': [16.5, 15.5, 13.8, 12.9, 12.3, 13.2, 12.6, 11.3, 11.7, 12.4, 13.1, 13.4, 11.9, 11.9, 11.6, 11.6, 13.9, 15.3, 15.5, 16.0, 18.1, 17.3, 17.6, 16.4]
}

standardize_data = pd.DataFrame(standardize_data)

# create the figure 
fig = plt.figure(figsize=(15, 7))

# create a grid of 2x1 subplots using gridspec
gs = gridspec.GridSpec(2, 1, height_ratios=[1, 1],hspace=0.5)
ax0 = plt.subplot(gs[0])
ax1 = plt.subplot(gs[1])

# First subplot
ax0.plot(data_d['Year'], data_d['All alcohol-induced deaths'], color='#F05454',linewidth=2,linestyle="--",label='Alcohol-induced deaths')
ax0.grid(axis='y', linestyle=':', alpha=0.2)
ax2 = ax0.twinx() 
ax2.plot(data_d['Year'], data_d['All alcohol-related deaths'], color='#444444',linewidth=1,label='Alcohol-related deaths  ')
ax0.set_title("Number of alcohol-induced and alcohol-related deaths (1997-2020)")
ax0.legend(loc='center right', bbox_to_anchor=(1.25, 0.5),prop={'size': 10},borderaxespad=0,edgecolor='none', facecolor='none')
ax2.legend(loc='center right', bbox_to_anchor=(1.25, 0.62),prop={'size': 10},borderaxespad=0,edgecolor='none', facecolor='none')

# Second subplot
ax1.plot(standardize_data['Year'], standardize_data['Alcohol-induced deaths'], color='#F05454',linewidth=2,linestyle="--")
ax1.grid(axis='y', linestyle=':', alpha=0.2)
ax2 = ax1.twinx() 
ax2.plot(standardize_data['Year'], standardize_data['Alcohol-related deaths'], color='#444444',linewidth=1)
ax1.set_title("Age-standardised rate (per 100,000 population) of alcohol-induced and alcohol-related deaths (1997–2020)")
plt.tight_layout()
plt.show()

<a id="section-5"></a>
<h1>5. The most serious concern for the general community</h1>

<p>In terms of what drug is a serious concern for the general community, the data shows that Methamphetamine is the most concerning drug for both males and females, and overall. Excessive drinking of alcohol is also a significant concern for the general community, ranking second overall and for females and third for males. Heroin is also a concern, ranking third overall and for males, and fourth for females. Other drugs of concern include tobacco smoking, cocaine, ecstasy, pain-killers/pain-relievers and opioids, marijuana/cannabis, methadone/buprenorphine, and steroids. It is worth noting that the "None of these" and "Not answered" options were also included in the survey, with a significant percentage of respondents choosing these options.</p>

In [None]:
data['A3'] = data['A3'].replace(
{'Non-medical use of Pain-killers|Pain-relievers and Opioids (e.g. Panadeine Forte, Nurofen Plus, Morphine)': 'Pain-killers|Pain-relievers and Opioids',
'Non-medical use of Meth|amphetamine': 'Meth|amphetamine',
'Non-medical use of Methadone|Buprenorphine': 'Methadone|Buprenorphine',
'Non-medical use of Steroids': 'Steroids'}
)

data_1 = data.groupby(['Sex','A3'])['A3'].count().to_dict()

total_female = data.Sex.value_counts().to_dict()['Female']
total_male = data.Sex.value_counts().to_dict()['Male']

percentage_data = {(gender, drug): (value/total_female if gender == "Female" else value/total_male) * 100 for (gender, drug), value in data_1.items()}

# create the figure and axes for the subplot
fig, ax = plt.subplots(3, 1, figsize=(10, 10))

# separate data for males and females
male_data = {k[1]:v for k,v in percentage_data.items() if k[0] == 'Male'}
female_data = {k[1]:v for k,v in percentage_data.items() if k[0] == 'Female'}

# sorting values
male_data = dict(sorted(male_data.items(), key=lambda x: x[1]))
female_data = dict(sorted(female_data.items(), key=lambda x: x[1]))

# create the bar plots for males and females
male_bars = ax[0].barh(list(male_data.keys()), list(male_data.values()))
female_bars = ax[1].barh(list(female_data.keys()), list(female_data.values()))

# loop through each drug in the data
for i, drug in enumerate(male_data.keys()):
    # if the percentage for the current drug is similar for both males and females
    if abs(male_data[drug] - female_data[drug]) <= 0.5:
        # set the color of the bar to dark
        male_bars[i].set_color('#444444')
        female_bars[i].set_color('#444444')
    else:
        # set the color of the bar to red
        male_bars[i].set_color('#F05454')
        female_bars[i].set_color('#F05454')
        
for i in male_data.keys():
    ax[0].annotate(f"{round(male_data[i],2)}%", xy=(male_data[i] + 1.45, i),va = 'center', ha='center',fontsize=8,fontweight='light')
    
for i in female_data.keys():
    ax[1].annotate(f"{round(female_data[i],2)}%", xy=(female_data[i] + 1.45, i),va = 'center', ha='center',fontsize=8,fontweight='light')
    
# creating overall (male and female included)
data_1 = round((data.A3.value_counts()/len(data)*100),2).to_dict()

# color for overall plot
colors = list(reversed(['#F05454','#F28C8C','#F5B5B5','#F8DFDF','#FBFBFB'] + ["#FBFBFB" for i in range(len(data_1)-5)]))

# bar plot
data_1 = dict(sorted(data_1.items(), key=lambda x: x[1]))
ax[2].barh(list(data_1.keys()), list(data_1.values()), color=colors,edgecolor='#171717',linewidth=0.5)

for i in data_1.keys():
    ax[2].annotate(f"{data_1[i]}%", xy=(data_1[i] + 1.45, i),va = 'center', ha='center',fontsize=8,fontweight='light')

[ax[i].spines[s].set_visible(False) for i in range(3) for s in ['top','right','bottom']]
    
# set title and labels for the subplot
ax[0].set_title("Male perception of 'The most serious concern for the general community'")
ax[1].set_title("Female perception of 'The most serious concern for the general community'")
ax[2].set_title("Overall perception of 'The most serious concern for the general community'")
ax[0].set_xticks([])
ax[1].set_xticks([])
ax[2].set_xticks([])
ax[0].grid(axis='y', linestyle=':', alpha=0.2)
ax[1].grid(axis='y', linestyle=':', alpha=0.2)
ax[2].grid(axis='y', linestyle=':', alpha=0.2)
ax[0].set_yticklabels(male_data.keys(),fontsize=8)
ax[1].set_yticklabels(female_data.keys(),fontsize=8)
ax[2].set_yticklabels(data_1.keys(),fontsize=8)
fig.tight_layout()
plt.show()

<a id="section-5A"></a>
<h1>5A. How do the age groups differ in what they think is the most serious concern in their community?</h1>

<p>Across all age groups, Methamphetamine (Meth) is consistently ranked as the most serious concern in terms of drug use in the community. Alcohol is also consistently ranked as a significant concern, but typically ranks lower than Meth.</p>

<p>It's worth noting that some other drugs, such as Heroin and Cocaine, are ranked as a significant concern in some age groups, but not in others. For example, Heroin is ranked as a more serious concern among young people, while Cocaine is ranked as a more serious concern among older people.</p>

In [None]:
data1 = calculate_percentages(data, '14-19', 'A3')
data2 = calculate_percentages(data, '20-29', 'A3')
data3 = calculate_percentages(data, '30-39', 'A3')
data4 = calculate_percentages(data, '40-49', 'A3')
data5 = calculate_percentages(data, '50-59', 'A3')
data6 = calculate_percentages(data, '60+', 'A3')

fig, axs = plt.subplots(6, 1, figsize=(15, 20))

ages = sorted([i for i in data.AgeGroup.unique() if i != "Missing"])
datas = [data1, data2, data3, data4, data5, data6]

for i, d in enumerate(datas):
    axs[i].barh(list(d.keys()), list(d.values()),color='#F05454')
    axs[i].set_title(f"Age group ({ages[i]}) : perception of 'The most serious concern for the general community'")
    axs[i].grid(axis='y', linestyle=':', alpha=0.2)
    
    # Find the maximum value in the current dataset
    max_val = max(d.values())
    for j, val in enumerate(d.values()):
        # If the value is less than 50% of the maximum value, color it differently
        if val < max_val * 0.5:
            axs[i].barh(list(d.keys())[j], val, color='#444444')
        # Annotate the bar chart with the value
        axs[i].annotate(f"{val}%", xy=(val, j), xytext=(5, 0),
                        textcoords='offset points', ha='left', va='center')

        
[axs[i].spines[s].set_visible(False) for i in range(6) for s in ['top','right','bottom']]
[axs[i].set_xticks([]) for i in range(6)]
plt.tight_layout()
plt.show()

<a id="section-5B"></a>
<h1>5B. Drug arrest per year 100,000 (Consumer and Provider)</h1>

<h3>Provider drug arrest</h3>

<p>In terms of overall arrest numbers, cannabis had the highest percentage of arrests at 41%, followed by amphetamine-type stimulants at 21%, other and unknown drugs at 9.1%, heroin and other opioids at 3.6%, cocaine at 1.6%, hallucinogens at 0.4%, and steroids at 0.1%.</p>

<p>When comparing the percentage change from 2006 to 2019, cannabis arrests saw the largest decrease at 67.8%, followed by heroin and other opioids at 72.22%, amphetamine-type stimulants at 48.1%, other and unknown drugs at 29.67%, cocaine at 62.5%, and hallucinogens at 0%. Steroids saw an increase of 200% in arrests from 2006 to 2019.</p>

<h3>Consumer drug arrest</h3>

<ul>
  <li>The most commonly reported drug type is cannabis, followed by amphetamine-type stimulants.</li>
  <li>The number of reported cases of drug use has increased over time for all drug types, with the exception of steroids.</li>
  <li>The number of reported cases of opioid use (heroin and other opioids) has remained relatively stable over time.</li>
  <li>The number of reported cases of cocaine and amphetamine-type stimulant use has increased over time.</li>
  <li>The number of reported cases of cannabis use has steadily increased over time, reaching a peak in 2015-2016.</li>
</ul>

<p>We can see that the most commonly consumed drugs in Australia in 2019 were Cannabis and Amphetamine-type stimulants, with 137.9 and 86.3 arrests per year respectively. The least consumed drug in 2019 was Hallucinogens with only 1.8 arrests per year.</p>

<p>When looking at the percentage change from 2006 to 2019, we can see that the usage of Amphetamine-type stimulants has increased by 62.22%, while the usage of Cannabis has decreased by 41.67%. The usage of Cocaine has increased by 321.05%, the usage of heroin and other opioids has decreased by 14.49%, usage of Steroids has increased by 216.67% and usage of Hallucinogens has increased by 125%.</p>

<p>Overall, it appears that the usage of certain drugs, such as Amphetamine-type stimulants and Cocaine, has increased in Australia over the past 13 years, while the usage of other drugs, such as Cannabis and heroin, has decreased.</p>

<h3>Overall conlusion on what is the most dangerous drug</h3>

<p>Comparing the two plots, it can be inferred that the drug types that are commonly consumed by individuals tend to have a higher number of consumer arrests as compared to provider arrests. This suggests that a majority of the arrests in the consumer dataset are likely to be of individuals who possess drugs for personal use, while a majority of the arrests in the provider dataset are likely to be of individuals who possess drugs for distribution or sale. Furthermore, the percentage change in arrests between the two datasets also differs, with some drug types having a higher increase in consumer arrests and a decrease in provider arrests, and vice versa. This may suggest that the law enforcement agencies are focusing more on certain types of drugs or certain types of individuals.</p>

<p>It appears that Amphetamine-type stimulants have the highest number of arrests for consumers and providers, with 86.3 and 10.9 arrests per 100,000 population, respectively. Additionally, these arrests have seen an increase of 62.22% and -48.1% per change respectively. This suggests that Amphetamine-type stimulants are the most dangerous drug for the community based on the number of arrests and the rate of change in arrests. However, it's worth noting that this conclusion is based on the limited data provided and may not reflect the overall situation in the community.</p>

In [None]:
# Get Australia's population data to convert drug arrest per 100,000 people
url = "https://www.macrotrends.net/countries/AUS/australia/population"
population_data = pd.read_html(url)[1] # Get the second table on the webpage

# Rename the columns
population_data.columns = ['Year','Population','Growth Rate']

# Keep only the years between 2006 and 2020
population_data = population_data[(population_data['Year'] <= 2020) & (population_data['Year'] >= 2006)]

def add_populations(df):
    """
    This function adds every two rows of population data to get the total population for each year.
    """
    for i in range(len(df)-1, 0, -2):
        df.loc[df.index[i-1],'Population'] += df.loc[df.index[i],'Population']
    return df

# Add the population data for each two rows
population_data = add_populations(population_data)

# Drop the Growth Rate column
population_data.drop(index=3,columns=['Growth Rate'],inplace=True)

# # Filter the data to only include consumer arrests
drug_arrests = data_b[data_b['Consumer/Provider']=='Consumer']
# drug_arrests = data_b[data_b['Consumer/Provider']=='Provider']

# Drop the Consumer/Provider column
drug_arrests.drop(columns=['Consumer/Provider'],inplace=True,axis=1)

# Reshape the dataframe to have one column for year and one column for value
drug_arrests = drug_arrests.melt(id_vars=['Drug type'], var_name='Year', value_name='Value')

# Keep only the first 4 characters of the year
drug_arrests['Year'] = drug_arrests['Year'].str.split("–").str[0]

# Change the year column in population data to string
population_data['Year'] = population_data['Year'].astype('str')

# Merge the drug arrests dataframe with the population dataframe on the Year column
drug_arrests = drug_arrests.merge(population_data, on='Year')
drug_arrests['Value'] = round(drug_arrests['Value'] / drug_arrests['Population']*100000,1)
drug_arrests_a = drug_arrests[drug_arrests['Year']=='2006']
drug_arrests_b = drug_arrests[drug_arrests['Year']=='2019']
# drug_arrests_b['per_change'] = round((drug_arrests_b['Value'].values - drug_arrests_a['Value'].values) / drug_arrests_a['Value'] * 100,2).values
drug_arrests_b['per_change'] = round((drug_arrests_b['Value'].values - drug_arrests_a['Value'].values) / drug_arrests_a['Value'] * 100,2).values
con = drug_arrests_b.copy()
# pro = drug_arrests_b.copy()

In [None]:
con = con.sort_values('Value',ascending=True)
pro = pro.sort_values('Value',ascending=True)

# color for overall plot
colors = list(reversed(['#F05454','#F28C8C','#F5B5B5','#F8DFDF','#FBFBFB','#FBFBFB','#FBFBFB']))

fig, (ax1, ax2) = plt.subplots(2,1, figsize=(15,10))

ax1.barh(pro["Drug type"], pro["Value"],color=colors,edgecolor='#171717',linewidth=0.5)
ax1.set_title("Provider arrest by Drug type (Percentage increase from 2006 to 2019 included per drug type)")
ax1.set_xlabel("Value")
ax1.grid(axis='y', linestyle=':', alpha=0.2)
for i, (value, drug_type, pct_change) in enumerate(zip(pro["Value"], pro["Drug type"], pro["per_change"])):
    ax1.text(value + 0.075, drug_type, f'{pct_change}%', ha='left', va='center')
[ax1.spines[s].set_visible(False) for s in ['top','right']]


ax2.barh(con["Drug type"], con["Value"],color=colors,edgecolor='#171717',linewidth=0.5)
ax2.set_title("Consumer arrest by Drug type (Percentage increase from 2006 to 2019 included per drug type)")
ax2.set_xlabel("Value")
ax2.grid(axis='y', linestyle=':', alpha=0.2)
for i, (value, drug_type, pct_change) in enumerate(zip(con["Value"], con["Drug type"], con["per_change"])):
    ax2.text(value + 1, drug_type, f'{pct_change}%', ha='left', va='center')
[ax2.spines[s].set_visible(False) for s in ['top','right']]

plt.tight_layout()
plt.show()


<a id="section-6"></a>
<h1>6. How the use of "illicit drugs" differ in how people rate themselves in terms of health</h1>

<ul>
  <li>Ex users have the lowest average rating of their health with a mean of 7.3 and a median of 7.5, while never used have the highest with a mean of 8.4 and a median of 8.5.</li>
  <li>Ex users also have the highest percentage of poor ratings at 14%, compared to never used at 5% and used in the last 12 months at 9%.</li>
  <li>On the other hand, never used have the highest percentage of very good ratings at 34%, compared to ex users at 24% and used in the last 12 months at 20%.</li>
  <li>The most common rating across all groups is "good" with ex users having 31%, never used having 34% and used in the last 12 months having 25%.</li>
  <li>The distribution of the rating also varies between the three groups. Ex users and never used have similar distribution with a peak at "good" and "very good" ratings, while used in the last 12 months have a peak at "good" rating.</li>
  <li>Ex users and never used have similar distribution with a peak at "good" and "very good" ratings, while used in the last 12 months have a peak at "good" rating.</li>
</ul>

<p>It appears that individuals who have reported being an "Ex user" tend to rate their health as "Fair" or "Poor" more often than those who have never used drugs or have only used them in the last 12 months. Additionally, individuals who have never used drugs or have only used them in the last 12 months tend to rate their health as "Very Good" or "Excellent" more often than those who have reported being an "Ex user". This suggests that there may be a correlation between past drug use and self-perceived health. It is important to note that this data is not conclusive and further research is needed to establish a causal relationship.</p>

In [None]:
df=data.groupby(['Anyillicit','B1'])['B1'].count()
df = pd.DataFrame(df)
df.columns=['Rating']
df=df.reset_index()
df["Rating"] = df.groupby("Anyillicit")["Rating"].apply(lambda x: round((x / x.sum())*100,2))

data1 = df[df['Anyillicit']=='Ex user'].sort_values('Rating',ascending=True)
data2 = df[df['Anyillicit']=='Never used'].sort_values('Rating',ascending=True)
data3 = df[df['Anyillicit']=='Used in the last 12 months'].sort_values('Rating',ascending=True)

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15,5))

ax1.barh(data1['B1'], data1['Rating'],color='#EEEEEE',edgecolor='#171717',linewidth=0.5)
ax1.set_title('Ex User')
ax1.grid(axis='y', linestyle=':', alpha=0.2)
[ax1.spines[s].set_visible(False) for s in ['top','right','bottom']]
ax1.set_xticks([])

ax2.barh(data2['B1'], data2['Rating'],color='#EEEEEE',edgecolor='#171717',linewidth=0.5)
ax2.set_title('Never Used')
ax2.grid(axis='y', linestyle=':', alpha=0.2)
[ax2.spines[s].set_visible(False) for s in ['top','right','bottom']]
ax2.set_xticks([])

ax3.barh(data3['B1'], data3['Rating'],color='#EEEEEE',edgecolor='#171717',linewidth=0.5)
ax3.set_title('Used in the last 12 months')
ax3.grid(axis='y', linestyle=':', alpha=0.2)
[ax3.spines[s].set_visible(False) for s in ['top','right','bottom']]
ax3.set_xticks([])

for i in data1.index:
    ax1.annotate(f"{data1['Rating'].iloc[i]}%", xy=(data1['Rating'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
for i in range(len(data2)):
    ax2.annotate(f"{data2['Rating'].iloc[i]}%", xy=(data2['Rating'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
for i in range(len(data3)):
    ax3.annotate(f"{data3['Rating'].iloc[i]}%", xy=(data3['Rating'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
    
plt.tight_layout()
plt.show()

<a id="section-7"></a>
<h1>7. Impact of alcohol and illicit drug use on the burden of disease and injury in Australia</h1>

<p>Alcohol and illicit drug use can have a significant impact on the burden of disease and injury in Australia. Alcohol is a leading risk factor for death, disability, and illness in Australia, contributing to a range of health problems including liver disease, cancer, and mental health disorders. Illicit drug use can also lead to a range of health problems, including addiction, overdose, and infectious diseases such as HIV and hepatitis. In addition, alcohol and illicit drug use can also lead to a range of social and economic problems, including crime, violence, and lost productivity. Overall, the impact of alcohol and illicit drug use on the burden of disease and injury in Australia is significant and multifaceted.</p>

<h5>The top three leading causes of burden of disease and injury in Australia due to alcohol and illicit drug use in 2021 are:</h5>
<ul>
    <li>drug use disorders (excluding alcohol) at 31.63%,</li>
    <li>accidental poisoning at 33.72%, and</li>
    <li>chronic liver disease at 10.24%.</li> 
    <li>The next highest causes are suicide and self-inflicted injuries at 13.81% and liver cancer at 7.28%.</li> 
</ul>
<p>Other causes listed in the data include anxiety disorders at 0.55%, depressive disorders at 0.49%, and schizophrenia at 0.73%. Hepatitis B (acute) and Hepatitis C (acute) were the least reported causes at 0.04% and 0.05% respectively.</p>

<p>It can be inferred that drug use disorders and accidental poisoning are the leading causes of burden of disease and injury in Australia due to alcohol and illicit drug use. Additionally, there is a significant burden of disease due to chronic liver disease. The data also suggests that there is a relatively lower burden of disease and injury caused by mental health conditions such as anxiety, depressive disorders and schizophrenia when compared to the leading factors.</p>

In [None]:
linked_diseases = data_a[['Linked disease','Total number (2021)','Gender']]
male = linked_diseases[linked_diseases['Gender']=='Male']
female = linked_diseases[linked_diseases['Gender']=='Female']

# male
male['Overall percentage']=round(male['Total number (2021)']/sum(linked_diseases['Total number (2021)'])*100,2)
male['Male percentage only']=round(male['Total number (2021)']/sum(male['Total number (2021)'])*100,2)
# female
female['Overall percentage']=round(female['Total number (2021)']/sum(linked_diseases['Total number (2021)'])*100,2)
female['Female percentage only']=round(female['Total number (2021)']/sum(female['Total number (2021)'])*100,2)

overall = round(linked_diseases.groupby('Linked disease')['Total number (2021)'].sum()).reset_index().sort_values('Total number (2021)')
overall['Total number (2021)'] = round(overall['Total number (2021)'] / overall['Total number (2021)'].sum()*100,2)

colors = list(reversed(['#F05454','#F28C8C','#F5B5B5','#F8DFDF','#F8DFDF'] + ["#FBFBFB" for i in range(len(data_1)-4)]))

fig, ax = plt.subplots(figsize=(15,5))
plt.barh(overall['Linked disease'], overall['Total number (2021)'], color=colors)
plt.title("Overall disease and injury in Australia caused by alcohol and illicit drugs")

for i in overall.index:
    ax.annotate(f"{overall['Total number (2021)'].iloc[i]}%", xy=(overall['Total number (2021)'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')

ax.grid(axis='y', linestyle=':', alpha=0.2)
[ax.spines[s].set_visible(False) for s in ['top','right','bottom']]
ax.set_xticks([])    
plt.show()

<a id="section-7A"></a>
<h1>7A. Impact of alcohol and illicit drug use on the burden of disease and injury in Australia by gender (divided by total sum of both male and female)</h1>

<h5>Differences between the two genders show:</h5>
<ul>
    <li>The percentages for drug use disorders, suicide and self-inflicted injuries, chronic liver disease, and liver cancer are higher for males than for females.</li>
    <li>The percentages for HIV/AIDS, Hepatitis B (acute), Hepatitis C (acute), Road traffic injuries (motorcyclists and vehicle occupants), anxiety disorders, depressive disorders and schizophrenia are higher for males than for females.</li>
</ul>

<p>It can be inferred that males are more prone to drug use disorders, suicide and self-inflicted injuries, chronic liver disease, and liver cancer than females. Additionally, males are more prone to HIV/AIDS, Hepatitis B (acute), Hepatitis C (acute), Road traffic injuries (motorcyclists and vehicle occupants), anxiety disorders, depressive disorders, and schizophrenia than females.

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(15,10))

male=male.sort_values('Overall percentage',ascending=True)
plt.subplot(2, 1, 1)
plt.barh(male['Linked disease'], male['Overall percentage'], color=colors)
plt.title("Overall disease and injury in Australia caused by alcohol and illicit drugs (Males)")
for i in male.index:
    ax[0].annotate(f"{male['Overall percentage'].iloc[i]}%", xy=(male['Overall percentage'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
ax[0].grid(axis='y', linestyle=':', alpha=0.2)
[ax[0].spines[s].set_visible(False) for s in ['top','right','bottom']]
ax[0].set_xticks([]) 

female=female.sort_values('Overall percentage',ascending=True)
plt.subplot(2, 1, 2)
plt.barh(female['Linked disease'], female['Overall percentage'], color=colors)
plt.title("Overall disease and injury in Australia caused by alcohol and illicit drugs (Females)")
for i in range(len(female)):
    ax[1].annotate(f"{female['Overall percentage'].iloc[i]}%", xy=(female['Overall percentage'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
ax[1].grid(axis='y', linestyle=':', alpha=0.2)
[ax[1].spines[s].set_visible(False) for s in ['top','right','bottom']]
ax[1].set_xticks([])    
plt.show()


<a id="section-7B"></a>
<h1>7B. Impact of alcohol and illicit drug use on the burden of disease and injury in Australia by gender (divided by total sum per gender)</h1>

<h5>Some similarities between the two genders include:</h5>
  <ul>
    <li>Both show a relatively low percentage of individuals affected by HIV/AIDS.</li>
    <li>Both show a relatively high percentage of individuals affected by accidental poisoning.</li>
    <li>Both show a relatively high percentage of individuals affected by drug use disorders (excluding alcohol).</li>
  </ul>
<h5>Some differences between the two genders include:</h5>
  <ul>
    <li>The percentage of males affected by suicide and self-inflicted injuries is higher than the percentage of females affected.</li>
    <li>The percentage of females affected by depressive and anxiety disorders is higher than the percentage of males affected.</li>
    <li>The percentage of males affected by liver cancer and chronic liver disease is higher than the percentage of females affected.</li>
    <li>The percentage of females affected by Hepatitis B and C is higher than the percentage of males affected.</li>
</ul>   
<p>Male individuals are most affected by drug use disorders (excluding alcohol) with a percentage of 29.39%. This is followed by accidental poisoning with a percentage of 33.79%. Suicide and self-inflicted injuries also affect a relatively high percentage of males at 16.64%. Other diseases and disorders that affect a relatively high percentage of males include liver cancer (7.11%), chronic liver disease (9.52%), and schizophrenia (0.85%).</p>
    
<p>Females on the other hand have a relatively high percentage of individuals affected by depressive and anxiety disorders, as well as Hepatitis B and C. Additionally, the percentage of females affected by accidental poisoning is relatively high. The percentage of individuals affected by suicide and self-inflicted injuries, liver cancer, and chronic liver disease is also relatively high. However, the percentage of individuals affected by HIV/AIDS and drug use disorders (excluding alcohol) is relatively low.</p>

In [None]:
fig, ax = plt.subplots(2, 1, figsize=(15,10))

male=male.sort_values('Male percentage only',ascending=True)
plt.subplot(2, 1, 1)
plt.barh(male['Linked disease'], male['Male percentage only'], color=colors)
plt.title("Male percentage only : disease and injury in Australia caused by alcohol and illicit drugs")
for i in male.index:
    ax[0].annotate(f"{male['Male percentage only'].iloc[i]}%", xy=(male['Male percentage only'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
ax[0].grid(axis='y', linestyle=':', alpha=0.2)
[ax[0].spines[s].set_visible(False) for s in ['top','right','bottom']]
ax[0].set_xticks([]) 

female=female.sort_values('Female percentage only',ascending=True)
plt.subplot(2, 1, 2)
plt.barh(female['Linked disease'], female['Female percentage only'], color=colors)
plt.title("Female percentage only : disease and injury in Australia caused by alcohol and illicit drugs")
for i in range(len(female)):
    ax[1].annotate(f"{female['Female percentage only'].iloc[i]}%", xy=(female['Female percentage only'].iloc[i], i), xytext=(5, 0), textcoords="offset points", ha='left', va='center')
ax[1].grid(axis='y', linestyle=':', alpha=0.2)
[ax[1].spines[s].set_visible(False) for s in ['top','right','bottom']]
ax[1].set_xticks([])    
plt.show()

<a id="section-8"></a>
<h1>8. Final thoughts</h1>

<p>The data gathered from the National Drugs Strategy Household Survey and the Australian Institute of Health and Welfare suggests that there is a general consensus among both males and females in Australia about the seriousness of certain drugs, such as Methamphetamine and Heroin, as the most common drugs that come to mind when thinking about "a drug problem." However, there are some differences in the specific drugs that are thought of, with higher percentages of females than males reporting Ecstasy and Pain-killers as the first drug that comes to mind.</p>
    
<p>The data also shows that there are key differences and similarities between the age groups in terms of what drugs they believe cause the most deaths, with Alcohol being considered the highest across all age groups and Methamphetamine being considered the third highest across all age groups. The data also suggests that there is an upward trend in the rate of drug-induced deaths in Australia, with Opioids and Depressants being significant contributors.When comparing the age-standardized rates of alcohol-induced deaths and alcohol-related deaths, it can be seen that the rate of alcohol-induced deaths is consistently lower than the rate of alcohol-related deaths, suggesting that while alcohol may be a contributing factor in many deaths, it is not always the direct cause. Additionally, the data suggests that there is a higher number of consumer arrests as compared to provider arrests, which may suggest that the law enforcement agencies are focusing more on certain types of drugs or certain types of individuals.</p>

<p>The leading causes of burden of disease and injury in Australia due to alcohol and illicit drug use in 2021 are drug use disorders, accidental poisoning, and chronic liver disease. Additionally, there is a significant burden of disease due to suicide and self-inflicted injuries and liver cancer. The data also suggests that there is a relatively lower burden of disease and injury caused by mental health conditions such as anxiety, depressive disorders, and schizophrenia. The data also shows that males are more prone to drug use disorders, suicide and self-inflicted injuries, chronic liver disease, and liver cancer than females. Additionally, males are more prone to HIV/AIDS, Hepatitis B (acute), Hepatitis C (acute), Road traffic injuries (motorcyclists and vehicle occupants), anxiety disorders, depressive disorders, and schizophrenia than females. It is important to address the issue of alcohol and illicit drug use as it has a significant impact on the burden of disease and injury in Australia.</p>

<h5>Created by:</h5>
<h6>Brandon Smith 28/01/23</h6>