# Case study for Drug-related deaths in Scotland Services

This case study is based on the quarterly report of Drug-related deaths in Scotland services. It showcases an application based on the DataQAHelper framework that has completed the refinement cycle. The application only requires users to set the correct dataset path and specify the necessary column names to automatically generate a data report. This report covers most of the key data-related questions.
You can download DataQAHelper from GitHub to run the case study.
If you are working on Colab, here are some commands to run the case study (Please note the folder name, the folder name used here is DataQAHelperWithoutLLM):


In [1]:
from google.colab import drive
drive.mount('/content/drive')
!pip install -r /content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/requirements.txt
import sys
sys.path.append('/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Because the version of Colab is updated, some packages must be upgraded. There may be an error reminder, but it has no effect on the use of the framework in Colab. There is no such issue when using the framework locally after downloading it.
!pip install pandas==2.0.0
!pip install scipy==1.8.0

Collecting pandas==2.0.0
  Using cached pandas-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (18 kB)
Using cached pandas-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
Installing collected packages: pandas
  Attempting uninstall: pandas
    Found existing installation: pandas 1.5.3
    Uninstalling pandas-1.5.3:
      Successfully uninstalled pandas-1.5.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
arviz 0.18.0 requires scipy>=1.9.0, but you have scipy 1.8.0 which is incompatible.
google-colab 1.0.0 requires pandas==2.1.4, but you have pandas 2.0.0 which is incompatible.
plotnine 0.12.4 requires statsmodels>=0.14.0, but you have statsmodels 0.13.5 which is incompatible.
pycaret 3.0.0 requires pandas<1.6.0,>=1.3.0, but you have pandas 2.0.0 which is incompatible.[0m[31m
[0mSuccessfully installed

In [3]:
!ls "/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM"

ACCCPdata  DataScienceComponents.py  NLGComponents.py  requirements.txt
data	   IntegratedPipelines.py    __pycache__       templates


In [4]:
## Select the dataset that can answer the questions.
## Then, choose the required column names.

from pandas import read_csv
import IntegratedPipelines as IP

data1 = read_csv("/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/data/1998to2018drugdeathsexagetype.csv", header=0)
col_names = ['year', 'drug-related deaths', 'males', 'females', 'Deaths under age 14',
             'Deaths between the ages of 15 and 24', 'Deaths between the ages of 25 and 34',
             'Deaths between the ages of 35 and 44', 'Deaths between the ages of 45 and 54',
             'Deaths between the ages of 55 and 64', 'Deaths ages over 65', 'average age of death',
             'dead by Heroin/morphine 2', 'dead by Methadone', 'dead by Heroin/morphine, Methadone or Bupren-orphine',
             'dead by Codeine or a codeine-containing compound',
             'dead by Dihydro-codeine or a d.h.c-containing compound', 'dead by any opiate or opioid', ]
data2 = read_csv("/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/data/drugdeathsexagetype.csv", header=None, names=col_names)
col_names = ['Year', 'all drug-related deaths', 'more than one drug was found', 'only one drug was found',
             'more than one drug was found in %', 'more than one drug was found to be present in the body',
             'accidental poisonings']
data3 = read_csv("/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/data/onedrug.csv", header=None, names=col_names)
col_names = ['Year', 'death by ‘street’ benzodiazepines (such as etizolam)',
             'death by methadone', 'death by heroin/morphine',
             'death by gabapentin and/or pregabalin',
             'death by cocaine', 'death by opiates/opioids (such as heroin/morphine and methadone)',
             'death by benzodiazepines (such as diazepam and etizolam)']
data4 = read_csv("/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/data/drugsubstancesimplicated.csv", header=None, names=col_names)
num_breaks=4
choose_year=2020
Xcol='year'
ycol='drug-related deaths'
Xcolname = "Year"
ycolname1 = "accidental poisonings"
ycolname2 = "all drug-related deaths"
ycolname3 = "more than one drug was found to be present in the body"
ycolnames = ["death by ‘street’ benzodiazepines (such as etizolam)",
             "death by methadone", "death by heroin/morphine",
             "death by gabapentin and/or pregabalin",
             "death by cocaine", "death by opiates/opioids (such as heroin/morphine and methadone)",
             "death by benzodiazepines (such as diazepam and etizolam)"]
age_groups = ['Deaths under age 14',
             'Deaths between the ages of 15 and 24', 'Deaths between the ages of 25 and 34',
             'Deaths between the ages of 35 and 44', 'Deaths between the ages of 45 and 54',
             'Deaths between the ages of 55 and 64', 'Deaths ages over 65']
y1name = "males"
y2name = "females"
category_name = " number of deaths where one or more of the following substances "
## Select the location of the report template
template_path = '/content/drive/MyDrive/ColabNotebooks/DataQAHelperWithoutLLM/templates/drug-related-deaths-tem.docx'

In [5]:
# Running pipeline
pipeline=IP.casestudy_datastory_pipeline()
pipeline.DRD_mainquestions(data1, data2,data3,data4,Xcol, ycol, num_breaks,choose_year,age_groups,y1name,y2name,Xcolname,ycolname1,ycolname2,ycolname3,category_name,ycolnames,template_path)

In general, drug-related deaths have risen since 1996, and the rate has been particularly high since 2013.
Replaced {{Q1}} with In general, drug-related deaths have risen since 1996, and the rate has been particularly high since 2013. in paragraph.
Replaced {{Q2}} with In 2000, males were more than 4 times as likely to have a drug-related death as females.
Overall, this gap has closed over the year. In 2020, males were 2.7 times as likely to have a drug-related death as females. in paragraph.
Replaced {{Q3}} with In 2020, 31.29% of all drug-related deaths were of people aged between the ages of 45 and 54, and followed by 31.22% between between the ages of 35 and 44. in paragraph.
Replaced {{Q4}} with In 2020, accidental poisonings accounted for 93.0% of all drug-related deaths. in paragraph.
Replaced {{Q5}} with In 2020, more than one drug was found to be present in the body accounted for 93.0% of all drug-related deaths. in paragraph.
Replaced {{Q6}} with In recent Year, there has bee