# **DASS 42**

This notebook is for Data Preparation.

The Dataset was downloaded from Open-Source Psychometrics Project's raw data in Openpsychometrics.org. A website for general public education about psychology and to collect data for psychological research.


The Folder containing 3 files :

*   demo1.png : An example picture of the on-line version of the Depression Anxiety Stress Scales (DASS).
*   codebook.txt : Explanation of dataset.
*   data.csv : The dataset file.

DASS 42 is a questionnaire of 42-items self report instrument designed to measure the three related negative emotional states of depression, anxiety and stress.

Each category has 14 items.

The score is calculated by the addition of total item scores of each categories' question, wich can range from 0 - 42.

## Import All Dependencies

In [1]:
import numpy as np
import pandas as pd
from google.colab import files

## Download Dataset

Download data set from Openpsychometrics.org and unzip folder.

In [2]:
!wget https://openpsychometrics.org/_rawdata/DASS_data_21.02.19.zip
!unzip DASS_data_21.02.19.zip

--2022-05-29 16:44:57--  https://openpsychometrics.org/_rawdata/DASS_data_21.02.19.zip
Resolving openpsychometrics.org (openpsychometrics.org)... 45.79.49.240
Connecting to openpsychometrics.org (openpsychometrics.org)|45.79.49.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7129804 (6.8M) [application/zip]
Saving to: ‘DASS_data_21.02.19.zip’


2022-05-29 16:44:57 (46.3 MB/s) - ‘DASS_data_21.02.19.zip’ saved [7129804/7129804]

Archive:  DASS_data_21.02.19.zip
   creating: DASS_data_21.02.19/
  inflating: DASS_data_21.02.19/codebook.txt  
  inflating: DASS_data_21.02.19/data.csv  
  inflating: DASS_data_21.02.19/demo1.png  


## Load Dataset

Load the dataset that going to be used : `DASS_data_21.02.19/data.csv`

In [3]:
data = pd.read_csv('DASS_data_21.02.19/data.csv', delimiter='\t')
data.head()

Unnamed: 0,Q1A,Q1I,Q1E,Q2A,Q2I,Q2E,Q3A,Q3I,Q3E,Q4A,...,screensize,uniquenetworklocation,hand,religion,orientation,race,voted,married,familysize,major
0,4,28,3890,4,25,2122,2,16,1944,4,...,1,1,1,12,1,10,2,1,2,
1,4,2,8118,1,36,2890,2,35,4777,3,...,2,1,2,7,0,70,2,1,4,
2,3,7,5784,1,33,4373,4,41,3242,1,...,2,1,1,4,3,60,1,1,3,
3,2,23,5081,3,11,6837,2,37,5521,1,...,2,1,2,4,5,70,2,1,5,biology
4,2,36,3215,2,13,7731,3,5,4156,4,...,2,2,3,10,1,10,2,1,4,Psychology


## Filter Dataset

Filter the columns using DASS Questions `QnA` as parameter.

In [4]:
  # Filter Dataset for DASS42 Question only
dass = data.copy()
dass = dass.filter(regex='Q\d{1,2}A')
dass.head()

Unnamed: 0,Q1A,Q2A,Q3A,Q4A,Q5A,Q6A,Q7A,Q8A,Q9A,Q10A,...,Q33A,Q34A,Q35A,Q36A,Q37A,Q38A,Q39A,Q40A,Q41A,Q42A
0,4,4,2,4,4,4,4,4,2,1,...,2,3,4,4,1,2,4,3,4,4
1,4,1,2,3,4,4,3,4,3,2,...,3,2,2,3,4,2,2,1,2,2
2,3,1,4,1,4,3,1,3,2,4,...,1,4,3,4,4,4,2,2,1,4
3,2,3,2,1,3,3,4,2,3,3,...,2,4,1,1,2,1,3,4,4,2
4,2,2,3,4,4,2,4,4,4,3,...,4,4,3,4,3,3,3,4,4,3


Because DASS42 use scoring scale of 0 - 3, subtract 1 from each scores.

In [5]:
  # DASS scale 0-3. All scores -1,
dass = dass.subtract(1,axis=1)
dass.head()

Unnamed: 0,Q1A,Q2A,Q3A,Q4A,Q5A,Q6A,Q7A,Q8A,Q9A,Q10A,...,Q33A,Q34A,Q35A,Q36A,Q37A,Q38A,Q39A,Q40A,Q41A,Q42A
0,3,3,1,3,3,3,3,3,1,0,...,1,2,3,3,0,1,3,2,3,3
1,3,0,1,2,3,3,2,3,2,1,...,2,1,1,2,3,1,1,0,1,1
2,2,0,3,0,3,2,0,2,1,3,...,0,3,2,3,3,3,1,1,0,3
3,1,2,1,0,2,2,3,1,2,2,...,1,3,0,0,1,0,2,3,3,1
4,1,1,2,3,3,1,3,3,3,2,...,3,3,2,3,2,2,2,3,3,2


In [6]:
dass.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39775 entries, 0 to 39774
Data columns (total 42 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Q1A     39775 non-null  int64
 1   Q2A     39775 non-null  int64
 2   Q3A     39775 non-null  int64
 3   Q4A     39775 non-null  int64
 4   Q5A     39775 non-null  int64
 5   Q6A     39775 non-null  int64
 6   Q7A     39775 non-null  int64
 7   Q8A     39775 non-null  int64
 8   Q9A     39775 non-null  int64
 9   Q10A    39775 non-null  int64
 10  Q11A    39775 non-null  int64
 11  Q12A    39775 non-null  int64
 12  Q13A    39775 non-null  int64
 13  Q14A    39775 non-null  int64
 14  Q15A    39775 non-null  int64
 15  Q16A    39775 non-null  int64
 16  Q17A    39775 non-null  int64
 17  Q18A    39775 non-null  int64
 18  Q19A    39775 non-null  int64
 19  Q20A    39775 non-null  int64
 20  Q21A    39775 non-null  int64
 21  Q22A    39775 non-null  int64
 22  Q23A    39775 non-null  int64
 23  Q24A    397

## Create Dataset for Each Category

Make a dictionary containing lists of each categories' questions.

Make a new dataset using the dictionary as a filter for each category.

In [7]:
  # Question Number for each category
Dass42_keys = {'Depression': [3, 5, 10, 13, 16, 17, 21, 24, 26, 31, 34, 37, 38, 42],
             'Anxiety': [2, 4, 7, 9, 15, 19, 20, 23, 25, 28, 30, 36, 40, 41],
             'Stress': [1, 6, 8, 11, 12, 14, 18, 22, 27, 29, 32, 33, 35, 39]}
  
  # Filter for Depression Questions
Dep_Quest = []
for i in Dass42_keys["Depression"]:
    Dep_Quest.append('Q'+str(i)+'A')

  # Filter for Anxiety Questions
Anx_Quest = []
for i in Dass42_keys["Anxiety"]:
    Anx_Quest.append('Q'+str(i)+'A')

  # Filter for Stress Questions
Stre_Quest = []
for i in Dass42_keys["Stress"]:
    Stre_Quest.append('Q'+str(i)+'A')

  # Filter Dataset
depression = dass.filter(Dep_Quest)
anxiety = dass.filter(Anx_Quest)
stress = dass.filter(Stre_Quest)


In [8]:
  # Depression dataset
depression.head()

Unnamed: 0,Q3A,Q5A,Q10A,Q13A,Q16A,Q17A,Q21A,Q24A,Q26A,Q31A,Q34A,Q37A,Q38A,Q42A
0,1,3,0,3,3,2,0,3,3,3,2,0,1,3
1,1,3,1,3,2,3,1,1,2,1,1,3,1,1
2,3,3,3,3,3,3,3,3,0,3,3,3,3,3
3,1,2,2,0,1,2,0,0,1,2,3,1,0,1
4,2,3,2,3,2,3,2,1,3,2,3,2,2,2


In [9]:
depression.shape

(39775, 14)

In [10]:
  # anxiety dataset
anxiety.head()

Unnamed: 0,Q2A,Q4A,Q7A,Q9A,Q15A,Q19A,Q20A,Q23A,Q25A,Q28A,Q30A,Q36A,Q40A,Q41A
0,3,3,3,1,3,2,2,3,3,2,1,3,2,3
1,0,2,2,2,2,0,0,0,1,3,2,2,0,1
2,0,0,0,1,3,1,0,1,1,0,1,3,1,0
3,2,0,3,2,1,0,1,0,0,0,2,0,3,3
4,1,3,3,3,3,3,3,3,3,3,3,3,3,3


In [11]:
anxiety.shape

(39775, 14)

In [12]:
  # stress dataset
stress.head()

Unnamed: 0,Q1A,Q6A,Q8A,Q11A,Q12A,Q14A,Q18A,Q22A,Q27A,Q29A,Q32A,Q33A,Q35A,Q39A
0,3,3,3,3,3,3,3,3,3,3,3,1,3,3
1,3,3,3,1,1,3,1,2,2,2,2,2,1,1
2,2,2,2,1,0,0,1,2,1,1,2,0,2,1
3,1,2,1,1,0,3,0,0,3,2,0,1,0,2
4,1,1,3,1,3,3,3,2,1,1,3,3,2,2


In [13]:
stress.shape

(39775, 14)

## Calculate the Scores

The scores are calculated by adding the total score of the question items for each category.

In [14]:
  # Sum scores for each categories
def scores(category):
    col = list(category)
    category['Cat_Scores'] = category[col].sum(axis=1)
    return category
    
depression = scores(depression)
anxiety = scores(anxiety)
stress = scores(stress)

In [15]:
depression.head()

Unnamed: 0,Q3A,Q5A,Q10A,Q13A,Q16A,Q17A,Q21A,Q24A,Q26A,Q31A,Q34A,Q37A,Q38A,Q42A,Cat_Scores
0,1,3,0,3,3,2,0,3,3,3,2,0,1,3,27
1,1,3,1,3,2,3,1,1,2,1,1,3,1,1,24
2,3,3,3,3,3,3,3,3,0,3,3,3,3,3,39
3,1,2,2,0,1,2,0,0,1,2,3,1,0,1,16
4,2,3,2,3,2,3,2,1,3,2,3,2,2,2,32


In [16]:
anxiety.head()

Unnamed: 0,Q2A,Q4A,Q7A,Q9A,Q15A,Q19A,Q20A,Q23A,Q25A,Q28A,Q30A,Q36A,Q40A,Q41A,Cat_Scores
0,3,3,3,1,3,2,2,3,3,2,1,3,2,3,34
1,0,2,2,2,2,0,0,0,1,3,2,2,0,1,17
2,0,0,0,1,3,1,0,1,1,0,1,3,1,0,12
3,2,0,3,2,1,0,1,0,0,0,2,0,3,3,17
4,1,3,3,3,3,3,3,3,3,3,3,3,3,3,40


In [17]:
stress.head()

Unnamed: 0,Q1A,Q6A,Q8A,Q11A,Q12A,Q14A,Q18A,Q22A,Q27A,Q29A,Q32A,Q33A,Q35A,Q39A,Cat_Scores
0,3,3,3,3,3,3,3,3,3,3,3,1,3,3,40
1,3,3,3,1,1,3,1,2,2,2,2,2,1,1,27
2,2,2,2,1,0,0,1,2,1,1,2,0,2,1,17
3,1,2,1,1,0,3,0,0,3,2,0,1,0,2,16
4,1,1,3,1,3,3,3,2,1,1,3,3,2,2,29


## Rating Conditions and Saving Datasets

For each category, use the score indicator to rate the severity of condition. See DASS 42  [Indo Ver](https://digilib.esaunggul.ac.id/public/UEU-Undergraduate-11049-kuesioner.Image.Marked.pdf) or [Eng Ver](https://ehchiro.com.au/wp-content/uploads/2017/06/DASS-Questionnaire_FILLABLE.pdf) for reference.

After rating the condition, drop the `Cat_Scores` column using `data.drop(columns=['Cat_Scores'], axis=1)`.

Use  `data.to_csv('file_name.csv', index=False)` to save dataframe into csv file without the index.

Download the dataset with `files.download('file_name.csv')`.

## Depression

In [18]:
  # Score Scale for Depression
def dep_indic(dep):
    if 0 <= dep <= 9 :
      return 'Normal'
    elif 10 <= dep <=13:
      return 'Ringan'
    elif 14 <= dep <= 20:
      return 'Sedang'
    elif 21 <= dep <= 27:
      return 'Parah'
    elif dep >= 28:
      return 'Sangat Parah'

depression['Scale_Dep'] = depression['Cat_Scores'].apply(dep_indic)
depression.head()

Unnamed: 0,Q3A,Q5A,Q10A,Q13A,Q16A,Q17A,Q21A,Q24A,Q26A,Q31A,Q34A,Q37A,Q38A,Q42A,Cat_Scores,Scale_Dep
0,1,3,0,3,3,2,0,3,3,3,2,0,1,3,27,Parah
1,1,3,1,3,2,3,1,1,2,1,1,3,1,1,24,Parah
2,3,3,3,3,3,3,3,3,0,3,3,3,3,3,39,Sangat Parah
3,1,2,2,0,1,2,0,0,1,2,3,1,0,1,16,Sedang
4,2,3,2,3,2,3,2,1,3,2,3,2,2,2,32,Sangat Parah


In [19]:
depression['Scale_Dep'].value_counts()

Sangat Parah    13577
Normal           8856
Sedang           7079
Parah            6477
Ringan           3786
Name: Scale_Dep, dtype: int64

In [20]:
Depression = depression.drop(columns=['Cat_Scores'], axis=1)
Depression.head()

Unnamed: 0,Q3A,Q5A,Q10A,Q13A,Q16A,Q17A,Q21A,Q24A,Q26A,Q31A,Q34A,Q37A,Q38A,Q42A,Scale_Dep
0,1,3,0,3,3,2,0,3,3,3,2,0,1,3,Parah
1,1,3,1,3,2,3,1,1,2,1,1,3,1,1,Parah
2,3,3,3,3,3,3,3,3,0,3,3,3,3,3,Sangat Parah
3,1,2,2,0,1,2,0,0,1,2,3,1,0,1,Sedang
4,2,3,2,3,2,3,2,1,3,2,3,2,2,2,Sangat Parah


In [21]:
Depression.shape

(39775, 15)

In [22]:
Depression.to_csv('Depression.csv', index=False, encoding ='utf-8-sig') 
files.download('Depression.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Anxiety

In [23]:
  # Score scale for Anxiety
def anx_indic(anx):
    if 0 <= anx <=7 :
      return 'Normal'
    elif 8 <= anx <= 9 :
      return 'Ringan'
    elif 10 <= anx <= 14 :
      return 'Sedang'
    elif 15 <= anx <= 19 :
      return 'Parah'
    elif anx >= 20 :
      return 'Sangat Parah'

anxiety['Scale_Anx'] = anxiety['Cat_Scores'].apply(anx_indic)
anxiety.head()

Unnamed: 0,Q2A,Q4A,Q7A,Q9A,Q15A,Q19A,Q20A,Q23A,Q25A,Q28A,Q30A,Q36A,Q40A,Q41A,Cat_Scores,Scale_Anx
0,3,3,3,1,3,2,2,3,3,2,1,3,2,3,34,Sangat Parah
1,0,2,2,2,2,0,0,0,1,3,2,2,0,1,17,Parah
2,0,0,0,1,3,1,0,1,1,0,1,3,1,0,12,Sedang
3,2,0,3,2,1,0,1,0,0,0,2,0,3,3,17,Parah
4,1,3,3,3,3,3,3,3,3,3,3,3,3,3,40,Sangat Parah


In [24]:
anxiety['Scale_Anx'].value_counts()

Sangat Parah    14122
Normal           9728
Sedang           7048
Parah            6113
Ringan           2764
Name: Scale_Anx, dtype: int64

In [25]:
Anxiety = anxiety.drop(columns=['Cat_Scores'])
Anxiety.head()

Unnamed: 0,Q2A,Q4A,Q7A,Q9A,Q15A,Q19A,Q20A,Q23A,Q25A,Q28A,Q30A,Q36A,Q40A,Q41A,Scale_Anx
0,3,3,3,1,3,2,2,3,3,2,1,3,2,3,Sangat Parah
1,0,2,2,2,2,0,0,0,1,3,2,2,0,1,Parah
2,0,0,0,1,3,1,0,1,1,0,1,3,1,0,Sedang
3,2,0,3,2,1,0,1,0,0,0,2,0,3,3,Parah
4,1,3,3,3,3,3,3,3,3,3,3,3,3,3,Sangat Parah


In [26]:
Anxiety.shape

(39775, 15)

In [27]:
Anxiety.to_csv('Anxiety.csv', index=False, encoding ='utf-8-sig') 
files.download('Anxiety.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Stress

In [29]:
  # Score scale for Stress
def stre_indic(stre):
    if 0 <= stre <=14:
      return 'Normal'
    elif 15 <= stre <= 18 :
      return 'Ringan'
    elif 19 <= stre <= 25 :
      return 'Sedang'
    elif 26 <= stre <= 33 :
      return 'Parah'
    elif stre >= 34 :
      return 'Sangat Parah'

stress['Scale_Stre'] = stress['Cat_Scores'].apply(stre_indic)
stress.head()

Unnamed: 0,Q1A,Q6A,Q8A,Q11A,Q12A,Q14A,Q18A,Q22A,Q27A,Q29A,Q32A,Q33A,Q35A,Q39A,Cat_Scores,Scale_Stre
0,3,3,3,3,3,3,3,3,3,3,3,1,3,3,40,Sangat Parah
1,3,3,3,1,1,3,1,2,2,2,2,2,1,1,27,Parah
2,2,2,2,1,0,0,1,2,1,1,2,0,2,1,17,Ringan
3,1,2,1,1,0,3,0,0,3,2,0,1,0,2,16,Ringan
4,1,1,3,1,3,3,3,2,1,1,3,3,2,2,29,Parah


In [30]:
stress['Scale_Stre'].value_counts()

Normal          11800
Sedang           8730
Parah            8575
Sangat Parah     5749
Ringan           4921
Name: Scale_Stre, dtype: int64

In [31]:
Stress = stress.drop(columns=['Cat_Scores'])
Stress.head()

Unnamed: 0,Q1A,Q6A,Q8A,Q11A,Q12A,Q14A,Q18A,Q22A,Q27A,Q29A,Q32A,Q33A,Q35A,Q39A,Scale_Stre
0,3,3,3,3,3,3,3,3,3,3,3,1,3,3,Sangat Parah
1,3,3,3,1,1,3,1,2,2,2,2,2,1,1,Parah
2,2,2,2,1,0,0,1,2,1,1,2,0,2,1,Ringan
3,1,2,1,1,0,3,0,0,3,2,0,1,0,2,Ringan
4,1,1,3,1,3,3,3,2,1,1,3,3,2,2,Parah


In [32]:
Stress.shape

(39775, 15)

In [33]:
Stress.to_csv('Stress.csv', index=False, encoding ='utf-8-sig') 
files.download('Stress.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>