## Anxiety, stress and depression on University Students
We found a [research](https://figshare.com/articles/dataset/MHP_Anxiety_Stress_Depression_Dataset_of_University_Students/25771164) about the feelings of different students. We will try to predict the level of anxiety, stress and depression of a student based on the features provided in the dataset. First, we will load the data and take a look at it.

### Imports

In [110]:
import pandas as pd

In [111]:
raw_data = pd.read_csv('data/Raw Dataset.csv')
processed = pd.read_csv('data/Processed.csv')
depression = pd.read_csv('data/Depression.csv')
anxiety = pd.read_csv('data/Anxiety.csv')
stress = pd.read_csv('data/Stress.csv')


In [112]:
raw_data.head()

Unnamed: 0,1. Age,2. Gender,3. University,4. Department,5. Academic Year,6. Current CGPA,7. Did you receive a waiver or scholarship at your university?,"1. In a semester, how often have you felt upset due to something that happened in your academic affairs?","2. In a semester, how often you felt as if you were unable to control important things in your academic affairs?","3. In a semester, how often you felt nervous and stressed because of academic pressure?",...,"7. In a semester, how often have you felt afraid, as if something awful might happen?","1. In a semester, how often have you had little interest or pleasure in doing things?","2. In a semester, how often have you been feeling down, depressed or hopeless?","3. In a semester, how often have you had trouble falling or staying asleep, or sleeping too much?","4. In a semester, how often have you been feeling tired or having little energy?","5. In a semester, how often have you had poor appetite or overeating?","6. In a semester, how often have you been feeling bad about yourself - or that you are a failure or have let yourself or your family down?","7. In a semester, how often have you been having trouble concentrating on things, such as reading the books or watching television?","8. In a semester, how often have you moved or spoke too slowly for other people to notice? Or you've been moving a lot more than usual because you've been restless?","9. In a semester, how often have you had thoughts that you would be better off dead, or of hurting yourself?"
0,18-22,Female,"Independent University, Bangladesh (IUB)",Engineering - CS / CSE / CSC / Similar to CS,Second Year or Equivalent,2.50 - 2.99,No,3 - Fairly Often,4 - Very Often,3 - Fairly Often,...,2 - More than half the days,2 - More than half the days,2 - More than half the days,3 - Nearly every day,2 - More than half the days,2 - More than half the days,2 - More than half the days,2 - More than half the days,3 - Nearly every day,2 - More than half the days
1,18-22,Male,"Independent University, Bangladesh (IUB)",Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,No,3 - Fairly Often,3 - Fairly Often,4 - Very Often,...,2 - More than half the days,3 - Nearly every day,2 - More than half the days,2 - More than half the days,2 - More than half the days,2 - More than half the days,2 - More than half the days,2 - More than half the days,2 - More than half the days,2 - More than half the days
2,18-22,Male,American International University Bangladesh (...,Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,No,0 - Never,0 - Never,0 - Never,...,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all,0 - Not at all
3,18-22,Male,American International University Bangladesh (...,Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,No,3 - Fairly Often,1 - Almost Never,2 - Sometimes,...,2 - More than half the days,2 - More than half the days,1 - Several days,2 - More than half the days,1 - Several days,2 - More than half the days,1 - Several days,2 - More than half the days,2 - More than half the days,1 - Several days
4,18-22,Male,North South University (NSU),Engineering - CS / CSE / CSC / Similar to CS,Second Year or Equivalent,2.50 - 2.99,No,4 - Very Often,4 - Very Often,4 - Very Often,...,3 - Nearly every day,1 - Several days,3 - Nearly every day,3 - Nearly every day,3 - Nearly every day,1 - Several days,3 - Nearly every day,0 - Not at all,3 - Nearly every day,3 - Nearly every day


### Pre-processing

There are some things to improve in the dataset. Firslty, we will rename the columns to remove the question numbering

In [113]:
raw_data.rename(lambda x: x.split(". ")[1], axis=1, inplace=True)

Now we get the numeric value for each question and sum it up depending on its category (stress, anxiety and depression). We decided to put different weights to different questions because they are not all equally important.

Then we drop the columns representing this questions (and the column asking if the student received a scholarship because everyone said no).

In [114]:
question_weights = [
    [0.1, 0.2, 0.15, 0.2, 0.05, 0.05, 0.05, 0.05, 0.1, 0.2], # Stress
    [0.2, 0.2, 0.15, 0.1, 0.15, 0.1, 0.1], # Anxiety
    [0.15, 0.2, 0.1, 0.1, 0.08, 0.12, 0.1, 0.05, 0.2], # Depression
]

To do this, we multiply the value given by each student on each question by it's corresponding weight, after that we do a Min-Max Normalization.

In [115]:
questions_values = raw_data.iloc[:, 7:].map(lambda x: int(x[0]))

raw_data["Stress value"] = (questions_values.iloc[:, :10] * question_weights[0]).sum(axis=1)
raw_data["Anxiety value"] = (questions_values.iloc[:, 10:17] * question_weights[1]).sum(axis=1)
raw_data["Depression value"] = (questions_values.iloc[:, 17:] * question_weights[2]).sum(axis=1)

raw_data["Stress value"] = (raw_data["Stress value"] - raw_data["Stress value"].min()) / (raw_data["Stress value"].max() - raw_data["Stress value"].min())
raw_data["Anxiety value"] = (raw_data["Anxiety value"] - raw_data["Anxiety value"].min()) / (raw_data["Anxiety value"].max() - raw_data["Anxiety value"].min())
raw_data["Depression value"] = (raw_data["Depression value"] - raw_data["Depression value"].min()) / (raw_data["Depression value"].max() - raw_data["Depression value"].min())

# Drop the individual questions and keep only the calculated values
raw_data = raw_data.drop(columns=raw_data.columns[6:33])
raw_data.head()

Unnamed: 0,Age,Gender,University,Department,Academic Year,Current CGPA,Stress value,Anxiety value,Depression value
0,18-22,Female,"Independent University, Bangladesh (IUB)",Engineering - CS / CSE / CSC / Similar to CS,Second Year or Equivalent,2.50 - 2.99,0.76087,0.716667,0.712121
1,18-22,Male,"Independent University, Bangladesh (IUB)",Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,0.684783,0.55,0.712121
2,18-22,Male,American International University Bangladesh (...,Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,0.01087,0.0,0.0
3,18-22,Male,American International University Bangladesh (...,Engineering - CS / CSE / CSC / Similar to CS,Third Year or Equivalent,3.00 - 3.39,0.48913,0.483333,0.478788
4,18-22,Male,North South University (NSU),Engineering - CS / CSE / CSC / Similar to CS,Second Year or Equivalent,2.50 - 2.99,0.804348,0.633333,0.769697


We change the Academic Year column to make it easier to understand. We use numbers for the years instead of text.

In [116]:
years = {
    "First Year or Equivalent": 1,
    "Second Year or Equivalent": 2,
    "Third Year or Equivalent": 3,
    "Fourth Year or Equivalent": 4,
    "Fifth Year or Equivalent": 5
}

raw_data["Academic Year"] = raw_data["Academic Year"].apply(lambda x: int(years.get(x)) if x in years.keys() else 0)
raw_data["Academic Year"] = raw_data["Academic Year"].astype("Int64")
raw_data.rename(columns={"Academic Year": "Year"}, inplace=True)


In [117]:
departements = raw_data["Department"].unique()
raw_data["Department"] = raw_data["Department"].apply(lambda x: x.split("/")[0])

Checking correlation between stress, anxiety and depression. We consider more than 0.90 as a high correlation between attributes. In this case there is no correlation higher than 0.77 so we don't see any redundancy.

In [118]:
raw_data[["Year", "Stress value", "Anxiety value", "Depression value"]].corr()

Unnamed: 0,Year,Stress value,Anxiety value,Depression value
Year,1.0,0.05145,0.070788,0.052358
Stress value,0.05145,1.0,0.674662,0.590094
Anxiety value,0.070788,0.674662,1.0,0.75477
Depression value,0.052358,0.590094,0.75477,1.0


In [119]:
raw_data.head()

Unnamed: 0,Age,Gender,University,Department,Year,Current CGPA,Stress value,Anxiety value,Depression value
0,18-22,Female,"Independent University, Bangladesh (IUB)",Engineering - CS,2,2.50 - 2.99,0.76087,0.716667,0.712121
1,18-22,Male,"Independent University, Bangladesh (IUB)",Engineering - CS,3,3.00 - 3.39,0.684783,0.55,0.712121
2,18-22,Male,American International University Bangladesh (...,Engineering - CS,3,3.00 - 3.39,0.01087,0.0,0.0
3,18-22,Male,American International University Bangladesh (...,Engineering - CS,3,3.00 - 3.39,0.48913,0.483333,0.478788
4,18-22,Male,North South University (NSU),Engineering - CS,2,2.50 - 2.99,0.804348,0.633333,0.769697


### Persist the data

To use this dataframe in our classification and clustering analysis we save it in a CSV.

In [120]:
raw_data.to_csv("Preprocessed.csv", index=False)