# Depression among Indian Students

This dataset compiles a wide range of information aimed at understanding, analyzing, and predicting **depression levels** among students. It is primarily designed for research in psychology, data science, and education, providing insights into several factors that contribute to student mental health challenges and aiding in the design of early-intervention strategies.

The dataset, sourced from Kaggle, consists of 28,000 observations across 18 variables, with each observation capturing individual characteristics of surveyed Indian students. After some initial pre-processing to clean and transform the data, I conduct exploratory data analysis to investigate, and later visualise, the association between the features and depression. In a separate Jupyter Notebook, I apply several machine learning classifiers to predict whether a student is depressed or not, acquiring an understanding of what features contribute to students' perceived depressive states the most.

The dataset is available here: https://www.kaggle.com/datasets/adilshamim8/student-depression-dataset

## 1 - Pre-processing

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
#If I run plt.rcParams['figure.figsize'] = (10, 8), this size applies to all plots below
%matplotlib inline

In [None]:
df = pd.read_csv("student_depression_dataset.csv")

In [None]:
display(df.head())
display(df.info(verbose=True, show_counts=True))
display(df.shape)

In [None]:
# there are no np.NaN values and no duplicated

print(df.isnull().sum())
print(df[df.duplicated()])

In [None]:
#transforming financial stress from object to numeric

df.query("`Financial Stress` == '?'")
df["Financial Stress"].replace("?", df.loc[~(df["Financial Stress"]=="?"),"Financial Stress"].mode()[0], inplace=True)
df["Financial Stress"]=df["Financial Stress"].astype(float)

In [None]:
#stripping sleep duration

df["Sleep Duration"]=df["Sleep Duration"].apply(lambda x: x.strip("'"))

In [None]:
#transforming age into an integer

df["Age"] = df["Age"].astype(int)

In [None]:
#given that 99.9% of observations are students, we will delete all workers

df["Profession"].value_counts(normalize=True)
df.loc[df["Profession"]=="Student",:]
df.drop(["Profession"],axis=1, inplace=True)

In [None]:
#as we only retain students, I delete "work pressure" and "job satisfaction", not interesting to students

df["Work Pressure"].value_counts()
df.drop(["Work Pressure"], axis=1, inplace=True)

df["Job Satisfaction"].value_counts(normalize=True)
df.drop(["Job Satisfaction"], inplace=True, axis=1)

In [None]:
#I rename some columns

df.rename(columns={"Have you ever had suicidal thoughts ?":"Suicidal Thoughts",
                   "Family History of Mental Illness":"Family History",
                  "Work/Study Hours":"Work_study_hours"}, inplace=True)

In [None]:
df.set_index("id", inplace=True)

In [None]:
df.to_csv("depression_after_eda.csv", index=False)

In [None]:
#following this short pre-processing, the number of rows now reduced to 27901 and columns to 14

df.shape

In [None]:
df.head()

In [None]:
df.info()

#### Boxplot

In [None]:
plt.style.available
plt.style.use(plt.style.available[-1]); #tableau-coloblind10 has a nice layout.

In [None]:
#conducting an analysis of all numerical variables

for feature in df.columns:
    if df[feature].dtypes != "object" and feature!="Depression": #alternative: include=["int64", "float64"]
        df[feature].plot(kind="box", color="green", showfliers=True);
        plt.title(f"Boxplot of {feature}");
        plt.savefig(f'Boxplot_{feature}.png'); #
        plt.show();
        plt.tight_layout();

#### Histograms

In [None]:
df.hist(figsize=(10,8), edgecolor="black", color="green", bins=20, grid=False)
plt.tight_layout()
plt.savefig("histogram.png", dpi=300, bbox_inches="tight")
plt.show()

## 2 - Exploratory Data Analysis

I conduct exploratory data analysis of the following 13 features, in relation to depression: **Gender, Age, City, Academic Pressure, CGPA, Study Satisfaction, Sleep Duration, Dietary Habits, Degree, Suicidal Thoughts, Work/Study Hours, Financial Stress, Family History**.

#### Depressed students

In [None]:
depressed_students_p=round(df["Depression"].value_counts(normalize=True)[1],2)
f'Depressed students amount to {depressed_students_p} of the total'

#### 0 - Correlation Matrix

In [None]:
#A correlation matrix displays the statistical relationship between all numerical variables

num_var=[index for index,value in df.dtypes.items() if value!="object"]
plt.figure(figsize=(10, 8));
ax=sns.heatmap(df[num_var].corr(), vmin=-1, vmax=+1, linecolor="black", annot=True, cmap="coolwarm", cbar=True);
ax.set_title("Correlation Matrix", fontsize=15);
plt.savefig("Correlation Matrix", dpi=300, bbox_inches="tight");
plt.show()

#alternative 1: sns.heatmap(df.select_dtypes(include="number").corr(),...)
#alternative 2: pd.plotting.scatter_matrix(df, figsize=(20, 15)), to visualise histograms and scatter plots.

The plot shows that depression is highly positively correlated with: **academic pressure, financial stress** and **study hours**. At the same time, depression is negatively correlated with: **age** and **study satisfaction**.

#### 1 - Gender

In [None]:
df["Gender"].value_counts(normalize=True)
depressed_students=df.groupby(["Gender"])["Depression"].sum()
total_depressed_students=depressed_students.sum()

In [None]:
stud_depression=df["Depression"].value_counts()[1]
print(f'The % of depression among female students is {round(depressed_students[0]/stud_depression,2)}')
print(f'The % of depression among male students is {round(depressed_students[1]/stud_depression,2)}')

In [None]:
ax = (depressed_students/total_depressed_students).plot(kind="bar", figsize=(8,6), fontsize=12, color=["pink","cyan"], lw=2, rot=0)

for bar in ax.patches: #ax.patches is a list of all rectangular shapes (bars) and each bar is a rectangle object
    height = bar.get_height() #this gets the height of the bar
    ax.text( #this gets the actual number as a label on the bar
        bar.get_x() + bar.get_width() / 2, #this finds the horizontal center of the bar
        height,
        f'{height:.2f}',#this creates the text string that will be shown
        ha='center', #horizontal alignment = center
        va='bottom', #vertical alignment = "bottom" means aligning the bottom of the text to the y-position so it sits on top of the bar.
        fontsize=12
    )
ax.set_title("Percentage of male and female depressed students", fontsize=(15))
plt.savefig("Percentage of male and female depressed students", dpi=300, bbox_inches="tight")
plt.show()

#### 2 - Age

In [None]:
df["Age"].value_counts().sort_index()

In [None]:
#I am only focusing on range 18-34, where the largest share of people lies
age_depression_gender_18_34=df.pivot_table(values="Depression", index="Age", columns="Gender", aggfunc="sum").sort_index().loc[18:34]
age_depression_gender_35_on=df.pivot_table(values="Depression", index="Age", columns="Gender", aggfunc="sum").sort_index().loc[35:59].sum(axis=0)
age_depression_gender_35_on_t=pd.DataFrame(age_depression_gender_35_on).T
age_depression_gender_35_on_t.index=["35-59"]
concat_depression_gender=pd.concat([age_depression_gender_18_34,age_depression_gender_35_on_t])

In [None]:
ax = concat_depression_gender.plot(kind="bar", color=["pink", "cyan"], figsize=(10, 7), rot=0, fontsize=11)
ax.set_title("Age and Depression, by gender", fontsize=15)
ax.legend(fontsize=15)
ax.set_xlabel("Age", fontsize=15)
plt.savefig("Age and Depression, by gender", dpi=300, bbox_inches="tight")
plt.show()


The majority of depressed students is aged 20, 24 and 28. With age, depression tends to wear off.

#### 3 - City

In [None]:
sum_depressed_students_10_cities=df.groupby(["City"])["Depression"].sum().sort_values(ascending=False).head(10).sum(axis=0)

In [None]:
city_dep=df.groupby(["City"])["Depression"].sum().sort_values(ascending=False).head(10)/sum_depressed_students_10_cities
city_dep.sort_values(ascending=False, inplace=True)

In [None]:
#I want to compare the cities' populations with the incidence of depressed students. From a search online:
city_populations_2025 = {
    "Kalyan": 1820000,
    "Hyderabad": 11340000,
    "Srinagar": 1780000,
    "Vasai-Virar": 1790000,
    "Thane": 2690000,
    "Kolkata": 15850000,
    "Ludhiana": 2030000,
    "Lucknow": 4130000,
    "Ahmedabad": 9061820,
    "Patna": 2689540
}

In [None]:
city_pop = (pd.Series(city_populations_2025)/sum(city_populations_2025.values())).sort_values(ascending=False)

In [None]:
concat=pd.concat([city_dep, city_pop], axis=1)
concat.columns=["Depression %","Population %"]
concat

In [None]:
concat["Depression %"].plot(kind="bar", rot=45, xlabel="Cities", ylim=(0.00,0.32));
concat["Population %"].plot(style="vr", rot=45, lw=20, ylabel="Percentage", ylim=(0.00,0.32));
plt.title("% of depressed students (bar char) and population (red marker)");
plt.savefig("% of depressed students (bar char) and population (red marker)", dpi=300, bbox_inches="tight");
plt.legend()
plt.show();

Kalyan, Srinagar, Vasai-Virar, Thare and Ludhiana have a larger share of depressed students compared to the share of inhabitants they have within this group of cities.

#### 4 - Academic Pressure

In [None]:
df["Academic Pressure"].value_counts().sort_index()

In [None]:
df["Academic Pressure"] = pd.Categorical(df["Academic Pressure"], categories=[0,1,2,3,4,5])
acad_press_depression=pd.crosstab(df["Academic Pressure"], df["Depression"], margins=True)
acad_press_depression_rel=(acad_press_depression/acad_press_depression.loc["All","All"]).loc[[0,1,2,3,4,5],[0,1]]
acad_press_depression_rel

In [None]:
#what's the proportion of depressed/non-depressed in each category of academic pressure?

ax = acad_press_depression.apply(lambda x: x/acad_press_depression["All"]).drop("All",axis=1).plot(kind="bar", rot=True, );
for bar in ax.patches:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)

#interpretation: almost 60% of the students that picked 0 are non-depressed. Almost 90% of students that picked 5 are depressed

In [None]:
#of all depressed/non-depressed students, what % are in each category?

ax = acad_press_depression.apply(lambda row: row/acad_press_depression.iloc[6,:], axis=1).iloc[0:6,0:2].plot(kind="bar", rot=0);
for bar in ax.patches:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)

#interpretation: of all depressed students, slightly more than 5% are in 1, and over 30% are in 2, etc.

In [None]:
#depressed students with highest academic pressure are 20% of all students.
ax = acad_press_depression.iloc[1:6].plot(kind="bar", figsize=(8,6), rot=0) #academic pressure = 0 is inconsequential
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)
ax.set_title("Depression and Academic Pressure", fontsize=15)
plt.savefig("Depression and Academic Pressure", dpi=300, bbox_inches="tight")
plt.show()


Academic pressure is higher for depressed students.

In [None]:
ax=pd.crosstab(index=df["Academic Pressure"], columns=[df["Depression"], df["Gender"]]).iloc[1:6].plot(kind="bar", rot=0);
ax.set_title("Depression and Academic Pressure, by gender", fontsize=12);
ax.set_ylabel("Frequency");

In [None]:
#academic pressure and financial stress

academ_press_fin_stress = pd.crosstab(index=df["Academic Pressure"], columns=[df["Depression"], df["Financial Stress"]]).iloc[1:6]
academ_press_fin_stress

In [None]:
styles = ["-v"] * 4 + [":v"]
colors = ["lightgrey"] * 4 + ["red"]
ax1=academ_press_fin_stress.iloc[:,0:5].plot(style=styles, color=colors);
ax1.set_title("Acad press for non-depressed people through different leves of financial stress");
ax2=academ_press_fin_stress.iloc[:,5:11].plot(style=styles, color=colors);
ax2.set_title("Acad press for depressed people through different levels of financial stress");

In [None]:
academ_press_fin_stress.iloc[:,[0,9]].plot(style=["vr"], figsize=(7,5));

#### 5 - CGPA

In [None]:
df["CGPA"].describe()

In [None]:
df.groupby("Depression")["CGPA"].describe().T

Interestingly, depressed students have a higher average CGPA than non-depressed ones. 

In [None]:
pd.pivot_table(df, values="CGPA", columns="Depression", index="Age", aggfunc="mean").dropna().plot(kind="bar", figsize=(7,5));


The CGPA seems rather consistent also across age, with larger differences past 36 years old.

#### 6 - Study Satisfaction

In [None]:
df.groupby(["Study Satisfaction"])["Depression"].value_counts().unstack().plot(kind="bar", rot=0);

#using df.pivot_table(index="Study Satisfaction", values=["Depression"], aggfunc=["sum"]) 
#only depression = 1 gets counted in

In [None]:
ax=df.groupby(["Study Satisfaction"])["Depression"].sum().plot(kind="bar");
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom')
ax.set_title("Study Satisfaction of Depressed Students", fontsize=15);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0);
plt.savefig("Study Satisfaction of Depressed Students", dpi=300, bbox_inches="tight")
plt.show()

In [None]:
pivot_table_ss=pd.pivot_table(df, values="Depression", columns="Gender", index="Study Satisfaction", aggfunc="sum", margins=True)/16308
pivot_table_ss=pivot_table_ss.iloc[[1,2,3,4,5],[0,1]]
pivot_table_ss

In [None]:
ax=pivot_table_ss.plot(kind="bar", color=["pink","cyan"], figsize=(8,6));
ax.set_title("Study satisfaction in depressed female and male students", fontsize=15);
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)

for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)

plt.savefig("Study satisfaction in depressed female and male students", dpi=300, bbox_inches="tight");

In [None]:
#doing the same analysis for the non-depressed students

df_0=df[df["Depression"]==0]
df_0

In [None]:
ax = (pd.crosstab(df_0["Study Satisfaction"], df_0["Gender"], margins=True)/11562).iloc[[1,2,3,4,5],[0,1]].plot(kind="bar", color=["pink","cyan"], figsize=(8,6));
plt.xticks(rotation=0);

for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)
    
plt.savefig("Study satisfaction in non-depressed female and male students", dpi=300, bbox_inches="tight");

In [None]:
pivot_table_ss=pd.pivot_table(df, values="Depression", columns="Gender", index="Study Satisfaction", aggfunc="sum", margins=True)/16308

#### 7 - Sleep Duration

In [None]:
axes = df.plot(sharex=True, kind="hist", column="Depression", by="Sleep Duration", edgecolor="black", lw=1, grid=False, figsize=(6,20));
for ax in axes.flat:
    ax.title.set_fontsize(12)
plt.savefig("histogram Depression by Sleep Duration.png", dpi=300, bbox_inches="tight")
plt.show()

Sleeping "less than 5 hours" is associated with depression.

In [None]:
group = df.groupby(["Sleep Duration"])["Depression"].value_counts().unstack(fill_value=0)
group.drop(["Others"], inplace=True)
group=group.loc[["Less than 5 hours", "5-6 hours", "7-8 hours", "More than 8 hours"],:]
group

In [None]:
ax=group.plot(kind="bar", figsize=(10,6))
ax.set_xlabel("Sleep Duration")
ax.set_ylabel("Number of People")
ax.set_title("Depression Levels by Sleep Duration")
plt.legend(title="Depression")
plt.tight_layout()
plt.show()

The majority of depressed students tend to sleep less than 5 hours, whereas the majority of non-depressed students tend to sleep between 7-8 hours and more than 8 hours.

#### 8 - Dietary Habits

In [None]:
axes = df.plot(sharex=True, kind="hist", column="Depression", by="Dietary Habits", edgecolor="black", lw=1, grid=False, figsize=(6,20));
for ax in axes.flat:
    ax.title.set_fontsize(12)
plt.savefig("histogram Depression by Dietary Habits.png", dpi=300, bbox_inches="tight")
plt.show()

In [None]:
df["Dietary Habits"].value_counts(normalize=True)

In [None]:
df["Dietary Habits"] = pd.Categorical(df["Dietary Habits"], categories=["Healthy", "Moderate", "Unhealthy"])
crosstab_diet=pd.crosstab(df["Dietary Habits"], df["Depression"], margins=True)
crosstab_diet_rel=(crosstab_diet/crosstab_diet.loc["All","All"]).iloc[0:3,0:2]

In [None]:
#of all depressed/non-depressed students, what % are in each category?


ax=crosstab_diet.apply(lambda row: row/crosstab_diet.iloc[3,:], axis=1).iloc[0:3,0:2].plot(kind="bar", rot=0, fontsize=12);
for bar in ax.patches:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)
plt.tight_layout()

#interpretation: 45% of all depressed students have an unhealthy diet, whereas only 26% of all non-depressed ones.

In [None]:
ax=crosstab_diet_rel.plot(kind="bar", rot=0, fontsize=12);
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)
ax.set_title("Dietary habits (categories)", fontsize=12);
ax.set_xlabel("categories", fontsize=12);
plt.savefig("Dietary habits (categories)", dpi=300, bbox_inches="tight")
plt.tight_layout()
plt.show()

In [None]:
pivot_diet=pd.pivot_table(df, values="Depression", columns="Dietary Habits" , index="Gender", aggfunc="sum", margins=True).T
pivot_diet=(pivot_diet/pivot_diet.loc["All","All"]).iloc[0:3,0:2]
pivot_diet

In [None]:
ax=pivot_diet.plot(kind="bar", fontsize=12, color=["pink", "cyan"], rot=0);
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)
ax.set_title("Dietary habits for depressed students, by gender", fontsize=12);
ax.set_xlabel("Gender", fontsize=12);
plt.savefig("Dietary habits for depressed students, by gender", dpi=300, bbox_inches="tight")
plt.tight_layout()
plt.show()

Depressed studends tend to have a more unhealthy diet: 26% of all students are depressed and unhealthy. Of the depressed students, male students have a more unhealthy diet than female students.

#### 9 - Degree

In [None]:
#do people pursing certain degrees display higher levels of depression?

crosstab_CGPA=pd.crosstab(df["Degree"], df["Depression"], margins=True).sort_values(by=[1],ascending=False)
crosstab_CGPA_short = (crosstab_CGPA/crosstab_CGPA.loc["All","All"]).iloc[1:11,0:2]

In [None]:
ax=(crosstab_CGPA_short*100).plot(kind="bar", figsize=(10,5));
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.1f}', ha='center', va='bottom', fontsize=8)

#15.4% of all depressed people in our dataset do not have a University degree.

#class 12 = high school diploma

There are 15.4% of people that are depressed and do not have a university degree but a high school diploma (class 12). Then there is a relatively high wedge: 3.7% of the surveyed students that are depressed are enrolled in the Bachelors' in Education.

In [None]:
ax = crosstab_CGPA.apply(lambda row: row/crosstab_CGPA.loc["All",:], axis=1).iloc[1:11,0:2].plot(kind="bar", rot=0, figsize=(10,9));
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=8)



#### 10 - Suicidal Thoughts

In [None]:
ax=df.groupby("Depression")["Suicidal Thoughts"].value_counts(normalize=True).unstack().plot(kind="bar", figsize=(7,5), rot=0);
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=10)
ax.set_title("Suicidal thoughts and Depression")
plt.savefig("Suicidal thoughts and Depression", dpi=300, bbox_inches="tight")

#85% of depressed people have suicidal thoughts

In [None]:
ax=pd.crosstab(df["Work_study_hours"], df["Suicidal Thoughts"], normalize=True).plot(kind="bar", rot=0, figsize=(12,5));
ax.set_title("Work_study_hours and suicidal thoughts");
for bar in ax.patches: 
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width() / 2, height, f'{height:.2f}', ha='center', va='bottom', fontsize=12)
plt.savefig("Suicidal thoughts and work_study_hours", dpi=300, bbox_inches="tight")

#normalize="index" only normalizes wrt work study hours

#### 11 - Work/study hours

In [None]:
crosstab_study_depression=pd.crosstab(df["Work_study_hours"], df["Depression"])
crosstab_study_depression

In [None]:
# the plot shows that depression is associated with longer hours of study / work(6+)
crosstab_study_depression.plot(kind="bar", stacked=True)

In [None]:
# depressed students tend to study longer hours. Most depressed students study 10,11,12 hours.
pivot_depression_studyhours=df.pivot_table(values="Depression", index="Work_study_hours", columns="Gender", aggfunc="sum")
more_than_10_hours = pivot_depression_studyhours.loc[[ 10.0, 11.0, 12.0],:].sum(axis=0)
male_and_females_studying_more_than_10_hours=more_than_10_hours/pivot_depression_studyhours.sum(axis=0)
print(f'Depressed female students studying more than 10 hours are {round(male_and_females_studying_more_than_10_hours[0],2)} of the total')
print(f'Depressed male students studying more than 10 hours are {round(male_and_females_studying_more_than_10_hours[1],2)} of the total')
print("There is not a significant difference on gender")

In [None]:
pivot_depression_studyhours.plot(kind="bar");

#### 12 - Financial Stress

In [None]:
ax=df.groupby("Financial Stress")["Depression"].value_counts(normalize=True).unstack().plot(kind="bar", rot=0);
ax.set_title("Financial stress and depression");
plt.savefig("Financial stress and depression", dpi=300, bbox_inches="tight")

Financial stress is reported higher for students with depression

#### 13 - Family History

In [None]:
# do people that have a family history of depression have a higher tendency to be depressed themselves?
# there doesn't seem to be a strong relation here
pd.crosstab(df["Family History"], df["Depression"]).plot(kind="bar");

In [None]:
fig, axes = plt.subplots(3,3, figsize=(16,8))
pd.crosstab(df["Family History"], df["Academic Pressure"]).plot(kind="bar", ax=axes[0][0]);
pd.crosstab(df["Family History"], df["Depression"]).plot(kind="bar", ax=axes[0][1]);
pd.crosstab(df["Family History"], df["Gender"]).plot(kind="bar", ax=axes[0][2]);
pd.crosstab(df["Family History"], df["Sleep Duration"]).plot(kind="bar", ax=axes[1][0]);
pd.crosstab(df["Family History"], df["Dietary Habits"]).plot(kind="bar", ax=axes[1][1]);
pd.crosstab(df["Family History"], df["Suicidal Thoughts"]).plot(kind="bar", ax=axes[1][2]);
pd.crosstab(df["Family History"], df["Study Satisfaction"]).plot(kind="bar", ax=axes[2][0]);
 

In [None]:
axes = df.plot(sharex=True, kind="hist", column="Depression", by="Family History", edgecolor="black", lw=1, grid=False, figsize=(8,10));
for ax in axes.flat:
    ax.title.set_fontsize(12)
plt.savefig("histogram Depression by Family History.png", dpi=300, bbox_inches="tight")
plt.show()

There doesn't seem to be a strong association between those students that have a family history of depression and those that don't

## 3 - Insights and Comparison with Existing Literature

I now compare the results from the data analysis with an essay on *Depression among Indian university students and its association with perceived university academic environment, living arrangements and personal issues*, published in the Asian Journal of Psychiatry (accepted in 2016). The study, carried out in Pondicherry University, is the first attempt in India to investigate university depression. A group of 717 students were asked to fill out a structured questionnaire, measured through a five-point Likert scale, to gauge their perceptions' on the university academic environment, living arrangements, and personal issues. 

Results show that 2.4% of participants were suffering from extremely severe depression and 13% from severe depression, with no significant difference based on gender. Students with lower CGPA, poor accomodation, hostile family environment and from a lower socio-economic background, and coming from the Humanities and Social Sciences Departments, and who felt academically stressed, tended to have higher self-reported levels of depression. 

These results are, at least partially, consistent with the present data analysis, which shows that depression correlates positively with **academic pressure, financial stress, longer work/study hours**, and **negatively with age and higher study satisfaction**. 

- *Gender*: In our dataset, 44% of depressed students are female and 56% are male.
- *Age*: The majority of depressed students is aged 20, 24 and 28. With age, depression tends to wear off.
- *Cities*: Kalyan, Srinagar, Vasai-Virar, Thare and Ludhiana have a larger share of depressed students compared to the share of inhabitants they have within this group of cities.
- *Acedemic pressure*: students that recorded an academic pressure equal to 5 and are depressed amounted to almost 20% of all students. Students that recorded an academic pressure equal to 1 and are depressed amounted to only 3%.  Of all depressed students, 57% recorded high academic pressure (4 and 5).
- *CGPA*: depressed students have a higher average CGPA than non-depressed ones, with CGPA being rather consistent also across age.
- *Study satisfaction*: depressed students are also more inclined to be less satisfied by their studies.
- *Sleep duration*: most depressed students sleep less than 5 hours, whereas the majority of non-depressed students tend to sleep between 7-8 hours and more than 8 hours.
- *Diet*: 45% of all depressed students have an unhealthy diet, whereas only 26% of all non-depressed ones. Of all students, 26% are depressed and unhealthy. Of all the depressed students, male students have a more unhealthy diet. 
- *Degree*: 26% of all depressed students do not have a university degree. In total in our sample, there are 15.4% of people that are depressed and do not have a university degree but a high school diploma (class 12). Then there is a relatively high wedge: 3.7% of the surveyed students that are depressed are enrolled in the Bachelors' in Education. 
- *Financial Stress*: financial stress is larger for students with depression.
- *Family history*: there doesn't seem to be a strong association between those students that have a family history of depression and those who don't.


At the time the paper was published, Indian universities lacked counselling centres facilities, where students can be adequately assisted by trained counsellors and psychologists. 