## Mentall Illness Prevalence
In this notebook we will use the "Our World in Data" Mental Illness Prevalence dataset, which can be accessed [here](https://ourworldindata.org/grapher/mental-illnesses-prevalence). This dataset has the estimated share of people with each mental illness in a given year from 1990 to 2019, whether or not they were diagnosed, based on representative surveys, medical data and statistical modeling.

The included illnesses are Schizophrenia, Depressive disorders, Anxiety disorders, Bipolar disorder, and Eating disorders. We select the five countries with the highest variance across the years with respect to each of the above mentioned illnesses and plot a stacked bar plot for each of these. We make 30 plots for each year to see the change.

We also make individual plots of each country with only the highest variance illness to see this change.

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [5]:
df = pd.read_csv("../dat/mental-illnesses-prevalence.csv")
selected_columns = ['Entity', 'Code', 'Year', 'Schizophrenia', 'Depressive disorders', 'Anxiety disorders', 'Bipolar disorder', 'Eating disorders']
df.columns = selected_columns
df

Unnamed: 0,Entity,Code,Year,Schizophrenia,Depressive disorders,Anxiety disorders,Bipolar disorder,Eating disorders
0,Afghanistan,AFG,1990,0.223206,4.996118,4.713314,0.703023,0.127700
1,Afghanistan,AFG,1991,0.222454,4.989290,4.702100,0.702069,0.123256
2,Afghanistan,AFG,1992,0.221751,4.981346,4.683743,0.700792,0.118844
3,Afghanistan,AFG,1993,0.220987,4.976958,4.673549,0.700087,0.115089
4,Afghanistan,AFG,1994,0.220183,4.977782,4.670810,0.699898,0.111815
...,...,...,...,...,...,...,...,...
6415,Zimbabwe,ZWE,2015,0.201042,3.407624,3.184012,0.538596,0.095652
6416,Zimbabwe,ZWE,2016,0.201319,3.410755,3.187148,0.538593,0.096662
6417,Zimbabwe,ZWE,2017,0.201639,3.411965,3.188418,0.538589,0.097330
6418,Zimbabwe,ZWE,2018,0.201976,3.406929,3.172111,0.538585,0.097909


In [12]:
# We extract a list of countries as these don't have information for every year so we need to query them individually
countries = np.array(list(set(df["Entity"])))
stds = []
for country in countries:
    country_data = df[df['Entity'] == country]
    sch = country_data['Schizophrenia'].std(axis=0)
    dep = country_data['Depressive disorders'].std(axis=0)
    anx = country_data['Anxiety disorders'].std(axis=0)
    bip = country_data['Bipolar disorder'].std(axis=0)
    eat = country_data['Eating disorders'].std(axis=0)
    stds.append([sch, dep, anx, bip, eat])
stds = np.array(stds)

In [16]:
# Extracting the needed data. The prevalence of Schizophrenia in Equatorial Guinea, Depressive Disorders in Cuba, Anxiety Disorders in Brazil, 
# Bipolar disorder in Argentina, and Eating disorders in Australia
selected_countries = countries[np.argmax(stds, axis=0)]
selected_countries

array(['Equatorial Guinea', 'Cuba', 'Brazil', 'Argentina', 'Australia'],
      dtype='<U32')

In [None]:
# We make stacked bar plots from each of these countries from 1990 to 2019. 
for year in range(1990,2020):
    selected_columns = ['Schizophrenia', 'Depressive disorders', 'Anxiety disorders', 'Bipolar disorder', 'Eating disorders']

    # Filter data for selected countries
    filtered_data = prevalence_DF[prevalence_DF['Entity'].isin(selected_countries)  & (prevalence_DF['Year'] == year) ]

    # Group by 'Entity' (country) and calculate the mean percentage for each disorder
    grouped_data = filtered_data.groupby('Entity')[selected_columns].mean()

    # Normalize the data to get percentages
    grouped_data_percentage = grouped_data

    # Plotting stacked bar chart
    grouped_data_percentage.plot(kind='bar', stacked=True, colormap='viridis')
    plt.title('Estimated share of people with each mental illness in selected countries ('+str(year)+')')
    plt.xlabel('Country')
    plt.ylabel('Estimated share')
    plt.ylim(0, 15)
    plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
    plt.savefig("prevalence_"+str(year)+".png", dpi=300, bbox_inches='tight')

In [18]:
# We make plots of the estimated share of people in the countries with the highest variance in time for the corresponding illness
years = range(1990,2020)
selected_columns = ['Schizophrenia', 'Depressive disorders', 'Anxiety disorders', 'Bipolar disorder', 'Eating disorders']
for i in range(len(selected_columns)):
    illness = selected_columns[i]
    country = selected_countries[i]
    
    # Group by 'Entity' (country) and calculate the mean percentage for each disorder
    filtered_data = prevalence_DF[prevalence_DF['Entity'].isin([country]) ][illness]
    plt.plot(years, filtered_data)
    plt.title('Estimated share of people with '+illness+' in '+country+'('+str(year)+')')
    plt.savefig(country+"_"+illness+"_"+str(year)+".png", dpi=300, bbox_inches='tight')
    plt.close('all')