# What accommodations best prevent suicide?
The dataset contains age-standardised suicide rates from 2015 and 2016 by country and year, as well as counts for resources and people, such as mental health units in hospitals, outpatient facilities, and psychiatrists working in the sector. So age is done and this project's job is to adjust for everything else, so to speak. This is focussed on 2016 for being the more recent year and having far fewer missing figures.

There will of course be plenty of other factors not in the dataset, such as income levels and (in terms of comparing countries' *reported* numbers, at least) how much suicide is stigmatised, so 1) a country's rate being high or low for its region is as relevant as a comparison to the world average, 2) this is almost entirely about things like numbers of facilities rather than regional effects anyway, and 3) remember this is just about the factors presented, not a complete picture.

Warning: I will treat this subject with levity because I'm sad.

In [60]:
import pandas as pd
import plotly.express as px

In [61]:
df = pd.read_csv("suicide_dataset-11.csv")
df = df.replace({'Yes': True, 'No': False})
df2015 = df[df["year"] == 2015]
df2016 = df[df["year"] == 2016]

I cleaned the data in a spreadsheet program before I put it in here, but it's good to check things in code too.

In [62]:
df.duplicated().sum()

0

In [63]:
df2015.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 549 entries, 0 to 548
Data columns (total 22 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   country                            549 non-null    object 
 1   iso                                549 non-null    object 
 2   sex                                549 non-null    object 
 3   year                               549 non-null    int64  
 4   suicide_rate                       549 non-null    float64
 5   mental_hospitals_per_100k          60 non-null     float64
 6   general_h_units_per_100k           51 non-null     float64
 7   outpatient_facilities_per_100k     51 non-null     float64
 8   day_treatment_facilities_per_100k  30 non-null     float64
 9   comres_facilities_per_100k         36 non-null     float64
 10  psychiatrists_per_100k             54 non-null     float64
 11  nurses_per_100k                    48 non-null     float64

In [64]:
df2016.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 549 entries, 549 to 1097
Data columns (total 22 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   country                            549 non-null    object 
 1   iso                                549 non-null    object 
 2   sex                                549 non-null    object 
 3   year                               549 non-null    int64  
 4   suicide_rate                       549 non-null    float64
 5   mental_hospitals_per_100k          270 non-null    float64
 6   general_h_units_per_100k           303 non-null    float64
 7   outpatient_facilities_per_100k     291 non-null    float64
 8   day_treatment_facilities_per_100k  147 non-null    float64
 9   comres_facilities_per_100k         132 non-null    float64
 10  psychiatrists_per_100k             306 non-null    float64
 11  nurses_per_100k                    264 non-null    floa

Nice. And you can see what I mean with 2016 having more to work with. There were a few other years that were even more uselessly empty and I got rid of them beforehand, but 2015 at least deserved to be counted properly.

In [65]:
# From here on, having no extra letter shall mean combined statistics and -g shall be separated by gender. That's the easiest way for me to remember things.
df2015g = df2015[df2015["sex"] != "Both"]
df2015 = df2015[df2015["sex"] == "Both"]
df2016g = df2016[df2016["sex"] != "Both"]
df2016 = df2016[df2016["sex"] == "Both"]

Here are the rates by country for 2016 (and the 2015 numbers aren't much different):

In [66]:
fig = px.choropleth(df2016, projection="winkel tripel", locations="iso", color="suicide_rate", color_continuous_scale=px.colors.sequential.Bluered,
                    labels={"suicide_rate": "Suicide rate"})
fig.show()

The rate is over one year per 100 000 people. This is also true of all of the other charts.

If it looks like there are more countries with lower rates, it's because there are. Here's a breakdown for 2016:

In [67]:
fig = px.histogram(df2016, x="suicide_rate")
fig.update_xaxes(title_text="Suicide rate")
fig.update_yaxes(title_text="Countries")
fig.show()

## Time for a bunch of linear regressions!

In [68]:
fig = px.scatter(df2016, x="mental_hospitals_per_100k", y="suicide_rate", trendline="ols")
fig.update_xaxes(title_text="Mental hospitals per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.show()

### Time for a bunch of linear regressions with arbitrary outlier cutoffs!

I'll save you all the other pre-filtering scatterplots. Also, most of the charts will have separate sex statistics since there are differences I find interesting. If a chart doesn't, the implication is they're similar enough that it's better off combined.

In [69]:
fig = px.scatter(df2016g[df2016g["mental_hospitals_per_100k"] < 0.5], x="mental_hospitals_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Mental hospitals per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

This seems like the most basic statistic there is, and that positive trend is worrying. Even if you look at that tempting river and cut it down to <0.1, the data make an annoying concave shape. But that's okay. This is a good time to mention that there are basically three "genres of accommodation" in the dataset:

* Facilities (per 100 000 people: mental hospitals, *beds in* mental hospitals, mental health units in general hospitals, *beds for mental health in* general hospitals, mental health outpatient facilities, mental health day treatment facilities, community residential facilities, and *beds in* community residential facilities)
* People (working in the mental health sector, per 100 000 people: psychiatrists, nurses, social workers, and psychologists)
* Government (fraction of government expenditure on mental health which was on mental hospitals, whether the country has a standalone law for mental health, when this law was enacted, whether the country has a standalone policy or plan for mental health, and when this policy was published)

Surely the number of beds is a better measure than the number of facilities, right?

In [70]:
fig = px.scatter(df2016g[df2016g["mental_h_beds_per_100k"] < 50], x="mental_h_beds_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Beds in mental hospitals per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

It's another positive trend for males and a lack of a trend for females. A milder case, but still. And there's another even more tempting place than last time to cut off the already cut off data that makes the figure look even more useless. I know this is the opposite of data, but here's an anecdote I don't remember the details of: I saw someone saying they lied to a doctor or psychologist of some kind to avoid being put in a mental hospital because it wouldn't have been good for them. Seems like they aren't the only one. Or countries with a bigger problem will tend to have more accommodations and this entire analysis is useless, but let's not make that our base assumption.

In [71]:
fig = px.scatter(df2016[df2016["general_h_beds_per_100k"] < 25], x="general_h_beds_per_100k", y="suicide_rate", trendline="ols")
fig.update_xaxes(title_text="Beds for mental health in general hospitals per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.show()

With beds in general hospitals, the trend you see very much depends on where you cut the data off, if you find one at all. Now let's complete the triad of bed statistics:

In [72]:
fig = px.scatter(df2016g[df2016g["comres_beds_per_100k"] < 15], x="comres_beds_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Beds in community residental facilities per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

A negative trend! I had to look up "community residential facility" and different areas can't seem to agree on what the exact definition is so I don't even know what this means, and it's another one where if you cut it off again you get a confusing shape, but hooray!

To complete the Facilities category I've made up, we'll look at outpatient and day treatment facilities. It could be argued that these are similar enough to make a good combined statistic, but some years in countries have data on one and not the other.

In [73]:
fig = px.scatter(df2016[df2016["outpatient_facilities_per_100k"] < 5], x="outpatient_facilities_per_100k", y="suicide_rate", trendline="ols")
fig.update_xaxes(title_text="Outpatient facilities per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.show()

It's a good thing that 10 line is there or I wouldn't be able to tell which direction the trend line is going.

In [74]:
fig = px.scatter(df2016g[df2016g["day_treatment_facilities_per_100k"] < 2], x="day_treatment_facilities_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Day treatment facilities per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

Conclusion: mental hospitals and anything related to them are a waste of resources.

### Scatterplots, part 2: Surely psychologists are useful?
But first, psychiatrists.

In [75]:
fig = px.scatter(df2016g[df2016g["psychiatrists_per_100k"] < 2], x="psychiatrists_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Psychiatrists per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

It's only for one gender, but we've found a negative trend that just might actually mean something. Although in absolute rather than percentage terms, the lines are very close, and this isn't the only chart that's true of. Is there a certain percentage of male suicides that are inevitable, or perhaps a certain type we haven't figured out how to get to? The WHO states confidently on their website that suicides are preventable, so probably the latter. Not that I'm trying to pretend the field didn't know about this gender difference already.

In [76]:
fig = px.scatter(df2016g[df2016g["nurses_per_100k"] < 20], x="nurses_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Nurses per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

I'm not seeing that much. It makes sense, nurses aren't strongly associated with suicide prevention the same way a few other occupations are.

In [77]:
fig = px.scatter(df2016g[df2016g["social_workers_per_100k"] < 1], x="social_workers_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Social workers per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

Now this one has a gender difference. Almost nothing for males, but for females we've got something that could be significant. This is probably the most interesting one.

In [78]:
fig = px.scatter(df2016g[df2016g["psychologists_per_100k"] < 4], x="psychologists_per_100k", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Psychologists per 100 000 people"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

Another one with a big percentage difference and much less of an absolute one. What does it mean? What are we missing?

Conclusion: don't pull all psychological education funding. The therapy sector does a little bit, maybe.

### Scatterplots, part 3: Government

In [79]:
fig = px.scatter(df2016g[df2016g["hospital_budget_pct"] < 5], x="hospital_budget_pct", y="suicide_rate", trendline="ols", color="sex")
fig.update_xaxes(title_text="Percent of government mental health budget used on mental hospitals"); fig.update_yaxes(title_text="Suicide rate")
fig.update_layout(legend_title_text="Sex")
fig.show()

Well, we already knew what good all those mental hospitals are.

In [80]:
fig = px.histogram(df2016, x="standalone_law", y="suicide_rate", barmode="group", histfunc="avg")
fig.update_xaxes(title_text="Goverment has standalone law regarding mental health"); fig.update_yaxes(title_text="Suicide rate (averaged by country)")
fig.show()

Uhhhh.

In [81]:
fig = px.histogram(df2016, x="standalone_policy", y="suicide_rate", barmode="group", histfunc="avg")
fig.update_xaxes(title_text="Goverment has standalone policy or plan for mental health"); fig.update_yaxes(title_text="Suicide rate (averaged by country)")
fig.show()

Oh, look, having a plan makes it worse. Again, or those are the countries that need it. Remember that this whole document could have the wrong idea.

## Conclusion
If you're considering suicide, talk to a professional. It's the only thing that does anything.