# Content
Data from a study of serious suicide attempts over three years in a predominantly rural population in Shandong, China.

## Format
A data frame with 2571 observations on the following 11 variables.

- Person_ID: ID number
- Hospitalised: yes/no
- Died: yes/no
- Urban: yes/no/unknown
- Year: 2009, 2010, or 2011
- Month: 1=Jan through 12=December
- Sex: female/male
- Age: years
- Education: Education level (illiterate, primary, secondary, tertiary, or unknown)
- Occupation: 1-10 occupation categories
- method: 1-9 possible methods

## Source
Sun J, Guo X, Zhang J, Wang M, Jia C, Xu A (2015) "Incidence and fatality of serious suicide attempts in a predominantly rural population in Shandong, China: a public health surveillance study," BMJ Open 5(2): e006762. https://doi.org/10.1136/bmjopen-2014-006762

## Questions:
1. How many survived and how many died? Hospitalised and non hospitalised.
2. What year had the highest suicide attempts and deaths?
3. What is the year with the highest number of deaths from attempted suicide, considering hospitalised and not hospitalised cases?
4. What month had the highest suicide attempts and deaths?
5. What gender had the highest suicide attempts and deaths?
6. What occupation had the highest suicide attempts and deaths?
7. What group of age had the highest suicide attempts and deaths?
8. Which method was most commonly used in suicide attempts?
9. What method of suicide has the highest fatality rate?
10. What is the number of suicide attempts and deaths among individuals with different levels of education, such as primary, secondary, and tertiary?

## Conclusion

In conclusion, the dataset highlights the serious issue of attempted suicide, with an overall fatality rate of 48.85%. Those who did not receive hospitalization had a 100% fatality rate, while hospitalized individuals had a lower rate of 15.33%. It is crucial to note that the survival rate for those who received hospitalization was 84.67%, whereas for those who did not, the survival rate was 0%.

It is deeply concerning that drowning, hanging, and jumping had reported fatality rates of 100%, 97.22%, and 80%, respectively. This highlights the crucial need to increase awareness about the risks associated with different suicide methods and implement effective prevention measures.

Pesticide ingestion was the most commonly used method, accounting for 68.77% of cases, followed by hanging at 16.76%. Pesticide ingestion had a high fatality rate of 43.55%, underscoring the urgent need to address this issue and prevent its use as a suicide method.

Notably, farmers accounted for 79.04% of the total 2571 suicide attempts, which helps to explain the high prevalence of pesticide ingestion as a method of suicide.

Given the sensitivity of this issue, it is essential to approach it with empathy and compassion and emphasize the importance of providing support and resources to individuals struggling with suicidal thoughts. Preventing suicide requires a multifaceted approach, including early identification, intervention, and ongoing support.

In [1]:
import pandas as pd

The data is in CSV format, I'll insert it to SQLite then do data manipulations to answer my questions.

In [2]:
df = pd.read_csv(r'SuicideChina.csv').drop(columns=["Unnamed: 0"])
df.columns = df.columns.str.lower()
df.sample(5)

Unnamed: 0,person_id,hospitalised,died,urban,year,month,sex,age,education,occupation,method
1921,1922,yes,yes,no,2009,6,male,61,Secondary,farming,Pesticide
2445,2446,no,yes,no,2011,12,male,56,Secondary,farming,Pesticide
237,238,yes,no,no,2009,1,female,89,primary,others/unknown,Other poison
546,547,yes,no,no,2010,6,male,19,unknown,others/unknown,Pesticide
474,475,yes,yes,no,2009,8,female,86,iliterate,farming,Pesticide


In [3]:
df.nunique()

person_id       2571
hospitalised       2
died               2
urban              3
year               3
month             12
sex                2
age               87
education          5
occupation        10
method             9
dtype: int64

In [4]:
df.describe().round(2)

Unnamed: 0,person_id,year,month,age
count,2571.0,2571.0,2571.0,2571.0
mean,1286.0,2010.05,6.3,52.63
std,742.33,0.79,3.2,19.78
min,1.0,2009.0,1.0,12.0
25%,643.5,2009.0,4.0,37.0
50%,1286.0,2010.0,6.0,53.0
75%,1928.5,2011.0,9.0,69.0
max,2571.0,2011.0,12.0,100.0


In [5]:
df.education.value_counts()

Secondary    1280
primary       659
iliterate     533
unknown        80
Tertiary       19
Name: education, dtype: int64

We have a typo in our data. Iliterate should be illiterate, we'll fix this before creating the database.

In [6]:
df.education = df.education.replace(to_replace='iliterate', value='illiterate')
df.education.value_counts()

Secondary     1280
primary        659
illiterate     533
unknown         80
Tertiary        19
Name: education, dtype: int64

# Data Wrangling

In [7]:
%load_ext sql
%sql sqlite:///suicide_attempts.db

## Creating the database

Let's load the SQL in Jupyter and create a Database.

In [8]:
%%sql sqlite://

CREATE TABLE shangdong_china (
    Person_ID INTEGER PRIMARY KEY,
    Hospitalised TEXT COLLATE NOCASE CHECK(Hospitalised IN ('yes', 'no')),
    Died TEXT COLLATE NOCASE CHECK(Died IN ('yes', 'no')),
    Urban TEXT COLLATE NOCASE CHECK(Urban IN ('yes', 'no', 'unknown')),
    Year INTEGER CHECK(Year IN (2009, 2010, 2011)),
    Month INTEGER CHECK(Month BETWEEN 1 AND 12),
    Sex TEXT COLLATE NOCASE CHECK(Sex IN ('female', 'male')),
    Age INTEGER,
    Education TEXT COLLATE NOCASE CHECK(Education IN ('illiterate', 'primary', 'secondary', 'tertiary', 'unknown')),
    Occupation TEXT,
    Method TEXT
)

Done.


[]

I want to add some constraints to ensure that the data we are getting from the **CSV** file is correct based on its format. Additionally, I want to make sure that the string comparisons are case-insensitive. To achieve this, I will use the **COLLATE NOCASE** clause in the **CHECK** constraints.

Let's view the table that we created

In [9]:
%%sql sqlite://

PRAGMA table_info(shangdong_china)

Done.


cid,name,type,notnull,dflt_value,pk
0,Person_ID,INTEGER,0,,1
1,Hospitalised,TEXT,0,,0
2,Died,TEXT,0,,0
3,Urban,TEXT,0,,0
4,Year,INTEGER,0,,0
5,Month,INTEGER,0,,0
6,Sex,TEXT,0,,0
7,Age,INTEGER,0,,0
8,Education,TEXT,0,,0
9,Occupation,TEXT,0,,0


## Inserting the data

Now I'll insert the CSV to our SQL. In pandas we can use .to_sql method to do it.

In [10]:
import sqlite3

conn = sqlite3.connect('suicide_attempts.db')
df.to_sql('shangdong_china', conn, if_exists='append', index=False)

2571

## Cleaning the data

Let's view the database, I'll view it randomly so I can get a hold of the information on what this data is.

In [11]:
%%sql sqlite://

SELECT * 
FROM shangdong_china 
ORDER BY RANDOM() 
LIMIT 10

Done.


Person_ID,Hospitalised,Died,Urban,Year,Month,Sex,Age,Education,Occupation,Method
2018,yes,yes,no,2009,3,female,84,Secondary,farming,Pesticide
1770,no,yes,no,2009,8,female,83,primary,farming,Hanging
1506,no,yes,no,2011,4,male,72,illiterate,farming,Hanging
1520,yes,no,no,2009,8,female,53,primary,household,Pesticide
1776,no,yes,no,2010,4,male,60,primary,farming,Hanging
2417,no,yes,no,2010,1,male,65,Secondary,farming,Hanging
1034,no,yes,no,2010,10,male,74,illiterate,farming,Hanging
1672,yes,no,no,2009,9,female,73,primary,farming,Pesticide
1938,no,yes,no,2010,9,male,45,Secondary,farming,Hanging
1755,yes,yes,no,2011,11,male,73,primary,farming,Pesticide


### Checking for null values

#### Null values observation

This table doesn't contain any null values.

In [12]:
%%sql sqlite://
SELECT *
FROM shangdong_china
WHERE Person_ID IS NULL
OR Hospitalised IS NULL
OR Died IS NULL
OR Urban IS NULL
OR Year IS NULL
OR Month IS NULL
OR Sex IS NULL
OR Age IS NULL
OR Education IS NULL
OR Occupation IS NULL
OR Method IS NULL

Done.


Person_ID,Hospitalised,Died,Urban,Year,Month,Sex,Age,Education,Occupation,Method


### Checking for duplicates

#### Duplicates observation

It is expected that there will be duplicates in several columns such as gender, year and month of suicide attempt, hospitalised, died, urban, sex, education, occupation, and method. So, it is not necessary to drop or delete any data.

In [13]:
%%sql sqlite://
SELECT Person_ID, COUNT(Person_ID) as duplicates
FROM shangdong_china
GROUP BY Person_ID
HAVING COUNT(Person_ID) > 1

Done.


Person_ID,duplicates


There's no duplicate in the Person_ID which is expected from a unique id.

# Exploratory Data Analysis

In [14]:
%%sql sqlite://

SELECT * 
FROM shangdong_china 
ORDER BY RANDOM() 
LIMIT 5

Done.


Person_ID,Hospitalised,Died,Urban,Year,Month,Sex,Age,Education,Occupation,Method
920,no,yes,yes,2010,6,male,57,Secondary,household,Jumping
1168,yes,no,no,2011,9,male,24,Secondary,farming,Pesticide
2014,yes,no,no,2009,11,male,58,Secondary,farming,Pesticide
971,yes,yes,yes,2011,11,male,27,Secondary,others/unknown,Poison unspec
1794,yes,no,no,2009,5,female,25,Secondary,farming,Pesticide


## Question: How many survived and how many died? Hospitalised and non hospitalised.

### Answer:

A total of **2571 suicide attempts** were recorded, with **1256 resulting in death**, and **1315 resulting in survival**.

**1553 people were hospitalised** and **1018 people were not hospitalised**. Out of those hospitalised, **238 individuals died** while **1315 individuals survived**. On the other hand, out of the **1018 people who were not hospitalised, all of them died**.

In [15]:
%%sql sqlite://
    
SELECT 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as deaths,
    SUM(CASE WHEN Hospitalised = 'no' THEN 1 ELSE 0 END) as non_hospitalised,
    SUM(CASE WHEN Hospitalised = 'yes' THEN 1 ELSE 0 END) as hospitalised,
    SUM(CASE WHEN Hospitalised = 'no' AND Died = 'yes' THEN 1 ELSE 0 END) as non_hospitalised_deaths,
    SUM(CASE WHEN Hospitalised = 'yes' AND Died = 'yes' THEN 1 ELSE 0 END) as hospitalised_deaths,
    SUM(CASE WHEN Hospitalised = 'no' AND Died = 'no' THEN 1 ELSE 0 END) as non_hospitalised_survivors,
    SUM(CASE WHEN Hospitalised = 'yes' AND Died = 'no' THEN 1 ELSE 0 END) as hospitalised_survivors
FROM shangdong_china

Done.


suicide_attempts,survivors,deaths,non_hospitalised,hospitalised,non_hospitalised_deaths,hospitalised_deaths,non_hospitalised_survivors,hospitalised_survivors
2571,1315,1256,1018,1553,1018,238,0,1315


## Question: What year had the highest suicide attempts and deaths?

### Answer:
The year with the highest number of suicide attempts was **2010**, with a total of **956 attempts**. This was followed by **2011** with **866 attempts**, and **2009** with **749 attempts**.

In [16]:
%%sql sqlite://

SELECT 
    Year, 
    COUNT(*) as suicide_attempts
FROM shangdong_china 
GROUP BY Year

Done.


Year,suicide_attempts
2009,749
2010,956
2011,866


## Question: What is the year with the highest number of deaths from attempted suicide, considering hospitalised and not hospitalised cases?

### Answer:
In **2010**, the number of individuals who died from attempted suicide **without hospitalization** was the highest at **398**, while the highest number of deaths for those who were **hospitalized** occurred in **2011**, with **102** individuals.

In [17]:
%%sql sqlite://

SELECT 
    Year, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as deaths,
    SUM(CASE WHEN Hospitalised = 'no' THEN 1 ELSE 0 END) as non_hospitalised,
    SUM(CASE WHEN Hospitalised = 'yes' THEN 1 ELSE 0 END) as hospitalised,
    SUM(CASE WHEN Hospitalised = 'no' AND Died = 'yes' THEN 1 ELSE 0 END) as non_hospitalised_deaths,
    SUM(CASE WHEN Hospitalised = 'yes' AND Died = 'yes' THEN 1 ELSE 0 END) as hospitalised_deaths,
    SUM(CASE WHEN Hospitalised = 'no' AND Died = 'no' THEN 1 ELSE 0 END) as non_hospitalised_survivors,
    SUM(CASE WHEN Hospitalised = 'yes' AND Died = 'no' THEN 1 ELSE 0 END) as hospitalised_survivors,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china 
GROUP BY Year

Done.


Year,suicide_attempts,survivors,deaths,non_hospitalised,hospitalised,non_hospitalised_deaths,hospitalised_deaths,non_hospitalised_survivors,hospitalised_survivors,fatality_rate
2009,749,423,326,279,470,279,47,0,423,43.52
2010,956,469,487,398,558,398,89,0,469,50.94
2011,866,423,443,341,525,341,102,0,423,51.15


## Question: What month had the highest suicide attempts and deaths?

### Answer:

Between 2009 and 2011, **June** recorded the **highest number of suicide attempts** with a total of **284**, while **December** had the **lowest** with **137 attempts**. **May** recorded the **highest number of deaths** with **135**, while **November** had the **lowest** with **75 deaths**. 

**March** had the **highest fatality rate** for attempted suicide at **60.53%**.

In [18]:
%%sql sqlite://

SELECT 
    Month, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as total_survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as total_deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china 
GROUP BY Month

Done.


Month,suicide_attempts,total_survivors,total_deaths,fatality_rate
1,201,101,100,49.75
2,208,103,105,50.48
3,190,75,115,60.53
4,208,102,106,50.96
5,263,128,135,51.33
6,284,154,130,45.77
7,247,144,103,41.7
8,229,141,88,38.43
9,241,134,107,44.4
10,211,100,111,52.61


## Question: What gender had the highest suicide attempts and deaths?

### Answer:
In terms of **suicide attempts**, **females had a higher number than males**, with **1328 attempts** compared to **1243 attempts for males**. However, when it comes to **deaths**, **males had a higher number** with a total of **669 deaths**, while **females** had **587 deaths**.

In [19]:
%%sql sqlite://

SELECT 
    Sex, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as total_survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as total_deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china 
GROUP BY Sex
ORDER BY suicide_attempts DESC

Done.


Sex,suicide_attempts,total_survivors,total_deaths,fatality_rate
female,1328,741,587,44.2
male,1243,574,669,53.82


## Question: What occupation had the highest suicide attempts and deaths?

### Answer:

The occupation that recorded the highest number of suicide attempts and deaths was **Farmer**, with a total of **2032 attempts** and **1093 deaths**. On the other hand, **retirees** had the lowest suicide attempt rate, with only **3 attempts** and **no recorded deaths**.

**Workers** had the **highest fatality rate** for attempted suicide, which was **100%**, followed by **professionals** with a rate of **70.27%**.

In [20]:
%%sql sqlite://

SELECT 
    Occupation, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as total_survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as total_deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china 
GROUP BY Occupation
ORDER BY suicide_attempts DESC

Done.


Occupation,suicide_attempts,total_survivors,total_deaths,fatality_rate
farming,2032,939,1093,53.79
household,248,154,94,37.9
others/unknown,156,146,10,6.41
professional,37,11,26,70.27
student,35,27,8,22.86
unemployed,30,23,7,23.33
business/service,21,12,9,42.86
worker,6,0,6,100.0
retiree,3,3,0,0.0
others,3,0,3,100.0


## Question: What group of age had the highest suicide attempts and deaths?

### Answer:

The group of age that had the highest suicide attempt is **old age (60+)** with a total of **982 attempts** and **726 deaths**, while the lowest is **adolescence/teenage (10-19)** years with a total of **65 attempts** and **14 deaths**.

I'll group the age by:
- adolescence or teenage years (10-19)
- young or early adulthood (approximately aged 20–39)
- middle adulthood (40–59)
- old age (60+).

In [21]:
%%sql sqlite://

SELECT 
    CASE 
        WHEN Age >= 10 AND Age <= 19 THEN '10-19' 
        WHEN Age >= 20 AND Age <= 39 THEN '20-39'
        WHEN Age >= 40 AND Age <= 59 THEN '40-59'
        WHEN Age >= 60 THEN '60+' 
        ELSE 'Unknown' 
    END AS age_bracket,
    COUNT(*) AS suicide_attempts,
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as total_survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as total_deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china
GROUP BY age_bracket
ORDER BY age_bracket

Done.


age_bracket,suicide_attempts,total_survivors,total_deaths,fatality_rate
10-19,65,51,14,21.54
20-39,654,518,136,20.8
40-59,870,490,380,43.68
60+,982,256,726,73.93


## Question: Which method was most commonly used in suicide attempts?

### Answer:

The most commonly used method for suicide attempts was **Pesticide**, with **1768 attempts**, while the least commonly used method was **Jumping**, with only **15 attempts**.

In [22]:
%%sql sqlite://

SELECT Method, COUNT(*) as suicide_attempts
FROM shangdong_china 
GROUP BY Method

Done.


Method,suicide_attempts
Cutting,29
Drowning,26
Hanging,431
Jumping,15
Other poison,146
Others,1
Pesticide,1768
Poison unspec,107
unspecified,48


## Question: What method of suicide has the highest fatality rate?

### Answer:

**Drowning** had the **highest fatality rate** at **100%**, followed by **hanging** at **97.22%**. On the other hand, **poisoning (unspecified)** had the **lowest fatality rate** at **2.8%**.

Fatality rate = (deaths / (cases)) * 100

In [23]:
%%sql sqlite://

SELECT 
    Method, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as total_survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as total_deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china 
GROUP BY Method
ORDER BY fatality_rate DESC

Done.


Method,suicide_attempts,total_survivors,total_deaths,fatality_rate
Drowning,26,0,26,100.0
Hanging,431,12,419,97.22
Jumping,15,3,12,80.0
Pesticide,1768,998,770,43.55
Cutting,29,21,8,27.59
Other poison,146,131,15,10.27
unspecified,48,45,3,6.25
Poison unspec,107,104,3,2.8
Others,1,1,0,0.0


## Question: What is the number of suicide attempts and deaths among individuals with different levels of education, such as primary, secondary, and tertiary?

### Answer:

**Secondary level had the highest number of suicide attempts**, which amounted to **1280 cases**. Out of these attempts, **314 resulted in death**. On the other hand, **primary level** had the **highest number of total deaths**, which were **468** in number.

The **fatality rate of suicide attempts is highest** among individuals with an **illiterate education level**, with a rate of **84.99%**. On the other hand, those with a **secondary education level** have the **lowest fatality rate** at **24.53%**.

In [24]:
%%sql sqlite://

SELECT 
    Education, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china 
GROUP BY Education
ORDER BY suicide_attempts DESC

Done.


Education,suicide_attempts,survivors,deaths,fatality_rate
Secondary,1280,966,314,24.53
primary,659,191,468,71.02
illiterate,533,80,453,84.99
unknown,80,69,11,13.75
Tertiary,19,9,10,52.63


## Conclusion

In conclusion, the dataset highlights the serious issue of attempted suicide, with an overall fatality rate of 48.85%. Those who did not receive hospitalization had a 100% fatality rate, while hospitalized individuals had a lower rate of 15.33%. It is crucial to note that the survival rate for those who received hospitalization was 84.67%, whereas for those who did not, the survival rate was 0%.

It is deeply concerning that drowning, hanging, and jumping had reported fatality rates of 100%, 97.22%, and 80%, respectively. This highlights the crucial need to increase awareness about the risks associated with different suicide methods and implement effective prevention measures.

Pesticide ingestion was the most commonly used method, accounting for 68.77% of cases, followed by hanging at 16.76%. Pesticide ingestion had a high fatality rate of 43.55%, underscoring the urgent need to address this issue and prevent its use as a suicide method.

Notably, farmers accounted for 79.04% of the total 2571 suicide attempts, which helps to explain the high prevalence of pesticide ingestion as a method of suicide.

Given the sensitivity of this issue, it is essential to approach it with empathy and compassion and emphasize the importance of providing support and resources to individuals struggling with suicidal thoughts. Preventing suicide requires a multifaceted approach, including early identification, intervention, and ongoing support.

In [25]:
%%sql sqlite://

SELECT 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate
FROM shangdong_china

Done.


suicide_attempts,survivors,deaths,fatality_rate
2571,1315,1256,48.85


In [26]:
%%sql sqlite://

SELECT
    Hospitalised,
    COUNT(*) as person_count, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate,
    ROUND((CAST(SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as survival_rate
FROM shangdong_china
GROUP BY Hospitalised

Done.


Hospitalised,person_count,survivors,deaths,fatality_rate,survival_rate
no,1018,0,1018,100.0,0.0
yes,1553,1315,238,15.33,84.67


In [27]:
%%sql sqlite://

SELECT 
    Method, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as total_survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as total_deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate,
    ROUND((CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*) FROM shangdong_china)) * 100, 2) as frequency_rate,
    ROUND((CAST(SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as survival_rate
FROM shangdong_china 
GROUP BY Method
ORDER BY frequency_rate DESC

Done.


Method,suicide_attempts,total_survivors,total_deaths,fatality_rate,frequency_rate,survival_rate
Pesticide,1768,998,770,43.55,68.77,56.45
Hanging,431,12,419,97.22,16.76,2.78
Other poison,146,131,15,10.27,5.68,89.73
Poison unspec,107,104,3,2.8,4.16,97.2
unspecified,48,45,3,6.25,1.87,93.75
Cutting,29,21,8,27.59,1.13,72.41
Drowning,26,0,26,100.0,1.01,0.0
Jumping,15,3,12,80.0,0.58,20.0
Others,1,1,0,0.0,0.04,100.0


In [31]:
%%sql sqlite://

SELECT 
    Occupation, 
    COUNT(*) as suicide_attempts, 
    SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) as survivors,
    SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) as deaths,
    ROUND((CAST(SUM(CASE WHEN Died = 'yes' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as fatality_rate,
    ROUND((CAST(COUNT(*) AS FLOAT) / (SELECT COUNT(*) FROM shangdong_china)) * 100, 2) as frequency_rate,
    ROUND((CAST(SUM(CASE WHEN Died = 'no' THEN 1 ELSE 0 END) AS FLOAT) / COUNT(*)) * 100, 2) as survival_rate
FROM shangdong_china 
GROUP BY Occupation
ORDER BY frequency_rate DESC

Done.


Occupation,suicide_attempts,survivors,deaths,fatality_rate,frequency_rate,survival_rate
farming,2032,939,1093,53.79,79.04,46.21
household,248,154,94,37.9,9.65,62.1
others/unknown,156,146,10,6.41,6.07,93.59
professional,37,11,26,70.27,1.44,29.73
student,35,27,8,22.86,1.36,77.14
unemployed,30,23,7,23.33,1.17,76.67
business/service,21,12,9,42.86,0.82,57.14
worker,6,0,6,100.0,0.23,0.0
retiree,3,3,0,0.0,0.12,100.0
others,3,0,3,100.0,0.12,0.0
