# Business Understanding

Bunuh diri merupakan permasalahan sosial yang serius di India dan dunia. Dengan populasi besar dan berbagai tantangan sosial, ekonomi, serta psikologis, India mencatat angka bunuh diri yang signifikan setiap tahunnya. Analisis data bunuh diri dapat memberikan wawasan mendalam tentang faktor-faktor penyebab serta pola yang muncul dalam berbagai kategori demografi.

# Tujuan Bisnis

1. Mengidentifikasi Tren dan Pola Bunuh Diri

   Menelusuri tren bunuh diri berdasarkan                        tahun, usia, jenis kelamin, dan profesi.
   Memahami distribusi kasus berdasarkan kategori penyebab bunuh diri yang tercatat.



2. Menganalisis Faktor Penyebab

   Mengidentifikasi hubungan antara faktor sosial-ekonomi dengan tingkat bunuh diri.
   Menggunakan wawasan ini untuk mendukung program pencegahan bunuh diri.



3. Memberikan Rekomendasi untuk Pengambilan Keputusan

   Membantu pemerintah dan organisasi kesehatan dalam merancang kebijakan pencegahan yang lebih efektif.

# Penilaian Situasi

Untuk mencapai tujuan tersebut, diperlukan pemahaman mendalam tentang:

Kondisi sosial-ekonomi di India yang mempengaruhi tingkat bunuh diri.

Kategori penyebab bunuh diri, misalnya tekanan finansial, tekanan akademik, atau faktor keluarga.

Distribusi geografis untuk mengetahui apakah ada daerah dengan tingkat kasus lebih tinggi.

# Tujuan Analitik

Eksplorasi dataset untuk memahami struktur dan kelengkapan data.

Visualisasi data untuk mengidentifikasi pola dan tren bunuh diri.

Analisis korelasi untuk menemukan faktor utama yang berkontribusi terhadap peningkatan angka bunuh diri.

# Perencanaan Proyek

1. Data Understanding: Memahami struktur dan kualitas dataset.


2. Data Preparation: Membersihkan dan mempersiapkan data untuk analisis lebih lanjut.


3. Analisis Data: Menerapkan teknik eksplorasi dan pemodelan untuk mengidentifikasi pola dalam data.


4. Visualisasi dan Dashboarding: Menyajikan hasil analisis dalam bentuk yang mudah dipahami.


5. Insight & Action: Memberikan rekomendasi kebijakan berdasarkan hasil analisis.

# Data Understanding

Suicides in India https://www.kaggle.com/datasets/rajanand/suicides-in-india

In [9]:
import pandas as pd

file_path = "/content/Suicides in India 2001-2012.csv"
df = pd.read_csv(file_path)

print("\nInformasi Dataset:")
print(df.info())

print("\nStatistik Deskriptif:")
print(df.describe())

# Analisis Korelasi
print("\nAnalisis Korelasi:")
numeric_df = df.select_dtypes(include=[np.number])
print(numeric_df.corr())

df


Informasi Dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 237519 entries, 0 to 237518
Data columns (total 7 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   State      237519 non-null  object
 1   Year       237519 non-null  int64 
 2   Type_code  237519 non-null  object
 3   Type       237519 non-null  object
 4   Gender     237519 non-null  object
 5   Age_group  237519 non-null  object
 6   Total      237519 non-null  int64 
dtypes: int64(2), object(5)
memory usage: 12.7+ MB
None

Statistik Deskriptif:
                Year          Total
count  237519.000000  237519.000000
mean     2006.500448      55.034477
std         3.452240     792.749038
min      2001.000000       0.000000
25%      2004.000000       0.000000
50%      2007.000000       0.000000
75%      2010.000000       6.000000
max      2012.000000   63343.000000

Analisis Korelasi:
           Year     Total
Year   1.000000  0.005231
Total  0.005231  1.000000


Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total
0,A & N Islands,2001,Causes,Illness (Aids/STD),Female,0-14,0
1,A & N Islands,2001,Causes,Bankruptcy or Sudden change in Economic,Female,0-14,0
2,A & N Islands,2001,Causes,Cancellation/Non-Settlement of Marriage,Female,0-14,0
3,A & N Islands,2001,Causes,Physical Abuse (Rape/Incest Etc.),Female,0-14,0
4,A & N Islands,2001,Causes,Dowry Dispute,Female,0-14,0
...,...,...,...,...,...,...,...
237514,West Bengal,2012,Social_Status,Seperated,Male,0-100+,149
237515,West Bengal,2012,Social_Status,Widowed/Widower,Male,0-100+,233
237516,West Bengal,2012,Social_Status,Married,Male,0-100+,5451
237517,West Bengal,2012,Social_Status,Divorcee,Male,0-100+,189


# Data Preparation

Suicides in India https://www.kaggle.com/datasets/rajanand/suicides-in-india

In [5]:
#Missing Values
print((df.isna().sum() / len(df)) * 100)

df = df[df['Total'] != 0]
df

State        0.0
Year         0.0
Type_code    0.0
Type         0.0
Gender       0.0
Age_group    0.0
Total        0.0
dtype: float64


Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total
13,A & N Islands,2001,Causes,Love Affairs,Female,0-14,1
20,A & N Islands,2001,Causes,Other Causes (Please Specity),Female,0-14,1
32,A & N Islands,2001,Causes,Other Prolonged Illness,Male,0-14,1
47,A & N Islands,2001,Causes,Failure in Examination,Male,0-14,1
54,A & N Islands,2001,Causes,Other Prolonged Illness,Female,15-29,8
...,...,...,...,...,...,...,...
237514,West Bengal,2012,Social_Status,Seperated,Male,0-100+,149
237515,West Bengal,2012,Social_Status,Widowed/Widower,Male,0-100+,233
237516,West Bengal,2012,Social_Status,Married,Male,0-100+,5451
237517,West Bengal,2012,Social_Status,Divorcee,Male,0-100+,189


In [None]:
#Mengecek Duplikasi Data
duplicates = df.duplicated().sum()
print(f"\nJumlah Data Duplikat: {duplicates}")



Jumlah Data Duplikat: 0


In [None]:
#Mengecek Outliers
results = []

cols = df.select_dtypes(include=['float64', 'int64'])

for col in cols:
  q1 = df[col].quantile(0.25)
  q3 = df[col].quantile(0.75)
  iqr = q3 - q1
  lower_bound = q1 - 1.5*iqr
  upper_bound = q3 + 1.5*iqr
  outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
  percent_outliers = (len(outliers)/len(df))*100
  results.append({'Kolom': col, 'Persentase Outliers': percent_outliers})

# Dataframe dari list hasil
results_df = pd.DataFrame(results)
results_df.set_index('Kolom', inplace=True)
results_df = results_df.rename_axis(None, axis=0).rename_axis('Kolom', axis=1)

# Tampilkan dataframe
display(results_df)



df


Kolom,Persentase Outliers
Year,0.0
Total,16.890017


Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total
0,A & N Islands,2001,Causes,Illness (Aids/STD),Female,0-14,0
1,A & N Islands,2001,Causes,Bankruptcy or Sudden change in Economic,Female,0-14,0
2,A & N Islands,2001,Causes,Cancellation/Non-Settlement of Marriage,Female,0-14,0
3,A & N Islands,2001,Causes,Physical Abuse (Rape/Incest Etc.),Female,0-14,0
4,A & N Islands,2001,Causes,Dowry Dispute,Female,0-14,0
...,...,...,...,...,...,...,...
237514,West Bengal,2012,Social_Status,Seperated,Male,0-100+,149
237515,West Bengal,2012,Social_Status,Widowed/Widower,Male,0-100+,233
237516,West Bengal,2012,Social_Status,Married,Male,0-100+,5451
237517,West Bengal,2012,Social_Status,Divorcee,Male,0-100+,189


 # Construct Data

In [6]:
df['Suicide Growth Rate (%)'] = df.groupby('State')['Total'].pct_change() * 10

df

Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total,Suicide Growth Rate (%)
13,A & N Islands,2001,Causes,Love Affairs,Female,0-14,1,
20,A & N Islands,2001,Causes,Other Causes (Please Specity),Female,0-14,1,0.000000
32,A & N Islands,2001,Causes,Other Prolonged Illness,Male,0-14,1,0.000000
47,A & N Islands,2001,Causes,Failure in Examination,Male,0-14,1,0.000000
54,A & N Islands,2001,Causes,Other Prolonged Illness,Female,15-29,8,70.000000
...,...,...,...,...,...,...,...,...
237514,West Bengal,2012,Social_Status,Seperated,Male,0-100+,149,-9.015202
237515,West Bengal,2012,Social_Status,Widowed/Widower,Male,0-100+,233,5.637584
237516,West Bengal,2012,Social_Status,Married,Male,0-100+,5451,223.948498
237517,West Bengal,2012,Social_Status,Divorcee,Male,0-100+,189,-9.653275


# Data Reduction

In [7]:
df.drop(columns=['Unnatural Death', 'Economic Factors', 'Unnamed: 0'], inplace=True, errors='ignore')
df

Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total,Suicide Growth Rate (%)
13,A & N Islands,2001,Causes,Love Affairs,Female,0-14,1,
20,A & N Islands,2001,Causes,Other Causes (Please Specity),Female,0-14,1,0.000000
32,A & N Islands,2001,Causes,Other Prolonged Illness,Male,0-14,1,0.000000
47,A & N Islands,2001,Causes,Failure in Examination,Male,0-14,1,0.000000
54,A & N Islands,2001,Causes,Other Prolonged Illness,Female,15-29,8,70.000000
...,...,...,...,...,...,...,...,...
237514,West Bengal,2012,Social_Status,Seperated,Male,0-100+,149,-9.015202
237515,West Bengal,2012,Social_Status,Widowed/Widower,Male,0-100+,233,5.637584
237516,West Bengal,2012,Social_Status,Married,Male,0-100+,5451,223.948498
237517,West Bengal,2012,Social_Status,Divorcee,Male,0-100+,189,-9.653275
