 Tahap 1: Data Understanding – Mengumpulkan dan Mengenali Data Relevan
**Tujuan Tahap Ini:**
1.  **Memuat Data**: Mengambil data dari file sumber (`smmh.csv`).
2.  **Mengenali Struktur Data**: Memeriksa jumlah baris dan kolom, tipe data, dan nama-nama kolom.
3.  **Inspeksi Awal**: Melihat beberapa baris pertama data dan statistik deskriptifnya untuk mendapatkan gambaran umum.
4.  **Eksplorasi Data (EDA)**: Melakukan visualisasi untuk memahami distribusi dan hubungan antar variabel kunci.

SEL 2: KODE - Import Library dan Memuat Data

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import OrdinalEncoder

# Atur style plot seaborn biar lebih modern
sns.set_theme(style="whitegrid", palette="viridis")

# Load dataset dari file smmh.csv
try:
    df = pd.read_csv('smmh.csv')
    
    # Membersihkan dan me-rename nama kolom agar lebih mudah digunakan
    df.columns = df.columns.str.replace(r'^\d+\W*\s*|\s*\(\d+-\d+\)$|\?$|^\s*|\s*$', '', regex=True)
    df.rename(columns={
        'What is your age': 'Age', 'Gender': 'Gender', 'Relationship Status': 'Relationship_Status',
        'Occupation Status': 'Occupation_Status', 'Do you use social media': 'Use_Social_Media',
        'What is the average time you spend on social media every day': 'Avg_Time_Social_Media',
        'How often do you find yourself using Social media without a specific purpose': 'SM_No_Purpose',
        'How often do you get distracted by Social media when you are busy doing something': 'SM_Distraction_Busy',
        'Do you feel restless if you haven_t used Social media in a while': 'SM_Restless',
        'On a scale of 1 to 5, how easily distracted are you': 'Easily_Distracted',
        'On a scale of 1 to 5, how much are you bothered by worries': 'Bothered_By_Worries',
        'Do you find it difficult to concentrate on things': 'Difficult_To_Concentrate',
        'On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media': 'SM_Compare_Success',
        'Following the previous question, how do you feel about these comparisons, generally': 'SM_Compare_Feeling',
        'How often do you look to seek validation from features of social media': 'SM_Seek_Validation',
        'How often do you feel depressed or down': 'Feel_Depressed',
        'On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate': 'Interest_Fluctuation',
        'On a scale of 1 to 5, how often do you face issues regarding sleep': 'Sleep_Issues'
    }, inplace=True)
    
    print("✅ Dataset berhasil dimuat dan nama kolom disederhanakan.")

except FileNotFoundError:
    print("🚨 ERROR: File 'smmh.csv' tidak ditemukan! Pastikan file ada di folder yang sama dengan notebook ini.")

print("\n(1) Lima baris pertama data:")
# Menggunakan display() agar output tabel lebih rapi di notebook
display(df.head())

print("\n(2) Informasi tipe data, jumlah non-null, dan penggunaan memori:")
# Menggunakan .info() untuk ringkasan teknis
df.info()

print("\n(3) Statistik deskriptif untuk semua kolom (numerik dan kategori):")
# Menggunakan .describe(include='all') untuk melihat statistik semua tipe data
display(df.describe(include='all').transpose())


✅ Dataset berhasil dimuat dan nama kolom disederhanakan.

(1) Lima baris pertama data:


Unnamed: 0,Timestamp,Age,Gender,Relationship_Status,Occupation_Status,What type of organizations are you affiliated with,Use_Social_Media,What social media platforms do you commonly use,Avg_Time_Social_Media,SM_No_Purpose,...,Do you feel restless if you haven't used Social media in a while,Easily_Distracted,Bothered_By_Worries,Difficult_To_Concentrate,SM_Compare_Success,"Following the previous question, how do you feel about these comparisons, generally speaking",SM_Seek_Validation,Feel_Depressed,Interest_Fluctuation,Sleep_Issues
0,4/18/2022 19:18:47,21.0,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5,...,2,5,2,5,2,3,2,5,4,5
1,4/18/2022 19:19:28,21.0,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4,...,2,4,5,4,5,1,1,5,4,5
2,4/18/2022 19:25:59,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3,...,1,2,5,4,3,3,1,4,2,5
3,4/18/2022 19:29:43,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4,...,1,3,5,3,5,1,2,4,3,2
4,4/18/2022 19:33:31,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,...,4,4,5,5,3,3,3,4,4,1



(2) Informasi tipe data, jumlah non-null, dan penggunaan memori:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 481 entries, 0 to 480
Data columns (total 21 columns):
 #   Column                                                                                        Non-Null Count  Dtype  
---  ------                                                                                        --------------  -----  
 0   Timestamp                                                                                     481 non-null    object 
 1   Age                                                                                           481 non-null    float64
 2   Gender                                                                                        481 non-null    object 
 3   Relationship_Status                                                                           481 non-null    object 
 4   Occupation_Status                                                                     

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Timestamp,481.0,480.0,5/11/2022 22:54:32,2.0,,,,,,,
Age,481.0,,,,26.13659,9.91511,13.0,21.0,22.0,26.0,91.0
Gender,481.0,9.0,Female,263.0,,,,,,,
Relationship_Status,481.0,4.0,Single,285.0,,,,,,,
Occupation_Status,481.0,4.0,University Student,292.0,,,,,,,
What type of organizations are you affiliated with,451.0,18.0,University,239.0,,,,,,,
Use_Social_Media,481.0,2.0,Yes,478.0,,,,,,,
What social media platforms do you commonly use,481.0,125.0,"Facebook, Instagram, YouTube",35.0,,,,,,,
Avg_Time_Social_Media,481.0,6.0,More than 5 hours,116.0,,,,,,,
SM_No_Purpose,481.0,,,,3.553015,1.096299,1.0,3.0,4.0,4.0,5.0


Eksplorasi Data dengan Visualisasi