**Gender Behaviour Analysis**

# Analisis Perilaku Pengguna: Gender, Swipe Right Ratio, dan Likes

Notebook ini merangkum hasil analisis eksplorasi data terkait perilaku pengguna aplikasi
dating berdasarkan gender. Analisis berfokus pada:

1. Perbedaan **Swipe Right Ratio** antar **Gender**.
2. Jumlah **Likes Received** berdasarkan **Gender**.
3. Hubungan antara **App Usage Time(minutes)** dengan **Likes Receive**.
4. Persebaran **Sexual Orientation** berdasar **Gender** dan analisa Chi square untuk melihat korelasinya.

Dataset berasal dari database `DatingSQL` dengan total ~50 ribu baris data.


# Import dan Load Data

Pada bagian ini akan digunakan untuk mengimpor semua library dan juga memberikan ringkasan statistik tahap awal.

In [11]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import f_oneway

# Import fungsi dari utils & script EDA
import sys, os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

from utils.db_connection import *
from utils.cleaning import *

# Load data
df_query = load_query('* FROM user_data')
pd.set_option('display.max_colwidth', 12)
pd.set_option('display.precision', 2)
df_query.head()


  df = pd.read_sql_query("SELECT * FROM user_data", conn)


Unnamed: 0,gender,sexual_orientation,location_type,income_bracket,education_level,interest_tags,app_usage_time_min,app_usage_time_label,swipe_right_ratio,swipe_right_label,likes_received,mutual_matches,profile_pics_count,bio_length,message_sent_count,emoji_usage_rate,last_active_hour,swipe_time_of_day,match_outcome
0,Prefer N...,Gay,Urban,High,Bachelor’s,"Fitness,...",52,Moderate,0.6,Optimistic,173,23,4,44,75,0.36,13,Early Mo...,Mutual M...
1,Male,Bisexual,Suburban,Upper-Mi...,No Forma...,Language...,279,Extreme ...,0.56,Optimistic,107,7,3,301,35,0.42,0,Morning,Chat Ign...
2,Non-binary,Pansexual,Suburban,Low,Master’s,"Movies, ...",49,Moderate,0.41,Optimistic,91,27,2,309,33,0.41,1,After Mi...,Date Hap...
3,Genderfluid,Gay,Metro,Very Low,Postdoc,"Coding, ...",185,Extreme ...,0.32,Balanced,147,6,5,35,5,0.07,21,Morning,No Action
4,Male,Bisexual,Urban,Middle,Bachelor’s,Clubbing...,83,High,0.32,Balanced,94,11,1,343,34,0.11,22,After Mi...,One-side...


In [12]:
print("Deskripsi singkat statistik dari dataset:\n")
df_query.describe(include='all')

Deskripsi singkat statistik dari dataset:



Unnamed: 0,gender,sexual_orientation,location_type,income_bracket,education_level,interest_tags,app_usage_time_min,app_usage_time_label,swipe_right_ratio,swipe_right_label,likes_received,mutual_matches,profile_pics_count,bio_length,message_sent_count,emoji_usage_rate,last_active_hour,swipe_time_of_day,match_outcome
count,50000,50000,50000,50000,50000,50000,50000.0,50000,50000.0,50000,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000.0,50000,50000
unique,6,8,6,7,9,40206,,7,,4,,,,,,,,6,10
top,Female,Straight,Remote Area,High,Bachelor’s,"Fitness,...",,Extreme ...,,Optimistic,,,,,,,,After Mi...,One-side...
freq,8384,6326,8519,7309,5646,6,,20140,,26873,,,,,,,,8524,5112
mean,,,,,,,149.91,,0.5,,99.53,13.87,2.99,250.17,50.07,0.29,11.52,,
std,,,,,,,86.99,,0.2,,58.0,9.11,2.0,144.8,29.17,0.16,6.92,,
min,,,,,,,0.0,,0.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,
25%,,,,,,,74.0,,0.37,,49.0,6.0,1.0,125.0,25.0,0.16,5.0,,
50%,,,,,,,150.0,,0.5,,100.0,13.0,3.0,250.0,50.0,0.27,12.0,,
75%,,,,,,,225.0,,0.64,,150.0,22.0,5.0,376.0,75.0,0.39,18.0,,


**Jumlah Baris yang ada duplikat**

In [13]:
print("Jumlah baris duplikat:", df_query.duplicated().sum())

Jumlah baris duplikat: 0


# Analisis perilaku yang berkaitan dengan gender

**Heatmap gender dan Uji Chi Square**

Pada tahap analisis ini akan dibuat analisa apakah ada ketimpangan jumlah sample per gender