# **1. Dataset Introduction**


The **Extrovert vs Introvert Behavior Data** dataset, created by **Rakesh Kapilavai** on Kaggle, provides a structured collection of behavioral attributes used to differentiate between **introverted** and **extroverted** personality types. It draws on psychological and social behavior patterns to explore how personality influences daily actions.

The dataset contains 2,900 rows (individual observations) and 8 columns, including (7) feature columns and a final target column labeled "Personality".

**Column Descriptions**

| Column Name | Description | Data Type | Value/Range |
| :--- | :--- | :--- | :--- |
| **Time_spent_Alone** | The number of hours the person spends alone on a daily basis. | Numeric | 0 - 11 |
| **Stage_fear** | Indicates whether the individual has a fear of being on stage (stage fright). | Categorical | "Yes" / "No" |
| **Social_event_attendance** | The frequency with which the person attends social events. | Numeric | 0 - 10 |
| **Going_outside** | The frequency of the person going outside in a typical week. | Numeric | 0 - 7 |
| **Drained_after_socializing**| Indicates if the person feels emotionally or mentally drained after social interactions. | Categorical | "Yes" / "No" |
| **Friends_circle_size** | The number of close friends the individual has. | Numeric | 0 - 15 |
| **Post_frequency** | How often the person posts on social media platforms. | Numeric | 0 - 10 |
| **Personality** | The target variable, classifying the person's personality type. | Categorical | "Extrovert" / "Introvert" |

# **2. Import Library**

In [12]:
# function to write module names and versions to a requirements file
def write_requirements(filename, *modules):
    with open(filename, "w") as f:
        for mod in modules:
            f.write(f"{mod.__name__}=={mod.__version__}\n")

In [13]:
import pandas as pd

write_requirements("requirements.txt", pd)

# **3. Load Dataset**

In [9]:
dataset = pd.read_csv("../personality_raw.csv")
dataset.head(10)

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality
0,4.0,No,4.0,6.0,No,13.0,5.0,Extrovert
1,9.0,Yes,0.0,0.0,Yes,0.0,3.0,Introvert
2,9.0,Yes,1.0,2.0,Yes,5.0,2.0,Introvert
3,0.0,No,6.0,7.0,No,14.0,8.0,Extrovert
4,3.0,No,9.0,4.0,No,8.0,5.0,Extrovert
5,1.0,No,7.0,5.0,No,6.0,6.0,Extrovert
6,4.0,No,9.0,,No,7.0,7.0,Extrovert
7,2.0,No,8.0,4.0,No,7.0,8.0,Extrovert
8,10.0,Yes,1.0,3.0,Yes,0.0,3.0,Introvert
9,0.0,No,8.0,6.0,No,13.0,8.0,Extrovert


# **4. Exploratory Data Analysis (EDA)**

Pada tahap ini, Anda akan melakukan **Exploratory Data Analysis (EDA)** untuk memahami karakteristik dataset.

Tujuan dari EDA adalah untuk memperoleh wawasan awal yang mendalam mengenai data dan menentukan langkah selanjutnya dalam analisis atau pemodelan.

# **5. Data Preprocessing**

Pada tahap ini, data preprocessing adalah langkah penting untuk memastikan kualitas data sebelum digunakan dalam model machine learning.

Jika Anda menggunakan data teks, data mentah sering kali mengandung nilai kosong, duplikasi, atau rentang nilai yang tidak konsisten, yang dapat memengaruhi kinerja model. Oleh karena itu, proses ini bertujuan untuk membersihkan dan mempersiapkan data agar analisis berjalan optimal.

Berikut adalah tahapan-tahapan yang bisa dilakukan, tetapi **tidak terbatas** pada:
1. Menghapus atau Menangani Data Kosong (Missing Values)
2. Menghapus Data Duplikat
3. Normalisasi atau Standarisasi Fitur
4. Deteksi dan Penanganan Outlier
5. Encoding Data Kategorikal
6. Binning (Pengelompokan Data)

Cukup sesuaikan dengan karakteristik data yang kamu gunakan yah. Khususnya ketika kami menggunakan data tidak terstruktur.