## **Librerías necesarias**

In [1]:
import pandas as pd
import numpy as np

## **About Dataset**
### **Overview**

Dive into the Extrovert vs. Introvert Personality Traits Dataset, a rich collection of behavioral and social data designed to explore the spectrum of human personality. This dataset captures key indicators of extroversion and introversion, making it a valuable resource for psychologists, data scientists, and researchers studying social behavior, personality prediction, or data preprocessing techniques.

### **Context**

Personality traits like extroversion and introversion shape how individuals interact with their social environments. This dataset provides insights into behaviors such as time spent alone, social event attendance, and social media engagement, enabling applications in psychology, sociology, marketing, and machine learning. Whether you're predicting personality types or analyzing social patterns, this dataset is your gateway to uncovering fascinating insights.

### **Dataset Details**

Size: The dataset contains 2,900 rows and 8 columns.

#### Features:
- Time_spent_Alone: Hours spent alone daily (0–11).
- Stage_fear: Presence of stage fright (Yes/No).
- Social_event_attendance: Frequency of social events (0–10).
- Going_outside: Frequency of going outside (0–7).
- Drained_after_socializing: Feeling drained after socializing (Yes/No).
- Friends_circle_size: Number of close friends (0–15).
- Post_frequency: Social media post frequency (0–10).
- Personality: Target variable (Extrovert/Introvert).

*Taken from: [Kaggle](https://www.kaggle.com/datasets/rakeshkapilavai/extrovert-vs-introvert-behavior-data)*

## **Leemos el dataset**

In [2]:
df = pd.read_csv('personality_dataset.csv')

## **Un poco de información del dataset**

In [3]:
df.head()

Unnamed: 0,Time_spent_Alone,Stage_fear,Social_event_attendance,Going_outside,Drained_after_socializing,Friends_circle_size,Post_frequency,Personality
0,4.0,No,4.0,6.0,No,13.0,5.0,Extrovert
1,9.0,Yes,0.0,0.0,Yes,0.0,3.0,Introvert
2,9.0,Yes,1.0,2.0,Yes,5.0,2.0,Introvert
3,0.0,No,6.0,7.0,No,14.0,8.0,Extrovert
4,3.0,No,9.0,4.0,No,8.0,5.0,Extrovert


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2900 entries, 0 to 2899
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Time_spent_Alone           2837 non-null   float64
 1   Stage_fear                 2827 non-null   object 
 2   Social_event_attendance    2838 non-null   float64
 3   Going_outside              2834 non-null   float64
 4   Drained_after_socializing  2848 non-null   object 
 5   Friends_circle_size        2823 non-null   float64
 6   Post_frequency             2835 non-null   float64
 7   Personality                2900 non-null   object 
dtypes: float64(5), object(3)
memory usage: 181.4+ KB


## **Quitamos los datos faltantes**

In [5]:
df = df.dropna(axis=0, how='any')
df.isna().sum()

Time_spent_Alone             0
Stage_fear                   0
Social_event_attendance      0
Going_outside                0
Drained_after_socializing    0
Friends_circle_size          0
Post_frequency               0
Personality                  0
dtype: int64

## **Vemos los valores únicos de las columnas categóricas**

In [6]:
# Print unique values of categorical columns
categorical_cols = df.select_dtypes(include='object').columns
for col in categorical_cols:
	print(f"{col}: {df[col].unique()}")

Stage_fear: ['No' 'Yes']
Drained_after_socializing: ['No' 'Yes']
Personality: ['Extrovert' 'Introvert']


## **Entrenamos el modelo**
1. Codificamos las variables categóricas.
2. Seleccionamos las variables predictoras y la variable objetivo.
3. Dividimos el dataset en entrenamiento y prueba.
4. Entrenamos un modelo de Random Forest.
5. Hacemos predicciones e imprimimos el reporte de clasificación.

In [7]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Encode categorical variables
df_encoded = df.copy()
label_encoders = {}
for col in ['Stage_fear', 'Drained_after_socializing', 'Personality']:
	le = LabelEncoder()
	df_encoded[col] = le.fit_transform(df_encoded[col])
	label_encoders[col] = le

# Features and target
X = df_encoded.drop('Personality', axis=1)
y = df_encoded['Personality']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train classifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred, target_names=label_encoders['Personality'].classes_))

              precision    recall  f1-score   support

   Extrovert       0.92      0.89      0.91       246
   Introvert       0.90      0.92      0.91       250

    accuracy                           0.91       496
   macro avg       0.91      0.91      0.91       496
weighted avg       0.91      0.91      0.91       496



## **Exportamos el modelo**

In [8]:
import joblib
joblib.dump(clf, 'RF_model.pkl')

['RF_model.pkl']

## **Hacemos una predicción de prueba**

In [9]:
# Load the trained model
clf_loaded = joblib.load('RF_model.pkl')

# Example new observation
new_observation = pd.DataFrame({
	'Time_spent_Alone': [5.0],
	'Stage_fear': [0],
	'Social_event_attendance': [3.0],
	'Going_outside': [2.0],
	'Drained_after_socializing': [0],
	'Friends_circle_size': [7.0],
	'Post_frequency': [4.0]
})

# Predict using the trained classifier
pred_encoded = clf_loaded.predict(new_observation)[0]
pred_label = label_encoders['Personality'].inverse_transform([pred_encoded])[0]
print(f"Predicted Personality: {pred_label}")

Predicted Personality: Extrovert


In [10]:
# Example new observation
new_observation = pd.DataFrame({
	'Time_spent_Alone': [10.0],
	'Stage_fear': [1],
	'Social_event_attendance': [1.0],
	'Going_outside': [2.0],
	'Drained_after_socializing': [1],
	'Friends_circle_size': [7.0],
	'Post_frequency': [1.0]
})

# Predict using the trained classifier
pred_encoded = clf_loaded.predict(new_observation)[0]
pred_label = label_encoders['Personality'].inverse_transform([pred_encoded])[0]
print(f"Predicted Personality: {pred_label}")

Predicted Personality: Introvert
