# **OPEN-ARC**
---

### Project 11: Basic Personality Prediction Model:
**Challenge:** Create an AI model, capable of classifying a person's basic personality type based on a set of features.


### Terms and Use:
Learn more about the project's [LICENSE](https://github.com/Infinitode/OPEN-ARC/blob/main/LICENSE) and read our [CODE_OF_CONDUCT](https://github.com/Infinitode/OPEN-ARC/blob/main/CODE_OF_CONDUCT) before contributing to the project. You can contribute to this project from here: [https://github.com/Infinitode/OPEN-ARC/](https://github.com/Infinitode/OPEN-ARC/).

---

Please fill out this performance sheet to help others quickly see your model's performance **(optional)**:

### Performance Sheet:
| Contributor | Architecture Type | Platform | Base Model | Dataset | Accuracy | Link |
|-------------|-------------------|----------|------------|---------|----------|------|
| Infinitode  | XGBClassifier  | Kaggle   | ✔  | Personality Dataset (introvert or Extrovert) | 92%    | [Notebook](https://github.com/Infinitode/OPEN-ARC/blob/main/Project-11-BPPM/project-11-bppm.ipynb) |
| Username  | Unknown  | Kaggle   | ✗/✔  | Personality Dataset (introvert or Extrovert) | Score    | [Notebook](https://github.com) |

---

### Using `XGBClassifier`

In [18]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from xgboost import XGBClassifier
import joblib

DATA_PATH = "/kaggle/input/personality-dataset-introvert-or-extrovert/personality_dataset.csv"
TARGET = "Personality"

# Load and split
df = pd.read_csv(DATA_PATH)
X = df.drop(columns=[TARGET])
y = df[TARGET]

num_cols = X.select_dtypes(include=["int64", "float64"]).columns.tolist()
cat_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42, stratify=y
)

# Preprocessing
num_imputer = SimpleImputer(strategy="median")
cat_imputer = SimpleImputer(strategy="most_frequent")

X_train_num = num_imputer.fit_transform(X_train[num_cols])
X_train_cat = cat_imputer.fit_transform(X_train[cat_cols])

ohe = OneHotEncoder(drop="first", handle_unknown="ignore", sparse_output=False)
X_train_cat_enc = ohe.fit_transform(X_train_cat)

# Prepare final matrix
X_train_ready = np.hstack([X_train_num, X_train_cat_enc])

# Encode labels and train
le = LabelEncoder()
y_train_enc = le.fit_transform(y_train)

model = XGBClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=4,
    eval_metric="logloss",
    use_label_encoder=False,
    random_state=42,
)
model.fit(X_train_ready, y_train_enc)

In [21]:
from sklearn.metrics import classification_report, roc_auc_score

# Predict probabilities and labels
y_proba = model.predict_proba(X_test_ready)[:, 1]  # probability for "Extrovert" (label 1)
y_pred = model.predict(X_test_ready)

# Reports
print(classification_report(y_test_enc, y_pred, target_names=le.classes_))
roc_auc = roc_auc_score(y_test_enc, y_proba)
print(f"ROC AUC: {roc_auc:.3f}")

              precision    recall  f1-score   support

   Extrovert       0.94      0.90      0.92       373
   Introvert       0.90      0.93      0.92       352

    accuracy                           0.92       725
   macro avg       0.92      0.92      0.92       725
weighted avg       0.92      0.92      0.92       725

ROC AUC: 0.955


Not bad at all, our model is now ready for the saving and then prediction...

### Saving the necessary files

In [22]:
import joblib

artifacts = {
    "model": model,
    "num_imputer": num_imputer,
    "cat_imputer": cat_imputer,
    "ohe": ohe,
    "label_encoder": le,
    "num_cols": num_cols,
    "cat_cols": cat_cols,
}
joblib.dump(artifacts, "personality_artifacts.pkl")
print("All artifacts saved to personality_artifacts.pkl")

All artifacts saved to personality_artifacts.pkl


### Predicting personality type

In [26]:
import joblib
import pandas as pd
import numpy as np

# Load artifacts
art = joblib.load("personality_artifacts.pkl")
model          = art["model"]
num_imputer    = art["num_imputer"]
cat_imputer    = art["cat_imputer"]
ohe            = art["ohe"]
le             = art["label_encoder"]
num_cols       = art["num_cols"]
cat_cols       = art["cat_cols"]

def ask(prompt, cast=str, options=None):
    """Tiny helper to get clean input."""
    while True:
        try:
            val = cast(input(prompt))
            if options and val not in options:
                raise ValueError(f"Must be one of {options}")
            return val
        except Exception as e:
            print(f"Error: {e}. Try again.\n")

def predict_personality():
    print("\n🎭  Introvert vs Extrovert Predictor")
    print("Answer a few quick questions:\n")

    # Gather answers
    answers = {
        "Time_spent_Alone":       ask("Hours spent alone per day (0‑24): ", float),
        "Stage_fear":             ask("Stage fear? (Yes/No): ", str.title, ["Yes", "No"]),
        "Social_event_attendance":ask("Social events per week (0‑10): ", int),
        "Going_outside":          ask("Trips outside per day (0‑10): ", int),
        "Drained_after_socializing": ask("Feel drained after socializing? (Yes/No): ",
                                         str.title, ["Yes", "No"]),
        "Friends_circle_size":    ask("Number of close friends (0‑30): ", int),
        "Post_frequency":         ask("Social‑media posts per week (0‑30): ", int),
    }

    # Build one‑row DataFrame in correct column order
    row = pd.DataFrame([answers])[num_cols + cat_cols]

    # --- Re‑run exact preprocessing -------------------------------------------------
    X_num = num_imputer.transform(row[num_cols])
    X_cat = cat_imputer.transform(row[cat_cols])
    X_cat_enc = ohe.transform(X_cat)
    X_ready = np.hstack([X_num, X_cat_enc])

    # --- Predict --------------------------------------------------------------------
    proba = model.predict_proba(X_ready)[0]
    idx = proba.argmax()
    pred_label = le.inverse_transform([idx])[0]
    confidence = proba[idx]

    print(f"\n🔮 You are likely an **{pred_label}** (confidence {confidence:.0%}).")

predict_personality()


🎭  Introvert vs Extrovert Predictor
Answer a few quick questions:



Hours spent alone per day (0‑24):  9
Stage fear? (Yes/No):  yes
Social events per week (0‑10):  4
Trips outside per day (0‑10):  7
Feel drained after socializing? (Yes/No):  no
Number of close friends (0‑30):  0
Social‑media posts per week (0‑30):  0



🔮 You are likely an **Introvert** (confidence 93%).


### The End:

This is the end of this project notebook, make sure to experiment and contribute to help improve the model and implementation. You can browse more of the open-source free projects on our GitHub repository: https://github.com/Infinitode/OPEN-ARC. If you like this project, make sure to star the repo and contribute your implementation, or help others in the community.

~ Infinitode