## Problem Formulation

### Context
  Investors need a lightweight, data-driven tool to assess "grit" in
  business leaders
  and CEO funding applicants. Grit (perseverance and passion for
  long-term goals) is
  a strong predictor of long-term business success.

### Modeling Objectives

  **Regression Task:** Predict continuous grit score (1-5 scale)
  - Success criteria: R² > 0.3, RMSE < 0.6
  - Justification: R² > 0.3 means explaining 30%+ of variance in grit,
  meaningful
    for psychological constructs. RMSE < 0.6 = ~12% error on 1-5 scale.

  **Classification Task:** Predict high/low grit (binary)
  - Success criteria: Accuracy > 70%, F1 > 0.70
  - Justification: 70%+ accuracy significantly beats random guessing
  (50%).
    F1 score ensures balanced precision/recall for investor decisions.

  **Business Objective:** Minimize survey length while maintaining R² >
  0.40
  - Target: Reduce from 62 questions (Big Five + Grit items) to ≤15
  questions
  - Constraint: Must maintain R² > 0.40 (higher bar for practical
  deployment)
  - Trade-off: Balance prediction accuracy vs. respondent
  burden/completion rate

  ### Current Baseline Performance 
  - Linear Regression: R² = 0.489, RMSE = 0.498 ✓ (exceeds targets)
  - Logistic Regression: Accuracy = 76.3%, F1 = 76.7% ✓ (exceeds
  targets)

### Importing required libraries

In [65]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.model_selection import StratifiedKFold, cross_validate
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold

### Reading in the dataset

In [66]:
df = pd.read_csv('data/cleaned_grit_data.csv')
df.head()

Unnamed: 0,index,country,surveyelapse,GS1,GS2,GS3,GS4,GS5,GS6,GS7,...,browser,introelapse,testelapse,Extraversion,Neuroticism,Agreeableness,Conscientiousness,Openness,Grit,highgrit
0,4,JP,340,5,2,3,3,2,4,2,...,Firefox,3,337,1.2,2.5,3.3,3.8,3.0,3.083333,0
1,6,US,126,4,1,3,2,1,5,1,...,Chrome,36,212,4.0,2.0,3.6,3.4,5.0,2.583333,0
2,8,EU,130,5,3,3,5,4,5,5,...,Microsoft Internet Explorer,14,183,4.4,4.5,4.7,4.0,4.3,4.25,1
3,10,AE,592,5,3,3,2,4,3,3,...,Chrome,726,311,3.0,4.6,3.6,3.8,3.4,3.166667,0
4,11,AU,217,3,1,1,2,1,3,1,...,Firefox,376,407,2.0,1.1,3.4,3.9,4.4,2.0,0


In [67]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Columns: 101 entries, index to highgrit
dtypes: float64(6), int64(82), object(13)
memory usage: 1.7+ MB


### Splitting the data into Train and Test splits

In [68]:
X = df[['Openness', 'Conscientiousness', 'Extraversion', 'Agreeableness', 'Neuroticism']]
y = df['Grit']

In [69]:
X.head()

Unnamed: 0,Openness,Conscientiousness,Extraversion,Agreeableness,Neuroticism
0,3.0,3.8,1.2,3.3,2.5
1,5.0,3.4,4.0,3.6,2.0
2,4.3,4.0,4.4,4.7,4.5
3,3.4,3.8,3.0,3.6,4.6
4,4.4,3.9,2.0,3.4,1.1


In [70]:
y.head()

0    3.083333
1    2.583333
2    4.250000
3    3.166667
4    2.000000
Name: Grit, dtype: float64

In [71]:
print("X shape: ", X.shape)
print("y shaep: ", y.shape)

X shape:  (2200, 5)
y shaep:  (2200,)


In [72]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [73]:
print("X_train shape: ", X_train.shape)
print("y_train shape: ", y_train.shape)
print("X_test shape: ", X_test.shape)
print("y_test shape: ", y_test.shape)


X_train shape:  (1760, 5)
y_train shape:  (1760,)
X_test shape:  (440, 5)
y_test shape:  (440,)


We are going to test the data on baseline models. The models we will be using are:

1. Mean model
2. Median model
3. Simple Linear Regresssion

The metrics we will be using are R2, MAE, MSE.

In [74]:
def evaluate_model(true, pred, name):
    print(f"{name} Model Results")
    print(f"R²  : {r2_score(true, pred):.4f}")
    print(f"MAE : {mean_absolute_error(true, pred):.4f}")
    print(f"MSE : {mean_squared_error(true, pred):.4f}")

### Mean model

In [75]:
mean_pred = np.full(shape=y_test.shape, fill_value=y_train.mean())

evaluate_model(y_test, mean_pred, "Mean Baseline")

Mean Baseline Model Results
R²  : -0.0003
MAE : 0.5718
MSE : 0.4743


### Median model

In [76]:
median_pred = np.full(shape=y_test.shape, fill_value=y_train.median())

evaluate_model(y_test, median_pred, "Median Baseline")

Median Baseline Model Results
R²  : -0.0060
MAE : 0.5718
MSE : 0.4770


### Linear Regression

In [77]:
lr = LinearRegression()
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)

In [78]:
evaluate_model(y_test, lr_pred, "Linear Regression")

Linear Regression Model Results
R²  : 0.4855
MAE : 0.3951
MSE : 0.2440


The baseline modeling results provide an initial understanding of how well grit can be predicted using simple statistical approaches. Both the mean and median models performed poorly, with negative R2 values and identical error scores, indicating that these models are unable to capture any meaningful variation in grit scores across individuals. In contrast, the linear regression model, which used the Big Five personality traits as predictors, showed a substantial improvement, achieving an R2 of approximately 0.49 and significantly lower MAE and MSE values. This suggests that nearly half of the variability in grit can be explained by personality traits alone, with traits such as conscientiousness likely playing a strong role, which is observed from our EDA. Overall, these results confirm that while trivial models offer no predictive value, linear regression provides a meaningful baseline and justifies moving forward with more advanced modeling techniques to further improve prediction and understand underlying relationships.

### Using only 'Conscientiousness' in the linear model to see its impact

In [79]:
X = df[[ 'Conscientiousness']]
y = df['Grit']

In [80]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [81]:
lr = LinearRegression()
lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)

In [82]:
evaluate_model(y_test, lr_pred, "Linear Regression")

Linear Regression Model Results
R²  : 0.4278
MAE : 0.4243
MSE : 0.2713


We can see that the other traits do contribute to the model's performance. We will be considering them all the personality traits in out main model or analyze them more later on.

We will use five fold cross validation to evaluate the performance of a basline linear regression model.

In [83]:
X = df[['Openness', 'Conscientiousness', 'Extraversion', 'Agreeableness', 'Neuroticism']]

model = LinearRegression()

folds = KFold(n_splits = 5, shuffle = True, random_state = 42)

results = cross_validate(model, X, y, cv = folds, scoring=['neg_mean_absolute_error', 'neg_root_mean_squared_error', 'r2'])

mae = -results['test_neg_mean_absolute_error']
rmse = -results['test_neg_root_mean_squared_error']
r2 = results['test_r2']

print("MAE per fold:", mae)
print("RMSE per fold:", rmse)
print("R^2 per fold:", r2)

print("\nAverage scores:")
print(f"MAE: {np.mean(mae)}")
print(f"RMSE: {np.mean(rmse)}")
print(f"R^2: {np.mean(r2)}")

MAE per fold: [0.39505942 0.39641015 0.39037359 0.41273381 0.39917258]
RMSE per fold: [0.49393897 0.50016897 0.48355819 0.51599462 0.49829142]
R^2 per fold: [0.48547012 0.49832952 0.49514649 0.46627077 0.49973973]

Average scores:
MAE: 0.39874991183193725
RMSE: 0.49839043182801124
R^2: 0.4889913245440196


The MAE suggests that the predictions deviate from there actual value by about 0.399 which is relatively small. Thr RMSE, which penlizes larger errors more heavily than MAE is 0.498, suggesting that there some larger errors occured. The R^2 suggegests that about 49% of the variability in grit is explained by this model.

We will use five fold cross validation to evaluate the performance of a basline logistic regression model.

In [84]:
y_log = df["highgrit"]

model = LogisticRegression()

stratified_folds = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

results = cross_validate(model, X, y_log, cv=stratified_folds, scoring=['accuracy', 'precision', 'recall', 'f1'])

acc = results['test_accuracy']
prec = results['test_precision']
rec = results['test_recall']
f1 = results['test_f1']

print("Accuracy per fold:", acc)
print("Precision per fold:", prec)
print("Recall per fold:", rec)
print("F1-score per fold:", f1)

print("\nAverage scores:")
print(f"Accuracy: {np.mean(acc)}")
print(f"Precision: {np.mean(prec)}")
print(f"Recall: {np.mean(rec)}")
print(f"F1: {np.mean(f1)}")

Accuracy per fold: [0.75454545 0.74772727 0.75909091 0.78409091 0.76818182]
Precision per fold: [0.76146789 0.76056338 0.74369748 0.77489177 0.76651982]
Recall per fold: [0.74774775 0.72972973 0.7972973  0.80630631 0.78026906]
F1-score per fold: [0.75454545 0.74482759 0.76956522 0.79028698 0.77333333]

Average scores:
Accuracy: 0.7627272727272727
Precision: 0.761428069572373
Recall: 0.7722700278754091
F1: 0.7665117134388856


The results suggest that around 76% of predictions are correct. Precision and recall are balanced.