# GenAI Application: Hidden Features Recognition in Facial Expressions
## Yilong Liu

### 1. Literature Review
To begin, it is essential to review existing literature exploring the relationship between specific facial features and socially relevant outcomes, such as decisions involving the death penalty or electoral success.

#### Key Studies

A thorough literature review has identified several influential studies pertinent to this research:

(1). Todorov et al. (2005) – Electoral Success

(2). Jaeger et al. (2020) – Electoral Success

(3). Wilson & Rule (2015) – Criminal Sentencing

(4). Eberhardt et al. (2006) – Criminal Sentencing and Race

(5). Columbia University Study (2023) – Jurors' Sentencing Bias

(6). Mueller & Mazur (1996) – Military Leadership (West Point Cadets)

#### Experimental Methodology

Among these studies, the first three papers, along with the fifth, involved experimental designs in which participants evaluated photographs of individuals. For example, Wilson and Rule (2015) required participants to rate photographs of convicted criminals who had received either the death penalty or life imprisonment.

#### Analysis of Specific Facial Features

Notably, the fourth and sixth studies provided detailed analyses of specific facial features, including:

(1). Skin tone

(2). Lip fullness

(3). Nose width

(4). Jawline strength

(5). Brow prominence

(6). Facial width-to-height ratio

#### Significant Findings

Eberhardt et al. (2006): Defendants with more stereotypically Afrocentric facial characteristics, such as darker skin tones and broader noses, received harsher criminal sentences, notably the death penalty. This underscores racial biases in judicial decision-making.

Mueller and Mazur (1996): Facial indicators of dominance, such as pronounced jawlines, prominent eyebrows, and higher facial width-to-height ratios, strongly predicted leadership attainment among West Point cadets, highlighting the significant influence of facial cues on professional advancement.

#### Conclusion

Collectively, these studies underscore the substantial impact facial attributes can have on social perceptions, judgments, and outcomes. Recognizing implicit biases associated with physical appearance is critical when examining decision-making and success in social and organizational contexts.





### 2. Coding Practice

In [None]:
# Step 0: Prepare and import necessary libraries
import os
import random
import numpy as np
import pandas as pd
import concurrent.futures
import ollama
import json
import torch
from PIL import Image
from transformers import BlipProcessor,BlipForConditionalGeneration 
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


In [5]:
# Step 1: Set up a function to get image file paths from a directory.
def get_image_path(directory):
    valid_extensions = ['.jpg', '.jpeg', '.png', '.bmp']
    return [os.path.join(directory, file) for file in os.listdir(directory) if file.lower().endswith(tuple(valid_extensions))]

# Step 2: Intialize BLIP image captioning model to describe images.
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
caption_model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

def caption_image(image_path):
    try:
        image = Image.open(image_path).convert("RGB")
    except Exception as e:
        print(f"Error opening image {image_path}: {e}")
        return ""
    inputs = processor(image, return_tensors = "pt")
    output_ids = caption_model.generate(**inputs)
    caption = processor.decode(output_ids[0], skip_special_tokens = True)
    print(f"Caption for {image_path}:{caption}")
    return caption

# Step 3: Extrac facial features using an LLM from Ollama
def extract_facial_features(image_path):
    default_features = {
        "facial_expression": None,
        "face_shape": None,
        "beard_presence": False,
        "lip_fullness": None,
        "nose_width": None,
        "jawline_strength": None,
        "brow_prominence": None,
        "facial_width_to_height_ratio": 0.0
    }
     
    caption = caption_image(image_path)

    prompt = f"""
You are an expert in facial analysis analyzing portrait of 18th century British naval officers.Based on the following portrait description, extract the facial features intoa JSON object with the following keys:
- facial_expression (e.g., "serious", "smiling", "neutral")
- face_shape (e.g., "oval", "round", "square")
- beard_presence (boolean: true or false)
- lip_fullness (e.g., "thin", "average", "full")
- nose_width (e.g., "narrow", "average", "strong")
- jawline_strength (e.g., "weak", "average", "strong")
- brow_prominence (e.g., "subtle", "prominent")
- facial_width_to_height_ratio (a float value)

Portrait description: "{caption}"

Respond with only the JSON object, and nothing else.
"""

    response = ollama.generate(model = 'llama3', prompt = prompt)
    raw_response = response.get('response','').strip()


    try:
        features = json.loads(raw_response)
    except Exception as e:
        features = {}

    for key, default in default_features.items():
        if key not in features or features[key] is None:
            features[key] = default
    
    if isinstance(features['beard_presence'], bool):
        features['board_presence'] = int(features['beard_presence'])
    return features

# Step 4: Build the complete dataset
def build_dataset(success_dir, failure_dir, max_workers = 8):
    data = []
    all_paths = [(path,1) for path in get_image_path(success_dir)]+[(path,0) for path in get_image_path(failure_dir)]

    with concurrent.futures.ThreadPoolExecutor(max_workers = max_workers) as executor:
        future_to_info = {executor.submit(extract_facial_features, path): (path, label) for path, label in all_paths}
        for future in concurrent.futures.as_completed(future_to_info):
            path, label = future_to_info[future]
            try:
                feats = future.result()
                feats['label'] = label
                data.append(feats)
            except Exception as exc:
                print(f"Image {path} generated an exception: {exc}")
    return pd.DataFrame(data)

# Step 5: Preprocess the dataset.
def preprocess_dataset(df):
    if "beard_presence" in df.columns:
        df['beard_presence'] = df['beard_presence'].apply(lambda x: int(x) if isinstance(x, bool) or x is not None else 0)

    categorical_cols = [
        'facial_expression', 'face_shape',
        'lip_fullness', 'nose_width', 
        'jawline_strength', 'brow_prominence'
    ]
    df = pd.get_dummies(df, columns = categorical_cols, dummy_na = True)
    X = df.drop(columns = ['label'])
    y = df['label']
    return X,y

# Step 6: Perform Bootstrapping to estimate feature importance.
def bootstrap_feature_importance(X,y, B=100, test_size = 0.2):
    feature_coef = {col: [] for col in X.columns}
    for b in range(B):
        X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = test_size, random_state = np.random.randint(0,10000))
        model = LogisticRegression(max_iter = 1000)
        model.fit(X_train, y_train)
        coefs = model.coef_[0]
        for col, coef in zip(X.columns, coefs):
            feature_coef[col].append(coef)

        feature_importance = {}
        for col, coef_list in feature_coef.items():
            feature_importance[col] = {
                "mean_coef": np.mean(coef_list),
                "std_coef": np.std(coef_list)
            }

        return feature_importance

# Step 7: Select features based on a threshold.    
def select_features_split(feature_importance, threshold=0.1):
    """
    Split features into two groups:
      - Selected features: those with absolute mean coefficient >= threshold.
      - Remaining features: the rest.
    """
    selected = {}
    remaining = {}
    for feature, stats in feature_importance.items():
        if abs(stats["mean_coef"]) >= threshold:
            selected[feature] = stats
        else:
            remaining[feature] = stats
    return selected, remaining



In [None]:
# Step 8: Run the complete pipeline
success_dir = "success-spur/features/imgs/successful"
failure_dir = "success-spur/features/imgs/unsuccessful"

df = build_dataset(success_dir, failure_dir)
print('Data built with shape:', df.shape)

X, y = preprocess_dataset(df)


B=100
feature_importance = bootstrap_feature_importance(X,y,B=B)
print("Estimated feature importance (all features):")

for feature, stats in feature_importance.items():
    print(f"{feature}: Mean Coefficient = {stats['mean_coef']:.3f}, Std= {stats['std_coef']:.3f}")

threshold = 0.1
selected_features, remaining_features = select_features_split(feature_importance, threshold = threshold)

print("\nSelected features (|mean_coef| >= {:.2f}):".format(threshold))
for feature, stats in selected_features.items():
    print(f"{feature}: Mean Coefficient = {stats['mean_coef']:.3f}, Std = {stats['std_coef']:.3f}")

print("\nRemaining features (|mean_coef| < {:.2f}):".format(threshold))
for feature, stats in remaining_features.items():
    print(f"{feature}: Mean Coefficient = {stats['mean_coef']:.3f}, Std = {stats['std_coef']:.3f}")

### 3. Result Analysis
#### Data Overview:
A bootstrap method and feature selection process were applied to a dataset comprising 50 samples with 10 facial features. These features, selected based on literature analyzing West Point portraits, include facial width-to-height ratio, presence of beard, facial expression, face shape, lip fullness, nose width, jawline strength, and brow prominence. Additionally, dummy variables indicating missing or unclear feature extraction (marked as "nan") were included.
#### Key Influential Features:
(1). Beard Presence / Board Presence：

Both “beard_presence” and “board_presence” show very high positive coefficients (0.761). This indicates that, in your sample, having a beard is strongly associated with a higher likelihood of career success.

(2). Lip Fullness:

The dummy variables for lip fullness show a large contrast: “lip_fullness_average” has a coefficient of +0.496, while “lip_fullness_thin” is -0.496. This suggests that an average lip fullness is linked with success, whereas thin lips are associated with a lower chance.

(3). Facial Expression:

“facial_expression_neutral” has a positive coefficient (+0.354) and “facial_expression_serious” a negative coefficient (-0.354). This implies that, relative to the reference (or missing) group, a neutral expression may be more favorable than a serious one in predicting success.

(4). Eyebrow Prominence:

The “brow_prominence_prominent” dummy shows a positive coefficient (+0.328) and “brow_prominence_subtle” a negative coefficient (-0.328). This suggests that prominent eyebrows may be a positive indicator, while subtle eyebrows may be less favorable.

(5). Low-Impact Feature:

In contrast, features such as facial width-to-height ratio (–0.013), face shape (both “face_shape_oval” and “face_shape_nan” at 0.000), nose width (0.000), and jawline strength (average: –0.037; strong: 0.037) show near-zero coefficients, indicating that these features have little influence on the model’s predictions.
#### Feature Selection:
Applying a threshold of |mean_coef| ≥ 0.10, the following features were identified as influential:

(1). Beard presence

(2). Lip fullness (average and thin)

(3). Facial expression (neutral and serious)

(4). Brow prominence (prominent and subtle)

Features such as facial width-to-height ratio, face shape, nose width, and jawline strength were deemed less significant and thus excluded from further consideration.
#### Interpretation:
The analysis highlights several facial characteristics strongly correlated with perceptions of career success. Beard presence emerges as the most robust predictor, indicating a societal bias associating beards with positive attributes like maturity or authority. The preference for average lip fullness over thin lips might reflect cultural perceptions of balanced or normative facial features as advantageous. Neutral facial expressions appear to convey approachability or composure, in contrast to serious expressions, which might suggest rigidity or less desirable traits. Prominent eyebrows potentially signify dominance or confidence, enhancing perceived leadership qualities.

#### Conclusion:
Overall, the analysis underscores the significance of specific facial features in influencing career success perceptions, notably beard presence, lip fullness, facial expression, and eyebrow prominence. However, given the subjective nature of facial feature extraction and interpretation, these results should be cautiously interpreted. Future research could further investigate these findings using diverse and larger samples to validate generalizability and understand underlying societal biases.
