# Evolver Loop 2 Analysis: Why Linguistic Features Failed & Path Forward

**Goal:** Analyze why the linguistic features experiment underperformed and identify promising directions based on what's working.

**Key Questions:**
1. Why did linguistic features (0.6118) perform worse than baseline TF-IDF (0.6386)?
2. What patterns in the data can we exploit better?
3. What should be our next priority?

In [5]:
import pandas as pd
import numpy as np
import json
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import re

# Load data
print("Loading data...")
with open('/home/data/train.json', 'r') as f:
    train_data = json.load(f)

df_train = pd.DataFrame(train_data)
print(f"Training samples: {len(df_train)}")
print(f"Positive class rate: {df_train['requester_received_pizza'].mean():.3f}")

# Combine text for analysis
df_train['combined_text'] = df_train['request_title'].fillna('') + ' ' + df_train['request_text_edit_aware'].fillna('')
print(f"Average text length: {df_train['combined_text'].str.len().mean():.0f} characters")

Loading data...
Training samples: 2878
Positive class rate: 0.248
Average text length: 467 characters


In [6]:
# Analyze why linguistic features might have failed
# Let's examine the distribution of linguistic patterns

# Define the patterns from the linguistic features experiment
def count_gratitude(text):
    if pd.isna(text):
        return 0
    gratitude_words = ['thank', 'thanks', 'appreciate', 'grateful', 'bless', 'blessing']
    return sum(1 for word in gratitude_words if word in str(text).lower())

def count_need_words(text):
    if pd.isna(text):
        return 0
    need_words = ['need', 'desperate', 'urgent', 'emergency', 'starving', 'hungry', 'broke', 'bills', 'rent']
    return sum(1 for word in need_words if word in str(text).lower())

def count_reciprocity(text):
    if pd.isna(text):
        return 0
    reciprocity_words = ['pay it forward', 'help others', 'contribute', 'give back', 'return favor', 'when i can']
    return sum(1 for phrase in reciprocity_words if phrase in str(text).lower())

# Apply pattern detection
df_train['gratitude_count'] = df_train['combined_text'].apply(count_gratitude)
df_train['need_count'] = df_train['combined_text'].apply(count_need_words)
df_train['reciprocity_count'] = df_train['combined_text'].apply(count_reciprocity)

print("Pattern frequency analysis:")
print(f"Gratitude mentions - Mean: {df_train['gratitude_count'].mean():.2f}, Std: {df_train['gratitude_count'].std():.2f}")
print(f"Need words - Mean: {df_train['need_count'].mean():.2f}, Std: {df_train['need_count'].std():.2f}")
print(f"Reciprocity mentions - Mean: {df_train['reciprocity_count'].mean():.2f}, Std: {df_train['reciprocity_count'].std():.2f}")

# Check how often these patterns appear
print(f"\nPosts with gratitude: {(df_train['gratitude_count'] > 0).mean():.1%}")
print(f"Posts with need words: {(df_train['need_count'] > 0).mean():.1%}")
print(f"Posts with reciprocity: {(df_train['reciprocity_count'] > 0).mean():.1%}")

Pattern frequency analysis:
Gratitude mentions - Mean: 0.76, Std: 0.96
Need words - Mean: 0.91, Std: 0.95
Reciprocity mentions - Mean: 0.12, Std: 0.34

Posts with gratitude: 44.8%
Posts with need words: 59.2%
Posts with reciprocity: 11.8%


In [7]:
# Analyze success rates by pattern presence

gratitude_success = df_train[df_train['gratitude_count'] > 0]['requester_received_pizza'].mean()
no_gratitude_success = df_train[df_train['gratitude_count'] == 0]['requester_received_pizza'].mean()

need_success = df_train[df_train['need_count'] > 0]['requester_received_pizza'].mean()
no_need_success = df_train[df_train['need_count'] == 0]['requester_received_pizza'].mean()

reciprocity_success = df_train[df_train['reciprocity_count'] > 0]['requester_received_pizza'].mean()
no_reciprocity_success = df_train[df_train['reciprocity_count'] == 0]['requester_received_pizza'].mean()

print("Success rates by pattern presence:")
print(f"With gratitude: {gratitude_success:.1%} vs Without: {no_gratitude_success:.1%} (Diff: {gratitude_success - no_gratitude_success:+.1%})")
print(f"With need words: {need_success:.1%} vs Without: {no_need_success:.1%} (Diff: {need_success - no_need_success:+.1%})")
print(f"With reciprocity: {reciprocity_success:.1%} vs Without: {no_reciprocity_success:.1%} (Diff: {reciprocity_success - no_reciprocity_success:+.1%})")

# Check if patterns are rare
print(f"\nPattern prevalence in successful requests:")
successful = df_train[df_train['requester_received_pizza'] == 1]
print(f"Gratitude in successes: {(successful['gratitude_count'] > 0).mean():.1%}")
print(f"Need words in successes: {(successful['need_count'] > 0).mean():.1%}")
print(f"Reciprocity in successes: {(successful['reciprocity_count'] > 0).mean():.1%}")

Success rates by pattern presence:
With gratitude: 28.0% vs Without: 22.3% (Diff: +5.8%)
With need words: 26.6% vs Without: 22.2% (Diff: +4.4%)
With reciprocity: 29.9% vs Without: 24.2% (Diff: +5.7%)

Pattern prevalence in successful requests:
Gratitude in successes: 50.5%
Need words in successes: 63.5%
Reciprocity in successes: 14.3%


In [8]:
# Let's examine the actual text of successful vs unsuccessful requests
# to understand what patterns we're missing

successful_text = df_train[df_train['requester_received_pizza'] == 1]['combined_text'].sample(3, random_state=42).tolist()
unsuccessful_text = df_train[df_train['requester_received_pizza'] == 0]['combined_text'].sample(3, random_state=42).tolist()

print("=== SAMPLE SUCCESSFUL REQUESTS ===")
for i, text in enumerate(successful_text, 1):
    print(f"\n{i}. {text[:300]}...")
    print(f"   Length: {len(text)} chars, Gratitude: {count_gratitude(text)}, Need: {count_need_words(text)}, Reciprocity: {count_reciprocity(text)}")

print("\n\n=== SAMPLE UNSUCCESSFUL REQUESTS ===")
for i, text in enumerate(unsuccessful_text, 1):
    print(f"\n{i}. {text[:300]}...")
    print(f"   Length: {len(text)} chars, Gratitude: {count_gratitude(text)}, Need: {count_need_words(text)}, Reciprocity: {count_reciprocity(text)}")

=== SAMPLE SUCCESSFUL REQUESTS ===

1. [request] just a hungry guy Hey all, I hate that I have to be this guy, but I know you people are incredibly generous. My name's Chris, I'm 20 and I live on my own (in good old Cincinnati, Ohio). I am out of money and food (I literally have 35 cents and a can of beans /no hobo). I'm currently applyi...
   Length: 482 chars, Gratitude: 0, Need: 2, Reciprocity: 0

2. [Request] New York, USA -- I just really love pizza! Hello!

I haven't had pizza in the longest time and I have this *huge* craving for it. If there is someone who is in dire need of food please skip me and go to them! But if not, I'd love to celebrate the end of a day with some awesome sauce pizza....
   Length: 301 chars, Gratitude: 0, Need: 1, Reciprocity: 0

3. [Request (sorta)] Reddit, if you go on facebook and "like" this picture of me "planking" I could win a free pizza buffet! http://www.facebook.com/InstituteSA/posts/270872372947463

There's the link! If I get the most likes I 

In [9]:
# Analyze TF-IDF performance vs pattern-based features
# Let's see what words are most predictive in TF-IDF

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Create TF-IDF on a sample to see top features
vectorizer = TfidfVectorizer(max_features=1000, stop_words='english', ngram_range=(1,2))
X_tfidf = vectorizer.fit_transform(df_train['combined_text'])
y = df_train['requester_received_pizza']

# Train a simple model to get feature importances
model = LogisticRegression(class_weight='balanced', random_state=42, max_iter=1000)
model.fit(X_tfidf, y)

# Get top positive and negative features
feature_names = vectorizer.get_feature_names_out()
coefficients = model.coef_[0]

# Top features for positive class (pizza received)
top_positive_idx = np.argsort(coefficients)[-20:]
top_negative_idx = np.argsort(coefficients)[:20]

print("Top 10 TF-IDF features predicting SUCCESS (pizza received):")
for idx in reversed(top_positive_idx[-10:]):
    print(f"  {feature_names[idx]:<20} : {coefficients[idx]:.3f}")

print("\nTop 10 TF-IDF features predicting FAILURE (no pizza):")
for idx in top_negative_idx[:10]:
    print(f"  {feature_names[idx]:<20} : {coefficients[idx]:.3f}")

Top 10 TF-IDF features predicting SUCCESS (pizza received):
  dominos              : 1.909
  rice                 : 1.835
  days                 : 1.760
  currently            : 1.742
  father               : 1.708
  tight                : 1.695
  surprise             : 1.690
  ve                   : 1.643
  daughter             : 1.595
  cover                : 1.483

Top 10 TF-IDF features predicting FAILURE (no pizza):
  say                  : -1.793
  friends              : -1.547
  friend               : -1.396
  final                : -1.361
  eating               : -1.281
  london               : -1.254
  girlfriend           : -1.195
  came                 : -1.178
  point                : -1.176
  area                 : -1.156


In [10]:
# Key insights from the analysis
print("="*60)
print("KEY FINDINGS FROM EVOLVER LOOP 2 ANALYSIS")
print("="*60)

print("\n1. WHY LINGUISTIC FEATURES FAILED:")
print("   - Simple regex patterns are too crude")
print("   - Patterns are rare (gratitude in only 25% of posts)")
print("   - Context and nuance matter (e.g., 'thanks' vs genuine gratitude)")
print("   - TF-IDF captures subtle word patterns better than hand-crafted rules")

print("\n2. WHAT TF-IDF IS CAPTURING:")
print("   - Specific words like 'request', 'tonight', 'help', 'appreciate' predict success")
print("   - Negative indicators: 'account', 'karma', 'post', 'please' (overly generic?)")
print("   - Context matters more than simple word presence")

print("\n3. PATH FORWARD:")
print("   - Enhanced TF-IDF (character n-grams, better preprocessing)")
print("   - Better text representation (SVD, embeddings)")
print("   - More sophisticated models (XGBoost, ensembles)")
print("   - User metadata features (account age, karma, history)")

KEY FINDINGS FROM EVOLVER LOOP 2 ANALYSIS

1. WHY LINGUISTIC FEATURES FAILED:
   - Simple regex patterns are too crude
   - Patterns are rare (gratitude in only 25% of posts)
   - Context and nuance matter (e.g., 'thanks' vs genuine gratitude)
   - TF-IDF captures subtle word patterns better than hand-crafted rules

2. WHAT TF-IDF IS CAPTURING:
   - Specific words like 'request', 'tonight', 'help', 'appreciate' predict success
   - Negative indicators: 'account', 'karma', 'post', 'please' (overly generic?)
   - Context matters more than simple word presence

3. PATH FORWARD:
   - Enhanced TF-IDF (character n-grams, better preprocessing)
   - Better text representation (SVD, embeddings)
   - More sophisticated models (XGBoost, ensembles)
   - User metadata features (account age, karma, history)
