# Evolver Loop 1 Analysis

Analyzing the competition data and winning solutions to guide the next experiments.

**Current Status:**
- Best CV: 0.4619 (TF-IDF + Gradient Boosting)
- Target: 0.8782
- Gap: 0.4163 points

**Goal:** Understand what winning solutions did and identify the most promising directions.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
train_df = pd.read_csv('/home/data/train.csv')
test_df = pd.read_csv('/home/data/test.csv')

print(f"Train shape: {train_df.shape}")
print(f"Test shape: {test_df.shape}")
print(f"\nColumns: {train_df.columns.tolist()}")
print(f"\nScore distribution:")
print(train_df['score'].value_counts().sort_index())

Train shape: (36473, 5)
Test shape: (36, 4)

Columns: ['id', 'anchor', 'target', 'context', 'score']

Score distribution:
score
0.00     7471
0.25    11519
0.50    12300
0.75     4029
1.00     1154
Name: count, dtype: int64


In [2]:
# Analyze the context feature
print("Context (CPC) distribution:")
print(train_df['context'].value_counts().head(10))

print(f"\nNumber of unique contexts: {train_df['context'].nunique()}")

# Look at examples with different contexts
print("\nSample rows with different contexts:")
sample_contexts = train_df['context'].unique()[:5]
for ctx in sample_contexts:
    print(f"\nContext: {ctx}")
    print(train_df[train_df['context'] == ctx][['anchor', 'target', 'score']].head(3))

Context (CPC) distribution:
context
H01    2186
H04    2177
G01    1812
A61    1477
F16    1091
C07    1069
G06    1063
B60     916
B01     891
G02     877
Name: count, dtype: int64

Number of unique contexts: 106

Sample rows with different contexts:

Context: A47
      anchor                  target  score
0  abatement  abatement of pollution   0.50
1  abatement          act of abating   0.75
2  abatement         active catalyst   0.25

Context: A61
       anchor              target  score
21  abatement  emission abatement   0.50
22  abatement          prevention   0.50
23  abatement         prophylaxis   0.25

Context: A62
          anchor               target  score
24     abatement  pollution abatement    0.5
14280   gas leak                fault    0.5
14281   gas leak    gas leak detector    0.5

Context: C01
                anchor                 target  score
25           abatement    abatement apparatus    0.5
1261  ammonia recovery                ammonia    0.5
1262  ammonia

In [3]:
# Analyze phrase lengths
train_df['anchor_len'] = train_df['anchor'].str.len()
train_df['target_len'] = train_df['target'].str.len()
train_df['anchor_word_count'] = train_df['anchor'].str.split().str.len()
train_df['target_word_count'] = train_df['target'].str.split().str.len()

print("Anchor length statistics:")
print(train_df['anchor_len'].describe())
print("\nTarget length statistics:")
print(train_df['target_len'].describe())

print("\nAnchor word count statistics:")
print(train_df['anchor_word_count'].describe())
print("\nTarget word count statistics:")
print(train_df['target_word_count'].describe())

Anchor length statistics:
count    36473.000000
mean        15.991720
std          5.538241
min          3.000000
25%         12.000000
50%         15.000000
75%         19.000000
max         38.000000
Name: anchor_len, dtype: float64

Target length statistics:
count    36473.000000
mean        15.758780
std          6.872403
min          2.000000
25%         11.000000
50%         15.000000
75%         20.000000
max         98.000000
Name: target_len, dtype: float64

Anchor word count statistics:
count    36473.000000
mean         2.177885
std          0.641176
min          1.000000
25%          2.000000
50%          2.000000
75%          2.000000
max          5.000000
Name: anchor_word_count, dtype: float64

Target word count statistics:
count    36473.000000
mean         2.171195
std          0.849613
min          1.000000
25%          2.000000
50%          2.000000
75%          3.000000
max         15.000000
Name: target_word_count, dtype: float64


In [4]:
# Analyze score distribution by context
score_by_context = train_df.groupby('context')['score'].agg(['mean', 'std', 'count']).reset_index()
score_by_context = score_by_context.sort_values('mean', ascending=False)

print("Top 10 contexts by average score:")
print(score_by_context.head(10))

print("\nBottom 10 contexts by average score:")
print(score_by_context.tail(10))

Top 10 contexts by average score:
   context      mean       std  count
12     A62  0.445652  0.212621     23
75     F15  0.437500  0.314140     96
69     E06  0.435294  0.256327     85
40     B81  0.421053  0.334604     57
15     B02  0.416667  0.314309     63
82     F25  0.415441  0.196092     68
17     B05  0.413961  0.298449    308
95     G08  0.411850  0.233069    173
2      A22  0.410714  0.237116     70
38     B66  0.403766  0.283745    239

Bottom 10 contexts by average score:
   context      mean       std  count
63     D21  0.327500  0.234988    300
54     C21  0.326705  0.246696     88
44     C04  0.326149  0.245167    348
58     D01  0.325826  0.253873    333
57     C25  0.322222  0.209482     90
46     C07  0.322030  0.235693   1069
45     C06  0.319672  0.242088     61
48     C09  0.319620  0.236289    553
51     C12  0.311216  0.247025    633
43     C03  0.308282  0.234998    163


In [5]:
# Look at examples with high and low scores to understand patterns
print("Examples with score = 1.0 (perfect match):")
high_score = train_df[train_df['score'] == 1.0].sample(5, random_state=42)
for _, row in high_score.iterrows():
    print(f"Context: {row['context']} | Anchor: '{row['anchor']}' | Target: '{row['target']}'")

print("\nExamples with score = 0.0 (no match):")
low_score = train_df[train_df['score'] == 0.0].sample(5, random_state=42)
for _, row in low_score.iterrows():
    print(f"Context: {row['context']} | Anchor: '{row['anchor']}' | Target: '{row['target']}'")

Examples with score = 1.0 (perfect match):
Context: C04 | Anchor: 'provide in amounts' | Target: 'provide amounts'
Context: H01 | Anchor: 'hinge mechanisms' | Target: 'hinging mechanisms'
Context: F03 | Anchor: 'pushing pin' | Target: 'pushing up pin'
Context: G01 | Anchor: 'microchambers' | Target: 'micro chambers'
Context: H04 | Anchor: 'fiber slack' | Target: 'fiber slacks'

Examples with score = 0.0 (no match):
Context: A46 | Anchor: 'opposing walls' | Target: 'handles'
Context: C07 | Anchor: 'azabicyclo' | Target: 'nitrogen oxide'
Context: H04 | Anchor: 'network load information' | Target: 'vehicle load'
Context: G01 | Anchor: 'electrical current distribution' | Target: 'wholesale distribution'
Context: F24 | Anchor: 'indoor room' | Target: 'indoor games'


In [6]:
# Analyze semantic relationships - look for synonyms and paraphrases
print("Examples showing semantic similarity (score >= 0.75):")
high_sim = train_df[train_df['score'] >= 0.75].sample(10, random_state=42)
for _, row in high_sim.iterrows():
    print(f"Score: {row['score']} | Context: {row['context']}")
    print(f"  Anchor: '{row['anchor']}'")
    print(f"  Target: '{row['target']}'")
    print()

print("\nExamples requiring context understanding:")
# Find examples where same anchor-target pair has different scores in different contexts
from collections import defaultdict
pair_scores = defaultdict(list)
for _, row in train_df.iterrows():
    pair = (row['anchor'], row['target'])
    pair_scores[pair].append((row['context'], row['score']))

context_dependent = {k: v for k, v in pair_scores.items() if len(v) > 1 and len(set([s for _, s in v])) > 1}
print(f"Found {len(context_dependent)} anchor-target pairs with context-dependent scores")

if context_dependent:
    print("\nSample context-dependent pairs:")
    for i, (pair, scores) in enumerate(list(context_dependent.items())[:3]):
        print(f"Pair {i+1}: '{pair[0]}' - '{pair[1]}'")
        for ctx, score in scores:
            print(f"  Context {ctx}: score = {score}")
        print()

Examples showing semantic similarity (score >= 0.75):
Score: 0.75 | Context: B29
  Anchor: 'toolpaths'
  Target: 'tool position'

Score: 0.75 | Context: H04
  Anchor: 'ecn'
  Target: 'electronic communication network'

Score: 0.75 | Context: F28
  Anchor: 'battery cell assembly'
  Target: 'battery assembly'

Score: 0.75 | Context: F21
  Anchor: 'resilient spring clip'
  Target: 'flexible retaining member'

Score: 1.0 | Context: B63
  Anchor: 'water intake'
  Target: 'water intake'

Score: 0.75 | Context: G11
  Anchor: 'lifting finger'
  Target: 'lifter member'

Score: 0.75 | Context: H05
  Anchor: 'moisture proof film'
  Target: 'waterproofing layer'

Score: 0.75 | Context: B29
  Anchor: 'project onto surface'
  Target: 'project onto exterior'

Score: 1.0 | Context: B27
  Anchor: 'pneumatic logic'
  Target: 'pneumatic logic'

Score: 0.75 | Context: A61
  Anchor: 'morpholin'
  Target: 'diethylenimide oxide'


Examples requiring context understanding:


Found 674 anchor-target pairs with context-dependent scores

Sample context-dependent pairs:
Pair 1: 'abnormal position' - 'open position'
  Context B23: score = 0.25
  Context E03: score = 0.5

Pair 2: 'absorbent properties' - 'physical properties'
  Context C08: score = 0.25
  Context D01: score = 0.5

Pair 3: 'absorbent properties' - 'properties'
  Context C08: score = 0.25
  Context D01: score = 0.5

