# Loop 44 Analysis: HFIP Outlier and Hybrid Model Strategy

## Key Findings from Experiment 043

1. **Non-linear mixture features improve mixture predictions by 12.5%** (0.084319 → 0.073776)
2. **Single solvent CV got 9.8% worse** (0.008194 → 0.008994)
3. **HFIP_2-MeTHF ramp has MSE = 0.583061** - 40x higher than typical ramps!
4. **This single outlier dominates the error**

## Evaluator's Key Insight

The HFIP outlier is the "smoking gun" for the CV-LB gap. If we can:
1. Use non-linear features ONLY for mixtures (not single solvents)
2. Target the HFIP outlier specifically

We might be able to change the CV-LB relationship.

In [1]:
# Load data and analyze the HFIP outlier
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

DATA_PATH = '/home/data'

# Load data
df_single = pd.read_csv(f'{DATA_PATH}/catechol_single_solvent_yields.csv')
df_full = pd.read_csv(f'{DATA_PATH}/catechol_full_data_yields.csv')
spange_df = pd.read_csv(f'{DATA_PATH}/spange_descriptors_lookup.csv', index_col=0)

print(f'Single solvent: {len(df_single)} samples')
print(f'Full data: {len(df_full)} samples')
print(f'Spange descriptors: {len(spange_df)} solvents')

Single solvent: 656 samples
Full data: 1227 samples
Spange descriptors: 26 solvents


In [2]:
# Analyze HFIP properties
print('=== HFIP (1,1,1,3,3,3-Hexafluoropropan-2-ol) Properties ===')
print()

# Find HFIP in Spange descriptors
hfip_name = '1,1,1,3,3,3-Hexafluoropropan-2-ol'
if hfip_name in spange_df.index:
    hfip_props = spange_df.loc[hfip_name]
    print('HFIP Spange descriptors:')
    for col in spange_df.columns:
        if col != 'solvent smiles':
            val = hfip_props[col]
            # Compare to mean
            mean_val = spange_df[col].mean() if col != 'solvent smiles' else None
            if mean_val is not None:
                ratio = val / mean_val if mean_val != 0 else float('inf')
                print(f'  {col}: {val:.4f} (mean: {mean_val:.4f}, ratio: {ratio:.2f}x)')
else:
    print(f'HFIP not found in Spange descriptors')
    print(f'Available solvents: {list(spange_df.index)}')

=== HFIP (1,1,1,3,3,3-Hexafluoropropan-2-ol) Properties ===

HFIP Spange descriptors:
  dielectric constant: 16.7000 (mean: 20.5505, ratio: 0.81x)
  ET(30): 62.1000 (mean: 46.5319, ratio: 1.33x)
  alpha: 1.9600 (mean: 0.5284, ratio: 3.71x)
  beta: 0.0000 (mean: 0.4811, ratio: 0.00x)
  pi*: 0.6500 (mean: 0.6152, ratio: 1.06x)
  SA: 1.0110 (mean: 0.3256, ratio: 3.10x)
  SB: 0.0140 (mean: 0.4568, ratio: 0.03x)
  SP: 0.4990 (mean: 0.6609, ratio: 0.76x)
  SdP: 1.4540 (mean: 0.7496, ratio: 1.94x)
  N: 0.0095 (mean: 0.0163, ratio: 0.58x)
  n: 1.2750 (mean: 1.3741, ratio: 0.93x)
  f(n): 0.1726 (mean: 0.2281, ratio: 0.76x)
  delta: 19.3000 (mean: 23.3969, ratio: 0.82x)


In [3]:
# Analyze HFIP mixtures in the full data
print('\n=== HFIP Mixtures in Full Data ===')
print()

# Find all HFIP-containing mixtures
hfip_mixtures = df_full[(df_full['SOLVENT A NAME'] == hfip_name) | (df_full['SOLVENT B NAME'] == hfip_name)]
print(f'Number of HFIP-containing samples: {len(hfip_mixtures)}')

if len(hfip_mixtures) > 0:
    print(f'\nUnique HFIP mixture partners:')
    partners_a = hfip_mixtures[hfip_mixtures['SOLVENT A NAME'] == hfip_name]['SOLVENT B NAME'].unique()
    partners_b = hfip_mixtures[hfip_mixtures['SOLVENT B NAME'] == hfip_name]['SOLVENT A NAME'].unique()
    all_partners = set(partners_a) | set(partners_b)
    for p in all_partners:
        count = len(hfip_mixtures[(hfip_mixtures['SOLVENT A NAME'] == p) | (hfip_mixtures['SOLVENT B NAME'] == p)])
        print(f'  - {p}: {count} samples')


=== HFIP Mixtures in Full Data ===

Number of HFIP-containing samples: 124

Unique HFIP mixture partners:
  - 2-Methyltetrahydrofuran [2-MeTHF]: 124 samples


In [4]:
# Compare HFIP to other fluorinated alcohols
print('\n=== Fluorinated Alcohols Comparison ===')
print()

# Find TFE (2,2,2-Trifluoroethanol)
tfe_name = '2,2,2-Trifluoroethanol'

if tfe_name in spange_df.index and hfip_name in spange_df.index:
    print('Comparing HFIP vs TFE vs Mean:')
    print()
    for col in spange_df.columns:
        if col != 'solvent smiles':
            hfip_val = spange_df.loc[hfip_name, col]
            tfe_val = spange_df.loc[tfe_name, col]
            mean_val = spange_df[col].mean()
            std_val = spange_df[col].std()
            
            # Z-scores
            hfip_z = (hfip_val - mean_val) / std_val if std_val > 0 else 0
            tfe_z = (tfe_val - mean_val) / std_val if std_val > 0 else 0
            
            print(f'{col}:')
            print(f'  HFIP: {hfip_val:.4f} (z={hfip_z:.2f})')
            print(f'  TFE:  {tfe_val:.4f} (z={tfe_z:.2f})')
            print(f'  Mean: {mean_val:.4f} (std={std_val:.4f})')
            print()


=== Fluorinated Alcohols Comparison ===

Comparing HFIP vs TFE vs Mean:

dielectric constant:
  HFIP: 16.7000 (z=-0.19)
  TFE:  8.5500 (z=-0.59)
  Mean: 20.5505 (std=20.1764)

ET(30):
  HFIP: 62.1000 (z=1.64)
  TFE:  59.8000 (z=1.40)
  Mean: 46.5319 (std=9.4968)

alpha:
  HFIP: 1.9600 (z=2.51)
  TFE:  1.5100 (z=1.72)
  Mean: 0.5284 (std=0.5712)

beta:
  HFIP: 0.0000 (z=-2.04)
  TFE:  0.0000 (z=-2.04)
  Mean: 0.4811 (std=0.2358)

pi*:
  HFIP: 0.6500 (z=0.15)
  TFE:  0.7300 (z=0.49)
  Mean: 0.6152 (std=0.2354)

SA:
  HFIP: 1.0110 (z=1.84)
  TFE:  0.8930 (z=1.52)
  Mean: 0.3256 (std=0.3727)

SB:
  HFIP: 0.0140 (z=-1.69)
  TFE:  0.1070 (z=-1.34)
  Mean: 0.4568 (std=0.2618)

SP:
  HFIP: 0.4990 (z=-2.60)
  TFE:  0.5430 (z=-1.89)
  Mean: 0.6609 (std=0.0623)

SdP:
  HFIP: 1.4540 (z=2.52)
  TFE:  0.9220 (z=0.62)
  Mean: 0.7496 (std=0.2791)

N:
  HFIP: 0.0095 (z=-0.58)
  TFE:  0.0139 (z=-0.20)
  Mean: 0.0163 (std=0.0117)

n:
  HFIP: 1.2750 (z=-2.21)
  TFE:  1.3000 (z=-1.65)
  Mean: 1.3741 (std=

In [5]:
# Analyze the per-ramp MSE distribution from exp_043
print('\n=== Per-Ramp MSE Analysis from Exp 043 ===')
print()

# From the experiment output:
ramp_mses = {
    'Methanol_Ethylene Glycol': 0.013315,
    'HFIP_2-MeTHF': 0.583061,
    'Cyclohexane_IPA': 0.108047,
    'Water.Acetonitrile_Acetonitrile': 0.017854,
    'Acetonitrile_Acetonitrile.Acetic Acid': 0.022840,
    '2-MeTHF_Diethyl Ether': 0.090010,
    'TFE_Water.TFE': 0.010383,
    'DMA_Decanol': 0.011805,
    'Ethanol_THF': 0.034359,
    'Cyrene_Ethyl Acetate': 0.005310,
    'MTBE_Butanone': 0.012690,
    'tert-Butanol_Dimethyl Carbonate': 0.019722,
    'Methyl Propionate_Ethyl Lactate': 0.166745
}

# Sort by MSE
sorted_ramps = sorted(ramp_mses.items(), key=lambda x: x[1], reverse=True)

print('Ramps sorted by MSE (highest first):')
for ramp, mse in sorted_ramps:
    print(f'  {ramp}: {mse:.6f}')

# Calculate statistics
mse_values = list(ramp_mses.values())
print(f'\nStatistics:')
print(f'  Mean MSE: {np.mean(mse_values):.6f}')
print(f'  Median MSE: {np.median(mse_values):.6f}')
print(f'  Std MSE: {np.std(mse_values):.6f}')
print(f'  Min MSE: {np.min(mse_values):.6f}')
print(f'  Max MSE: {np.max(mse_values):.6f}')

# HFIP contribution
hfip_mse = ramp_mses['HFIP_2-MeTHF']
total_mse = sum(mse_values)
print(f'\nHFIP contribution: {hfip_mse/total_mse*100:.1f}% of total MSE')


=== Per-Ramp MSE Analysis from Exp 043 ===

Ramps sorted by MSE (highest first):
  HFIP_2-MeTHF: 0.583061
  Methyl Propionate_Ethyl Lactate: 0.166745
  Cyclohexane_IPA: 0.108047
  2-MeTHF_Diethyl Ether: 0.090010
  Ethanol_THF: 0.034359
  Acetonitrile_Acetonitrile.Acetic Acid: 0.022840
  tert-Butanol_Dimethyl Carbonate: 0.019722
  Water.Acetonitrile_Acetonitrile: 0.017854
  Methanol_Ethylene Glycol: 0.013315
  MTBE_Butanone: 0.012690
  DMA_Decanol: 0.011805
  TFE_Water.TFE: 0.010383
  Cyrene_Ethyl Acetate: 0.005310

Statistics:
  Mean MSE: 0.084319
  Median MSE: 0.019722
  Std MSE: 0.151439
  Min MSE: 0.005310
  Max MSE: 0.583061

HFIP contribution: 53.2% of total MSE


In [6]:
# Calculate what happens if we remove HFIP
print('\n=== Impact of Removing HFIP ===')
print()

# Without HFIP
mse_without_hfip = [v for k, v in ramp_mses.items() if 'HFIP' not in k]
mean_without_hfip = np.mean(mse_without_hfip)

print(f'Mean MSE with HFIP: {np.mean(mse_values):.6f}')
print(f'Mean MSE without HFIP: {mean_without_hfip:.6f}')
print(f'Improvement: {(np.mean(mse_values) - mean_without_hfip) / np.mean(mse_values) * 100:.1f}%')

# What if we could perfectly predict HFIP?
print(f'\nIf HFIP MSE = 0:')
mse_with_perfect_hfip = [v if 'HFIP' not in k else 0 for k, v in ramp_mses.items()]
mean_with_perfect_hfip = np.mean(mse_with_perfect_hfip)
print(f'  Mean MSE: {mean_with_perfect_hfip:.6f}')
print(f'  Improvement: {(np.mean(mse_values) - mean_with_perfect_hfip) / np.mean(mse_values) * 100:.1f}%')


=== Impact of Removing HFIP ===

Mean MSE with HFIP: 0.084319
Mean MSE without HFIP: 0.042757
Improvement: 49.3%

If HFIP MSE = 0:
  Mean MSE: 0.039468
  Improvement: 53.2%


In [7]:
# Analyze the hybrid model strategy
print('\n=== Hybrid Model Strategy Analysis ===')
print()

# From exp_043:
# - Single solvent CV: 0.008994 (baseline: 0.008194)
# - Mixture CV with non-linear: 0.073776 (baseline: 0.084319)

# If we use:
# - Baseline for single solvents: 0.008194
# - Non-linear for mixtures: 0.073776

n_single = 656
n_full = 1227
total = n_single + n_full

# Hybrid approach
hybrid_single = 0.008194  # baseline
hybrid_mixture = 0.073776  # non-linear
hybrid_combined = (n_single * hybrid_single + n_full * hybrid_mixture) / total

# Baseline approach
baseline_single = 0.008194
baseline_mixture = 0.084319
baseline_combined = (n_single * baseline_single + n_full * baseline_mixture) / total

print(f'Baseline approach:')
print(f'  Single: {baseline_single:.6f}')
print(f'  Mixture: {baseline_mixture:.6f}')
print(f'  Combined: {baseline_combined:.6f}')

print(f'\nHybrid approach:')
print(f'  Single: {hybrid_single:.6f}')
print(f'  Mixture: {hybrid_mixture:.6f}')
print(f'  Combined: {hybrid_combined:.6f}')

print(f'\nImprovement: {(baseline_combined - hybrid_combined) / baseline_combined * 100:.1f}%')


=== Hybrid Model Strategy Analysis ===

Baseline approach:
  Single: 0.008194
  Mixture: 0.084319
  Combined: 0.057799

Hybrid approach:
  Single: 0.008194
  Mixture: 0.073776
  Combined: 0.050929

Improvement: 11.9%


In [8]:
# Analyze the CV-LB relationship with the hybrid approach
print('\n=== CV-LB Relationship Analysis ===')
print()

# Current relationship: LB = 4.31*CV + 0.0525
# Best CV: 0.008194 → LB = 0.0877

# If hybrid CV = 0.051207 (from exp_043)
# Predicted LB = 4.31 * 0.051207 + 0.0525 = 0.273
# This is WORSE because mixture CV is much higher

# BUT the competition may weight single vs mixture differently
# Let's assume the competition uses the same weighting as our CV

print('Current CV-LB relationship: LB = 4.31*CV + 0.0525')
print()

# Our best submission (exp_030)
print('Best submission (exp_030):')
print(f'  CV: 0.008298')
print(f'  LB: 0.0877')
print(f'  Predicted LB: {4.31 * 0.008298 + 0.0525:.4f}')

# If we submit hybrid
print(f'\nHybrid approach:')
print(f'  Combined CV: {hybrid_combined:.6f}')
print(f'  Predicted LB: {4.31 * hybrid_combined + 0.0525:.4f}')

print('\nNOTE: The hybrid approach has HIGHER combined CV because mixture CV is much higher.')
print('This suggests the competition may weight single solvents more heavily.')


=== CV-LB Relationship Analysis ===

Current CV-LB relationship: LB = 4.31*CV + 0.0525

Best submission (exp_030):
  CV: 0.008298
  LB: 0.0877
  Predicted LB: 0.0883

Hybrid approach:
  Combined CV: 0.050929
  Predicted LB: 0.2720

NOTE: The hybrid approach has HIGHER combined CV because mixture CV is much higher.
This suggests the competition may weight single solvents more heavily.


In [9]:
# Key insight: The competition CV scheme may be different
print('\n=== Competition CV Scheme Analysis ===')
print()

# From the template notebook:
# 1. Single solvent: leave-one-solvent-out (24 folds)
# 2. Full data: leave-one-ramp-out (13 folds)
# 3. Combined score is the average of both

# BUT our best LB (0.0877) is much lower than the combined CV would suggest
# This implies the competition may weight single solvents more heavily

# Let's calculate what weighting would give LB = 0.0877 with CV = 0.008298
# LB = 4.31*CV + 0.0525 = 4.31*0.008298 + 0.0525 = 0.0883
# This is close to 0.0877, so the relationship holds

# BUT if we use the combined CV (0.051207), the predicted LB would be:
# LB = 4.31*0.051207 + 0.0525 = 0.273
# This is much higher than 0.0877

print('The discrepancy suggests:')
print('1. The competition CV is based on SINGLE SOLVENT only, not combined')
print('2. OR the competition uses a different weighting scheme')
print('3. OR the mixture data is evaluated separately')

print('\nThis is a CRITICAL insight!')
print('If the LB is based on single solvent CV only, then:')
print('  - Improving mixture predictions may not help LB')
print('  - We should focus on single solvent predictions')
print('  - The non-linear mixture features may not help LB')


=== Competition CV Scheme Analysis ===

The discrepancy suggests:
1. The competition CV is based on SINGLE SOLVENT only, not combined
2. OR the competition uses a different weighting scheme
3. OR the mixture data is evaluated separately

This is a CRITICAL insight!
If the LB is based on single solvent CV only, then:
  - Improving mixture predictions may not help LB
  - We should focus on single solvent predictions
  - The non-linear mixture features may not help LB


In [10]:
# Summary and recommendations
print('\n=== Summary and Recommendations ===')
print()

print('KEY FINDINGS:')
print('1. HFIP_2-MeTHF ramp has MSE = 0.583, 40x higher than typical ramps')
print('2. HFIP contributes 53% of total mixture MSE')
print('3. Non-linear features improve mixture CV by 12.5% (0.084 → 0.074)')
print('4. BUT single solvent CV got 9.8% worse (0.008194 → 0.008994)')
print('5. The LB appears to be based on single solvent CV, not combined')
print()

print('STRATEGIC IMPLICATIONS:')
print('1. The hybrid approach (baseline for single + non-linear for mixture) may not help LB')
print('2. If LB is based on single solvent CV, we should focus on that')
print('3. The HFIP outlier may not affect LB if mixture data is weighted differently')
print()

print('RECOMMENDED NEXT STEPS:')
print('1. Submit the hybrid model to verify if mixture improvements help LB')
print('2. If not, focus on single solvent improvements')
print('3. Try HFIP-specific handling (remove from training or special model)')
print('4. Analyze what makes HFIP different and add specific features')
print()

print('REMAINING SUBMISSIONS: 4')
print('TARGET: 0.0347')
print('BEST LB: 0.0877 (2.53x gap)')


=== Summary and Recommendations ===

KEY FINDINGS:
1. HFIP_2-MeTHF ramp has MSE = 0.583, 40x higher than typical ramps
2. HFIP contributes 53% of total mixture MSE
3. Non-linear features improve mixture CV by 12.5% (0.084 → 0.074)
4. BUT single solvent CV got 9.8% worse (0.008194 → 0.008994)
5. The LB appears to be based on single solvent CV, not combined

STRATEGIC IMPLICATIONS:
1. The hybrid approach (baseline for single + non-linear for mixture) may not help LB
2. If LB is based on single solvent CV, we should focus on that
3. The HFIP outlier may not affect LB if mixture data is weighted differently

RECOMMENDED NEXT STEPS:
1. Submit the hybrid model to verify if mixture improvements help LB
2. If not, focus on single solvent improvements
3. Try HFIP-specific handling (remove from training or special model)
4. Analyze what makes HFIP different and add specific features

REMAINING SUBMISSIONS: 4
TARGET: 0.0347
BEST LB: 0.0877 (2.53x gap)
