# ESG Data Augmentation với Real Labels

Notebook này sẽ thực hiện:
1. **Đọc và phân tích dữ liệu ESG features** từ file `esg_features_with_tiers_labels.csv` 
2. **Augment dữ liệu** để tăng kích thước dataset bằng nhiều phương pháp
3. **Sử dụng real E, S, G labels** thay vì synthetic labels
4. **Xử lý đặc biệt cho integer columns** và ratio columns
5. **Output augmented dataset** với real labels để training

**Đặc điểm dataset:**
- Có real labels: `e_score`, `s_score`, `g_score`
- Tất cả columns đều integer, ngoại trừ `esg_pos_ratio`, `esg_neg_ratio`
- Xóa `esg_tier`, giữ `esg_cluster`
- Noise injection phải maintain integer constraints

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

print("Libraries imported successfully!")

Libraries imported successfully!


In [9]:
# Đọc dữ liệu ESG features với real labels
esg_data = pd.read_csv('esg_features_with_tiers_labels.csv')

# Xóa cột esg_tier như yêu cầu
if 'esg_tier' in esg_data.columns:
    esg_data = esg_data.drop('esg_tier', axis=1)
    print("✅ Removed 'esg_tier' column as requested")

print("=== ESG FEATURES DATA với REAL LABELS ===")
print(f"Dataset shape: {esg_data.shape}")
print(f"Number of companies: {len(esg_data)}")

# Identify column types for augmentation
ratio_cols = ['esg_pos_ratio', 'esg_neg_ratio']
label_cols = ['e_score', 's_score', 'g_score']
metadata_cols = ['filename', 'esg_cluster']
integer_cols = [col for col in esg_data.columns 
                if col not in ratio_cols + label_cols + metadata_cols]

print(f"\nColumn categorization:")
print(f"   Integer columns: {len(integer_cols)}")
print(f"   Ratio columns: {len(ratio_cols)} {ratio_cols}")
print(f"   Label columns: {len(label_cols)} {label_cols}")
print(f"   Metadata columns: {len(metadata_cols)} {metadata_cols}")

# Hiển thị một vài rows đầu
print(f"\nFirst 3 rows preview:")
display_cols = ['filename', 'total_esg_mentions', 'esg_pos_ratio', 'esg_neg_ratio', 'esg_cluster', 'e_score', 's_score', 'g_score']
print(esg_data[display_cols].head(3))

# Phân tích real ESG labels
print(f"\n=== REAL ESG LABELS ANALYSIS ===")
for label_col in label_cols:
    scores = esg_data[label_col]
    print(f"{label_col}: μ={scores.mean():.1f}, σ={scores.std():.1f}, range=[{scores.min():.1f}, {scores.max():.1f}]")

# ESG Cluster distribution
print(f"\nESG Cluster distribution:")
print(esg_data['esg_cluster'].value_counts().sort_index())

# Data quality checks
print(f"\n=== DATA QUALITY CHECKS ===")
print(f"Missing values: {esg_data.isnull().sum().sum()}")

# Check integer constraints
print(f"\nInteger columns validation (sample):")
for col in integer_cols[:5]:  # Check first 5 as sample
    is_integer = esg_data[col].apply(lambda x: x == int(x) if pd.notnull(x) else True).all()
    print(f"   {col}: {'✅' if is_integer else '❌'} All integers")

print(f"\nRatio columns validation:")
for col in ratio_cols:
    min_val, max_val = esg_data[col].min(), esg_data[col].max()
    print(f"   {col}: range=[{min_val:.3f}, {max_val:.3f}]")

✅ Removed 'esg_tier' column as requested
=== ESG FEATURES DATA với REAL LABELS ===
Dataset shape: (49, 63)
Number of companies: 49

Column categorization:
   Integer columns: 56
   Ratio columns: 2 ['esg_pos_ratio', 'esg_neg_ratio']
   Label columns: 3 ['e_score', 's_score', 'g_score']
   Metadata columns: 2 ['filename', 'esg_cluster']

First 3 rows preview:
          filename  total_esg_mentions  esg_pos_ratio  esg_neg_ratio  \
0  AR BBC 2023.txt                 355       0.814085       0.185915   
1  AR BBC 2024.txt                 708       0.922316       0.077684   
2  AR BID 2024.txt                 891       0.955129       0.044871   

   esg_cluster  e_score  s_score  g_score  
0            2     70.6     65.6     66.4  
1            2     68.1     71.0     73.0  
2            1     59.7     60.9     65.0  

=== REAL ESG LABELS ANALYSIS ===
e_score: μ=77.3, σ=14.2, range=[35.0, 95.0]
s_score: μ=77.2, σ=13.2, range=[35.0, 88.0]
g_score: μ=80.3, σ=11.2, range=[57.2, 95.0]

ESG Clu

In [10]:
# Data Augmentation Functions cho ESG Features với Real Labels
def augment_esg_data_with_labels(df, methods=['noise', 'interpolation', 'scaling', 'synthetic'], 
                                samples_per_method=2, noise_factor=0.05):
    """
    Augment ESG features data với real labels:
    
    - Maintain integer constraints cho tất cả columns ngoại trừ ratio columns
    - Preserve label relationships
    - Handle different column types appropriately
    """
    
    # Define column types
    ratio_cols = ['esg_pos_ratio', 'esg_neg_ratio']
    label_cols = ['e_score', 's_score', 'g_score'] 
    metadata_cols = ['filename', 'esg_cluster']
    integer_cols = [col for col in df.columns 
                    if col not in ratio_cols + label_cols + metadata_cols]
    
    print(f"🔧 Augmentation setup:")
    print(f"   Integer columns: {len(integer_cols)}")
    print(f"   Ratio columns: {len(ratio_cols)}")
    print(f"   Label columns: {len(label_cols)}")
    
    augmented_data = []
    
    # Giữ lại dữ liệu gốc
    print(f"📊 Original data: {len(df)} samples")
    for idx, row in df.iterrows():
        augmented_data.append(row.to_dict())
    
    # Method 1: Noise Injection với integer constraints
    if 'noise' in methods:
        print(f"🔊 Applying noise injection...")
        for i in range(samples_per_method):
            for idx, row in df.iterrows():
                new_row = row.copy()
                
                # Integer columns: add noise then round
                for col in integer_cols:
                    original_value = row[col]
                    # Tạo noise nhỏ cho integer columns
                    std = noise_factor * (abs(original_value) + 0.1)
                    noise = np.random.normal(0, std)
                    new_value = max(0, original_value + noise)
                    new_row[col] = int(round(new_value))  # Round to integer
                
                # Ratio columns: normal noise
                for col in ratio_cols:
                    original_value = row[col]
                    std = noise_factor * 0.1  # Smaller noise for ratios
                    noise = np.random.normal(0, std)
                    new_value = np.clip(original_value + noise, 0, 1)  # Keep in [0,1]
                    new_row[col] = new_value
                
                # Labels: small noise but reasonable ranges
                for col in label_cols:
                    original_value = row[col]
                    std = noise_factor * 5  # Allow up to ±2.5 point change
                    noise = np.random.normal(0, std)
                    new_value = np.clip(original_value + noise, 0, 100)  # Keep in [0,100]
                    new_row[col] = round(new_value, 1)
                
                # Metadata: keep original
                new_row['filename'] = f"{row['filename']}_noise_{i+1}"
                
                augmented_data.append(new_row)
    
    # Method 2: Interpolation
    if 'interpolation' in methods:
        print(f"🔄 Applying interpolation...")
        for i in range(samples_per_method):
            for idx in range(len(df)):
                # Chọn 2 samples ngẫu nhiên từ cùng cluster nếu có thể
                cluster = df.iloc[idx]['esg_cluster']
                same_cluster = df[df['esg_cluster'] == cluster]
                
                if len(same_cluster) > 1:
                    pair = same_cluster.sample(2)
                else:
                    pair = df.sample(2)
                
                row1, row2 = pair.iloc[0], pair.iloc[1]
                
                # Interpolate với weight ngẫu nhiên
                alpha = np.random.uniform(0.3, 0.7)
                new_row = {}
                
                # Integer columns: interpolate then round
                for col in integer_cols:
                    interpolated = alpha * row1[col] + (1 - alpha) * row2[col]
                    new_row[col] = int(round(interpolated))
                
                # Ratio columns: normal interpolation
                for col in ratio_cols:
                    new_row[col] = alpha * row1[col] + (1 - alpha) * row2[col]
                
                # Labels: interpolate
                for col in label_cols:
                    interpolated = alpha * row1[col] + (1 - alpha) * row2[col]
                    new_row[col] = round(interpolated, 1)
                
                # Metadata
                new_row['filename'] = f"{row1['filename']}_interp_{i+1}"
                new_row['esg_cluster'] = row1['esg_cluster']  # Keep from first sample
                
                augmented_data.append(new_row)
    
    # Method 3: Feature Scaling
    if 'scaling' in methods:
        print(f"📏 Applying feature scaling...")
        for i in range(samples_per_method):
            for idx, row in df.iterrows():
                new_row = row.copy()
                
                # Scale random subset of integer features
                features_to_scale = np.random.choice(integer_cols, 
                                                   size=max(1, len(integer_cols)//3), 
                                                   replace=False)
                
                for col in features_to_scale:
                    scale_factor = np.random.uniform(0.8, 1.2)
                    scaled_value = row[col] * scale_factor
                    new_row[col] = int(round(max(0, scaled_value)))
                
                # Adjust ratios slightly
                for col in ratio_cols:
                    scale_factor = np.random.uniform(0.95, 1.05)
                    scaled_value = row[col] * scale_factor
                    new_row[col] = np.clip(scaled_value, 0, 1)
                
                # Labels: minor scaling
                for col in label_cols:
                    scale_factor = np.random.uniform(0.95, 1.05)
                    scaled_value = row[col] * scale_factor
                    new_row[col] = round(np.clip(scaled_value, 0, 100), 1)
                
                new_row['filename'] = f"{row['filename']}_scale_{i+1}"
                
                augmented_data.append(new_row)
    
    # Method 4: Synthetic Generation based on clusters
    if 'synthetic' in methods:
        print(f"🤖 Applying synthetic generation...")
        for cluster in df['esg_cluster'].unique():
            cluster_data = df[df['esg_cluster'] == cluster]
            
            for i in range(samples_per_method):
                # Calculate cluster statistics
                new_row = {}
                
                # Integer columns: sample from cluster distribution
                for col in integer_cols:
                    cluster_mean = cluster_data[col].mean()
                    cluster_std = cluster_data[col].std()
                    if cluster_std == 0:
                        cluster_std = 0.1
                    
                    synthetic_value = np.random.normal(cluster_mean, cluster_std)
                    new_row[col] = int(round(max(0, synthetic_value)))
                
                # Ratio columns: cluster-based generation
                for col in ratio_cols:
                    cluster_mean = cluster_data[col].mean()
                    cluster_std = cluster_data[col].std()
                    if cluster_std == 0:
                        cluster_std = 0.01
                    
                    synthetic_value = np.random.normal(cluster_mean, cluster_std)
                    new_row[col] = np.clip(synthetic_value, 0, 1)
                
                # Labels: cluster-based with some variation
                for col in label_cols:
                    cluster_mean = cluster_data[col].mean()
                    cluster_std = cluster_data[col].std()
                    if cluster_std == 0:
                        cluster_std = 2.0
                    
                    synthetic_value = np.random.normal(cluster_mean, cluster_std)
                    new_row[col] = round(np.clip(synthetic_value, 0, 100), 1)
                
                # Metadata
                new_row['filename'] = f"synthetic_cluster_{cluster}_{i+1}.txt"
                new_row['esg_cluster'] = cluster
                
                augmented_data.append(new_row)
    
    return pd.DataFrame(augmented_data)

print("🛠️ Enhanced data augmentation functions defined successfully!")

🛠️ Enhanced data augmentation functions defined successfully!


In [11]:
# Thực hiện Data Augmentation với Real Labels
print("=== PERFORMING DATA AUGMENTATION với REAL LABELS ===")

# Augment data với enhanced function
augmented_esg_data = augment_esg_data_with_labels(
    esg_data, 
    methods=['noise', 'interpolation', 'scaling', 'synthetic'],
    samples_per_method=2,  # 2 samples per method per original sample
    noise_factor=0.04  # Noise factor nhỏ để giữ tính chất dữ liệu
)

print(f"\n=== AUGMENTATION RESULTS ===")
print(f"Original dataset: {esg_data.shape}")
print(f"Augmented dataset: {augmented_esg_data.shape}")
print(f"Increase factor: {len(augmented_esg_data) / len(esg_data):.1f}x")

# Kiểm tra integer constraints
print(f"\n=== INTEGER CONSTRAINT VALIDATION ===")
ratio_cols = ['esg_pos_ratio', 'esg_neg_ratio']
label_cols = ['e_score', 's_score', 'g_score']
metadata_cols = ['filename', 'esg_cluster']
integer_cols = [col for col in augmented_esg_data.columns 
                if col not in ratio_cols + label_cols + metadata_cols]

# Check first 5 integer columns
print("Integer columns validation (sample):")
for col in integer_cols[:5]:
    # Check if all values are integers
    is_integer = augmented_esg_data[col].apply(lambda x: x == int(x) if pd.notnull(x) else True).all()
    print(f"   {col}: {'✅' if is_integer else '❌'} All integers")

# Kiểm tra chất lượng augmentation
print(f"\n=== QUALITY CHECK ===")
key_features = ['total_esg_mentions', 'esg_pos_ratio', 'total_pos_environmental', 
                'total_pos_social', 'total_pos_governance']

print("Original data statistics:")
orig_stats = esg_data[key_features].describe()
print(orig_stats.loc[['mean', 'std', 'min', 'max']].round(2))

print("\nAugmented data statistics:")
aug_stats = augmented_esg_data[key_features].describe()
print(aug_stats.loc[['mean', 'std', 'min', 'max']].round(2))

# Kiểm tra sự khác biệt về mean và std
print(f"\n=== STATISTICAL COMPARISON ===")
for feature in key_features:
    orig_mean, orig_std = esg_data[feature].mean(), esg_data[feature].std()
    aug_mean, aug_std = augmented_esg_data[feature].mean(), augmented_esg_data[feature].std()
    
    mean_diff = abs(aug_mean - orig_mean) / orig_mean * 100
    std_diff = abs(aug_std - orig_std) / orig_std * 100 if orig_std > 0 else 0
    
    print(f"{feature}:")
    print(f"  Mean difference: {mean_diff:.1f}%")
    print(f"  Std difference: {std_diff:.1f}%")

=== PERFORMING DATA AUGMENTATION với REAL LABELS ===
🔧 Augmentation setup:
   Integer columns: 56
   Ratio columns: 2
   Label columns: 3
📊 Original data: 49 samples
🔊 Applying noise injection...
🔄 Applying interpolation...
📏 Applying feature scaling...
🤖 Applying synthetic generation...

=== AUGMENTATION RESULTS ===
Original dataset: (49, 63)
Augmented dataset: (351, 63)
Increase factor: 7.2x

=== INTEGER CONSTRAINT VALIDATION ===
Integer columns validation (sample):
   pos_env_climate_action: ✅ All integers
   neg_env_climate_action: ✅ All integers
   pos_env_energy_transition: ✅ All integers
   neg_env_energy_transition: ✅ All integers
   pos_env_water_stewardship: ✅ All integers

=== QUALITY CHECK ===
Original data statistics:
      total_esg_mentions  esg_pos_ratio  total_pos_environmental  \
mean              403.02           0.93                    61.41   
std               255.02           0.04                    29.08   
min                 3.00           0.81                

In [12]:
# Phân tích Real ESG Labels
print("=== ANALYZING REAL ESG LABELS ===")
print("✅ Using real E, S, G labels from dataset!")

# Thống kê về real labels
print(f"\n=== REAL ESG LABELS STATISTICS ===")
label_cols = ['e_score', 's_score', 'g_score']
label_stats = augmented_esg_data[label_cols].describe()
print(label_stats.round(2))

# Tính correlation giữa các scores
print(f"\n=== INTER-SCORE CORRELATIONS ===")
correlation_matrix = augmented_esg_data[label_cols].corr()
print(correlation_matrix.round(3))

# Phân tích distribution theo clusters
print(f"\n=== LABEL DISTRIBUTION BY CLUSTER ===")
for cluster in sorted(augmented_esg_data['esg_cluster'].unique()):
    cluster_data = augmented_esg_data[augmented_esg_data['esg_cluster'] == cluster]
    print(f"\nCluster {cluster} ({len(cluster_data)} samples):")
    for col in label_cols:
        mean_score = cluster_data[col].mean()
        std_score = cluster_data[col].std()
        print(f"   {col}: μ={mean_score:.1f}, σ={std_score:.1f}")

# Check for any anomalies
print(f"\n=== DATA QUALITY CHECKS ===")
for col in label_cols:
    scores = augmented_esg_data[col]
    print(f"{col}:")
    print(f"   Range: [{scores.min():.1f}, {scores.max():.1f}]")
    print(f"   Missing values: {scores.isnull().sum()}")
    print(f"   Outliers (>3σ): {len(scores[abs(scores - scores.mean()) > 3*scores.std()])}")

print(f"\n✅ Real labels analysis completed!")

=== ANALYZING REAL ESG LABELS ===
✅ Using real E, S, G labels from dataset!

=== REAL ESG LABELS STATISTICS ===
       e_score  s_score  g_score
count   351.00   351.00   351.00
mean     77.26    77.10    80.22
std      13.81    12.85    11.18
min      34.60    34.10    56.30
25%      71.35    70.95    71.85
50%      82.10    82.60    85.50
75%      87.40    87.90    87.90
max      98.40    92.40    99.30

=== INTER-SCORE CORRELATIONS ===
         e_score  s_score  g_score
e_score    1.000    0.955    0.847
s_score    0.955    1.000    0.815
g_score    0.847    0.815    1.000

=== LABEL DISTRIBUTION BY CLUSTER ===

Cluster 0 (163 samples):
   e_score: μ=83.2, σ=9.8
   s_score: μ=84.1, σ=10.1
   g_score: μ=87.0, σ=2.0

Cluster 1 (51 samples):
   e_score: μ=53.4, σ=4.0
   s_score: μ=56.3, σ=4.0
   g_score: μ=60.9, σ=3.0

Cluster 2 (93 samples):
   e_score: μ=71.9, σ=1.9
   s_score: μ=71.2, σ=4.7
   g_score: μ=72.1, σ=2.5

Cluster 3 (44 samples):
   e_score: μ=94.1, σ=1.8
   s_score: μ=87

In [13]:
# Chuẩn bị dữ liệu với Real Labels
print("=== PREPARING DATA với REAL LABELS ===")

# Xác định features và targets với real labels
exclude_cols = ['filename', 'esg_cluster', 'e_score', 's_score', 'g_score']
feature_cols = [col for col in augmented_esg_data.columns if col not in exclude_cols]
X = augmented_esg_data[feature_cols]

print(f"📊 Features prepared:")
print(f"  Number of features: {len(feature_cols)}")
print(f"  Data shape: {X.shape}")
print(f"  Sample features: {feature_cols[:8]}...")

print(f"\n📈 Real Target statistics:")
label_cols = ['e_score', 's_score', 'g_score']
for label_col in label_cols:
    scores = augmented_esg_data[label_col]
    print(f"  {label_col}: μ={scores.mean():.1f}, σ={scores.std():.1f}, range=[{scores.min():.1f}, {scores.max():.1f}]")

print(f"\n✅ Data preparation với real labels completed!")
print(f"\n📊 FINAL AUGMENTED DATASET với REAL LABELS:")
print(f"   • Original dataset: {len(esg_data)} companies")
print(f"   • Augmented dataset: {len(augmented_esg_data)} samples ({len(augmented_esg_data)/len(esg_data):.1f}x increase)")
print(f"   • Features: {len(feature_cols)} columns")
print(f"   • Real labels: E, S, G scores")

=== PREPARING DATA với REAL LABELS ===
📊 Features prepared:
  Number of features: 58
  Data shape: (351, 58)
  Sample features: ['pos_env_climate_action', 'neg_env_climate_action', 'pos_env_energy_transition', 'neg_env_energy_transition', 'pos_env_water_stewardship', 'neg_env_water_stewardship', 'pos_env_biodiversity_nature', 'neg_env_biodiversity_nature']...

📈 Real Target statistics:
  e_score: μ=77.3, σ=13.8, range=[34.6, 98.4]
  s_score: μ=77.1, σ=12.9, range=[34.1, 92.4]
  g_score: μ=80.2, σ=11.2, range=[56.3, 99.3]

✅ Data preparation với real labels completed!

📊 FINAL AUGMENTED DATASET với REAL LABELS:
   • Original dataset: 49 companies
   • Augmented dataset: 351 samples (7.2x increase)
   • Features: 58 columns
   • Real labels: E, S, G scores


In [14]:
# Save Augmented Dataset với Real Labels
print("=== SAVING AUGMENTED DATASET với REAL LABELS ===")

# Save the complete augmented dataset with real labels
output_filename = 'augmented_esg_dataset_with_real_labels.csv'
augmented_esg_data.to_csv(output_filename, index=False)

print(f"✅ Saved augmented dataset: {output_filename}")
print(f"   • Shape: {augmented_esg_data.shape}")
print(f"   • Columns: {list(augmented_esg_data.columns)}")

# Display sample of the final dataset
print(f"\n📋 Sample of augmented dataset với real labels:")
sample_cols = ['filename', 'total_esg_mentions', 'esg_pos_ratio', 'esg_cluster', 'e_score', 's_score', 'g_score']
if all(col in augmented_esg_data.columns for col in sample_cols):
    print(augmented_esg_data[sample_cols].head())

print(f"\n🎉 DATASET AUGMENTATION với REAL LABELS COMPLETED!")
print(f"🚀 Ready for XGBoost training với real E, S, G labels!")

# Final summary
print(f"\n📊 FINAL SUMMARY:")
print(f"   • Original dataset: {len(esg_data)} companies")
print(f"   • Augmented dataset: {len(augmented_esg_data)} samples ({len(augmented_esg_data)/len(esg_data):.1f}x increase)")
print(f"   • Features: {len([col for col in augmented_esg_data.columns if col not in ['filename', 'esg_cluster', 'e_score', 's_score', 'g_score']])} feature columns")
print(f"   • Real labels: E ({augmented_esg_data['e_score'].mean():.1f}±{augmented_esg_data['e_score'].std():.1f}), S ({augmented_esg_data['s_score'].mean():.1f}±{augmented_esg_data['s_score'].std():.1f}), G ({augmented_esg_data['g_score'].mean():.1f}±{augmented_esg_data['g_score'].std():.1f})")
print(f"   • Integer constraints: Maintained for all non-ratio columns")
print(f"   • Output file: {output_filename}")

# Validation summary
print(f"\n🔍 VALIDATION SUMMARY:")
ratio_cols = ['esg_pos_ratio', 'esg_neg_ratio']
label_cols = ['e_score', 's_score', 'g_score']
metadata_cols = ['filename', 'esg_cluster']
integer_cols = [col for col in augmented_esg_data.columns 
                if col not in ratio_cols + label_cols + metadata_cols]

print(f"   • Integer columns: {len(integer_cols)} (should all be integers)")
print(f"   • Ratio columns: {len(ratio_cols)} (float values in [0,1])")
print(f"   • Real labels: {len(label_cols)} (E, S, G scores)")
print(f"   • No esg_tier column: {'✅' if 'esg_tier' not in augmented_esg_data.columns else '❌'}")
print(f"   • esg_cluster preserved: {'✅' if 'esg_cluster' in augmented_esg_data.columns else '❌'}")

=== SAVING AUGMENTED DATASET với REAL LABELS ===
✅ Saved augmented dataset: augmented_esg_dataset_with_real_labels.csv
   • Shape: (351, 63)
   • Columns: ['filename', 'pos_env_climate_action', 'neg_env_climate_action', 'pos_env_energy_transition', 'neg_env_energy_transition', 'pos_env_water_stewardship', 'neg_env_water_stewardship', 'pos_env_biodiversity_nature', 'neg_env_biodiversity_nature', 'pos_env_pollution_prevention', 'neg_env_pollution_prevention', 'pos_env_circular_economy', 'neg_env_circular_economy', 'pos_env_sustainable_practices', 'neg_env_sustainable_practices', 'pos_social_diversity_inclusion', 'neg_social_diversity_inclusion', 'pos_social_workforce_development', 'neg_social_workforce_development', 'pos_social_health_safety', 'neg_social_health_safety', 'pos_social_human_rights', 'neg_social_human_rights', 'pos_social_community_engagement', 'neg_social_community_engagement', 'pos_social_customer_stakeholder', 'neg_social_customer_stakeholder', 'pos_social_financial_incl