# AI Challenge - Prediction of Spatial Cellular Organization from Histology Images

## Random submission notebook

This notebook attempts to understand how the leaderboard is scored. It generates random submissions, then modifies a portion of the data points, to study how the Leaderbord score changes.

# 1. Data Loading and Visualization

Below, we load the HE images (both Train and Test) from the HDF5 file and overlay the spot coordinates on the images.

In [1]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
# Display spot table for Test slide (only the spot coordinates on 2D array)
with h5py.File("/kaggle/input/el-hackathon-2025/elucidata_ai_challenge_data.h5", "r") as f:
    test_spots = f["spots/Test"]
    spot_array = np.array(test_spots['S_7'])
    test_spot_table = pd.DataFrame(spot_array)
    
# Show the test spots coordinates for slide 'S_7'
test_spot_table

# Load training data
with h5py.File("/kaggle/input/el-hackathon-2025/elucidata_ai_challenge_data.h5", "r") as f:
    train_spots = f["spots/Train"]
    train_spot_tables = {slide_name: pd.DataFrame(np.array(train_spots[slide_name])) for slide_name in train_spots.keys()}

# 3. Creating a Random Submission

In this section, we generate random predictions for the 35 cell types.  
The predictions are random floats between 0 and 2 (without any normalization) for each spot in the Test slide.  
The order of spots is preserved as in the test spots table.

In [3]:
import numpy as np
import pandas as pd

def generate_random_submission_partial_v2(train_spot_tables, test_spot_table, seed=None, 
                                            n_rows=None, n_columns=None, random_range=(0, 2)):
    """
    Generate a random submission DataFrame for cell type abundances as follows:
      1. Generate random values for the full table.
      2. Replace (zero out) entries outside the top-left submatrix defined by n_rows and n_columns.
      
    Parameters:
        train_spot_tables (dict): Dictionary of pandas DataFrames containing training spot data.
                                   It is assumed that train_spot_tables['S_1'] has cell type columns 
                                   starting at index 2 (e.g., 'C1', 'C2', ..., 'C35').
        test_spot_table (pd.DataFrame): DataFrame containing test spot data; its index is used for the submission.
        seed (int, optional): Random seed for reproducibility.
        n_rows (int, optional): Number of rows (spots) to retain random predictions.
                                All other rows will be set to 0. If None, use all rows.
        n_columns (int, optional): Number of cell type columns to retain random predictions.
                                   All other columns will be set to 0. If None, use all cell type columns.
        random_range (tuple, optional): (min, max) range for generating random values. Defaults to (0, 2).
    
    Returns:
        pd.DataFrame: A DataFrame with the same index as test_spot_table and columns corresponding to
                      the cell type fields. Only the top-left submatrix (n_rows x n_columns) contains random numbers;
                      the rest are 0.
    """
    # Set the random seed if provided
    if seed is not None:
        np.random.seed(seed)
    
    # Get all cell type column names (assumes the first two columns are not cell types)
    all_columns = train_spot_tables['S_1'].columns[2:].values
    total_columns = len(all_columns)
    
    # Number of test spots (rows)
    indices = test_spot_table.index.values
    total_rows = len(indices)
    
    # Generate a full table with random values in the given range
    submission = pd.DataFrame(
        (random_range[1] - random_range[0]) * np.random.rand(total_rows, total_columns) + random_range[0],
        columns=all_columns,
        index=indices,
        dtype=float
    )
    
    # Determine n_rows and n_columns (if not provided, use the full dimensions)
    if n_rows is None:
        n_rows = total_rows
    else:
        n_rows = min(n_rows, total_rows)
    if n_columns is None:
        n_columns = total_columns
    else:
        n_columns = min(n_columns, total_columns)
    
    # Zero out rows beyond n_rows and columns beyond n_columns
    submission.iloc[n_rows:, :] = 0   # Set all rows after the first n_rows to 0
    submission.iloc[:, n_columns:] = 0  # Set all columns after the first n_columns to 0
    
    return submission

# Example usage:
# Assuming you have loaded train_spot_tables (a dict with key 'S_1') and test_spot_table (a DataFrame):
submission_df = generate_random_submission_partial_v2(train_spot_tables, test_spot_table, 
                                                       seed=42, n_rows=300, n_columns=35, random_range=(0,2))
submission_df.head()


Unnamed: 0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,...,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35
0,0.74908,1.901429,1.463988,1.197317,0.312037,0.311989,0.116167,1.732352,1.20223,1.416145,...,1.570352,0.399348,1.028469,1.184829,0.092901,1.21509,0.341048,0.130103,1.897771,1.931264
1,1.616795,0.609228,0.195344,1.368466,0.880305,0.244076,0.990354,0.068777,1.818641,0.51756,...,0.777355,0.542698,1.657475,0.713507,0.561869,1.085392,0.281848,1.604394,0.149101,1.973774
2,1.54449,0.397431,0.011044,1.630923,1.413715,1.458014,1.542541,0.148089,0.716931,0.231738,...,0.987591,1.045466,0.855082,0.050838,0.215783,0.062858,1.272821,0.628712,1.017141,1.815133
3,0.498584,0.820766,1.511102,0.457596,0.15396,0.579503,0.322443,1.859395,1.616241,1.266808,...,0.834822,0.444216,0.239731,0.67523,1.885819,0.646406,1.037581,1.406038,0.727259,1.943564
4,1.924895,0.503565,0.994497,0.601757,0.569681,0.073774,1.219129,1.005358,0.102958,0.557293,...,1.670605,0.64156,0.373037,0.08155,1.181786,1.355129,0.033176,1.024186,0.452992,1.290346


# 4. Submission File Generation

Finally, we generate the submission file in the required format.  
Each row corresponds to a test spot with its identifier (constructed here as the index) followed by 35 predictions.

In [4]:
# Prepare submission DataFrame: spot_id column and then predictions for each cell type

submission_df.insert(0, 'ID', submission_df.index)

# Save the submission file as submission.csv
submission_df.to_csv("./submission.csv", index=False)
print("Submission file 'submission.csv' created!")

Submission file 'submission.csv' created!
