# AI Challenge - Prediction of Spatial Cellular Organization from Histology Images

## Random submission notebook

This notebook attempts to understand how the leaderboard is scored. It generates random submissions, then modifies a portion of the data points, to study how the Leaderbord score changes.

# 1. Data Loading and Visualization

Below, we load the HE images (both Train and Test) from the HDF5 file and overlay the spot coordinates on the images.

In [1]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
config = {
    "n_rows": 2000,
    "n_cols": 35
}

In [3]:
# Display spot table for Test slide (only the spot coordinates on 2D array)
with h5py.File("/kaggle/input/el-hackathon-2025/elucidata_ai_challenge_data.h5", "r") as f:
    test_spots = f["spots/Test"]
    spot_array = np.array(test_spots['S_7'])
    test_spot_table = pd.DataFrame(spot_array)
    
# Show the test spots coordinates for slide 'S_7'
test_spot_table

# Load training data
with h5py.File("/kaggle/input/el-hackathon-2025/elucidata_ai_challenge_data.h5", "r") as f:
    train_spots = f["spots/Train"]
    train_spot_tables = {slide_name: pd.DataFrame(np.array(train_spots[slide_name])) for slide_name in train_spots.keys()}

# 3. Creating a Random Submission

In this section, we generate random predictions for the 35 cell types.  
The predictions are random floats between 0 and 2 (without any normalization) for each spot in the Test slide.  
The order of spots is preserved as in the test spots table.

In [4]:
def generate_permuted_submission(train_spot_tables, test_spot_table, seed=None, 
                                 n_rows=None, n_columns=None):
    """
    Generate a submission where each row is a random permutation of integers 1 to 35,
    and values outside the top-left (n_rows x n_columns) block are zeroed.

    Parameters:
        train_spot_tables (dict): For getting C1–C35 column names.
        test_spot_table (pd.DataFrame): Test spot table; index is used.
        seed (int, optional): Random seed for reproducibility.
        n_rows (int, optional): How many rows (spots) retain their values.
        n_columns (int, optional): How many columns retain their values.
    
    Returns:
        pd.DataFrame: Submission DataFrame with shape (num_spots, 35)
    """
    if seed is not None:
        np.random.seed(seed)

    # Get column names
    all_columns = train_spot_tables['S_1'].columns[2:].tolist()
    total_columns = len(all_columns)
    
    if n_columns is None:
        n_columns = total_columns
    else:
        n_columns = min(n_columns, total_columns)
    
    indices = test_spot_table.index.values
    total_rows = len(indices)

    if n_rows is None:
        n_rows = total_rows
    else:
        n_rows = min(n_rows, total_rows)

    # Create full matrix with each row a random permutation of 1 to 35
    matrix = np.array([np.random.permutation(np.arange(1, total_columns + 1)) for _ in range(total_rows)])
    submission = pd.DataFrame(matrix, columns=all_columns, index=indices, dtype=int)

    # Zero out beyond n_rows and n_columns
    if n_rows < total_rows:
        submission.iloc[n_rows:, :] = 0
    if n_columns < total_columns:
        submission.iloc[:, n_columns:] = 0

    return submission
    
submission_df = generate_permuted_submission(
    train_spot_tables, test_spot_table, 
    seed=42, n_rows=config["n_rows"], n_columns=config["n_cols"]
)
submission_df

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,...,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35
0,27,14,25,22,16,30,20,13,9,17,...,11,23,19,26,7,21,35,8,15,29
1,16,17,5,10,28,22,11,6,19,13,...,30,14,25,4,18,34,9,21,7,3
2,2,11,5,33,6,3,31,34,24,12,...,32,8,15,16,14,4,30,10,26,18
3,11,35,4,20,21,13,6,31,32,10,...,16,1,2,12,8,9,7,28,5,33
4,20,3,8,31,9,17,1,18,22,21,...,15,35,7,23,5,30,4,34,32,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2083,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2084,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2085,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2086,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [5]:
submission_df

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,...,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35
0,27,14,25,22,16,30,20,13,9,17,...,11,23,19,26,7,21,35,8,15,29
1,16,17,5,10,28,22,11,6,19,13,...,30,14,25,4,18,34,9,21,7,3
2,2,11,5,33,6,3,31,34,24,12,...,32,8,15,16,14,4,30,10,26,18
3,11,35,4,20,21,13,6,31,32,10,...,16,1,2,12,8,9,7,28,5,33
4,20,3,8,31,9,17,1,18,22,21,...,15,35,7,23,5,30,4,34,32,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2083,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2084,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2085,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2086,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# 4. Submission File Generation

Finally, we generate the submission file in the required format.  
Each row corresponds to a test spot with its identifier (constructed here as the index) followed by 35 predictions.

In [6]:
# Prepare submission DataFrame: spot_id column and then predictions for each cell type

submission_df.insert(0, 'ID', submission_df.index)

# Save the submission file as submission.csv
submission_df.to_csv("./submission.csv", index=False)
print("Submission file 'submission.csv' created!")

Submission file 'submission.csv' created!
