# AI Challenge - Prediction of Spatial Cellular Organization from Histology Images

## Random submission notebook

This notebook attempts to understand how the leaderboard is scored. It generates random submissions, then modifies a portion of the data points, to study how the Leaderbord score changes.

# 1. Data Loading and Visualization

Below, we load the HE images (both Train and Test) from the HDF5 file and overlay the spot coordinates on the images.

In [1]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
config = {
    "n_rows": 500,
    "n_cols": 25
}

In [3]:
# Display spot table for Test slide (only the spot coordinates on 2D array)
with h5py.File("/kaggle/input/el-hackathon-2025/elucidata_ai_challenge_data.h5", "r") as f:
    test_spots = f["spots/Test"]
    spot_array = np.array(test_spots['S_7'])
    test_spot_table = pd.DataFrame(spot_array)
    
# Show the test spots coordinates for slide 'S_7'
test_spot_table

# Load training data
with h5py.File("/kaggle/input/el-hackathon-2025/elucidata_ai_challenge_data.h5", "r") as f:
    train_spots = f["spots/Train"]
    train_spot_tables = {slide_name: pd.DataFrame(np.array(train_spots[slide_name])) for slide_name in train_spots.keys()}

In [4]:
test_spot_table.Test_Set.value_counts()


Test_Set
2    1586
1     502
Name: count, dtype: int64

In [5]:
test_spot_table

Unnamed: 0,x,y,Test_Set
0,1499,1260,2
1,1435,1503,2
2,558,1082,2
3,736,1304,1
4,1257,1592,1
...,...,...,...
2083,736,639,2
2084,1016,684,2
2085,1181,839,2
2086,735,1436,1


# 3. Creating a Random Submission

In this section, we generate random predictions for the 35 cell types.  
The predictions are random floats between 0 and 2 (without any normalization) for each spot in the Test slide.  
The order of spots is preserved as in the test spots table.

In [6]:
def generate_permuted_submission(train_spot_tables, test_spot_table, seed=None):
    """
    Generate a submission DataFrame where each row with Test_Set==1 is filled
    with a random permutation of integers from 1 to 35 (or 1 to the total number of columns).
    
    Parameters:
        train_spot_tables (dict): For getting C1–C35 column names.
        test_spot_table (pd.DataFrame): Test spot table; must include a "Test_Set" column.
        seed (int, optional): Random seed for reproducibility.
    
    Returns:
        pd.DataFrame: Submission DataFrame with random permutations for rows where Test_Set==1.
    """
    import numpy as np
    import pandas as pd

    if seed is not None:
        np.random.seed(seed)

    # Get column names from the training table (assumes they start at the third column)
    all_columns = train_spot_tables['S_1'].columns[2:].tolist()
    total_columns = len(all_columns)
    
    # Filter the test_spot_table to only include rows where Test_Set equals 1
    submission_indices = test_spot_table[test_spot_table['Test_Set'] == 1].index
    num_rows = len(submission_indices)
    
    # Create a matrix where each row is a random permutation of numbers from 1 to total_columns
    matrix = np.array([np.random.permutation(np.arange(1, total_columns + 1)) for _ in range(num_rows)])
    
    submission = pd.DataFrame(matrix, index=submission_indices, columns=all_columns, dtype=int)
    return submission

# Example usage to generate a single submission:
submission_df = generate_permuted_submission(train_spot_tables, test_spot_table, seed=42)
print(submission_df)

# Example usage to generate multiple submission files:
import numpy as np
num_submissions = 100  # change this to however many files you need
for i in range(num_submissions):
    # Optionally vary the seed for each submission
    submission_df = generate_permuted_submission(train_spot_tables, test_spot_table, seed=np.random.randint(10000))
    filename = f"submission_{i}.csv"
    submission_df.to_csv(filename)
submission_df

      C1  C2  C3  C4  C5  C6  C7  C8  C9  C10  ...  C26  C27  C28  C29  C30  \
3     27  14  25  22  16  30  20  13   9   17  ...   11   23   19   26    7   
4     16  17   5  10  28  22  11   6  19   13  ...   30   14   25    4   18   
8      2  11   5  33   6   3  31  34  24   12  ...   32    8   15   16   14   
9     11  35   4  20  21  13   6  31  32   10  ...   16    1    2   12    8   
12    20   3   8  31   9  17   1  18  22   21  ...   15   35    7   23    5   
...   ..  ..  ..  ..  ..  ..  ..  ..  ..  ...  ...  ...  ...  ...  ...  ...   
2072  23  16  28   2  29  30  17  24  13   31  ...   22   26   19    9   25   
2078  24   9  12  19  34  25  16  33   6    1  ...   15   18    3    5   31   
2081  33  24  23  11  26  12  16  17  20    2  ...    6   29   21    3   22   
2082  19  21  24  26  28  20   5  27   3   32  ...   13   34   25   12   31   
2086  20   8   5  32  18   7  22  28   2   34  ...   30   12   15    9   21   

      C31  C32  C33  C34  C35  
3      21   35    8

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,...,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35
3,12,2,14,35,21,4,9,22,16,23,...,1,10,18,33,26,28,30,29,32,24
4,15,24,16,8,23,33,27,12,14,2,...,9,21,30,10,11,17,3,19,26,6
8,25,10,18,9,5,22,14,31,13,15,...,29,2,3,11,35,20,12,19,7,32
9,26,18,30,17,7,9,13,29,4,27,...,5,19,35,33,12,31,28,32,34,1
12,18,1,19,16,6,13,30,26,29,12,...,4,15,11,25,24,9,7,32,5,27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2072,16,14,26,18,4,30,19,8,17,35,...,27,24,20,15,13,2,33,9,34,29
2078,21,35,34,14,29,28,7,5,27,12,...,15,30,16,1,18,23,2,33,22,17
2081,26,13,15,16,17,25,31,28,30,32,...,23,11,2,7,19,4,1,33,18,14
2082,7,24,3,1,20,9,8,28,25,34,...,19,13,32,29,17,2,18,12,6,15


In [7]:
!ls

__notebook__.ipynb  submission_32.csv  submission_56.csv  submission_7.csv
submission_0.csv    submission_33.csv  submission_57.csv  submission_80.csv
submission_10.csv   submission_34.csv  submission_58.csv  submission_81.csv
submission_11.csv   submission_35.csv  submission_59.csv  submission_82.csv
submission_12.csv   submission_36.csv  submission_5.csv   submission_83.csv
submission_13.csv   submission_37.csv  submission_60.csv  submission_84.csv
submission_14.csv   submission_38.csv  submission_61.csv  submission_85.csv
submission_15.csv   submission_39.csv  submission_62.csv  submission_86.csv
submission_16.csv   submission_3.csv   submission_63.csv  submission_87.csv
submission_17.csv   submission_40.csv  submission_64.csv  submission_88.csv
submission_18.csv   submission_41.csv  submission_65.csv  submission_89.csv
submission_19.csv   submission_42.csv  submission_66.csv  submission_8.csv
submission_1.csv    submission_43.csv  submission_67.csv  submission_90.csv
s

# 4. Submission File Generation

Finally, we generate the submission file in the required format.  
Each row corresponds to a test spot with its identifier (constructed here as the index) followed by 35 predictions.

In [8]:
# Prepare submission DataFrame: spot_id column and then predictions for each cell type

submission_df.insert(0, 'ID', submission_df.index)

# Save the submission file as submission.csv
submission_df.to_csv("./submission.csv", index=False)
print("Submission file 'submission.csv' created!")

Submission file 'submission.csv' created!


In [9]:
!touch ~/.config/kaggle/kaggle.json
!chmod 600 ~/.config/kaggle/kaggle.json

touch: cannot touch '/root/.config/kaggle/kaggle.json': No such file or directory
chmod: cannot access '/root/.config/kaggle/kaggle.json': No such file or directory
