# AI Challenge - Prediction of Spatial Cellular Organization from Histology Images

## Random submission notebook

This notebook attempts to understand how the leaderboard is scored. It generates random submissions, then modifies a portion of the data points, to study how the Leaderbord score changes.

# 1. Data Loading and Visualization

Below, we load the HE images (both Train and Test) from the HDF5 file and overlay the spot coordinates on the images.

In [7]:
!python -m pip install -q h5py pandas matplotlib numpy kaggle


[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [10]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import kaggle

In [13]:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.model_list_cli()

Next Page Token = CfDJ8NaUH7gUrDtFgwt-2-W4oQAmupZSooi1M6KCUklRdiUfHa-sld6YBgUoAJ0QMidqaXoz8fag1rL4EKZ2JQPh248
    id  ref                                                        title                                      subtitle                                                                                                                                                                                                                                                       author            
------  ---------------------------------------------------------  -----------------------------------------  -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  ----------------  
225262  deepseek-ai/deepseek-r1                                    DeepSeek R1                                We introduce

In [None]:
from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()



Downloading elucidata_ai_challenge_data.h5 to ./data


100%|██████████| 318M/318M [00:00<00:00, 1.16GB/s]





In [None]:
# Downloads the file from the competition "el-hackathon-2025"
api.competition_download_file(
    'el-hackathon-2025', 
    'elucidata_ai_challenge_data.h5', 
    path='./data'  # downloads the file into the local folder "./data"
)


In [18]:
!dir data

 Volume in drive C is Windows
 Volume Serial Number is C2B7-260F

 Directory of c:\Users\dallo\workspace\kaggle_competitions\data

02/04/2025  11:56    <DIR>          .
02/04/2025  11:56    <DIR>          ..
12/03/2025  17:34       333,866,720 elucidata_ai_challenge_data.h5
               1 File(s)    333,866,720 bytes
               2 Dir(s)  588,329,377,792 bytes free


In [11]:
config = {
    "n_rows": 500,
    "n_cols": 25
}

In [19]:
# Display spot table for Test slide (only the spot coordinates on 2D array)
with h5py.File("./data/elucidata_ai_challenge_data.h5", "r") as f:
    test_spots = f["spots/Test"]
    spot_array = np.array(test_spots['S_7'])
    test_spot_table = pd.DataFrame(spot_array)
    
# Show the test spots coordinates for slide 'S_7'
test_spot_table

# Load training data
with h5py.File("./data/elucidata_ai_challenge_data.h5", "r") as f:
    train_spots = f["spots/Train"]
    train_spot_tables = {slide_name: pd.DataFrame(np.array(train_spots[slide_name])) for slide_name in train_spots.keys()}

In [20]:
test_spot_table.Test_Set.value_counts()


Test_Set
2    1586
1     502
Name: count, dtype: int64

In [21]:
test_spot_table

Unnamed: 0,x,y,Test_Set
0,1499,1260,2
1,1435,1503,2
2,558,1082,2
3,736,1304,1
4,1257,1592,1
...,...,...,...
2083,736,639,2
2084,1016,684,2
2085,1181,839,2
2086,735,1436,1


# 3. Creating a Random Submission

In this section, we generate random predictions for the 35 cell types.  
The predictions are random floats between 0 and 2 (without any normalization) for each spot in the Test slide.  
The order of spots is preserved as in the test spots table.

In [33]:
import time
import random
import numpy as np
import pandas as pd
from kaggle.api.kaggle_api_extended import KaggleApi
def generate_permuted_submission(train_spot_tables, test_spot_table, seed=None):
    """
    Generate a submission DataFrame where each row is filled with a random permutation 
    of integers from 1 to 35 (or more generally, 1 to the total number of columns).
    The submission DataFrame will include an 'ID' column corresponding to each test spot.
    The output will have one row per entry in test_spot_table (2088 rows).

    Parameters:
        train_spot_tables (dict): For getting C1–C35 column names.
        test_spot_table (pd.DataFrame): Test spot table with 2088 rows.
        seed (int, optional): Random seed for reproducibility.

    Returns:
        pd.DataFrame: Submission DataFrame with random permutations for all rows.
    """
    import numpy as np
    import pandas as pd

    if seed is not None:
        np.random.seed(seed)

    # Get column names from the training table (assumes they start at the third column)
    all_columns = train_spot_tables['S_1'].columns[2:].tolist()
    total_columns = len(all_columns)
    
    # Use the index from test_spot_table (should be 2088 rows)
    indices = test_spot_table.index
    num_rows = len(indices)
    
    # Create a matrix where each row is a random permutation of numbers from 1 to total_columns
    matrix = np.array([np.random.permutation(np.arange(1, total_columns + 1)) for _ in range(num_rows)])
    
    # Build the submission DataFrame using all rows
    submission = pd.DataFrame(matrix, index=indices, columns=all_columns, dtype=int)
    
    # Add the 'ID' column based on the index
    submission.insert(0, 'ID', submission.index)
    
    # Optionally, reset the index if you prefer a clean numeric index
    submission.reset_index(drop=True, inplace=True)
    
    return submission




In [34]:


# Set the competition slug (update as needed)
competition_slug = "el-hackathon-2025"

# Number of submissions you want to generate and submit
num_submissions = 2

for i in range(num_submissions):
    # Generate a submission with an optionally randomized seed
    submission_df = generate_permuted_submission(train_spot_tables, test_spot_table, seed=np.random.randint(10000))
    
    # Save the submission file locally
    filename = f"submission_{i}.csv"
    submission_df.to_csv(filename, index=False)
    
    submission_df.head()
    # Prepare a submission message
    submission_message = f"Automated submission {i}"
    
    # Submit the file to the competition
    api.competition_submit(file_name=filename, message=submission_message, competition=competition_slug)
    print(f"Submitted {filename} to {competition_slug}")
    
    # Wait for a random time between 3 and 5 seconds before the next submission
    wait_time = random.uniform(3, 5)
    time.sleep(wait_time)


100%|██████████| 207k/207k [00:00<00:00, 303kB/s] 


Submitted submission_0.csv to el-hackathon-2025


100%|██████████| 207k/207k [00:00<00:00, 301kB/s] 


Submitted submission_1.csv to el-hackathon-2025


In [25]:
submission_df

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,...,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35
3,25,3,20,26,35,13,28,18,4,10,...,19,31,9,32,29,24,8,34,16,33
4,15,31,19,34,6,18,20,9,5,25,...,33,11,22,14,23,7,24,32,29,3
8,5,21,30,22,29,1,13,19,9,33,...,35,24,23,6,2,7,16,17,3,11
9,11,20,15,19,10,1,14,33,17,26,...,35,30,22,23,18,31,21,34,6,4
12,7,15,11,10,25,35,31,5,33,17,...,3,26,2,8,30,21,22,4,14,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2072,17,13,15,28,18,32,22,10,27,16,...,2,33,31,24,30,7,29,19,25,21
2078,16,35,23,21,6,12,14,2,8,15,...,11,3,25,27,20,30,13,17,19,29
2081,11,35,10,26,28,1,6,24,30,9,...,34,32,4,5,12,29,27,15,22,8
2082,1,25,10,6,2,24,30,15,9,21,...,18,19,26,17,8,14,28,27,33,20


# 4. Submission File Generation

Finally, we generate the submission file in the required format.  
Each row corresponds to a test spot with its identifier (constructed here as the index) followed by 35 predictions.

In [None]:
!touch ~/.config/kaggle/kaggle.json
!chmod 600 ~/.config/kaggle/kaggle.json

touch: cannot touch '/root/.config/kaggle/kaggle.json': No such file or directory
chmod: cannot access '/root/.config/kaggle/kaggle.json': No such file or directory
