[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crunchdao/quickstarters/blob/master/competitions/broad-3/quickstarters/random-submission/random-submission.ipynb)

![Cover](https://raw.githubusercontent.com/crunchdao/competitions/refs/heads/master/competitions/broad-3/assets/cover.png)

In [1]:
%pip install --upgrade crunch-cli

Collecting crunch-cli
  Downloading crunch_cli-5.7.0-py3-none-any.whl.metadata (3.1 kB)
Collecting packaging>=24.2 (from crunch-cli)
  Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Collecting redbaron (from crunch-cli)
  Downloading redbaron-0.9.2-py2.py3-none-any.whl.metadata (15 kB)
Collecting baron>=0.7 (from redbaron->crunch-cli)
  Downloading baron-0.10.1-py2.py3-none-any.whl.metadata (16 kB)
Collecting rply (from baron>=0.7->redbaron->crunch-cli)
  Downloading rply-0.7.8-py2.py3-none-any.whl.metadata (4.2 kB)
Collecting appdirs (from rply->baron>=0.7->redbaron->crunch-cli)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Downloading crunch_cli-5.7.0-py3-none-any.whl (103 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.9/103.9 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hDownloading packaging-24.2-py3-none-any.whl (65 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB

Get a new token: https://hub.crunchdao.com/competitions/broad-3/submit/via/notebook

In [6]:
!crunch setup --notebook broad-3 random --token nxUyCVZJwD8up77ZLXhd1ojtMYulv4kAIzhKJzrnuOTwIevqzBc5Eja5WAKgeQ3B

# To retrieve a larger dataset, include the --size large argument as shown below:
#!crunch setup --notebook --size large broad-3 hello --token aaaabbbbccccddddeeeeffff


---
Your token seems to have expired or is invalid.

Please follow this link to copy and paste your new setup command:
https://hub.crunchdao.com/competitions/broad-3/submit

If you think that is an error, please contact an administrator.


In [None]:
!pip install spatialdata

In [1]:
import spatialdata
import scanpy
import numpy
import pandas
import os
from skimage import io

In [2]:
import crunch
crunch = crunch.load_notebook()

loaded inline runner with module: <module '__main__'>


In [3]:
# In the training function, users build and train the model to make inferences on the test data.
# Your model must be stored in the `model_directory_path`.
def train(
    data_directory_path: str, 
    model_directory_path: str
):    
    # Loading scRNAseq data
    # The single-cell RNA sequencing (scRNA-seq) data provides gene expression data 
    # for 18,615 protein-coding genes from colon tissue samples with and without dysplasia.
    scRNAseq = scanpy.read_h5ad(os.path.join(data_directory_path, 'Crunch3_scRNAseq.h5ad'))
    
    # Loading Spatial Data
    # UC9_I.zarr contains H&E Image noncancerous mucosa (already provided in Crunch 1 and Crunch 2)
    sdata = spatialdata.read_zarr(os.path.join(data_directory_path, 'UC9_I.zarr'))
    
    # Load dysplasia-related files
    # These files include:
    # - HE: An H&E image of tissue regions exhibiting dysplasia
    # - HE_nuc: A nuclear segmentation mask
    # - region: An ROI mask indicating dysplastic vs. non-dysplastic regions of the tissue
    #
    # Using these images, you can extract additional spatial features and labels that may
    # be relevant for training or evaluating your model.      
    dysplasia_file = {
        # H&E image of tissue with dysplasia
        'HE': os.path.join(data_directory_path, 'UC9_I-crunch3-HE.tif'), 
        # Nucleus segmentation of H&E image
        'HE_nuc': os.path.join(data_directory_path, 'UC9_I-crunch3-HE-label-stardist.tif'),
        # Regions in H&E image highlighting dysplasia and non-dysplasia
        'region': os.path.join(data_directory_path, 'UC9_I-crunch3-HE-dysplasia-ROI.tif')
    }     
        
    # Read the dysplasia-related images and store them in a dictionary
    dysplasia_img_list = {}
    for key in dysplasia_file:
        dysplasia_img_list[key] = io.imread(dysplasia_file[key])
        
    # TODO Add your training code here and save the trained model to the specified model_directory_path.
    

In [4]:
# In the inference function, the trained model is loaded and used to make inferences on a
# sample of data that matches the characteristics of the training test.
def infer(
    data_file_path: str,
):
    data_path = os.path.dirname(data_file_path)
    
    # Load the list of genes to predict                 
    gene_list = pandas.read_csv(os.path.join(data_path, 'Crunch3_gene_list.csv'))        
    gene_names = gene_list['gene_symbols']
    
    # The intended goal is to rank all 18,615 protein-coding genes based on their ability 
    # to distinguish dysplasia from noncancerous mucosa regions, assigning them ranks 
    # from 1 (best discriminator) to 18,615 (worst).
    #
    # Currently, we generate a random permutation of gene names as a placeholder.
    # Replace the logic below with actual model inference:
    # 1. Load the trained model from the model directory.
    # 2. Use the model to score and rank the genes accordingly.
    # 3. Return the predicted ranking as a DataFrame.
    
    prediction = pandas.DataFrame(
        numpy.random.permutation(gene_names), 
        index=numpy.arange(1, len(gene_names) + 1),
        columns=['Gene Name']
    )
       
    return prediction

In [9]:
# This command is running a local test with your submission
# making sure that your submission can be accepted by the system
crunch.test(
    no_determinism_check=True,
)

20:54:32 no forbidden library found
20:54:32 
20:54:33 started
20:54:33 running local test
20:54:33 internet access isn't restricted, no check will be done
20:54:33 
20:54:33 starting spatial loop...
20:54:33 call: train


data/UC9_I-crunch3-HE-label-stardist.tif: download from https:crunchdao--competition--production.s3.eu-west-1.amazonaws.com/data-releases/98/UC9_I-crunch3-HE-label-stardist.tif (23778030 bytes)
data/UC9_I-crunch3-HE-label-stardist.tif: already exists, file length match
data/Crunch3_gene_list.csv: download from https:crunchdao--competition--production.s3.eu-west-1.amazonaws.com/data-releases/98/Crunch3_gene_list.csv (122177 bytes)
data/Crunch3_gene_list.csv: already exists, file length match
data/Crunch3_scRNAseq.h5ad.zip: download from https:crunchdao--competition--production.s3.eu-west-1.amazonaws.com/data-releases/98/Crunch3_scRNAseq.h5ad.zip (924652673 bytes)
data/Crunch3_scRNAseq.h5ad.zip: already exists, file length match
data/Crunch3_scRNAseq.h5ad.zip: already uncompressed, marker is present
data/UC9_I.zarr.zip: download from https:crunchdao--competition--production.s3.eu-west-1.amazonaws.com/data-releases/98/UC9_I.zarr.zip (927207232 bytes)
data/UC9_I.zarr.zip: already exists, f

  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
20:54:49 save prediction - path=data/prediction.parquet
20:54:49 duration - time=00:00:16
20:54:49 memory - before="1.04 GB" after="1.63 GB" consumed="592.95 MB"


FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/fn/mtvxtck11kzdss39fl63ffdr0000gn/T/tmpirvnxd_4/file.parquet'

Now remember to download this notebook and then submit it at https://hub.crunchdao.com/competitions/broad-3/submit/