Skip to content

User_Building an Ingestion Program for a Competition

Adrien Pavão edited this page Feb 2, 2023 · 20 revisions

Ingestion Program

On the left (blue) is the organizer. They supply an ingestion program and a scoring program. On the right (red) is the participant. They supply result or code (or even data). For a simple use case, see the Iris competition.

What is an ingestion program?

An ingestion program is a piece of code, which is executed when a challenge participant makes a submission. It "ingests" it and execute "something" to help processing it. There are several possible use cases:

  1. Parsing the submission and deciding how to process it, e.g. the organizers may allow the participants to submit either results, code or data.
  2. Allowing submission of source code or libraries. The ingestion program can then call functions supplied by the participants and executed them on the input data. Advantage: the input data is read by the same reader for everybody, the participants are not penalized if their code fails to read the data and/or if it takes time to read the data. Other advantage: the organizers can run cross-validation experiments with not possible cheating of the participants.
  3. Allowing time series predictions, active learning or query learning. The ingestion program can "serve" data on demand to the code supplied by the participants. In fact, the ingestion program can even generate artificial data, if needed!

However you may not need it, read on:

Result submission challenges

If you are organizing a challenge with RESULT submission (no participant-supplied code executed on the challenge platform), you should not supply an ingestion program. The participants should submit a zip file with prediction result and NO metadata file.

Code submission challenges

Submission of executables

If your participants must supply executables, you do not necessarily need to supply an ingestion program. The challenge platform will execute any submission that comes with a metadata file. This file need to include the command to be executed, e.g.:

command: python $program/run.py $input $output

Submission of libraries or source code

You must supply a so-called "Ingestion Program" that reads data and participants' submissions if you ask the participants for instance to supply a python class model.py that is NOT executable. Your ingestion program will then read the data and call that class to train and test the predictive model. We provide an example of ingestion program for the Iris challenge.

Execution priority

The following logic is implemented:

  • If participant submission has no "metadata" file:
    • treat the submission as a result submission and forward it to the scoring program
  • else: # treat the submission as a code submission
    • If the organizers did NOT provide an ingestion program:
      • execute the code submission of the participants (according to the command in its metadata file)
    • else: # organizer-supplied ingestion program
      • execute the ingestion program (via its metadata command)
      • execute simultaneously the code submission of the participants (if there is a command in its metadata file)

If an ingestion program is supplied by the organizers and the code of the participants is executable, this allows both codes to be run simultaneously and exchange data (input data from the ingestion program to the participant's program and results the other way around). This happens via the $shared directory.

This feature can help implement competitions in which data is not provided all at once to the code of the participants. This includes implementing:

  • cross-validation
  • time series prediction
  • on-line learning
  • active or query learning
  • iterative experimental design
  • reinforcement learning

Arguments

The following arguments are available to the various programs. All arguments are DIRECTORIES.

ingestion program:

command: python $ingestion_program/test.py $ingestion_program $input $output $hidden $shared $submission_program
  • $ingestion_program directory where the ingestion program is located.
  • $input input data directory.
  • $output output directory (where predictions are written).
  • $hidden reference data directory.
  • $shared directory shared with the participant's code (which is executed simultaneously).
  • $submission_program directory of the code being ran -- if this is during the scoring phase, it will be the scoring program

participant's submission:

command: python $program/code.py $program $input $output $shared $submission_program
  • $program directory of the submitted code.
  • $input input data directory.
  • $output output directory (where predictions are written).
  • $shared directory shared with the participant's code (which is executed simultaneously).
  • $submission_program directory of the code submitted by the participants.

scoring program:

command: python $program/score.py $input $output
  • $program directory of the scoring program.
  • $input input data directory. It contains 2 subdirectories ref/ and res/ containing the solutions and the predictions respectively.
  • $output output directory (where scores are written).
  • $hidden hidden reference data directory, only available if ingestion is ran during scoring program

A simple test example is provided in the Yello World competition.

Clone this wiki locally