Skip to content

Conversation

@Gautzilla
Copy link
Contributor

@Gautzilla Gautzilla commented Sep 22, 2025

🐳 What's new?

This PR aims at lightening the job module in order to facilitate the configuration of servers created by each task.
It also allows to export Core API datasets on datarmor through jobs.

🐳 How to use it??

🐬 Public API

Simply set the requested server config in a JobConfig instance, and attach a JobBuilder with the specified config to the public API Dataset:

from osekit.utils.job import JobConfig, JobBuilder
from osekit.public_api.dataset import Dataset

dataset = Dataset(...) # See the Dataset documentation

job_config = JobConfig(
    nb_nodes=1, # Number of nodes on which the job runs
    ncpus=28, # Number of total cores used per node
    mem="60gb", # Maximum amount of physical memory used by the job
    walltime=Timedelta(hours=5), # Maximum amount of real itime during which the job can be running
    venv_name=os.environ["CONDA_DEFAULT_ENV"], # Works only for conda venvs
    queue="omp" # Queue in which the job will be submitted
)

dataset.job_builder = JobBuilder(
    config=job_config,
)

# Now the dataset has a non-None job_builder attribute,
# running an analysis will write a PBS file in the logs directory
# and submit it to the requested queue.

dataset.run_analysis(...) # See the Analysis documentation

🐬 Core API

For exporting Core API datasets through jobs, the Job instances must be created manually, with a link to the export_analysis script and a specified list of arguments for this script.

Here is the example from the doc:

import os

from osekit.core_api.spectro_dataset import SpectroDataset
from osekit.core_api.audio_dataset import AudioDataset
from osekit.utils.job import JobConfig, Job

# Some Public API imports are required
from osekit.public_api.analysis import AnalysisType
from osekit.public_api import export_analysis

ads = AudioDataset(...) # See the AudioDataset doc
sds = SpectroDataset(...) # See the SpectroDataset doc

# We must specify the folder in which the files will be exported
# This is an example with both audio and spectro exports.
ads.folder = Path(...)
sds.folder = Path(...)

# Datasets must be serialized
ads.write_json(ads.foler/"output")
sds.write_json(sds.foler/"output")

# Export specifications
# All parameters are listed in this example, but all parameters other than analysis have default values
args = {
    "analysis": (AnalysisType.AUDIO|AnalysisType.SPECTROGRAM).value,
    "ads-json": ads.foler/"output"/f"{ads.name}.json",
    "sds-json": sds.foler/"output"/f"{sds.name}.json",
    "subtype": "FLOAT",
    "matrix-folder-path": "None", # Folder in which npz matrices are exported
    "spectrogram-folder-path": sds.folder/"output", # Folder in which png spectrograms are exported
    "welch-folder-path": "None",  # Folder in which npz welch matrices are exported
    "first": 0, # First data of the dataset to be exported
    "last": -1, # Last data of the dataset to be exported
    "downsampling-quality": "HQ",
    "upsampling-quality": "VHQ",
    "umask": 0o022,
    "tqdm-disable": "False", # Disable TQDM progress bars
    "multiprocessing": "True",
    "nb-processes": "None",  # Should be a string. "None" uses the max number of processes, otherwise e.g. "3" will use 3.
    "use-logging-setup": "True", # Call osekit.setup_logging() before exporting the dataset.
}

# Job and server configuration
job_config = JobConfig(
    nb_nodes=1,
    ncpus=28,
    mem="60gb",
    walltime=Timedelta(hours=1),
    venv_name=os.environ["CONDA_DEFAULT_ENV"],
    queue="omp"
)

job = Job(
    script_path = Path(export_analysis.__file__),
    script_args=args,
    config=job_config,
    name="test_job_core",
    output_folder=Path(...), # Path in which the .out and .err files are written
)

# Write the PBS file and submit the job
job.write_pbs(Path(...) / f"{job.name}.pbs")
job.submit_pbs()

🐡 Outro

I'm not sure the documentation is clear enough, please tell me if this is too clumsy to use!!

CarpentierQuinquinGIF

@Gautzilla Gautzilla self-assigned this Sep 22, 2025
@Gautzilla Gautzilla added the job monitoring Work related to monitoring of the jobs label Sep 22, 2025
@Gautzilla Gautzilla changed the title [DRAFT] Job rework Job rework Oct 31, 2025
@Gautzilla Gautzilla marked this pull request as ready for review October 31, 2025 10:08
Copy link
Contributor

@mathieudpnt mathieudpnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to change default value last in script_args from Job instance

@Gautzilla
Copy link
Contributor Author

need to change default value last in script_args from Job instance

It should be solved with the last commit, do I have your green light?

image

@mathieudpnt mathieudpnt self-requested a review November 18, 2025 15:58
mathieudpnt
mathieudpnt previously approved these changes Nov 18, 2025
Copy link
Contributor

@mathieudpnt mathieudpnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image looking good !

@mathieudpnt mathieudpnt self-requested a review November 18, 2025 16:17
@mathieudpnt mathieudpnt merged commit 85f3c75 into Project-OSmOSE:main Nov 18, 2025
2 checks passed
@Gautzilla Gautzilla deleted the job-rework branch November 18, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

job monitoring Work related to monitoring of the jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants