-
Notifications
You must be signed in to change notification settings - Fork 0
Code style guide
Maksim Buslovaev edited this page Feb 12, 2025
·
1 revision
The moving from a focus on research to a focus on exploitation means the transition from writing code to reading it.
The code in a Jupyter notebook is the language of communication. Everything should be clear to the last word. Most importantly, our notebooks (substeps) are a means of communication. This is the basis of code transferability. This is a kind of living knowledge about the subject area. This is the best documentation. We do not apply such requirements to utils.
Example. If one came and read the name of some function or variable in a notebook, and in 1 minute one did not understand its meaning and purpose, then the code is unfit.
- Make a notebook simple and expressive.
- Use Snake Case for variables and functions and Upper Camel Case for classes. Use uppercase variables for parameters.
- Notebook names should be verbs.
- Put all notebook-wide libraries into so called imports cell in a substep. It should be located under the cell with substep.interface. If import statement is used only for one cell, it is possible to leave it in that cell. At the same time, Python imports from Sinara lib remain at their place of use. The prints of the library versions should be where the imports are.
- Example. import os is widely used and thus will go to the imports cell.
- Example. from sinara.spark import Spark is left at the place of use as it goes in the examples.
- Prefer long names to short and obscure ones. Abbreviations should be used only for well-known cases.
- Import classes of well-known libs by using Python aliases with lib-specific prefixes (as statement). It can be with a prefix and an underscore.
- Example.
from mmengine.config import Config as MmengineConfig- Example.
from mmengine.runner import set_random_seed as mm_set_random_seed
- Use substep.interface to transfer data (entities) inside a step (between substeps) and between steps.
- Example. Entities that looks like a config file with parameters are suspicious entities. Think about it, it's probably an anti-pattern.
- Example. You need to transfer data between substeps inside step via tmp_inputs/tmp_outputs. To work with temporary data inside substep, you need to use tmp_entities.
- Use immutable entities in the interface, as well as immutable pipeline_params, step_params, substep_params and other parameters, variables across steps/substeps. It is necessary to strive not to change them anymore after the first initialization. Ideally, we change variables within a single cell. And then we do not change.
- Example. It is forbidden to drag one entity through the pipeline and change it repeatedly. This is an anti-pattern.
- Declare variables at the place of their application.
- Avoid creating new terms and entities (with the exception of Sinara entities). And if they are suddenly needed, then when describing the code and variables, we turn to the official SinaraML glossary or to the glossary of the framework libraries used.
- Example. The concepts of external/internal storage and other self-made concepts are prohibited. You need to use SinaraStorage and Sinara tmp_entities.
- All variables must be unambiguous. We minimize the number of terms in the code and don't call the same entity by different names.
- Example 1. Instead of an abstract dataset, we mean a specific dataset, for example, train_coco_dataset.
- Example 2. For mmcv, there should not be just the word config, it should be of type mmcv_config.
- Example 3. Such variables as data_url, data, config, url are forbidden to be used
- Example 4. Do we use inference instead of predict in a notebook? Yes. But the function with the name predict is fine. Predict can be also replaced with detect.
- Don't use model term as it is usually ambiguous.
- Example. Once dealt with weights, call it weights instead of model. Depending on context, use exact term like obj_detector, model_service and so on.
- Get away from the spaghetti code. To understand (safely change the code) one cell of the spaghetti code, you need to read (make a change) in the entire substep, the entire step or the entire pipeline.
- Move the code that is not needed in editing for ML pipeline operation to utils (release of new versions of the model services).
- Example. Function checking_CUDA_is_avaiable goes to utils.
- Use a standalone notebook (not included as substep in step_params.json) for serious troubleshooting.
- Prefer linear code in notebook (avoid loops and conditions).
- Use lines that is not longer than 120 symbols. If it is longer than 120, split it into several strings.
- Put the calculated path value in a variable before inserting a complex structure somewhere, when working with os.join, shutil.copy and so on. Rely on the criterion of non-obviousness: the length of the string is more than 90 symbols and there are internal operations inside the brackets (join, concatenate).
- Example.
src_test_image_file_name = osp.join(mmengine_cfg.val_dataloader.dataset.data_prefix.img, val_coco_annotations["images"][0]["file_name"])
assert osp.exists(src_test_image_file_name)
test_image_file_extension = Path(src_test_image_file_name).suffix
dst_test_image_file_name = osp.join(tmp_entities.obj_detect_inference_files, f"test{test_image_file_extension}")
shutil.copy(src_test_image_file_name, dst_test_image_file_name)