All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
1.0.0 - 2024-10-14
- Add configurable builder timeout through the env var
IMAGE_BUILD_TIMEOUT
(#992) - Check if function is linked with compute plans (through the compute tasks) before building. If all compute plans have been cancelled or failed, cancels the function. (#997)
- django commands
sync_users
,sync_incoming_organizations
andsync_outgoing_organizations
that can add, update and remove corresponding entities. (#999) - Celery soft timeout on
SaveImageTask
and env variable (IMAGE_SAVING_TIMEOUT_SECONDS
) to customize the timeout (#1003)
- replaced
django-celery-results
by a connexion to the existing redis instance. (#995) - Compute plans now starts when first compute task status transition to building instead of executing. (#998)
tqdm
dependency, only used in logging (#990)
0.48.0 - 2024-09-12
- Propagate function profiling step through the orchestrator
- Create
FunctionProfilingStep
model in the API - Create
FunctionProfilingStep
in orchestrator.resources` representing the different profiling step for the function - Save in the local DB when receive the event
ASSET_PROFILING_STEP
(#886)
- Create
- Views returning the function duration (#949)
get_orchestrator_client
moved fromsubstrapp.orchestrator
toorchestrator
(#886)- Function images are only downloaded if they do not already exist on the local backend when a task is launched. (#934)
- Disable never expiring users
BearerToken
(#969)
- Casting when receiving
FunctionProfilingStep
from the orchestrator (#932) - Improved resilience to docker image deletion (in particular when plugged in to an external registry). (#934)
- Function durations received from the orchestrator are now correctly saved as milliseconds instead of seconds (#949)
tqdm
andpython-dxf
versions have been pinned. (#957)- Pin
pyopenssl
to version24.1.0
to avoid deprecation warnings (#958) - Celery was failing silently when a task called
FailableTask.on_failure
if the task didn't have alogs
attribute (now return internal error) (#970) - `OrganizationHttpError.status_code` is now always getting a value properly (#982)
argh
dependency has been removed. (#957)
0.47.0 - 2024-06-11
- Repackage Harbor 412 errors to a user-friendly message. (#926)
- SSL connection errors to the registry are now retried. (#917)
- All exceptions raised during the build of an image are caught and properly handled. (#911)
- Bypass
AssetFailureReport
creation whenerror type isERROR_TYPE_INTERNAL
. It fixes errors when the backend receives an internal error from the orchestrator (without log address/checksum, as they are not provided in this scenario) (#924) - Errors when saving the image to the registry are properly handled. (#926)
- Bubble up `BuildError` in the builder (#927)
0.46.0 - 2024-06-03
- Add privateCa to Kaniko pods (#877) (#877)
- Parametrize user docker repository for user docker images through environment variable. (#888)
- Resources to Kaniko pods and computeTask pods (#892)
- Profiling step
download_function
(#1520)
- Requests to get Docker manifest now uses manifests in version 2 (#862)
- substra-backend docker image is now based on Ubuntu 23.10 (#866)
MinIO
is now the default S3 storage, instead oflocalstack
. (#878)
- Dxf authenticator now uses content of
dockerconfig
to login (#862)
0.45.0 - 2024-03-27
- BREAKING: remove
type
fromdatamanager
(#860) - [chore]
towncrier
is now used for changelog management (#861)
0.44.0 2024-03-07
- Ensure that we always use
safezip
andtarsafe
to avoid path traversal attack (#845) - Add timeout on all OIDC requests (value set with
OIDC_TIMOUT
, and default toHTTP_CLIENT_TIMEOUT_SECONDS
) (#846) - Check if builder is enabled during function registration (#847)
- Compute task status
DOING
is renamedEXECUTING
(#827) - Compute plan statuses
EMPTY
,WAITING
ansdTODO
are merged into new statusCREATED
(#827)
worker
does not wait the cache to be pre-populated to start (#831)- Remove docker pull configuration from the
worker
used by kaniko (#831)
0.43.0 2024-02-26
- Add
Image checksum
andImage storage address
toFunction
(#739) - Add checks on built
Image checksum
(#739) - Field
asset_type
onAssetFailureReport
(based on protobuf enumorchestrator.FailedAssetKind
) (#727) - Celery task
FailableTask
that contains the logic to store the failure report, that can be re-used in different assets. (#727) - Add
FunctionStatus
enum (#714) - BREAKING: Add
status
onapi.Function
(typeFunctionStatus
) (#714) - Tests to ensure build order is made in order of submission (including retries) (#740)
- BREAKING: Added compute task statuses
STATUS_WAITING_FOR_BUILDER_SLOT
andSTATUS_BUILDING
inSTATUS_WAITING_FOR_PARENT_TASKS
reflecting current task status (#823)
- Rename
function.function
tofunction.archive
(#739) - Refactor image transfer and add tests (#775)
ComputeTaskFailureReport
renamed inAssetFailureReport
(#727)- Field
AssetFailureReport.compute_task_key
renamed toasset_key
(#727) - Docker images are no longer removed at the end of a compute plan (#777)
- Rename
LEDGER_CHANNEL
toCHANNELS
andLEDGER_MSP_ID
toMSP_ID
(#794) - Propagate the creation of
AssetFailureReport
to all backends, so that logs are accessible from all backends (#807) - BREAKING: Rename
STATUS_WAITING
andSTATUS_TODO
inSTATUS_WAITING_FOR_PARENT_TASKS
andSTATUS_WAITING_FOR_EXECUTOR_SLOT
(#823) - Rename all instances of
COMPUTE_TASK_LOGS_STORAGE_ADDRESS
toASSET_LOGS_STORAGE_ADDRESS
andCOMPUTE_TASK_LOGS_BUCKET_NAME
toASSET_LOGS_BUCKET_NAME
(#825)
- Remove mention of
chaincode
afterdistributed mode
deprecation (#795) - BREAKING: remove
distributed
Skaffold profile #768 - Remove
wait_for_image_built
as he logic for changing status toTODO
has been moved to the orchestrator and set only when the function is built (#819)
- Convert asset uuid to str in
FailedAssetLogsViewSet
#804 - Organisations which are not function owner loads function image only once (#833)
0.42.2 2023-10-18
- Update charts.
0.42.1 2023-10-06
three_orgs
Skaffold profile for standalone orchestrator (#733)
- Pre-populate Skaffold deployment with substra-tools base docker image (not minimal) (#736)
settings.DEBUG
is notTrue
by default.
0.42.0 2023-09-08
- Update to pydantic 2.3.0 (#718)
0.41.0 2023-09-07
- New
SECRET_KEY
optional environment variable (#671) /api-token-auth/
and the associated tokens can now be disabled through theEXPIRY_TOKEN_ENABLED
environment variable andserver.allowImplicitLogin
chart value (#698)- Tokens issued by
/api-token-auth/
can now be deleted like other API tokens, through aDELETE
request on the/active-api-tokens
endpoint (#698) - A standalone Docker image for updating Private CA Certificates. (#726)
- Increase the number of tasks displayable in frontend workflow #697
- BREAKING: Change the format of many API responses from
{"message":...}
to{"detail":...}
(#705)
- BREAKING:
SECRET_KEY_PATH
andSECRET_KEY_LOAD_AND_STORE
environment variables (#671) - Removed logic for storing
SECRET_KEY
at startup, in order to increase stability; it should be done at a higher level, i.e. the chart (#671)
/api-token-auth/
sometimes handing out tokens that are about to expire (#698)
0.40.0 2023-07-25
- Remove pagination on
get_performances
to remove limitation on 1000 first points (#690) - Update or create the step profiling, instead of raising an error if already exists (#691)
- New UserAwaitingApproval (base user with no channel) (#680)
0.39.0 2023-06-27
- New
SECRET_KEY_PATH
andSECRET_KEY_LOAD_AND_STORE
environment variables (#668)
0.38.0 2023-06-12
- BREAKING: Support for multiple API tokens with expanded functionality (#639)
- BREAKING: database backend now defaults to
backend_{ORG_NAME}
rather thansubstra
in all cases (this was already the production setting) ORG_NAME
now defaults todefault
rather than being mandatory (this is always overridden in the chart)
- references to
substra
cli commands inlocaldev.md
(#667)
0.37.0 2023-05-11
- Performance is now unique regarding a ComputeTaskOutput and a Metric (#634)
- BREAKING:
TaskProfiling
do not usecreate_or_update
but retry withPUT
if error is409
(#636)
- Raise a serializable Exception so that CeleryRetryError won't crash (#641)
- Do not retry on non-timeout build errors (#641)
- Metric from Performance (#650)
0.36.1 2023-04-21
- Catch all exception in
get_pod_logs
and always return a string containing either logs, or the reason we couldn't get logs (#637) redis
dependency formetric-exporter
(#640)- Skaffold
monitoring
profile (#640)
- Increase the
max_attempts
inwatch_log
to allow kaniko pods to take longer to start (#637)
- Add function name to workflow view & ordering on functionName in cpTasks list (#635)
0.36.0 2023-03-31
- Add filters to performances export (#590)
- Filter warnings in
pyproject.toml
to follow deprecation messages inpkg_resources
(#612) - Prefetch
function__inputs
,function__outputs
inComputeTaskViewSet
(#613) - Prefetch
inputs
,outputs
,inputs__asset
,outputs__assets
,function__inputs
andfunction__outputs
inCPTaskViewSet
(#613) - Add
ComputeTaskWithDetailsSerializer
as a full-view serializer (including inputs and outputs) (#613) - Prefetch
outputs
in_PerformanceMetricSerializer
(#611) - Index on
DataManager.channel
(#607) - Prefetch on
DataManager.data_samples
(#607) - OpenID Connect integration (#609)
- order of
data_sample_key
in tests (#607)
- BREAKING: rename Algo to Function (#573)
- BREAKING: List views of
ComputeTask
objects no longer include theComputeTaskInput
not theComputeTaskOutput
objects (#613) - Rename fields in export perf csv (#593)
- Durations in task profiling formatted in microseconds instead of 'DD hh:mm:ss.uuuuuu' (#598)
- Loop through assets only once in
get_exec_command_args
(insubstrapp
)(#615)
- filter warnings in
pyproject.toml
for previous deprecation warning inrest_framework_simplejwt
(#612) - model
TaskDataSample
and fieldsComputeTask.data_samples
/DataManager.compute_tasks
(#614) data_samples_keys
in_PerformanceComputeTaskSerializer
(#611)
0.35.1 2023-02-16
IncomingOrganization
password hashed twice when hasher algorithm was updated (#592)
- BREAKING: asset values in compute task inputs/outputs. (#509)
0.35.0 2023-02-06
- Contributing, contributors & code of conduct files (#563)
- Skaffold default example configuration (#570)
- Skaffold spawning instances in
default
namespace (#574)
- Test only field for data samples (#551)
- BREAKING: failed task category from compute plan API response(#525)
0.34.1 2023-01-09
- Update or create the task profiling or step profiling, instead of raising an error if already exists (#559)
parent_tasks
inComputeTask
(#554)
0.34.0 2022-12-19
- Pass the rank of a task in a task properties dictionary, send in a
--task-properties
argument (#548)
0.33.0 2022-11-22
- Add
create
api from model view to submit compute task output. - Add role filter to users list.
- Endpoints to list task input/output assets
- "Kind" filters on task input and ouput assets endpoints.
- Return train_data_sample_keys and test_data_sample_keys fields in data manager "list" API responses
- Extract inputs and outputs from db to build the workflow_graph response.
- Add compute task category unknown value
- Improved validation at user creation.
- Compute task outputs/inputs valid storage address.
- Signature cookie expiration date (#540)
- Algo creation events aren't included in newsfeed anymore.
- Remove task category from the compute engine.
- BREAKING: compute task specific endpoints.
- BREAKING: category related fields to create task.
- BREAKING: compute task specific data
- BREAKING: task category
- Container image build error logs are now saved to the DB and served via the API.
- Replace
algo
byalgo_key
in gRPC communication
- Allow configuring gRPC keepalive
- output identifier add in metric response object in
compute_plan perf
view. - Prevent use of
__
in asset metadata keys - Task input asset
- Accept
CSRF_TRUSTED_ORIGINS
env var as settings option - Generic task endpoint to list/retrieve tasks
- Add inputs and output kinds in the workflow_graph response
- Bug in migration 0028_data_migration_compute_task_output.
- BREAKING: model categories
- BREAKING: algo categories
- Outdated information on patching RabbitMQ for Apple Silicon chips.
- Don't use the
status
field of the compute plan protocol buffer messages. - BREAKING: remove
delete_intermediary_models
field from thecompute_plan
view. - Use task output asset in API response
- Add channel column to input/output tables.
- The test task uses the same CLI arguments format as the other tasks.
- Bug when executing compute tasks with too many data samples (command line length exceeding max.) The substra-tools arguments are now passed using a file.
- CI job to test forward migrations.
- BREAKING! Rename
localrep
app toapi
, seeUPGRADE.md
for migration. - Accept
ALGO_UNKNOWN
as a valid algo category.
- Stop generating algo method to execute from task category as it is now passed within substra cli
- Expose a default value (
dev
) for the backend version on the/info
endpoint
- BREAKING: allow registration of users with role from the API and reset password mechanism
- Add address and permissions for
inputs
of kind datamanager and model in compute_tasks api response - Synchronize compute task output assets into localrep
- Add
compute_task_key
filter on the model view.
- Enable gRPC keepalive
- Increase limit for tasks displayed in CP workflow graph to 1000
- Periodic celery task to queue pending tasks
- Local folder support
- Saving of tasks timings
- Add CP default status at creation
- Add method to update algo, compute_plan and data manager name
- BREAKING: Replace celery RabbitMQ by Redis
transient
field to task outputs- Include tasks output models and performances in output fields in task endpoint response
- Use gRPC stream to listen to events
- Update orchestrator protobuf definitions
- Pass output identifier when registering assets
- BREAKING: Orchestrator RabbitMQ connection to listen to events
- The /task/bulk_create/ endpoint now accepts the "inputs" field
- Category filter from /algos/ route
- Legacy compute task permission fields
- Removed invalid metric asset kind from newsfeed
- Add missing compute task outputs data migration
- CP Performance endpoint returns array of performances in "results" instead of dict
- Rule-based logic for compute task output permissions. Instead, permissions are now explicitly provided by the end-user
- The "out_trunk_model_permissions" field was removed from the /task/bulk_create/ endpoint. It is superseded by the "outputs" field.
- Scheduler worker now processes predict tasks too
- Compute plan status is now correctly set to CANCELED
- Local representation of task outputs
- Local representation of task inputs
- Compute tasks returned by the API now have the "outputs" field
- Compute tasks returned by the API now have the "inputs" field
- The /task/bulk_create/ endpoint now accepts the "outputs" field
- Compute plan ranks and round_idx list to cp perf view
python manage.py get_outgoing_organization
can now be used to check inter-org connection.
- BREAKING: Removed the
metric_keys
property of test tasks in favor of the genericalgo
field.
- Execute predict tasks.
- Backend hostname to the organization view
- Compute plan workflow graph endpoint handles predict tasks
- Don't raise an error when retrying certain orchestrator requests - @grpc_retry
- Filtering on compute tasks metadata
- View to build task workflow graph
- arm64 stage in the dockerfile to install dependencies and build psycopg2 from source
- Filtering and ordering on duration in CPs ans Tasks views
- deprecated metadata was used during event processing
- datamanagers' and metrics' storage_addresses in task responses
- CP localrep sync issues resulting in missing CP duration
- Removed the search parameter from API
- Return HTTP 413 "Payload Too Large" when the orchestrator gRPC payload exceeds max size
- failure_report KeyError when updating compute task (localrep sync)
- password displayed when running django commands to add users
- Filtering on compute plan metadata
- use tasks from event for task processing
- set task status to doing in compute task task
- remove the
prepare_task
Celery task
- Delete metrics endpoints; use algo endpoints instead
- BREAKING: drop the s3 upload feature
- BREAKING: rename node to organization
- Duplicated datasamples
- Support for predict Algos
- Hardcode task outputs
- Cross asset filtering by key on asset views
- Health service for the event app
- (BREAKING) New register tasks endpoint
- Use assets embedded in events during sync
- Full resync: also resync metrics when syncing algos
- CP name properly included in newsfeed items
- Clean up
taskdir
before adding assets to it - Only run
remove_directory_contents
when the directory exists
- Empty compute plan status
- Possibility to filter only important news in newsfeed view
- (BREAKING) Store enum values as string (impact ordering) - Model
- Within a CP, allow image builds to occur concurrently with another task's execution
- (BREAKING) Store enum values as string (impact ordering) - ComputePlan ComputeTask
- Add algo inputs and outputs
- Mount GPU manually in GKE to be able to share them
- ComputePlanMetadataViewSet to list all existing compute plans metadata
- Filtering on timestamp for newsfeed
- View to export performances as csv file download
- (BREAKING) Store enum values as string (impact ordering) - Algo
- (BREAKING) Add mandatory name field to compute plan
- Use the algo checksum as image tag
- Completed pod state in
kubernetes_utils.watch_pod()
- Merge Metric and Algo view and model , see UPGRADE.md for migration
- (BREAKING) feat!: Get compute plan key from the request data
- Add django-filters on datasample view and define custom IN lookup filters for all views
- Filtering on permissions for algos and datamanagers
- (BREAKING) Pass named inputs/outputs to algos
- Set default pagination page size and max page size
- Use standard ModelViewSet to handle DataSample list and retrieve calls
- Newsfeed now returns items for metric, algo and datamanager creation
- DataSamples for Localrep ComputeTask are foreign keys
- Removed MELLODDY-specific code
- Filtering using django-filters
- Create ModelRep DB instance when registering model in the orchestrator
- End date for failed CP
- Prevent 301 redirects when downloading failure reports
- Add SETFCAP capability to kaniko image builder to avoid build issue on security.capability
- (BREAKING) Fix cancelled CP inconsistencies
- Cancel CP should return json
- Uuid substring collision made tests randomly fail
- Drop support for
DEBUG_QUICK_IMAGE
- Set log level to DEBUG in dev
- Remove Orchestrator{Aggregate, CompositeTrain, Test, Train}TaskSerializer and use orchestrator client directly
- Handle error values in mapping functions
- Remove
single-snapshot
in kaniko build option
- Handle disabled model in resync by making address not mandatory
- Store computetasks logs address and owner in localrep
- Optimize computetask list queryset
- Full-text search in CP name
- Local representation of node assets
- Retrieve files to download permissions and storage address from localrep
- Add API endpoint to serve all performances of a given compute plan
- Extra ordering options for compute plans and tasks
- Full text search in all assets names and keys
- Make possible to start the backend-server without orchestrator connection available
- Use the substra-tools image 0.10.0 in the example yaml files
- Return 410 error for all attempts at file download in isolated mode
- Removed unused description files cache for remote assets
- Use standard ModelViewSet to handle Metric list and retrieve calls
- Use standard ModelViewSet to handle Algo list and retrieve calls
- Removed unused create_or_update_model method
- Use standard ModelViewSet to handle DataManager list and retrieve calls
- Use standard ModelViewSet to handle ComputeTask list and retrieve calls
- Use standard ModelViewSet to handle Model list and retrieve calls
- Add field compute plan name in NewsFeedViewSet
- Use standard ModelViewSet to handle ComputePlan list and retrieve calls
- Remove OrchestratorAlgoSerializer and use orchestrator client directly
- Remove OrchestratorDataManagerSerializer and use orchestrator client directly
- Disabled models do not expose an address
- Increase max page size default
- Remove OrchestratorDataSampleSerializer and OrchestratorDataSampleUpdateSerializer and use orchestrator client directly
- Remove OrchestratorMetricSerializer and use orchestrator client directly
- Remove OrchestratorModelSerializer
- Remove OrchestratorComputePlanSerializer and use orchestrator client directly
- Dev conf uses latest substra-tools image
- Disable model in localrep
- Compute CP dates after updating related tasks
- Compute task stays doing forever if saving the model raises an OSError
- Compute CP dates before updating CP status
- W340 null has no effect on ManyToManyField during migrations
- Clear the asset buffer When the disk is full
- Add API endpoint to serve failed compute task logs
- In the API, allow filtering events by timestamp
- Local representation of datamanager assets
- Add task category in news feed view
- Local representation of datasample assets
- Local representation of computeplan assets
- Local representation of computetask assets
- Add
ORCHESTRATOR_RABBITMQ_ACTIVITY_TIMEOUT
to restart event app if no activity - Exponential backoff of celery tasks retry
- Compute CP tasks count and status from localrep data
- Use localrep data in CPAlgoViewSet
- Local representation of performance assets
- Use localrep data in tasks views
- Use localrep performances in tasks views
- Localrep computeplan status field, that is synced when receiving computetask event update instead of in the computeplan view
- Compute and store cp dates and duration during sync
- Local representation of model assets
- Use localrep data in newsfeed view
- When resyncing the local representation, only fetch the latest events
- Accept datamanager events with missing logs_permissions in localsync
- Update datasamples protobuf.
register_datasamples
return now the datasamples list registered - When resyncing the local representation, fetch events by page of 1000 instead of one-by-one
- Update computetask protobuf.
register_tasks
return now the tasks list registered - Stream directly asset files from MinIO without loading them on disk
- compute_plan["failed_task"] is populated when syncing a failed task event in the event app instead of in the views
- Switch from aiopika to pika in the event app
- Cancel a compute task when its compute plan is not runnable
- Handle incomplete medata in newsfeed
- Fix container image build lock, to prevent ImageEntrypoint concurrency issues
- Fix filters on datamanager list view
- Handle missing failed task for failed CP status
- Handle case of resync with no new events
- Corrupted asset buffer when asset download fails
- Possible race condition when deleting pod
- Use failure report owner to determine whether storage address is local
- Check that start_date and end_date are not
None
to compute duration - Assets filtering on enums
- Synchronize performances and models before update computetask status
- Compute CP status after computetasks resync
- Safe serializer should not break db transaction
- The datamanager asset now has a
logs_permission
field - Local representation of algo assets
localrep
app withEvent
andAlgo
: migration, model and serializerlocalsync
module inevents
app, using localrep models and serializers to load orchestrator response and save events and algos metadata in DB.resync
the local representation at the start of the event appsync_on_event_message
method to save in the local representation the algos and events when an event on the algo is received.
- Local representation of metric assets
- Add a news feed endpoint to watch compute task event update
- Update
substrapp
algo viewset:- For
list
andretrieve
, replace orchestrator query bylocalrep
models queryset. - For
create
, createlocalrep
algo before synchronization to be able to instantly (locally) query a newly created algo.
- For
- News feed now handles incomplete event metadata
- Preserve order of parent tasks when registering a new task
- Memory leak in MinIO client
- Zombie compute pods (pods which are never deleted) in edge cases
- Missing timeout on event app orchestrator RabbitMQ connection
- Fixed task failure due to concurrent download of input assets
- When executing compute tasks, store Algos and Metrics in temporary folders instead of the Asset Buffer
- On compute task failure, send the type of the error that occurred to the orchestrator
- Remove validated field on Datasample, Algo, Metrics, Model and Datamanager models
- Update backend and metric export python dependencies
- Reorganize and rename exported archives
- Metrics support for the Django app behind the flag
ENABLE_METRICS
- Limit file size upload to
DATA_UPLOAD_MAX_SIZE
- Setting to run in "isolated" mode (when there is no backend data, only orchestrator data)
- Add
COMPUTE_POD_GKE_GPUS_LIMITS
setting to enable usage of GPU by the compute pod on GKE - Add new route to list all algos of a compute plan
- Add cp start date, end date and duration
- Datasample upload by path is possible only from the servermedias volumes
- algo, metrics and datamanager are stored in Minio instead of the medias volume
- Rename
RUN_AS_GROUP
setting toCOMPUTE_POD_RUN_AS_GROUP
. - Rename
RUN_AS_USER
setting toCOMPUTE_POD_RUN_AS_USER
. - Rename
FS_GROUP
setting toCOMPUTE_POD_FS_GROUP
. - Do not openly expose media directory
- Do not mount the serviceAccount token on the compute pod
- Switch log format to JSON
COMPUTE_REGISTRY
setting, you should provide the whole kaniko image name inKANIKO_IMAGE
- Properly prevent path traversal in archives and don't allow symbolic links
- Inject task extra information even if there are no query params
- Support for filters on compute plan sub routes
COMMON_HOST_DOMAIN
variable env now required to set domain property of JWT cookies- Models and Datasamples are now stored in MinIO (they were previously stored on the filesystem)
- Possibility to deploy deploy multiple workers on different kubernetes nodes in order to use compute resources in parallel. See charts CHANGELOG.md for more details
- post_delete signal upon model deletion to remove the model from Minio storage
- Add task extra information related to start_date and end_date
- serve API behind subpath with
SUBPATH
env var
- Task data are now mounted on
/substra_internal
instead of/sandbox
- (BREAKING) Replace objective by metric
- (BREAKING) Multiple metrics and performances per test task
- Insert full data manager, metrics and parent tasks objects in tuple responses for retrieve calls
- Validate orchestrator connection on readiness and liveness api checks
- Set the local folder dynamically instead of leaving it to substra-tools default
- Fix trailing comma that turned versions at /info into lists
- Accept
UUID.hex
UUID as asset keys - Trying to download a disabled model will now result in a consistent http 410 error code instead of an http 500 or http 404 error code
- Models exported to bucket are prefixed with their compute plan's ID
- Backend version build in docker image and exposed in
/info
- Orchestrator and chaincode version in
/info
- Unified all 3 categories of algos in a single endpoint.
- all algos are now served through
/algo
and/algo/:key
- when creating a new algo, you must give a
category
property which value is one ofALGO_SIMPLE
,ALGO_COMPOSITE
orALGO_AGGREGATE
- all algos are now served through
- Search objective by metrics with
/objective?search=objective:metrics_name:[METRIC_NAME]
instead of/objective?search=objective:metrics:[METRIC_NAME]
- Switched to structured logging
- Made
/info
semi-public: returns some information for anonymous users and some more for authenticated ones
- Routes
/aggregate_algo
,/aggregate_algo/:key
,/composite_algo
and/composite_algo/:key
(all algos now served through/algo
and/algo/:key
) - Asset filters on attributes from different assets
example :
GET /objective?search=traintuple:key:foo
The composed filter that are removed are:
/dataset?search=model:field_key:value
/dataset?search=objective:field_key:value
/algo?search=model:field_key:value
/objective?search=model:field_key:value
/objective?search=dataset:field_key:value
/model?search=algo:field_key:value
/model?search=dataset:field_key:value
/model?search=objective:field_key:value
- Add routes to get a compute plan's compute tasks by type
- Compute-plan-less compute tasks all using the same lock key
- Asset buffer skipping some data samples
- API: Added Pagination for lists of asset
- Fix kaniko local directory cache for base images
- The backend is compatible with FL worflow again
- lock_resource was raising a FileNotFound exception in high concurrency scenarios
- Refactor views/datasamples.py
- The opener is now downloaded instead of being copied from disk
- Better use of shutil.copytree
- Add docker config secret name for kaniko builder .
- Add registry cleaning tasks.
- Use a single compute pod for all the tasks of a compute plan .
- Add two missing
__init__
files . - Update python dependencies.
- Export models .
- Binding to service.port instead of 8000.
- Datasample order for metrics.
- Auto-allocate docker-registry node port .
- Bump django from 2.2.19 to 2.2.20 in /backend.
- Update cryptography to its latest release.
- Update django and django-celery-results.
- docker-registry default service value to nodePort
- Update grpcio
- Change local peer hostname to prevent issue from grpc client
- Fix JWT token blacklist at logout
- Add django shared cache to prevent issue in throttling
- Less permissive CORS & AllowHosts