-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
SDK version: 2.32.0
When an ingest() worker fails to put a record, it logs an output message and continues working, e.g.:
Failed to ingest row 1147613: An error occurred (ValidationException) when calling the PutRecord operation: <output removed>
Failed to ingest row 432713: An error occurred (InternalFailure) when calling the PutRecord operation (reached max retries: 4): Internal server error. Please try again later.
Once all workers are done, the ingest() method will complete with an exception.
This creates the following problems:
-
The return value isn't populated, which makes it impossible to access the list of failed rows in
IngestionManagerPandas.failed_rows -
If the ingestion process is run in a script, a SageMaker Processing job, etc. the job is marked as failed, unless you do something like:
try:
feature_group.ingest(data_frame=data, max_workers=max_workers, wait=True)
except Exception:
pass
IMHO, a nicer behavior would be to catch the exception in ingest(), to return the list of failed rows identifiers, and to let the caller decide what to do next. In some use cases, ingestion errors won't be big enough of a problem to justify failing the job and any associated workflow.