Skip to content

[Feature Store] Feature group creation: provide a DataCatalogConfig while enabling glue table creation #6702

@simonvdk

Description

@simonvdk

Confirm by changing [ ] to [x] below:

Issue is about usage on:

  • Service API : I want to do X using Y service, what should I do?
  • CLI : passing arguments or cli configurations.
  • Other/Not sure.

Platform/OS/Hardware/Device
Linux

Describe the question
Use case
Create a feature group with automatic glue table creation for the offline store metadata, while configuring the glue data catalog database and table names

Issue encountered
It seems that providing a DataCatalogConfig and setting disable_glue_table_creation to false are mutually exclusive:

  • I can either not configure the glue database and table names and enable the glue table creation, so that the glue table with the default name and database is created upon feature group creation
  • OR I can provide a DataCatalogConfig but then I have to disable the glue table creation, so that the requested glue table is not created upon feature group creation

But I cannot provide a DataCatalogConfig and enable the glue table creation. Error encountered:

An error occurred (ValidationException) when calling the CreateFeatureGroup operation: Validation Error: DataCatalogConfig is not permitted in the request unless AutoCreateGlueTable is turned off. Please either set AutoCreateGlueTable to false or remove DataCatalogConfig from the request.

Why this seems to be an issue:

  • this behaviour (mutually exclusive) is not mentioned in the documentation. Also, there is no further mention or example of how to configure the offline store data catalog in the documentation
  • given the current state of the documentation, a user may want to configure the name of the glue database and table where the offline store metadata will be stored, while benefiting from the glue table creation upon feature group creation (with all the configuration - schema, storage descriptor etc - coming from the feature group information)
  • this extract from the java SDK documentation seems to indicate that the DataCatalogConfig should not be mutually exclusive with the automatic table creation

Ways to reproduce issue
Reproduced with AWS SDK (2.50.0) and AWS CLI.
Providing an OfflineStoreConfig with both DisableGlueTableCreation=False and a DataCatalogConfig with configured glue database (already created) and a glue table (that does not yet exist) raises the above error. Proving the DataCatalogConfig with DisableGlueTableCreation=True does not raise, but the glue table is not created either.

Example with AWS CLI:

aws sagemaker create-feature-group --cli-input-json '{"EventTimeFeatureName": "timestamp", "Description": "", "RecordIdentifierFeatureName": "record_id", "FeatureDefinitions": [{"FeatureName": "record_id", "FeatureType": "Integral"}, {"FeatureName": "timestamp", "FeatureType": "String"}], "OfflineStoreConfig": {"S3StorageConfig": {"S3Uri": "s3://my_bucket/my_prefix", "KmsKeyId": "arn:aws:kms:region:account_id:key/key_id"}, "DataCatalogConfig": {"TableName": "my_table", "Catalog": "account_id", "Database": "my_db"}, "DisableGlueTableCreation": false}, "FeatureGroupName": "my-feature-group"}'

Expected output
A clearer documentation about how to configure the offline store data catalog (e.g. with an example in a notebook), and possibly the possibility to configure the data catalog while benefiting from the glue table creation

Logs/output
Get full traceback and error logs by adding --debug to the command.

NB: A similar issue has been opened on the sagemaker-python-sdk repository

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationThis is a problem with documentation.feature-requestA feature should be added or improved.service-apiThis issue is due to a problem in a service API, not the SDK implementation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions