[Feature] Infer string feature type from pandas 'object' dtype

**Describe the feature you'd like**

[FeatureGroup.load_feature_definitions()](https://github.com/aws/sagemaker-python-sdk/blob/0f5cf1824c0b116c9b218c803f3b94a85e09fd45/src/sagemaker/feature_store/feature_group.py#L609) (incidentally, why doesn't this method show on the [FeatureGroup API doc?](https://sagemaker.readthedocs.io/en/stable/api/prep_data/feature_store.html#sagemaker.feature_store.feature_group.FeatureGroup) It has a docstring...) should default to `String` data type for columns with pandas dtype `object`, instead of raising: `ValueError: Failed to infer Feature type based on dtype object for column ...`.

**How would this feature be used? Please describe.**

Per [the Pandas doc](https://pandas.pydata.org/docs/user_guide/text.html#working-with-text-data), although Pandas does now have a string dtype and it's the preferred way to handle text data:
1. It wasn't available before pandas v1.0
2. For backward-compatibility, `object` remains the default dtype inferred when parsing lists of strings or reading CSVs.

For both of these reasons, it's common that users will have dataframes with object columns representing strings. IMO the process for converting columns explicitly to the new string dtype is non-obvious (see below "additional context"), and therefore a bit of a pain that this function isn't able to just infer string by default for "object"s.

**Describe alternatives you've considered**

1. Leave as-is (users must figure out how to explicitly convert all their `object` dtype fields to a different dtype)
2. Map `object` dtype to SMFS `String` during type inference
3. (Preferred?) Map `object` dtype to SMFS `String`, but *raise a warning* (because theoretically an `object` dataframe field could be any Python object - just that it's very likely in practice to be text strings)

**Additional context**

My current workaround for explicitly setting object->str dtypes in Pandas 1.0+ is (as per [here](https://github.com/aws-samples/sagemaker-101-workshop/blob/ef22bc022272b9d6c0acb337977fa4d206eb6397/builtin_algorithm_hpo_tabular/util/data.py#L181)):

```python
for col in df:
    if pd.api.types.is_object_dtype(df[col].dtype):
        df[col] = df[col].astype(pd.StringDtype())
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Infer string feature type from pandas 'object' dtype #3505

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Infer string feature type from pandas 'object' dtype #3505

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions