sagemaker.model_monitor.DefaultModelMonitor suggest_baseline is not  able to read Japanese text

**Describe the bug**
When creating statistics and constraints with DefaultModelMonitor.suggest_baseline for a UTF-8 encoded CSV containing Japanese text, the column names and categorical values are all appeared as ????? in the output JSON, making it unuseable.
 

**To reproduce**
A clear, step-by-step set of instructions to reproduce the bug.
The provided code need to be **complete** and **runnable**, if additional data is needed, please include them in the issue.
Create a CSV dataset with Japanese columns name, and categorical values in Japanese.
```
my_default_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600, 
)

my_default_monitor.suggest_baseline(
    baseline_dataset="baselining_data_set.csv",
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=output_s3_uri,
)
```

Check the statistics.json and constraints.json created, it will show ?????? for Japanese text
```
{
  "version" : 0.0,
  "features" : [ {
    "name" : "????",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "???????",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "???????",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }, {
    "name" : "????",
    "inferred_type" : "Integral",
    "completeness" : 1.0,
    "num_constraints" : {
      "is_non_negative" : true
    }
  }
```

**Expected behavior**
Correctly showing Japanese text.

**Screenshots or logs**
If applicable, add screenshots or logs to help explain your problem.

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**:  2.224.4
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**:
- **Framework version**:
- **Python version**:
- **CPU or GPU**:
- **Custom Docker image (Y/N)**:

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sagemaker.model_monitor.DefaultModelMonitor suggest_baseline is not able to read Japanese text #4822

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sagemaker.model_monitor.DefaultModelMonitor suggest_baseline is not able to read Japanese text #4822

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions