Skip to content

InvalidSerDe: None in not in the valid SerDe list #123

@JPFrancoia

Description

@JPFrancoia

I'm trying to create/update a table in the Glue catalog with the following snippet:

# ...some code fetching csv files from a bucket...

for file in valid_files:
    df = wr.pandas.read_csv(path=file)

    # This is needed, if a column has a trailing "\r", call to metadata_to_glue crashes
    df.columns = [c.strip() for c in df.columns]

    # Call without serde in extra_args
    wr.glue.metadata_to_glue(
        df,
        BUCKET_SCAN + SUB_PATH,
        valid_files,
        "csv",
        database=DATABASE,
        table="my_table_20200129",
        # extra_args={"serde": "LazySimpleSerDe"},
        preserve_index=False,
    )

I get the following error:

Traceback (most recent call last):
  File "qof_scripts/crawler.py", line 72, in <module>
    preserve_index=False,
  File "/Users/jpfrancoia/.local/share/virtualenvs/test_aws_lake-KkaPCkQ0/lib/python3.7/site-packages/awswrangler/glue.py", line 114, in metadata_to_glue
    columns_comments=columns_comments)
  File "/Users/jpfrancoia/.local/share/virtualenvs/test_aws_lake-KkaPCkQ0/lib/python3.7/site-packages/awswrangler/glue.py", line 182, in create_table
    extra_args=extra_args)
  File "/Users/jpfrancoia/.local/share/virtualenvs/test_aws_lake-KkaPCkQ0/lib/python3.7/site-packages/awswrangler/glue.py", line 313, in csv_table_definition
    raise InvalidSerDe(f"{serde} in not in the valid SerDe list.")
awswrangler.exceptions.InvalidSerDe: None in not in the valid SerDe list

I managed to track down the issue to this line: https://github.com/awslabs/aws-data-wrangler/blob/d50b214274583eb6dd2cbc1c6c54c60f9f87035c/awswrangler/glue.py#L295

Basically the serde is taken from the extra_args parameter:

serde = extra_args.get("serde")

But serde is set to None if it's not provided in the extra_args dict. And the rest of the function crashes if serde isn't set to OpenCSVSerDe or LazySimpleSerDe: https://github.com/awslabs/aws-data-wrangler/blob/d50b214274583eb6dd2cbc1c6c54c60f9f87035c/awswrangler/glue.py#L313

I think this is a bug. In the current setting, the extra_args parameter is an Optional[dict], but the method csv_table_definition can't run without the serde being set.

This can be solved by defaulting to a serde if serde isn't provided. I'll make a PR.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions