-
Notifications
You must be signed in to change notification settings - Fork 722
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Consider the following Pandas Dataframe:
import datetime
df['datetime'] = datetime.datetime.today()
df['normalized_date'] = df['datetime'].dt.normalize()
df['date'] = datetime.date.today()
df = df[['datetime', 'normalized_date', 'date']]This results in a schema like this:
datetime datetime64[ns]
normalized_date datetime64[ns]
date object
Resulting in this schema in Glue when saved:
While the schema in Parquet looks like this when reading the file with Spark:
root
|-- datetime: timestamp (nullable = true)
|-- normalized_date: timestamp (nullable = true)
|-- date: date (nullable = true)
This issue might be caused in the Glue.type_pandas2athena() function that converts
all pandas 'object'-types to string. Maybe instead of using the Pandas schema, you need to use the pyarrow.Table schema.
stijndehaes and igorborgest
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
