You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Leaving out engine specific parameters for pandas materializers
Current behavior
The only arguments that can be passed into the read_FORMAT and to_FORMAT for each of the materilzers are the ones specified on the respective page, but some pandas functions allow the user to add additional arguments that are passed into the engine directly .
As the code is written, Hamilton throws an error when an otherwise valid argument can be passed, unless they are passed as a dict as a part of the extra_kwargs argument which is unituituve.
Stack Traces
PandasParquetWriter.__init__() got an unexpected keyword argument 'read_dictionary'
Screenshots
Steps to replicate behavior
Pass any parameter not specified in the initializtion of the class but is a valid parameter to any Read and Write class
There shouldn't be a failure for any parameter that is valid, but should they have to pass it in as part of the extra_kwargs or into the class like other arguments
Additional context
I'm working on the geopandas materilzers and noticed that there are many instances where I would need the user to pass in their own kwargs that I had not initially listed. I just need to know which implementation should be used.
Yeah this is a bit of a hard one. We decided to use extra_kwargs since it would otherwise require listing all possible kwargs in the class since we're using dataclasses (AFAIK). That seems hard to scale/ensure it's right?
So extra_kwargs I think is the happy medium, but it does mean the API doesn't 1-1 map. That said, I think extra_kwargs helps someone else who comes to read the code, as it makes it clear these kwargs are extra and likely specific to some use case... versus you having to check the function signature to understand what is used, and what is being passed on under the hood...
I'll noodle on this for the week, but I'm leaning towards extra_kwargs. Thoughts?
JoJo10Smith
changed the title
Bug Report
'extra_kwargs' vs writing out each argument
Oct 30, 2023
Leaving out engine specific parameters for pandas materializers
Current behavior
The only arguments that can be passed into the read_FORMAT and to_FORMAT for each of the materilzers are the ones specified on the respective page, but some pandas functions allow the user to add additional arguments that are passed into the engine directly .
For example the Pandas to_parquet (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_parquet.html#pandas.DataFrame.to_parquet) has a few arguments listed on the page (path, engine, compression etc) but has the option to pass in additional arguments to the engine as **kwargs.
If the user set
engine = pyarrow
as their engine they would be able to use the following arguments in addition to the ones listed on the pandas page: https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.htmlAs the code is written, Hamilton throws an error when an otherwise valid argument can be passed, unless they are passed as a dict as a part of the
extra_kwargs
argument which is unituituve.Stack Traces
PandasParquetWriter.__init__() got an unexpected keyword argument 'read_dictionary'
Screenshots
Steps to replicate behavior
Pass any parameter not specified in the initializtion of the class but is a valid parameter to any Read and Write class
Library & System Information
Hamilton: 1.35.1
Python 3.10.2
System: Ubuntu 22.04
Expected behavior
There shouldn't be a failure for any parameter that is valid, but should they have to pass it in as part of the
extra_kwargs
or into the class like other argumentsAdditional context
I'm working on the geopandas materilzers and noticed that there are many instances where I would need the user to pass in their own kwargs that I had not initially listed. I just need to know which implementation should be used.
Potential solutions
The text was updated successfully, but these errors were encountered: