Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data type anomaly for specific fields within daily data #107

Closed
amotl opened this issue Jul 5, 2020 · 5 comments · Fixed by #115
Closed

Data type anomaly for specific fields within daily data #107

amotl opened this issue Jul 5, 2020 · 5 comments · Fixed by #115

Comments

@amotl
Copy link
Member

amotl commented Jul 5, 2020

As outlined within panodata/dwdweather2#27, acquiring precipitation information from daily observations should yield data like

"precipitation_form": 4,
"precipitation_height": 8.8,

However, I just found out that

wetterdienst readings --resolution=daily --parameter=kl --period=recent --station=44

will yield data like

"precipitation_form":4.0,
"precipitation_height":1.5,

So, we should adjust the data type for precipitation_form to be an Integer, like designated within the dwdweather2 knowledge base module, line 142.

cc @BenjaminMews

@amotl
Copy link
Member Author

amotl commented Jul 5, 2020

While being at it, we also might want to appropriately adjust the data type for fields like daily_quality_level_4 as outlined within the dwdweather2 knowledge base module, line 140.

@gutzbenj
Copy link
Member

gutzbenj commented Jul 6, 2020

Do you think that we should reinvent the dtype mapping creation or simply implement another if-else for, say, static data columns (-> int) and dynamic data columns (-> float).

@amotl
Copy link
Member Author

amotl commented Jul 6, 2020

I believe doing it in a dynamic manner would be okay. Then we can say things like

if column_name in ['precipitation_form', 'any_others'] or 'quality_level' in column_name:
    value = int(value)

Note this is just non-Pandas pseudocode and probably should be written down in a more elaborated way. Also, it should be performed before humanizing column names, which is an optional feature.

@gutzbenj
Copy link
Member

gutzbenj commented Jul 6, 2020

Btw just noticed that if we want to successfully apply this to the library, we require the whole parameter names being typed for all resolutions. Otherwise we'd break the functionality for some time resolutions. In conclusion we should first fully name the whole set of parameters from 1_minute to annual...

@amotl
Copy link
Member Author

amotl commented Jul 6, 2020

Btw just noticed that if we want to successfully apply this to the library, we require the whole parameter names being typed for all resolutions. Otherwise we'd break the functionality for some time resolutions.

I see. Thanks for looking at the nitty gritty details.

For now, we could also approach a dynamic solution and use integer_field in df.columns as a constraint to apply the coercion. I just quickly ramped up and submitted #108 to give us an idea about how things might be implemented that way.


In conclusion we should first fully name the whole set of parameters from 1_minute to annual...

I will not stop you doing this. Thanks already! It might save some unnecessary cycles iterating through all special integer fields. However, if you feel the dynamic solution outlined through #108 will also be okay, I will also be happy to help getting it out of draft mode.

@amotl amotl linked a pull request Jul 13, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants