Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dubious value added of the 'freq' variable #288

Open
rideofyourlife opened this issue Jan 15, 2024 · 7 comments
Open

Dubious value added of the 'freq' variable #288

rideofyourlife opened this issue Jan 15, 2024 · 7 comments

Comments

@rideofyourlife
Copy link

rideofyourlife commented Jan 15, 2024

Many datasets, which contain only one frequency available (like namq_10_gdp, sts_inpr_m etc.), were awarded a new variable "freq". I generally understand the idea behind it, but while working on the package it has only proven to often be an unnecessary step of %>% select (-freq) in majority of the code I write.

Does anyone else have similar thoughts?

@pitkant
Copy link
Member

pitkant commented Jan 15, 2024

This is intentional behaviour but if many people find this annoying you can report it under this issue and we can reconsider.

@antaldaniel
Copy link
Contributor

Statistical agencies worldwide have similar standards treating metadata, and metadata in this case is there to avoid unforeseen logical errors when joining or linking data; I think that the freq variable is present when there are similar statistical products or datasets available with the same variables but different frequencies. In that case a joining without frequency adjustment results in a hard to find logical error. The freq variable is the same as the unit variable, you really want to avoid unknowingly divide euros with thousand euros, or multiply annual values in a chain with quarterly values.

@rideofyourlife
Copy link
Author

Statistical agencies worldwide have similar standards treating metadata, and metadata in this case is there to avoid unforeseen logical errors when joining or linking data;

Well, we are all aware. At least I hope so it is the case.

In that case a joining without frequency adjustment results in a hard to find logical error. The freq variable is the same as the unit variable, you really want to avoid unknowingly divide euros with thousand euros, or multiply annual values in a chain with quarterly values.

This would assume users are somewhat unaware of what they are doing. It seems to me that implementation of this technique is triumph of form over content.

@pitkant pitkant added this to In progress in eurostat 4.1.0 Feb 7, 2024
@pitkant
Copy link
Member

pitkant commented Apr 29, 2024

@rideofyourlife I have uploaded some WIP code in v4.1 branch. It enables users to make queries the same way as before but adds an additional parameter legacy.data.output to get_* functions that transforms dimensions names such as TIME_PERIOD and OBS_VALUE to time and values that were used before and removes extra columns such as freq, DATAFLOW and LAST UPDATE altogether.

If you could test this and give some feedback on what you think it would be great!

@pitkant pitkant moved this from In progress to Done in eurostat 4.1.0 Apr 29, 2024
@rideofyourlife
Copy link
Author

I have already laboriously replaced "time" with "TIME_PERIOD" in all my codes, so having "time" back is not as essential now as it had been before the recent change. Despite that, where do I use this legacy.data.output? In which function?

@pitkant
Copy link
Member

pitkant commented May 20, 2024

I'm sorry for the laborious process. In version 4.1 legacy.data.output = TRUE parameter in get_eurostat() function should return a similar data.frame / tibble as it returned in version 3.8.3 and before.

@rideofyourlife
Copy link
Author

Ah, yes: it works. It is just not suggested by R Studio while writing for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants