Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wish list / roadmap #7

Open
12 tasks
larnsce opened this issue May 16, 2023 · 5 comments
Open
12 tasks

Wish list / roadmap #7

larnsce opened this issue May 16, 2023 · 5 comments

Comments

@larnsce
Copy link
Collaborator

larnsce commented May 16, 2023

This wish list was copied from the README and added here as a to do list:

@larnsce
Copy link
Collaborator Author

larnsce commented May 16, 2023

@nickdickinson

I have reviewed the to do list and other open issues. Some notes and questions:

Complete labeling of all of the datasets

Do you mean something like this? https://strengejacke.github.io/sjlabelled/articles/labelleddata.html

Complete how-to documentation and several case studies to demonstrate use

I would add a very simple README for the pkgdown landing site that's also the README for the GitHub Repo and then more extensive "Articles" vignette pages to the pkgdown site (#4)

Add use cases on combining with other data sets (national monitoring data, country TrackFin studies, etc.)

Combine this with "Articles" page on website in issue #4 and "how-to documentation" point

Add tests for data extraction and validation to cross validate country files against world files and different sheets against one another (as an extraction test and internal validation of the data sets)

That's issue #1, right?

Standardize the (long) data format used by datasets in the package.

Would you say the data is in a "long" format? I would actually argue that this is a wide format and one could standardize one single table as a long-format in which all data is included from all individual dataframes. That long format could then serve as a type of API.

Post article on “Enhancing the use and quality of official statistics using open source”

How about we make it a goal to get this on CRAN and then alongside write an article for: https://joss.theoj.org/

@nickdickinson
Copy link
Member

Labelling: yes adding the label attributes. Possibly value attributes. I would avoid an additional dependency unless strictly required though. Also if we do add one, lets ideally stick to tidyverse/haven: https://haven.tidyverse.org/reference/labelled.html
How-to and case studies: basically vignettes. I have used this in a few different consultancies to support D30 and SWA to produce data products among others. So I'd like to give some examples. Indeed having pkgdown site would be best.
Long data form: most are wide, a couple are long (inequalities files I believe). I would like to add a few functions (API if you will) to enable to easily switch back and forth.

As a side note, having a separate package to quickly convert the whole dataset into a star-schema database that can be uploaded into a backend database (SQLite or mssql) would be best. That is a separate but related issue that I should add to the roadmap. This is critical for applications like Excel and Power BI that cannot easily work with the size of the long dataset without massive files. Ultimately, a lot of end-users are in this category. For R users it could also be a more efficient way of working with the data in dashboards like Shiny. I'm not a huge Shiny fan because of the way it is not usually built to scale.

CRAN: Yes, this would be great. I think we are mostly there but there is some documentation of the datasets
Article: Sounds great. Will need to free up some time ;-)

@larnsce
Copy link
Collaborator Author

larnsce commented May 25, 2023

Thanks, @nickdickinson.

I have reworked the wish list of this issue a bit and had tried to convert some of the to do's to actual issues, but GitHub wouldn't let me.

It might be because I am not a collaborator on this project. Could you add add me?

And then: How would you like me to contribute?

  • clone -> dev branch -> feature branch -> PR to dev branch?
  • fork to openwashdata/jmpwashdata -> dev branch -> PR to dev branch on WASHNote/jmpwashdata?
  • clone -> push to main? ;)

@nickdickinson
Copy link
Member

The first would be great. For such a small project we may not even need the dev branch but if we think more people will join up then I think let's set up a dev branch.

Also would be good to setup a few tests to make sure we don't introduce break changes as we move along. I'll create an issue

@larnsce
Copy link
Collaborator Author

larnsce commented Apr 16, 2024

@nickdickinson: Can you add Colin as an contributor with push rights to this repo, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants