Skip to content
Dominic Comtois edited this page Dec 22, 2020 · 6 revisions

Welcome to the summarytools wiki! I'll be writing down some ideas here over the next weeks, ideas that I encourage you to discuss in the Discussions section.

The main goal for now is to make the package more of a collective effort. The main question for now is:

How do I/we make it easier for you to contribute?

I don't think there is an easy answer, and there are several angles from which to tackle this. As I expressed it on Twitter on a few occasions, I know summarytools can be improved. And I can't keep up doing it all by myself. If just a few people join in at first, we can work together in defining a clearer roadmap, which will, in turn, make it easier for even more people to participate.

Involving the R Community & Ideas for a Roadmap

Whether we see it through the lenses of design, implementation, project management, or documentation, I think this can be a great opportunity to learn from each other and make summarytools a durable, flexible, and collaborative project that evolves as R does!

Areas of Improvement

Documentation

Developer's Perspective

I've started working on this wiki page... More to come soon, but feel free to chip in & discuss!

User's Perspective (inline, vignettes)

Please feel free to add to the existing documentation (inline & vignettes), be it by forking the project on GitHub and creating pull requests (always using the "dev-current" branch pls), or by creating something new (demos that could eventually become vignettes, for instance).

Code Organization / Internal structure

There is certainly room for improvement here. Some functions have lots of lines of code, and I am sure there are ways to make everything more clear and tidy. I've spend a lot of time developing this package on my own, not overly concerned with making things clean and simple -- as long as I understood where I was going, I was satisfied and kept on going, not always mindful of posterity. Now that I don't have so much time to dedicate to the package, I can understand that it might be a bit scary to dive in. Having the input of people with a solid package development experience would be great -- I'm aware that some prerequisites are lacking in the Developer's Perspective wiki pages, this will be my point of focus in the short term.

Refining Testing

Since functions produce outputs, the way I handle testing is to generate a bunch of outputs (which serve as a point of reference) and when a change is implemented, I re-generate those outputs and compare them with the reference output; this served me well so far, at least preventing regression problems: If I work on a new functionality and break something, it will show up that way -- maybe a heading is broken, or the row order has changed. This approach is not as rigorous as it could be; used in combination with a more traditional and systematic unit-testing, it would be best.

Note that I've removed the testing logic from the GitHub repo; I'm making a new repo for it, as it will make it simpler to work across several branches.

What is on the Horizon

Flexibility in Headings

One thing I'd like to do is offer more flexibility in terms of heading content. I'm thinking maybe using the existing framework for translations and pushing it further by allowing variables to be put in. For instance, maybe you want to have the variable name in the main title, rather than on a separate line.

Quality of Output

To create good output, one must put some time and effort. I've seen lots of blog entries trying to promote summarytools but unfortunately showing ill-formatted tables (missing linefeeds, odd breaks in table headings, and so on). Working on our own in RStudio is one thing; creating content for the Web is another. I think there's room for improvement here as well, although I'm not sure how to get there. Having the input of Web developers/integrators would be nice.

Performance

The package performs well with small or medium-sized datasets, but when very large datasets are involved, there is some lag. I've identified bottlenecks and tried to optimize as much as I could, but maybe including some C would be the next step. Or maybe even removing some non-essential functionality like checking for barcode data (this is one bottleneck).

Testing

Adding unit tests is a must. Not knowing the extent to which the package will need to be reworked, it not be the best place to start though. But if you have some ideas in that area, please share your thoughts!