-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRAN Task View Proposal: CompositionalData #58
Comments
Dear Michail, |
The list prepared by Michail includes all packages I know on compositional data analysis and even more. It is really a comprehensive list. I support it. |
Michail, I agree with the others, very nice proposal! I will read a few things in more detail but a couple of quick comments:
|
Achim thanks for your nice comments.
|
Patrice is definitely a good addition, welcome on board. Additionally, it would be good to increase diversity a bit and maybe find two more co-maintainer, ideally a female person and/or someone from a different region/field/application area etc. For the links: DOIs will be more persistent and always resolve to the journal links (which may change over time). arXiv also added DOIs recently. |
Ok, in that case I will add Christophe as well. He is an expert in the CTVs and from a different field, but not a female. |
Dear all,
You are going forward too fast for me. May I read the proposal first and
give some suggestions (also regarding possible maintainers) until tomorrow
afternoon?
Thank you
Best
Matthias
Michail Tsagris ***@***.***> schrieb am Do., 28. Sept. 2023,
09:02:
… Ok, in that case I will add Christophe as well. He is an expert in the
CTVs and from a different field, but not a female.
I will change the links with the DOIs everywhere, later on today. What
shall I do about the books?
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIFCRM5EGYIE4FTKOFULHTX4UOILANCNFSM6AAAAAA5I3MM2Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sure, Matthias, that's perfectly fine. The other CRAN Task View Editors haven't reacted, yet, either. So we're still in the review process stage and are still collecting feedback. |
Re: Michail. Christophe is, of course, a great collaborator...but he is already the principal maintainer of two task views. So if possible I would ask you to reach out to other persons in order to distribute the workload better. Moreover, it would also be good to team up with people that really bring in a different perspective who might be aware of packages/activities/etc that you don't know, yet. So I would encourage you to think about potential co-maintainers and then reach out to them. |
Achim,
I cannot speak for Christophe but on my side, I have a rather good
experience of the R packages related to compositional data since I work
in this field and have tested the top 10 packages. Thanks to my RWsearch
package, detecting new packages on CRAN is easy and allowed the
Distribution task view to expand from 150 packages in 2018 to 250+
packages in 2023. RWsearch also detected the new isopleuros package
(stange name!) that appeared on 2023-05-16 and is still in version 1.
The CoDa community is rather small and the number of packages will not
grow so much. If Matthias accepts to co-maintain the task view, we will
be 3 persons (excluding Christophe). For ladies, we need contact them
one by one. My idea is to make a call at the next CoDa meeting in July
2024 and ask for a (female) volunteer.
https://www.coda-association.org/en/coda-info/news-info/coda-book-applied-compositional-data-analysis/
Patrice
Le 28/09/2023 à 11:37, Achim Zeileis a écrit :
…
Re: Michail.
Christophe is, of course, a great collaborator...but he is already the
principal maintainer of two task views. So if possible I would ask you
to reach out to other persons in order to distribute the workload
better. Moreover, it would also be good to team up with people that
really bring in a different perspective who might be aware of
packages/activities/etc that you don't know, yet. So I would encourage
you to think about potential co-maintainers and then reach out to them.
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALBUSNJ27EUKJ7F7MGXQCTLX4VAL5ANCNFSM6AAAAAA5I3MM2Q>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Patrice, thanks for the input! Two quick comments:
|
Achim hello.
|
Thanks for the DOIs! Regarding the maintenance: Maintainers should pick up new and interesting packages, tutorials, etc. that are relevant for the task view. So it's good to have people with different backgrounds (scientific field, methodology vs. applications, geographical region, etc.) who follow what is going on in the R world from their perspective. Then they will notice different relevant innovations. |
In that case can I ask someone from the bioinformatics field to join us? |
Bioinformatics sounds good to me. But maybe wait for Matthias in case he has further suggestions. Adding Bioconductor packages can be done via |
Thanks Achim, |
Sorry to come in the conversation that late! Thanks for the proposal, which is indeed very useful @statlink I agree that the "Bioiformatics/ecology related packages" (be careful: there is a typo) could maybe be improved. The use of methods explicitly using the compositional nature of the data is the standard in metagenomics and this could be a subsection of this part (I can help you find some packages of interest in addition to the ones that you already cited). For other types of omics, such as sequencing data in general, it is less standard but sometimes useful for some tasks (for instance, the bioconductor package |
And, in addition, I am far from being an expert but some omics data obtained from spectrometer (proteomics, metabolomics) are also often compositional (you cite some packages related to this in your current proposal) and to my opinion, this should be a different subsection (because, the reason for the compositional nature of the data is very different from metagenomics and other kind of sequencing data). |
Tuxette hi. I am not an expert either, and I included all of them in one section because this is a different field to mine. |
I can help you sort this section (if you agree of course). I'll try to do that next week if that works for you? |
We need Achim to agree with this also. Because in that case you would have to be a co-maintainer. |
Nathalie @tuxette is a CRAN Task View Editor - like myself. And we help to improve task view proposals while they are under review, so that we can eventually approve them. (See also the proposal guidelines.) |
Achim I am happy if she joins us alongside Patrice. |
Dear all Thanks for this initiative. Really great to see so many (new) packages in this field and you did a great job finding and listing them. In short: In long:
I also disagree with the second sentence: "The most popular approach is to use the logarithm transformation applied to ratios of the variables, initially suggested by Aitchison (1982). However this approach has drawbacks and for this many alternative transformations have been developed throughout the years.", since you meant most probably the additive log-ratio and centered log-ratio, and with many you mean most probably only the isometric log-ratio transformation (plus some "exotic" ones" since most of the other power transformation does not fulfill the principles of compositional data analysis. I would thus recommend: "The most popular approach is to apply a log-ratio analysis, initially suggested by Aitchison (1982)". From these, you may see that I propose that at least one guy from the inner circle from CoDa should be included in the task view. This could be, e.g. Karel Hron or - in case you need a women: Kamila Facevicova. They could be the CoDa police ;-)
I am sorry to be such critical because despite being critical, I really look forward to such a task view, but my impression is that the current version needs a lot of discussion and re-writings and also needs people from the inner circle (e.g. some of those I mentioned in my points (1) and (7)). |
I am afraid that would be too much for me but I can help in organizing things with the "bioinformatics" part. However, maybe first, I think that Matthias's comments above have to be accounted for. I agree with most of them (but I am not an expert of of CoDa), especially with comment 6 (which is in line with my previous comment) and also with the fact that the description of packages is too long. |
Matthias @matthias-da, thank you for the thorough feedback, this is very much appreciated...and exactly what I would have hoped for. I agree with Nathalie @tuxette that this feedback should be incorporated first. Michail @statlink, Matthias' feedback reflects why we push for a diverse team of co-maintainers. What feels completely obvious and natural for some readers might feel awkward for others. So rather than pushing for one side or the other, we try to make the task view accessible for all sorts of different readers from different backgrounds. Hence, establishing a mixed team is a good idea. |
I agree.
Patrice
Le 29/09/2023 à 13:44, Achim Zeileis a écrit :
…
Matthias @matthias-da <https://github.com/matthias-da>, thank you for
the thorough feedback, this is very much appreciated...and exactly
what I would have hoped for. I agree with Nathalie @tuxette
<https://github.com/tuxette> that this feedback should be incorporated
first.
Michail @statlink <https://github.com/statlink>, Matthias' feedback
reflects why we push for a diverse team of co-maintainers. What feels
completely obvious and natural for some readers might feel awkward for
others. So rather than pushing for one side or the other, we try to
make the task view accessible for all sorts of different readers from
different backgrounds. Hence, establishing a mixed team is a good idea.
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALBUSNKOW2IZUETRUYJS3D3X42YAVANCNFSM6AAAAAA5I3MM2Q>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks @matthias-da . Regarding your points:
regarding your point 3., 5.,..., do you have a proposal for the structure/outline of the view? |
ad 1. I wrote the police with a ;-) The theory is not that simple in compositional data analysis and I outlined pitfalls in the intro. Thus - in my point of view - the ctv would benefit from someone who has outstanding theoretical knowledge (and is using R in daily business and is from the "inner circle" of the compositional data community). In my point of view, somebody from the "Viennese/Czech group (Peter Filzmoser, Karel Hron, Matthias Templ) or/and from the "German group" (best suited from this group is Raimon Tolosano-Delgado) or/and from the "Girona group" (best suited from this group is Javier Palarea-Albaladejo) should be part of it, at least this would be natural looking at their achievements in the field. This doesn't mean that you are not experts, it's just to have somebody on board with the traditional (log-ratio analysis) view. ad 2. Personally, I would create one section "Rounded zeros, structural zeros, count zeros and missing values" and make paragraphs for all these issues. One might also give the name "Prepocessing of compositional data" as an alternative. ad 3. As already written in my point (8), unfortunately, I have no answer to this question, but it should be discussed. Whenever the principles are introduced in the beginning, one probably should give a mark on methods that do not fulfill the three key principles of CoDa (scale invariance, sub-compositional coherence (including subcompositional dominance and ratio preservingness), and permutation invariance). I see this as an open question of how to deal with this. I tend to not discuss this matter in the CTV, because it would involve a deep dive into all methods listed. ad. regarding the outline: I am not sure about the structure. I see several possibilities. One lists packages according to the type of methods (such as regression methods, compositional tables, robust methods, visualization, high-dimensional data, ...), and the other one lists packages (also) based on applicational fields (such as omics science and bioinformatics, chemometrics, ecology, ...). Maybe something like this?
However, there are other methods like cluster analysis, discriminant analysis and classification methods, principal component analysis, and correlation analysis. Why they would be less important than "regression analysis", for example? So should one extend the above list with another (at least) 4 sections on these methods? And why not also have a section on log-ratio (and other) transformations in the beginning? One problem is also maybe that package compositions and robCompositions, for example, could be listed in almost all sections. I think this all needs further discussion, and I am afraid that it might need time to find a good solution. Another idea is to have a similar structure on sections like the sections in the books of CoDa:
|
Dear Achim, dear all,
I started rewriting the task view this week-end, taking in consideration
the useful remarks from Matthias.
This is a side activity for me and I plan to complete the new version by
the end of the week. Please, give me time.
I will also wait for Nathalie suggestions and then add the suggested
packages in the task view.
Let's wait for the second draft to be completed before we seek for new
contributors.
I have to leave and will be the full day out of my office. I will be
able to read your remarks only in the evening.
Best regards to all.
Patrice Kiener
Le 29/09/2023 à 21:01, Matthias Templ a écrit :
…
ad 1. I wrote the police with a ;-) The theory is not that simple in
compositional data analysis and I outlined pitfalls in the intro. Thus
- in my point of view - the ctv would benefit from someone who has
outstanding theoretical knowledge (and is using R in daily business
and is from the "inner circle" of the compositional data community).
In my point of view, somebody from the "Viennese/Czech group (Peter
Filzmoser, Karel Hron, Matthias Templ) or/and from the "German group"
(best suited from this group is Raimon Tolosano-Delgado) or/and from
the "Girona group" (best suited from this group is Javier
Palarea-Albaladejo) should be part of it, at least this would be
natural looking at their achievements in the field. This doesn't mean
that you are not experts, it's just to have somebody on board with the
/traditional (log-ratio analysis) view/.
ad 2. Personally, I would create one section "Rounded zeros,
structural zeros, count zeros and missing values" and make paragraphs
for all these issues. One might also give the name "Prepocessing of
compositional data" as an alternative.
ad 3. As already written in my point (8), unfortunately, I have no
answer to this question, but it should be discussed. Whenever the
principles are introduced in the beginning, one probably should give a
mark on methods that do not fulfill the three key principles of CoDa
(scale invariance, sub-compositional coherence (including
subcompositional dominance and ratio preservingness), and permutation
invariance). I see this as an open question of how to deal with this.
I tend to not discuss this matter in the CTV, because it would involve
a deep dive into all methods listed.
ad. regarding the outline: I am not sure about the structure. I see
several possibilities. One lists packages according to the type of
methods (such as regression methods, compositional tables, robust
methods, visualization, high-dimensional data, ...), and the other one
lists packages (also) based on applicational fields (such as omics
science and bioinformatics, chemometrics, ecology, ...).
Personally, I think the main categories should be built based on the
kind of methods and there could be some extra sections with very
specialized fields (or even subsubmit them in the previous sections
and have 8) High-dimensional data as the last section).
Maybe something like this?
1. General purpose packages
2. Robust methods
3. Rounded zeros, structural zeros, count zeros, and missing values
4. Regression modelling
5. Functional data analysis and probability density functions
6. Contingency tables and compositional tables
7. Visualization (?)
8. Special applications in Omics science and bioinformatics (?)
including high-dimensional data (?)
9. Special applications in ecology (?)
However, there are *other methods* like cluster analysis, discriminant
analysis and classification methods, principal component analysis, and
correlation analysis. Why they would be less important than
"regression analysis", for example? So should one extend the above
list with *another (at least) 4 sections on these methods?* And why
not also have a section on log-ratio (and other) transformations in
the beginning? One problem is also maybe that package compositions and
robCompositions, for example, could be listed in almost all sections.
I think this all needs further discussion, and I am afraid that it
might need time to find a good solution.
Another idea is to have a similar structure on sections like the
sections in the books of CoDa:
* https://link.springer.com/book/10.1007/978-3-642-36809-7#toc (a
bit too less sections, but anyhow good to look at it).
* https://link.springer.com/book/10.1007/978-3-319-96422-5#toc (this
would be my choice (but I am biased here) because of a bit more
sections, thus maybe a good start)
* https://www.routledge.com/Compositional-Data-Analysis-in-Practice/Greenacre/p/book/9781138316430
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALBUSNPDZGJ5LXOXUFZEI5TX44LHHANCNFSM6AAAAAA5I3MM2Q>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Patrice @pkR-pkR thanks for this! There's no rush. |
@pkR-pkR : No rush indeed. Since Matthias has suggested deep modifications, tell me when you have a first version and I'll make my suggestion on that basis (next week at best probably). |
@pkR-pkR : There is no activity in this discussion since last October. There is no rush but I'm checking if you still plan to submit this proposal? |
Alternative: I can imagine to completely re-write from scratch this ctv together with Raimon Tolosana-Delgado and Javier Palarea-Albaladejo. Both are experts in compositional data analysis and R and well-known in the community. What do you think? |
From the viewpoint of the CRAN Task View Editors it would be best if the different approaches to this topic could be resolved unanimously - with contributors from both sides! So maybe - now that some time has passed since the original proposal - you can coordinate a revision that you do jointly and that encompasses ideas from both sides? That would be much preferred over a decision between two different teams of co-maintainers with different ideas. |
Agree. I offered my participation as well as I listed the other suggestions of potential co-authors in October and it is still surely a good way to do so. Best |
Thanks, Matthias, very much appreciated! |
Michail @statlink and Patrice @pkR-pkR, we haven't had any update from you in almost a year. Hence, it's time to close it. Matthias @matthias-da, if you still want to propose something on the same topic, feel free to create a new issue. If you do so, I would ask you to consider including some of the ideas of Michail, Patrice, and Christophe. |
Achim @zeileis, I would then proceed with some co-authors and specialists in the field mentioned earlier. However, we surely need 1-2 months from now to have a version to share. |
Matthias, ok, good, thanks for the follow-up. Just open a new issue for the proposal when you are ready to do so. In the issue please also briefly discuss how you incorporated the ideas from this first proposal. Thanks! |
Hello,
I would like to propose a new CTV named CompositionalData. The CTV is about packages dedicated to compositional data analysis.
The relevant github link is
https://github.com/statlink/CompositionalData
Michail Tsagris
The text was updated successfully, but these errors were encountered: