-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting pData factor levels #297
Comments
Because each column of |
I certainly use it a lot to order samples by experiment variables, it would be great if we could keep this capability. |
Do you have to use factor?Ordering doesn't necessarily require that. |
How would you do it? I need the ordering to be retained by the GatingSet and it should not affect the sample order. |
I don't fully understand your intention. Please give a concrete use case showing that character isn't sufficient and factor is necessary |
@DillonHammill Default factors are not ordered. Do you mean that you use |
Well, factor levels always have a default ordering, usually dictated by the values as they appear in the |
No, they are alphabetical:
|
Oh, right, by default. For some reason I was thinking I ran in to an exception to that, but it must have been manually reordered. But anyway, if for some reason he has a method that arbitrarily reorders the levels then that would be a problem if it doesn't persist. |
Basically, even if a factor is not ordered (as in the the ordered flag is not set to true and the factor levels should not be considered ordinal), default group ordering information for tables/plots/summaries can still be preserved (which may be related to the use case here). That is
But in order to preserve that behavior in the |
Thanks for looking into this, as @jacobpwagner mentioned I primarily use this for summaries and plots. For example, in my plotting function
I also use |
That's kind of what I suspected. @mikejiang , maybe we could add representation of the
of course, this starts to look a bit like we're re-inventing the wheel of |
Just documenting some of our offline discussions: 2 possible approaches: 1. The distributed approach:
do something like this:
This extra information would then also have to be written out to the Pros:
Cons:
2. The centralized approach: Pros:
Cons:
We can discuss run-time efficiency as well, but the crux of the issue may be where the metadata should live. In some ways it's nice to have it attached to the H5 for each sample's cytoframe, but maybe it doesn't make sense to store group-level information (like factor levels) with individual samples. |
I favor the first approach. |
@jacobpwagner, just letting you know that I have a workaround for ordering things now. This is no longer essential. Users can just specify the ordering in a list when plotting etc and set factor levels on the exported stats prior to plotting with ggplot2. It may actually be beneficial to not change this as then we don't have to worry about differing column classes. |
Thanks for the update @DillonHammill. This was still on my list, but it's good to know it's not urgent for your use case. |
Thanks for using the NEWS.md to track changes, this is very useful. |
Hello, I'd just like to chime in and add support to this use case. I played around with CytoExploreR (thanks @DillonHammill by the way for your development work on this - of course, this extends to the whole cyto suite) and had to update to the development version of flowWorkspace for that purpose. I really like to use ggcyto in conjunction with |
Just updating with a note here that it would also be helpful to store other sorts of type information attached to the pheno data stored at the
Also, we can not assume all columns that are capable of conversion to numerical types when loaded back in to R should in fact be converted. For example, subject IDs often have formatted digit strings that are not necessarily supposed to be converted in to simple numeric types. If the character id stored is "001", type conversion would succeed but make it just 1 (integer). Similarly, subject IDs with decimals like "1.24" implying arm 1 subject 24 would be converted to doubles. So, point being, we ultimately will likely need type information attached to the pheno data stored at the C++ level in |
I arrived here because I had the issue that To my understanding Maybe you could improve the |
Hi @mikejiang & @jacobpwagner ,
I am having trouble setting factor levels of pData variables using the latest flowWorkspace. I used to be able to do this:
Now the column retains the original class and is not converted to a factor.
Any idea why this is happening?
The text was updated successfully, but these errors were encountered: