Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster ordering issues #21

Closed
RvV1979 opened this issue Jul 2, 2024 · 3 comments
Closed

cluster ordering issues #21

RvV1979 opened this issue Jul 2, 2024 · 3 comments

Comments

@RvV1979
Copy link

RvV1979 commented Jul 2, 2024

I am using mapmixture on a data set with more than 9 clusters and find there are some issues with cluster ordering and colors. Below is a minimal example illustrating the issues.

# prepare admixture dataframe with site1 mainly  cluster1, site2 mainly cluster 2, and site 3 mainly cluster 3
admixture_df<-data.frame(Site=c("site1","site2","site3"),
                 Ind=c("ind1","ind2","ind3"),
                 Cluster1=c(0.9,0.05,0.05),
                 Cluster2=c(0.02,0.8,0.05),
                 Cluster3=c(0.02,0.05,0.7),
                 Cluster10=c(0.02,0.05,0.1),
                 Cluster11=c(0.04,0.05,0.1)
                 )
cluster_cols=c("orange","skyblue","green4","gold","darkblue")
structure_plot(admixture_df = admixture_df, type="structure", legend = "right", cluster_cols=cluster_cols)

However, the resulting structure plot shows something different, namely that site2 is mainly cluster 10 and site3 cluster 11. This is incorrect; see figure below:
structure_plot
When I plot the admixture percentages on a map, the cluster assignments are correct. However, the colors do not match the order in which they are defined (for example, I intended cluster2 to be skyblue but it is gold); see code and figure below:

coords_df<-data.frame(Site=c("site1","site2","site3"), Longitude=c(5,15,25), Latitude=c(10,20,30))
mapmixture(admixture_df = admixture_df, coords_df = coords_df, cluster_cols=cluster_cols, expand = T)

mapmixture
What I think happens is that the clusters are internally ordered based on their names in a strict (i.e., not natural) sense as can be seen in the legend of the mapmixture plot where clusters 10 and 11 come before 2 and 3. In the mapmixture plot, this means that colors are not applied in the expected order. I believe that same internal ordering happens in the structure plot but for some reason, the cluster names in the legend are in the original order causing the cluster assignments to be incorrect.

Thanks

@RvV1979
Copy link
Author

RvV1979 commented Jul 2, 2024

I have found that adding [order(colnames(admixture_df[-c(1,2)]))] to the mapmixture command fixes results in associated colors as intended:
mapmixture(admixture_df = admixture_df, coords_df = coords_df, expand = T, cluster_cols=cluster_cols[order(colnames(admixture_df[-c(1,2)]))])
mapmixture_fixed
However, the legend is still not in the intended (natural) order.

Tom-Jenkins added a commit that referenced this issue Jul 2, 2024
@Tom-Jenkins
Copy link
Owner

Hello @RvV1979,

Thank you for pointing this out, I think this was indeed a bug which happened when the cluster columns have numbers that automatically order unexpectedly. I have pushed a fix up to GitHub to hopefully address this.
e2b9857

Please could you re-install the development GitHub version and re-try your original code. Let me know if this works on your subset and your full data.

Thanks, Tom

@RvV1979
Copy link
Author

RvV1979 commented Jul 2, 2024

Thanks for the quick reply and fix. I did some tests and everything seems to work as expected.

@RvV1979 RvV1979 closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants