Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate id–axis pairings should only throw errors when found within PANELs #65

Closed
liupfskygre opened this issue Aug 17, 2020 · 8 comments
Labels

Comments

@liupfskygre
Copy link

Hi,
I run into this error with ggalluvial 0.12.1 by running following code, which is not the case a few days ago and could not figure out why.

summarized_arch_for_alluvial<-read.delim("alluvia.txt",header=T, check.names = FALSE)

is_alluvia_form(summarized_arch_for_alluvial, weight = "mean") 
#>TRUE

color_gene<-c("#6A51A3","#9E9AC8","#7F2704","#D94801","#FD8D3C","#FDD0A2","#08519C","#4292C6","#9ECAE1","#DEEBF7","#00441B","#238B45","#74C476")

ggplot(data = summarized_arch_for_alluvial, aes(x = Month, y = mean, alluvium =gene_type_META )) +geom_alluvium(aes(fill = gene_type_META, colour = gene_type_META),alpha = .75, decreasing = FALSE) +theme_bw()+scale_fill_manual(values=color_gene) + scale_color_manual(values=color_gene)+theme(axis.text.x = element_text(angle = -45, hjust = 0),text = element_text(size=16)) +facet_grid(Depth~Eco_sites, scales = "fixed") +theme(legend.position="bottom")+theme(legend.text = element_text( size = 14))

#error info
Error in f(...) :
Data is not in a recognized alluvial form (see help('alluvial-data') for details).

#testing
is_alluvia_form(summarized_arch_for_alluvial)
Missing alluvia for some stratum combinations.
[1] TRUE

but when I run the example code here: https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html

data(Refugees, package = "alluvial")
country_regions <- c(
  Afghanistan = "Middle East",
  Burundi = "Central Africa",
  `Congo DRC` = "Central Africa",
  Iraq = "Middle East",
  Myanmar = "Southeast Asia",
  Palestine = "Middle East",
  Somalia = "Horn of Africa",
  Sudan = "Central Africa",
  Syria = "Middle East",
  Vietnam = "Southeast Asia"
)
Refugees$region <- country_regions[Refugees$country]
ggplot(data = Refugees,
       aes(x = year, y = refugees, alluvium = country)) +
  geom_alluvium(aes(fill = country, colour = country),
                alpha = .75, decreasing = FALSE) +
  scale_x_continuous(breaks = seq(2003, 2013, 2)) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = -30, hjust = 0)) +
  scale_fill_brewer(type = "qual", palette = "Set3") +
  scale_color_brewer(type = "qual", palette = "Set3") +
  facet_wrap(~ region, scales = "fixed") +
  ggtitle("refugee volume by country and region of origin")

all fine (strange)

#testing
is_alluvia_form(Refugees)
Missing alluvia for some stratum combinations.
[1] TRUE

since my dataset had a similar structure with the example dataset, not sure what is going wrong here.

#> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] remotes_2.2.0 vegan_2.5-6 lattice_0.20-41 permute_0.9-5 RColorBrewer_1.1-2 colorspace_1.4-1
[7] ggalluvial_0.12.1 phyloseq_1.30.0 hrbrthemes_0.8.0 viridis_0.5.1 viridisLite_0.3.0 forcats_0.5.0
[13] stringr_1.4.0 dplyr_1.0.1 purrr_0.3.4 readr_1.3.1 tidyr_1.1.1 tibble_3.0.3
[19] tidyverse_1.3.0 ggplot2_3.3.2 alluvial_0.1-2

#example file here
alluvia.txt

thanks!

@corybrunson
Copy link
Owner

corybrunson commented Aug 17, 2020

Hi @liupfskygre and thank you for raising the issue. I'm able to replicate the problem: Your code runs fine in v0.11.3 (the most recent before v0.12.0), but it fails in v0.12.0 as well as v0.12.1. You can revert to v0.11.3 for the time being as follows:

remotes::install_version("ggalluvial", version = "0.11.3")

In fact, the problem seems to have arisen at commit d31ff48. As of this commit, is_lodes_form() (which is used inside the stats) returns FALSE if any combination of id and axis appears in more than one row of the data—that is, if any alluvium flows through the same axis more than once. It is this checking function, not is_alluvia_form(), that corresponds to setting x, alluvium, and stratum in the aes() part of the ggplot() call, which is why the problem is not detected in the above code.

However, i see what you're trying to do: The same alluvia flow across the same axes in different facets, so in fact there's no conceptual problem with translating the data into an alluvial plot. Rather, the problem is that the check that no duplicate idaxis pairings exist is performed without first grouping the data by the faceting variable. You're right that it's a bug. I'll come back to this within a couple of weeks and send a patch to CRAN ASAP.

@corybrunson corybrunson changed the title Data is not in a recognized alluvial form (see help('alluvial-data') for details). duplicate id–axis pairings should only throw errors when found within PANELs Aug 17, 2020
@liupfskygre
Copy link
Author

liupfskygre commented Aug 18, 2020 via email

@corybrunson
Copy link
Owner

@liupfskygre a patch is on its way to CRAN as v0.12.2. I verified that it works on the example you shared, but i should have also asked you if you would give it a try on your original problem. If you have the change, would you install from main and see if the problem is resolved? Here's how to install if the patch is not on CRAN yet:

remotes::install_github("corybrunson/ggalluvial")

@epjungd
Copy link

epjungd commented Feb 24, 2022

Hello!
I'm having trouble doing a plot. I want to have 3 axis, as shown in the plot:
alluvial_loans_fdi1

I did this plot and it worked, but when I use a dataset in the long form (as suggested in issue #72) the plot doesn't show the alluvia.

This is a part of my data: top30companies.csv

This is my code:

library(tidyverse)
library(ggplot2)
library(reprex)
library(ggalluvial)
library(readxl)

top30companies_reprex <- read_csv("top30companies.csv")

top30companies_reprex$variable <- factor(top30companies_reprex$variable, levels = c("corporate_name", "country_name", "bank_name"))

top30companies_reprex$variable2 <- factor(top30companies_reprex$variable2, levels = c("Company", "Country", "Bank"))

is_lodes_form(top30companies_reprex,
key = "variable",
value = "value",
id = "group_strata")

ggplot(top30companies_reprex,
aes(x = variable2,
stratum = value,
alluvium = group_strata,
fill = value,
y = freq,
label = value)) +
geom_flow(stat = "alluvium") +
geom_stratum(na.rm = TRUE) +
guides(fill = FALSE) +
geom_fit_text(stat = "stratum",
width = 1/4,
min.size = 3,
reflow = T,
grow = T)

This is the plot without the alluvia:
alluvial_reprex

I dont know how to go on, maybe the problem is related to the warning message that I get when I use the is_lodes_form command to check my data frame: "Missing id-axis pairings (at some sites)."

I would appreciate any kind of help!
Thanks in advance

@corybrunson
Copy link
Owner

Hi @epjungd, thank you for the very clear description of the problem. I cannot get to it right now but should have time within a week.

I see why you commented on this issue rather than open a new issue. Depending on how it plays out, i might ask to make this a new issue.

@corybrunson
Copy link
Owner

Hi @epjungd, i hope i've resolved the issue. Some alternative code is below, with changes commented to explain what i did.

Basically, i realized that the column "group_strata" took a unique value for each row, making it impotent for the alluvium aesthetic, since the alluvium should be the identifier that links values taken at different axes. It looked like "unique_alluvium_entries" was better-suited for this role, but it failed the is_lodes_form() test due to the presence of duplicate pairings of the alluvium identifier and the axis aesthetic "variable". There turned out to be only one duplicated identifier, however, so i removed it from the database and fed the result into your ggplot() call, with only the aforementioned aesthetic specifications changed.

Please let me know if this is not what you're after!

library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
#> Warning: package 'dplyr' was built under R version 4.1.2
library(ggplot2)
library(reprex)
library(ggalluvial)
library(readxl)
# attach library for fit-text geom
library(ggfittext)
# set working directory to ggalluvial local repo
setwd("~/Documents/software/R/ggalluvial/")

top30companies_reprex <- read_csv("sandbox/issues/top30companies.csv")
#> Rows: 351 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (4): variable, value, variable2, group_strata
#> dbl (2): unique_alluvium_entries, freq
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

top30companies_reprex$variable <- factor(
  top30companies_reprex$variable,
  levels = c("corporate_name", "country_name", "bank_name")
)

top30companies_reprex$variable2 <- factor(
  top30companies_reprex$variable2,
  levels = c("Company", "Country", "Bank")
)

# check whether using `unique_alluvium_entries` as id satisfies lodes form
is_lodes_form(top30companies_reprex,
              key = "variable",
              value = "value",
              id = "unique_alluvium_entries")
#> Duplicated id-axis pairings.
#> [1] FALSE
# how many duplicated id-axis pairings are there?
top30companies_reprex %>%
  count(variable, unique_alluvium_entries, name = "count") %>%
  count(count)
#> # A tibble: 2 × 2
#>   count     n
#>   <int> <int>
#> 1     1   345
#> 2     2     3
# which `unique_alluvium_entries` appear twice at any axis?
top30companies_reprex %>%
  group_by(unique_alluvium_entries) %>%
  add_count(name = "count") %>%
  filter(count > 3L) %>%
  select(unique_alluvium_entries, variable, count)
#> # A tibble: 6 × 3
#> # Groups:   unique_alluvium_entries [1]
#>   unique_alluvium_entries variable       count
#>                     <dbl> <fct>          <int>
#> 1                      33 country_name       6
#> 2                      33 corporate_name     6
#> 3                      33 bank_name          6
#> 4                      33 corporate_name     6
#> 5                      33 country_name       6
#> 6                      33 bank_name          6
# remove 33 from consideration and render the alluvia plot
top30companies_reprex %>%
  filter(unique_alluvium_entries != 33) %>%
  ggplot(aes(x = variable2,
             stratum = value,
             alluvium = unique_alluvium_entries,
             fill = value,
             y = freq,
             label = value)) +
  geom_flow(stat = "alluvium") +
  geom_stratum(na.rm = TRUE) +
  guides(fill = FALSE) +
  geom_fit_text(stat = "stratum",
                width = 1/4,
                min.size = 3,
                reflow = T,
                grow = T)
#> Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
#> "none")` instead.
#> Warning: Removed 2 rows containing missing values (geom_fit_text).

Created on 2022-02-25 by the reprex package (v2.0.1)

@epjungd
Copy link

epjungd commented Feb 28, 2022

Thank you very much, I could finally get my plot with your indications!

I leave you the final plot that I was trying to do :)

alluvial_loans_fdi6

@corybrunson
Copy link
Owner

You're welcome! And that is an intense plot! Feel free to raise a new issue if you encounter a new problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants