Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line plot master ticket: markers and connection lines #602

Open
danicahelb opened this issue Mar 14, 2023 · 3 comments
Open

Line plot master ticket: markers and connection lines #602

danicahelb opened this issue Mar 14, 2023 · 3 comments

Comments

@danicahelb
Copy link

danicahelb commented Mar 14, 2023

This is the master ticket for related line plot tickets VEuPathDB/web-eda#1411, VEuPathDB/web-eda#1519 and VEuPathDB/web-eda#1082, as fixes to one issue are having downstream affects and creating other issues.

I will use this hypothetical example:

X- axis variable Timepoint

  • Values in the full dataset = 1, 3, 4, 5, 6, 7 (There is no Timepoint = 2)
  • Values in the subset = 1, 4, 5, 6, 7 (There is no Timepoint = 2 in the full dataset, and Timepoint = 3 has been removed from the subset either by directly filtering on Timepoint or indirectly by filtering on other variables in the dataset)

Y-axis variable Malaria

  • Values in the full dataset = Yes, No, Not tested
  • At Timepoint=4, the study protocol indicated that no participants were tested for malaria (so all participants have Malaria = Not tested at Timepoint = 4)

This is what we want:

  1. No markers
  • There should be no markers whenever there is no data for a given X-axis value in the entire dataset NOR whenever there is no data for a given X-axis value in the subset, regardless of how the y-axis proportion is configured
  • Ex: no markers at Timepoint = 2, 3. Lines should connect Timepoint 1 to Timepoint 4.
  1. Filled markers
  • There should be filled markers whenever there IS data for a given X-axis value as long as there is at least 1 value for the Y-axis in the subset AND the Y-axis proportion contains all possible values in the denominator
  • Ex: no markers at Timepoint = 2, 3 and Filled markers at Timepoints 1, 4, 5, 6, and 7 when Malaria proportion = Yes/(Yes+No+Not tested). Lines should connect Timepoint 1 to Timepoint 4 to Timepoint 5 to Timepoint 6 to Timepoint 7
  1. Hollow markers
  • There should be hollow markers whenever there IS data for a given X-axis value as long as there is at least 1 value for the Y-axis in the subset BUT the Y-axis proportion does NOT contain all possible values in the denominator
  • Ex: no markers at Timepoint = 2, 3 and Hollow markers at Timepoint = 4 and filled markers at Timepoint= 1, 5, 6, and 7 when Malaria proportion = Yes/(Yes+No). There should be a break in the line whenever there is a hollow marker. So the filled marker at Timepoint = 1 would NOT be connected to Timepoint 4, and Timepoint 4 would not be connected to Timepoint 5. But a line would connect Timepoint 5 to Timepoint 6 to Timepoint 7
@danicahelb
Copy link
Author

danicahelb commented Mar 14, 2023

VEuPathDB/web-eda#1411 was created because markers were appearing at X-axis values that were not in the subset, which made it look like the Y-axis proportion for these X-axis values was equal to 0 instead of not having any rows of data for these X-axis values

VEuPathDB/web-eda#1519 was created because a connecting line should be drawn through the markers. Lines should be drawn even in instances where an X-axis value has data in the full dataset but does NOT contain a marker because it does not have data in the subset (In that case, the line should be drawn between the 2 markers immediately adjacent to the X-axis value that does not have data in the subset).

Unrelated to these tickets, VEuPathDB/web-eda#1082 was created because markers for X-axis values that were IN the subset but did not have any Y-axis values that were included in the proportion calculation (ie, had an undefined proportion, 0/0) looked like they had a proportion equal to 0. we wanted some sort of marker at these X-axis values to indicate that the subset does contain data for that X-axis value and the proportion configurations do not remove this data from the subset. Connecting lines should NEVER go through hollow markers. If 2 filled markers are separated by a hollow marker, there will be a break in the line

@danicahelb
Copy link
Author

danicahelb commented Mar 14, 2023

the Y-axis proportion does NOT contain all possible values in the denominator

This means the same thing as:

the Y-axis proportion does NOT contain data for any selected values in the denominator

For example, y values are A, B, C and the proportion is configured as A/(A+B) instead of A/(A+B+C).

Since the numerator cannot be a superset of the denominator, C is forced to not be in the numerator as well as the denominator. Therefore rows with y=C will be excluded from the plot (though they remain in the subset).

There can be some values of x where y is only ever equal to C. These points are y=0/0

These points where y=0/0 are given hollow markers and no connecting line through them

The only time hollow unconnected points are ever used is where y=0/0

@bobular
Copy link
Member

bobular commented Mar 14, 2023

Hi @danicahelb - thank you so much for consolidating everything here.

I think there could be some more discussion either in EDA UX or data viz about breaking the lines.

We may want different behaviour depending on direct vs. indirect filtering (e.g. the new filter-aware behaviours we've talked about)

  • directly filtered on the x-axis variable: join the lines (because we assume the user knew what they were doing)
  • x-axis variables drop out as a side effect of another filter: make gaps in the line (to alert users to the missing data?)

Not 100% sure of the reasoning myself, and only briefly discussed with Danielle, so I suggest it goes to committee!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants