Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved colour themes from new data sources #419

Open
jrosindell opened this issue Oct 7, 2021 · 7 comments
Open

Improved colour themes from new data sources #419

jrosindell opened this issue Oct 7, 2021 · 7 comments

Comments

@jrosindell
Copy link
Member

These could be from the advanced search or from other features.

For example, we could add a pin on the tree itself and/or colour all branches descending from a higher taxa if that higher taxa was selected for highlight.

@jrosindell
Copy link
Member Author

See also #396 and #182

@jrosindell
Copy link
Member Author

After some discussion with @hyanwong and with possible users of this feature I wanted to open a discussion about how it could work and specify a small project around it that can be costed.

It is clear that the most sensible approach for highlighting regions of the tree depends on how simple it is to define those regions, and also on how much of the tree is involved in the highlights. For simple cases where only one species or clade (possibly including all direct ancestor nodes) is to be highlighted it's relative easy to pass around the information defining that region and the tree can be expanded to include the highlighted area. What we're interested in scoping out here is a solution for highlighting any generic region of the tree e.g. a species list consisting of tens of thousands of species.

One approach is to expand the database with another field for leaves and nodes to tell if they are in this region or not, but that's really very ugly, obliges OneZoom to play a big role in defining the regions (which may be managed by and for third parties) and doesn't scale well to cover large numbers of regions with different use cases. That's out but I'm recording the possibility as part of the though process.

The solution suggested is to have a string of digits (rather like the tree topology and cut map) that lines up with ordered leaves and ordered nodes and tells us immediately, for every leaf and node, if it is within the highlighted region or not. These stings can be stored in compressed files (I'll call it a tree region file) rather like the tree topology file we already have - this makes them static and easy to load and express on a tree view. The downside is that tree region files will need to be remade whenever the tree itself is remade as the leaf and node positions will change. We'd need unit tests to make sure it all works so that we never mess up because a small mistake in the data could easily lead to non sensical highlights that bear no resemblance to the intended region of the tree. Finally, if we later had more than one tree (e.g. a bigger tree including extinct species and maybe even a third tree including selected extinct species but not all) then we'd need three versions of these highlighted area static files and would need to serve the correct one to the visualisation.

In terms of the graphic approach to highlighting, I think what we have now for advanced search works OK but there should be an option to add flags to highlighted species as well and there should be an option to fully colour the tree by these highlights (rather than allow any colour scheme and attempt the highlights over the top as it is now for advanced search and common ancestor marking). I guess that graphical part is probably reasonably clear and well defined.

We'd also need some kind of JSON format for the source information defining the tree region (I'll call it a tree region source) from which the tree region files can be built - I think it's fine just to make the tree region source an ordered list of OTT IDs with a pair of flags on each to say that you include (or not) all descendants / ancestors of that OTT in the region as well.

The most thorny issue, and the part that Yan and I had most difficulty specifying is the process of creating and updating the tree region files which need to get rebuilt whenever the tree region source from the third party changes and also whenever the tree is rebuilt. We'd need to create an API to which you pass the URL of a JSON tree region source and get back a region map file. One key issue there is how tree region files would get cached and the extent to which we need to manage that process to make it performant vs. let the server do it all itself. Should it be that we store all the tree region source files and rebuild all the tree region files as static files whenever the tree is built? Should it be that we don't store the tree region sources but do store the tree region files on the OneZoom server? Or should it be that we provide a service to create the tree region files and let third parties store them / rebuild them as needed.

@jrosindell
Copy link
Member Author

Scoping out this further for a specific project.

  • Binary mappings to leaves and branches which tells if they are highlighted or not based on a static file like the cut map which is specific to the tree and includes a check to prevent it from being displayed
  • A system to make these mappings for a tree based on a list of OTTs that we store or that comes from an external URL the system would highlight the OTTids listed any any ancestors of those IDs
  • A system for storing those mappings as static compressed objects with some information about them which enables us to organise multiple mappings and multiple trees. Not sure if the information about the mappings should be in a database but maybe that's going too far and hard coded is enough.
  • Integrate the building of these mappings into the tree building pipeline and permit the mappings to be updated (perhaps by cron job) even when the tree hasn't been rebuilt.
  • We'd need to be able to have more than one binary mapping displayed simultaneously (though not that many) this part should be alright as our highlighting system already enables multiple highlights on branches
  • We would need to revamp the colouring system as described in Tidy up tree colouring styles #695
  • The highlighted regions would need to be accessible from user settings, URL and tours (probably this is easy as all three work from the URL)
  • The museum display kiosk version should support the same features as well

@lentinj
Copy link
Collaborator

lentinj commented Jul 19, 2024

Binary mappings to leaves and branches which tells if they are highlighted or not

Highlighting a bunch of individual things (vs. things and their descendents) has come up already, but a binary map would only make sense if you're highlighting a significant proportion of the tree, which seems unlikely (but I could be wrong).

Having an external URL to define the list of OTTs (vs. stuffing in an incredibly long URL) sounds sensible, even if it creates some weirdness when that list is refreshed. Presumably it would need to be Pinpoints rather than OTTs to cope with highlighting nodes without an OTT?

Caching the results of the Pinpoint -> OZid API lookup (possibly manually rather than via. NGINX so we have more control of expiry) would be a lot simpler than building this into the tree-building pipeline, and potentially benefit other parts of the site too where we do bulk Pinpoint lookups.

@lentinj
Copy link
Collaborator

lentinj commented Jul 22, 2024

We think this is actually a colour scheme, not a highlight scheme (rule of thumb: A colour scheme is a data visualisation tool, a highlight is a mechanism to find things).

We want a colour scheme that, based on a list of OTTs (non-OTT nodes won't be supported), we can assign a bunch of "traits" to a node/leaf, and then map these to a colour (or CBF colour), for both a branch and the point itself.

In our case the traits would inherit, as highlights do, and apply to the branches, but that may not be the case.

Here we're looking at 50k OTTs, but that could be higher (imagine the IUCN colour scheme being implemented through the same mechanism).

@lentinj
Copy link
Collaborator

lentinj commented Jul 22, 2024

There's 2 places the inheritance logic could sit:

  • If we have a byte array we can use the cut map locations and get the traits of all descendants. Then have a set of configurable algorithms for summarising to the branch / interior node colour.
  • precompute a node's colour traits this when walking the tree in src/projection, as we currently do with highlights. Then the algorithms can just recurse upwards / downwards in a similar fashion (if we don't share the entire highlight code)

@lentinj lentinj changed the title Improved highlighting of species and groups Improved colour themes from new data sources Jul 26, 2024
@lentinj
Copy link
Collaborator

lentinj commented Jul 26, 2024

@jrosindell On a leaf we put the IUCN status in text once you zoom in enough. Of course this makes sense for the default extinction risk colour scheme, but if you select "popularity", should we actually be displaying the relative popularity in that place, and similar for other colour schemes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants