Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[usavian] Data sources: bird ranges, distributions, other information #2

Open
3 of 15 tasks
trashbirdecology opened this issue Dec 6, 2019 · 8 comments
Open
3 of 15 tasks

Comments

@trashbirdecology
Copy link
Collaborator

trashbirdecology commented Dec 6, 2019

CLICK ME FOR LIST OF DATA SOURCES HERE

  • (✔) symbol indicates the data was added to /data/data_sources.csv

Below is a list of A list of desired data sources for bird ranges and distributions:

Data warehouses

Taxonomy

@trashbirdecology trashbirdecology assigned krburgio and unassigned krburgio Dec 6, 2019
@trashbirdecology trashbirdecology changed the title Identify potential and get bird range, distn layers Identify bird ranges and predicted distributions shapefiles Dec 6, 2019
@krburgio
Copy link
Collaborator

krburgio commented Dec 7, 2019

Requested shapefiles for all North American birds through the Birdlife Internation data portal - estimated time of response is 5-10 business days. I will let you know when I hear something back!

@trashbirdecology
Copy link
Collaborator Author

brilliant

@trashbirdecology
Copy link
Collaborator Author

trashbirdecology commented Jan 6, 2020

any news, @krburgio, from BirdLife on adapting the files?

@krburgio
Copy link
Collaborator

krburgio commented Jan 7, 2020

He never replied to my email. I will just submit a new request with the caveats that we will not be publishing their shapefiles on our larger map.

@skybristol
Copy link

I apologize, in advance, if I'm overloading this one issue with too much additional complexity.

Related to #11, data restrictions (as part of a license or not) are a huge part of the metadata we need to capture behind all of these sources. This is a messy world where many data providers have not gone through the process of formalizing an actual license for their product but have provided some language about what their desires are for usage. NatureServe has been a "great" example of that where they have language that is quite restrictive for information pulled from their API. It's in the language for their API key agreement, but they haven't gone about designating an actual license.

For something like BirdLife (or any third party source this system will use), we need to determine what we are going to use this source for and make sure we nail down whether or not that can be done legitimately. Simple redistribution (i.e., serving up a WMS to show visually on a map) is often something that data owners balk at because they worry about a number of things (loss of credit, inability to capture full metrics on access, problems with versions getting out of whack, etc.). But it may be okay for us to create, use, and distribute our own value-added derivative product following whatever stipulations have been specified by the owner.

If this is just a matter of pointing a management user simply looking for "raw" information products they may find of interest about particular species, then our information system just needs to have a metadata pointer that is discoverable because we connect the dots to related information. The usage mode is then someone discovering the metadata on our end and going and visiting BirdLife to get the referenced product. But I'm guessing this will be more a matter of using the corpus of BirdLife distributions with other sources to provide an interface that takes a user's point of interest (geographic area, management objectives, ecological disturbance regime, etc.) and deriving an aggregate report on species potentially of interest to the subject input vector.

The best case would be one where BirdLife (or any of our sources) is providing an interface on their end that lets us work with their data dynamically to get the answers we want. That way, we just hit their service with some code and process the response we get. But in a whole lot of cases, we likely need the ability to acquire the digital data in whatever form we can, spin them up on our own infrastructure, and provide an interface that drives our applications. That interface is a derivative product, and we just need to make sure what we need from it is consistent with whatever usage stipulations are in effect. Since we will be following a principle of openness and transparency in what we put out, we also need to make sure that we are able to provide a clear provenance trace back to the original source and through any steps we've taken to build and provide the derivatives.

@trashbirdecology
Copy link
Collaborator Author

trashbirdecology commented Jan 8, 2020

@skybristol

Simple redistribution (i.e., serving up a WMS to show visually on a map)

What is WMS?


But it may be okay for us to create, use, and distribute our own value-added derivative product following whatever stipulations have been specified by the owner.

Yes, I would like to ensure that by combining existing data (e.g., BirdLife, BBS) that we can legally use the synthesized data product in our work here.


If this is just a matter of pointing a management user simply looking for "raw" information products they may find of interest about particular species, then our information system just needs to have a metadata pointer that is discoverable because we connect the dots to related information.

To clarify, the idea here would be that the end-user might benefit from having a simple base layer which contains species ranges/distributions (the synthesized product of e.g. BirdLIfe etc.) over top of the 'bulk' of the work (the conservation related things).


The best case would be one where BirdLife (or any of our sources) is providing an interface on their end that lets us work with their data dynamically to get the answers we want.

This sounds like a long-term goal

@skybristol
Copy link

What is WMS?

https://www.opengeospatial.org/standards/wms - Essentially putting a picture of map data on a map interface. If you poke around at BirdLife a bit, you'll see they are serving their species distributions via a Geoserver here, http://birdlaa8.miniserver.com/geoserver. That gets at your last point about it being a long term goal. If their Geoserver infrastructure is robust enough (it's on commercial cloud provider in the UK, but no idea what the machine is configured to handle), you could drive much of what you might want USAvian to do directly from the same services they have online. However, they don't advertise its existence, referring instead to a geodatabase file, which I'm guessing is their preferred distribution method.

To clarify, the idea here would be that the end-user might benefit from having a simple base layer which contains species ranges/distributions (the synthesized product of e.g. BirdLIfe etc.) over top of the 'bulk' of the work (the conservation related things).

Is your intent then to provide a visual interface that shows a user, here's your area of conservation interest and a visual depiction of the modeled distribution of species that may occur in that area? By "synthesized product" do you mean something like a species richness map based on multiple distributions? Or is it more about a report based on using the data for a calculation of some kind like species in the area, spatial area represented in the potential distribution, stats on the species (IUCN status, FWS status, etc.)?

This sounds like a long-term goal

Again, it's not necessarily a long term thing, but many groups that provide data like this are still working in the mode of, "We have a web app and stuff behind it like a Geoserver to show stuff in our context, but if you want to use our data, here's a download you can get from us with permission and associated stipulations on use." The audience considered for these cases is mostly always the individual researcher, the analytical work they are doing, and a paper they are going to publish. It's not usually for the user who is going to take all/most of the data, put them together with other data, and build some regular use application somewhere else. They may not have a problem with it, but they haven't set up the infrastructure or the legal framework to support that use.

In this case, they are essentially laying out the stipulations of CC-BY-NC or something similar and may not have a problem with what you are laying out. But I'm sure they would not want you to put copies of their geodatabase files online where someone else could find and download them, bypassing their request form and opportunity to know about who's accessing their product. (Even though anyone could figure out their Geoserver address and write code to do that now.) For our own purposes, USGS could obtain their data legitimately via the process you already set in motion, spin them up on our own Geoserver, and use that for our applications (dynamically generating species richness maps, etc.). But we would want to make sure that was completely understood and sanctioned by BirdLife and put online in a way that guided users of our derivative products back to BirdLife if they wanted to do something different, disabling the ability for our services to become a proxy for someone to bypass their system.

P.S. Sorry if I'm spouting off about a bunch of stuff you already know and are thinking about.

@trashbirdecology
Copy link
Collaborator Author

Is your intent then to provide a visual interface that shows a user, here's your area of conservation interest and a visual depiction of the modeled distribution of species that may occur in that area?

Yes, the primary goal is the 'conservation network' itself. By having the taxonomic distributions/filtering system, one can then visualize the network as relative to the specie(s) of interest.

By "synthesized product" do you mean something like a species richness map based on multiple distributions?

Yes, rather than being a tool which just re-maps existing products (e.g. BirdLife distributions), we could provide a simple range/distribution map.

Or is it more about a report based on using the data for a calculation of some kind like species in the area, spatial area represented in the potential distribution, stats on the species (IUCN status, FWS status, etc.)?

Generating reports, unless very general, sounds like an option for specific use-case decision support tools. If the tool proves useful for this purpose, then perhaps we will get to this point. But for now I do not envision the tool as being a single-decision DST. But, the usability tests may reveal opportunity in the area...

I'm sure they would not want you to put copies of their geodatabase files online where someone else could find and download them,

Indeed. This is certainly not an objective of USAvian. Rather we would like to use their data to create the above-defined synthesized product to help visualize the conservation network.

disabling the ability for our services to become a proxy for someone to bypass their system.

Again, yes, we do not want to provide the taxonomic geographies data products or byproducts. Rather, we will provide an otherwise non-existent conservation data layer and visualization.

P.S. Sorry if I'm spouting off about a bunch of stuff you already know and are thinking about.

This is helpful indeed -- especially the more technical data serving bits...

@trashbirdecology trashbirdecology changed the title Identify bird ranges and predicted distributions shapefiles Data Sources: bird ranges, distributions, other information Jan 14, 2020
@trashbirdecology trashbirdecology changed the title Data Sources: bird ranges, distributions, other information [usavian] Data sources: bird ranges, distributions, other information Jan 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants