# Extracting Taxon-Location Data from Legacy Biodiversity Literature

The [BIOfid search](https://biofid.de/de/search/) enables you to search for documents that contain both a taxon and
specific locations. If you specify a region, the search engine uses its knowledge to find also locations within this
region and marks them as relevant for your search. For example, you can search for [Taxus baccata in Germany](https://www.biofid.de/en/search/Taxus%20baccata%20in%20Germany/).

The example below shows one way to analyse the data that you can download from the search results.

In [1]:
from scripts.biofid_data import generate_dataframe_from_json_file

BIOFID_URI = 'https://www.biofid.de/bio-ontologies/Tracheophyta/gbif/5284517'
JSON_FILE_PATH = 'data/biofid-response-taxus-baccata-in-germany.json'

taxon_df = generate_dataframe_from_json_file(JSON_FILE_PATH, f'taxus-baccata-in-germany.tsv')


## Visualize taxon-location relations on a map
The processing in the script above prepared the data. It looked up all locations that were mentioned on the same page as
a taxon (in this case _Taxus baccata_). Yes, this is a very crude approach. However, this is for demonstration purpose!

Now, we can put all the data that was extracted onto an interactive map. You can click any marker for further details! Also,
the data is restricted to the timeframe of 1900 to 1920.

In [2]:
import folium
from scripts.biofid_data import generate_marker_text

m = folium.Map()

dataset = taxon_df

time_restricted = dataset[dataset.document_publication_year.between(1900, 1920)]

time_restricted.apply(lambda row: folium.Marker(location=[row["latitude"], row["longitude"]],
                                   popup=folium.Popup(
                                       generate_marker_text(row),
                                       max_width=300
                                   )
                                   ).add_to(m), axis=1)

sw = dataset[['latitude', 'longitude']].min().values.tolist()
ne = dataset[['latitude', 'longitude']].max().values.tolist()

m.fit_bounds([sw, ne])

m.save('species-location-map.html')

# Display
m

## Getting some context
Having some data from the literature plotted on a map is nice. But we would like to connect the data to e.g. GBIF.
So, we can call the GBIF-API for occurrences within the timeframe of 1900 to 1920 and plot them on the map to get an idea
of how real data is distributed.

In [8]:
from scripts.commons import Biofid, get_gbif_occurrences_for_germany
import re
from copy import copy

map_with_gbif_data = copy(m)

try:
    biofid = Biofid()
    biofid_data = biofid.get_biofid_data_for_uri(BIOFID_URI)

    gbif_url = list(filter(lambda x: x.get('predicate', {}).get('value') == 'https://dwc.tdwg.org/terms/#taxonID',
                           biofid_data['data']))[0]
    gbif_id = re.search(r'/species/([0-9]*)$', gbif_url['object']['value']).group(1)
except IndexError:
    gbif_id = re.search(r'[0-9]*$', BIOFID_URI).group()

gbif_data = get_gbif_occurrences_for_germany(gbif_id, year='1900,1920')

for data in gbif_data:
    try:
        longitude = data['decimalLongitude']
        latitude = data['decimalLatitude']
        label = data['scientificName']
        occurrence_key = data['occurrenceID']

        folium.CircleMarker(location=(latitude, longitude),
                      popup=folium.Popup(
                          f'GBIF Data Key: <a href="{occurrence_key}">{occurrence_key}</a>; Species: {label}',
                          max_width=300
                      )).add_to(m)
    except KeyError:
        pass

map_with_gbif_data.save('species-location-map-with-gbif-comparison.html')

map_with_gbif_data

## Getting the Data
You can download this Jupyter Notebook, run it on your desktop and modify the scripts to suite your needs. If you do so,
please be aware of the [AGPLv3 license](https://tldrlegal.com/license/gnu-affero-general-public-license-v3-(agpl-3.0)) of this project!

# References

Derived dataset GBIF.org (14 October 2021) Filtered export of GBIF occurrence data https://doi.org/10.15468/dd.age5zq