# Extracting Taxon-Location Data from Legacy Biodiversity Literature

You can run a search including a taxon and a location in the [BIOfid search](https://biofid.de/de/search/).
To receive the desired result, you should be specific, e.g. stating "Taxus baccata in Germany"
instead of "Taxus baccata Germany" (this should be fixed soon).

Subsequently, you can download the search result as JSON, when clicking the appropriate
button on the left panel, next to the search results. This process may take a while (up to
some minutes; there is some heavy lifting involved). Now you can save the result dataset
on your hard drive.

To run the script, set the variable `JSON_FILE` to the (absolute) path of the file. Then you
can simply run the script!

In [11]:
from scripts.biofid_data import generate_dataframe_from_json_file

BIOFID_URI = 'https://www.biofid.de/bio-ontologies/Tracheophyta/gbif/5284517'
JSON_FILE_PATH = 'data/biofid-response-taxus-baccata-in-germany.json'

taxon_df = generate_dataframe_from_json_file(JSON_FILE_PATH, f'taxus-baccata-in-germany.tsv')


In [12]:
import folium
from scripts.biofid_data import generate_marker_text

m = folium.Map()

dataset = taxon_df

time_restricted = dataset[dataset.document_publication_year.between(1900, 1920)]

time_restricted.apply(lambda row: folium.Marker(location=[row["latitude"], row["longitude"]],
                                   popup=folium.Popup(
                                       generate_marker_text(row),
                                       max_width=300
                                   )
                                   ).add_to(m), axis=1)

sw = dataset[['latitude', 'longitude']].min().values.tolist()
ne = dataset[['latitude', 'longitude']].max().values.tolist()

m.fit_bounds([sw, ne])

m.save('species-location-map.html')

# Display
m

In [13]:
from scripts.commons import Biofid, get_gbif_occurrences_for_germany
import re
from copy import copy

map_with_gbif_data = copy(m)

try:
    biofid = Biofid()
    biofid_data = biofid.get_biofid_data_for_uri(BIOFID_URI)

    gbif_url = list(filter(lambda x: x.get('predicate', {}).get('value') == 'https://dwc.tdwg.org/terms/#taxonID',
                           biofid_data['data']))[0]
    gbif_id = re.search(r'/species/([0-9]*)$', gbif_url['object']['value']).group(1)
except IndexError:
    gbif_id = re.search(r'[0-9]*$', BIOFID_URI).group()

gbif_data = get_gbif_occurrences_for_germany(gbif_id, year='1900,1920')

for data in gbif_data:
    try:
        longitude = data['decimalLongitude']
        latitude = data['decimalLatitude']
        label = data['scientificName']
        dataset_key = data['datasetKey']

        folium.CircleMarker(location=(latitude, longitude),
                      popup=folium.Popup(
                          f'GBIF Data Key: {dataset_key}; Species: {label}',
                          max_width=300
                      )).add_to(m)
    except KeyError:
        pass

map_with_gbif_data.save('species-location-map-with-gbif-comparison.html')

map_with_gbif_data

# References

Derived dataset GBIF.org (14 October 2021) Filtered export of GBIF occurrence data https://doi.org/10.15468/dd.age5zq