# Invasive alien species internet activity data mining and processing for **iEcology-IAS-miner**

In this notebook, we explore the functionality of the **iEcology-IAS-miner** python package, which is build to seemlessly extract internet activity, images, mentions and occurrences of invasive alien species across the EU from a variety of platforms. For the demonstration to work, all input files should be located in the same folder as the Python notebook. Note that these scripts will not work if the platform's respective API keys have not been set in your local .env file located in the root directory of the library. The .env file can be opened in any text editor and should look something like this:

YT_API_KEY='*insert youtube API key*'  
FLICKR_API_KEY='*insert flickr API key*'  
FLICKR_API_SECRET='*insert flickr secret*'  
WIKI_USER_AGENT='*insert project name* (*insert personal email*)'  
EASIN_EMAIL='*insert personal email*'  
EASIN_PW='*insert easin password*'  

Please contact me with persistent issues at **simon.reynaert@plantentuinmeise.be**.

In [1]:
#set the paths correctly on local device so that functions can be imported
import sys
import os

notebook_dir = os.getcwd()
print(f"The notebook is located at '{notebook_dir}'.")
src_path = os.path.abspath(os.path.join(notebook_dir, "../src"))
print(f"The functions are located at '{src_path}'.")

# Add to Python path if not already there
if src_path not in sys.path:
    sys.path.insert(0, src_path)

The notebook is located at 'c:\Users\simon\Documents\GitHub\iEcology-IAS-miner\scripts'.
The functions are located at 'c:\Users\simon\Documents\GitHub\iEcology-IAS-miner\src'.


## 1. Species list and synonyms mining

### 1.1. Get EASIN union list species names, synonyms and R identifiers

In [3]:
from list_mining.get_EASIN_unionlistofconcern import fetch_and_process_easin_data

fetch_and_process_easin_data(url = "https://easin.jrc.ec.europa.eu/apixg/catxg/euconcern",
                             output_file="EASIN_unionlist_species_and_synonyms.csv")

import pandas as pd

df = pd.read_csv("EASIN_unionlist_species_and_synonyms.csv")
df.head()

Data successfully saved to EASIN_unionlist_species_and_synonyms.csv


Unnamed: 0,EASINID,Scientific Name,Label,All Names
0,R00046,Acacia mearnsii,Common Name,Acácia
1,R00046,Acacia mearnsii,Common Name,Acácia negra
2,R00046,Acacia mearnsii,Common Name,Acacia noir
3,R00046,Acacia mearnsii,Common Name,Acácia-negra
4,R00046,Acacia mearnsii,Common Name,Aromo


### 1.2. Get Wikipedia union list species names and Q identifiers

In [None]:
from list_mining.get_unionlist_wiki import (
    run_easin_sitelinks_pipeline
)
# Define custom output filenames
custom_q_numbers_file = 'unionconcern_invasive_species_qnumbers_2025.csv'
custom_sitelinks_file = 'unionconcern_invasive_species_wikipedia_links_2025.csv'

# Run the pipeline
df_q_numbers, df_sitelinks = run_easin_sitelinks_pipeline(
    wiki_url='https://en.wikipedia.org/wiki/List_of_invasive_alien_species_of_Union_concern',
    q_number_file=custom_q_numbers_file,
    sitelinks_file=custom_sitelinks_file
)

df_sitelinks.head()

--- Starting Pipeline for URL: https://en.wikipedia.org/wiki/List_of_invasive_alien_species_of_Union_concern ---
Step 1/4: Fetching webpage and extracting scientific names...
Step 2/4: Getting Wikidata Q-numbers (This may take time)...


100%|██████████| 88/88 [00:54<00:00,  1.62it/s]


Step 3/4: Fetching sitelinks for all EU languages (This may take time)...


Fetching sitelinks: 100%|██████████| 88/88 [00:55<00:00,  1.58it/s]

Step 4/4: Saving data to unionconcern_invasive_species_qnumbers_2025.csv and unionconcern_invasive_species_wikipedia_links_2025.csv...
Pipeline completed and data saved successfully.





Unnamed: 0,Scientific Name,Q-number,Language,Wikipedia Title
0,Acacia saligna,Q402385,de,Weidenblatt-Akazie
1,Acacia saligna,Q402385,en,Acacia saligna
2,Acacia saligna,Q402385,es,Acacia saligna
3,Acacia saligna,Q402385,fi,Siniakaasia
4,Acacia saligna,Q402385,fr,Acacia saligna


### 1.3. Get GBIF species synonyms and common names

In [3]:
from list_mining.get_synonyms_GBIF import fetch_gbif_names_and_synonyms

df = fetch_gbif_names_and_synonyms(input_csv_path="unionconcern_invasive_species_wikipedia_links_2025.csv",
                              output_csv_path="GBIF_unionlist_synonyms.csv",
                              max_workers=10) #multithreading for speed-up

df.head()

  from .autonotebook import tqdm as notebook_tqdm


🚀 Starting GBIF synonyms and common names fetching for species in: **unionconcern_invasive_species_wikipedia_links_2025.csv**
🔍 Found **88** unique scientific names. Processing with 10 workers...


Fetching GBIF Data: 100%|██████████| 88/88 [00:22<00:00,  3.87it/s]


✅ Data collection completed in 22.75 seconds.
📂 Saved results to **GBIF_unionlist_synonyms.csv** (Total rows: 2921)





Unnamed: 0,Scientific Name,Name,Type,Language (optional)
0,Acacia saligna,Acacia cyanophylla var. cyanophylla,Synonym,
1,Acacia saligna,Acacia bracteata Maiden & Blakely,Synonym,
2,Acacia saligna,Acacia lindleyi Meisn.,Synonym,
3,Acacia saligna,Acacia cyanophylla Lindl.,Synonym,
4,Acacia saligna,Racosperma salignum (Labill.) Pedley,Synonym,


## 2. Invasive alien species internet activity mining 

### 2.1. Fetching Flickr images

In [7]:
from activity_mining.get_flickr_mentions_final import (
    get_flickr_client, 
    load_species_list, 
    scrape_flickr_data,
    EU_BOUNDING_BOXES #predefined bounding box for Europe
)

#load client
flickr = get_flickr_client()

#load sp list
species_list = load_species_list("unionconcern_invasive_species_qnumbers_2025.csv")

#fetch flickr data

results_df = scrape_flickr_data(
    flickr_client = flickr,
    species_list = species_list,
    bounding_boxes= EU_BOUNDING_BOXES,
    start_date = "2022-01-01",
    end_date = "2022-12-31")

results_df.head()

✅ Loaded 88 species names.
📅 Scraping data between 2022-01-01 and 2022-12-31...


Fetching for EU:   2%|▏         | 2/88 [00:00<00:39,  2.20it/s]

Loading formatted geocoded file...


Fetching for EU: 100%|██████████| 88/88 [00:37<00:00,  2.32it/s]


Unnamed: 0,photo_id,scientific_name,country,date_taken,latitude,longitude,url,tags
0,52511929340,Ailanthus altissima,AT,2022-11-01 08:20:46,48.241958,16.393618,https://live.staticflickr.com/65535/5251192934...,ailanthusaltissima donauinsel götterbaum wien ...
1,54776815125,Alopochen aegyptiaca,FR,2022-06-03 06:24:17,45.811663,4.975644,,ouettedégypte egyptiangoose alopochenaegyptiac...
2,54641104858,Alopochen aegyptiaca,FR,2022-03-25 09:10:12,48.957035,2.50926,https://live.staticflickr.com/65535/5464110485...,bird birdwatching wildlife birding france egyp...
3,52626004432,Alopochen aegyptiaca,LU,2022-10-06 16:13:40,49.900985,5.869946,,alopochenaegyptiaca carygreisch fuussefeld ins...
4,52581534119,Alopochen aegyptiaca,GB,2022-01-31 08:46:04,51.501966,-0.132782,https://live.staticflickr.com/65535/5258153411...,sonya68 sonyalpha68 alpha68 sony alpha 68 a68 ...


### 2.2. Fetching wikipedia geolocated pageviews

### 2.3. Fetching Wikipedia language-based pageviews

### 2.4. Fetching Youtube videos

### 2.5. Fetching iNaturalist observations

In [6]:
from activity_mining.get_inaturalist_nonresearch_observations_final import run_inat_pipeline

run_inat_pipeline(species_csv_path="unionconcern_invasive_species_qnumbers_2025.csv",
                  place_id = 97391, #Europe
                  start_date="2023-01-01",
                  end_date="2023-12-31",
                  output_folder="inat_obs_nonresearch_2023")

--- Starting iNaturalist Pipeline ---
Fetching observations from 2023-01-01 to 2023-12-31 in place_id=97391...
Output folder: inat_obs_nonresearch_2023
Found 88 species to process.


Overall Species Progress:   1%|          | 1/88 [00:04<06:58,  4.81s/it]

Finished fetching 190 new non-research observations for Acacia saligna (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   2%|▏         | 2/88 [00:06<04:23,  3.07s/it]

Finished fetching 3 new non-research observations for Acridotheres tristis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   3%|▎         | 3/88 [00:15<07:45,  5.48s/it]

Finished fetching 377 new non-research observations for Ailanthus altissima (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   5%|▍         | 4/88 [00:20<07:42,  5.51s/it]

Finished fetching 227 new non-research observations for Alopochen aegyptiaca (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   6%|▌         | 5/88 [00:22<05:48,  4.19s/it]

Finished fetching 1 new non-research observations for Alternanthera philoxeroides (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   7%|▋         | 6/88 [00:24<04:51,  3.56s/it]

Finished fetching 14 new non-research observations for Ameiurus melas (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   8%|▊         | 7/88 [00:26<04:08,  3.07s/it]

Finished fetching 0 new non-research observations for Andropogon virginicus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:   9%|▉         | 8/88 [00:28<03:34,  2.68s/it]

Finished fetching 4 new non-research observations for Arthurdendyus triangulatus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  10%|█         | 9/88 [00:31<03:31,  2.68s/it]

Finished fetching 33 new non-research observations for Asclepias syriaca (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  11%|█▏        | 10/88 [00:33<03:24,  2.62s/it]

Finished fetching 18 new non-research observations for Axis axis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  12%|█▎        | 11/88 [00:36<03:14,  2.53s/it]

Finished fetching 22 new non-research observations for Baccharis halimifolia (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  14%|█▎        | 12/88 [00:38<02:56,  2.33s/it]

Finished fetching 0 new non-research observations for Cabomba caroliniana (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  15%|█▍        | 13/88 [00:39<02:45,  2.21s/it]

Finished fetching 3 new non-research observations for Callosciurus erythraeus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  16%|█▌        | 14/88 [00:41<02:33,  2.08s/it]

Finished fetching 0 new non-research observations for Callosciurus finlaysonii (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  17%|█▋        | 15/88 [00:43<02:29,  2.04s/it]

Finished fetching 5 new non-research observations for Cardiospermum grandiflorum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  18%|█▊        | 16/88 [00:46<02:36,  2.17s/it]

Finished fetching 26 new non-research observations for Celastrus orbiculatus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  19%|█▉        | 17/88 [00:48<02:27,  2.08s/it]

Finished fetching 0 new non-research observations for Channa argus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  20%|██        | 18/88 [00:49<02:20,  2.00s/it]

Finished fetching 1 new non-research observations for Cortaderia jubata (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  22%|██▏       | 19/88 [00:51<02:16,  1.97s/it]

Finished fetching 2 new non-research observations for Corvus splendens (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  23%|██▎       | 20/88 [00:53<02:11,  1.93s/it]

Finished fetching 0 new non-research observations for Ehrharta calycina (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  24%|██▍       | 21/88 [00:56<02:22,  2.13s/it]

Finished fetching 30 new non-research observations for Eichhornia crassipes (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  25%|██▌       | 22/88 [00:58<02:24,  2.20s/it]

Finished fetching 34 new non-research observations for Elodea nuttallii (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  26%|██▌       | 23/88 [01:00<02:22,  2.19s/it]

Finished fetching 3 new non-research observations for Eriocheir sinensis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  27%|██▋       | 24/88 [01:02<02:15,  2.11s/it]

Finished fetching 0 new non-research observations for Faxonius rusticus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  28%|██▊       | 25/88 [01:04<02:07,  2.03s/it]

Finished fetching 0 new non-research observations for Fundulus heteroclitus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  30%|██▉       | 26/88 [01:06<02:01,  1.97s/it]

Finished fetching 0 new non-research observations for Gambusia affinis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  31%|███       | 27/88 [01:09<02:16,  2.24s/it]

Finished fetching 47 new non-research observations for Gambusia holbrooki (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  32%|███▏      | 28/88 [01:15<03:32,  3.54s/it]

Finished fetching 239 new non-research observations for Gunnera tinctoria (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  33%|███▎      | 29/88 [01:17<02:58,  3.02s/it]

Finished fetching 0 new non-research observations for Gymnocoronis spilanthoides (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  34%|███▍      | 30/88 [01:19<02:37,  2.72s/it]

Finished fetching 0 new non-research observations for Hakea sericea (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  35%|███▌      | 31/88 [01:27<04:00,  4.21s/it]

Finished fetching 302 new non-research observations for Heracleum mantegazzianum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  36%|███▋      | 32/88 [01:29<03:20,  3.58s/it]

Finished fetching 14 new non-research observations for Heracleum persicum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  38%|███▊      | 33/88 [01:31<02:55,  3.18s/it]

Finished fetching 64 new non-research observations for Heracleum sosnowskyi (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  39%|███▊      | 34/88 [01:33<02:30,  2.79s/it]

Finished fetching 0 new non-research observations for Herpestes javanicus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  40%|███▉      | 35/88 [01:35<02:15,  2.55s/it]

Finished fetching 4 new non-research observations for Humulus scandens (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  41%|████      | 36/88 [01:38<02:13,  2.56s/it]

Finished fetching 23 new non-research observations for Hydrocotyle ranunculoides (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  42%|████▏     | 37/88 [01:41<02:19,  2.74s/it]

Finished fetching 107 new non-research observations for Impatiens glandulifera (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  43%|████▎     | 38/88 [01:43<02:12,  2.65s/it]

Finished fetching 13 new non-research observations for Koenigia polystachya (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  44%|████▍     | 39/88 [01:45<02:00,  2.45s/it]

Finished fetching 5 new non-research observations for Lagarosiphon major (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  45%|████▌     | 40/88 [01:47<01:52,  2.34s/it]

Finished fetching 9 new non-research observations for Lampropeltis getula (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  47%|████▋     | 41/88 [01:49<01:48,  2.30s/it]

Finished fetching 21 new non-research observations for Lepomis gibbosus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  48%|████▊     | 42/88 [01:51<01:38,  2.15s/it]

Finished fetching 0 new non-research observations for Lespedeza cuneata (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  49%|████▉     | 43/88 [01:53<01:32,  2.05s/it]

Finished fetching 0 new non-research observations for Limnoperna fortunei (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  50%|█████     | 44/88 [01:55<01:29,  2.02s/it]

Finished fetching 8 new non-research observations for Lithobates catesbeianus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  51%|█████     | 45/88 [01:57<01:29,  2.07s/it]

Finished fetching 21 new non-research observations for Ludwigia grandiflora (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  52%|█████▏    | 46/88 [02:00<01:39,  2.36s/it]

Finished fetching 92 new non-research observations for Ludwigia peploides (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  53%|█████▎    | 47/88 [02:02<01:31,  2.22s/it]

Finished fetching 2 new non-research observations for Lygodium japonicum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  55%|█████▍    | 48/88 [02:05<01:32,  2.31s/it]

Finished fetching 51 new non-research observations for Lysichiton americanus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  56%|█████▌    | 49/88 [02:07<01:25,  2.20s/it]

Finished fetching 0 new non-research observations for Microstegium vimineum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  57%|█████▋    | 50/88 [02:08<01:18,  2.07s/it]

Finished fetching 0 new non-research observations for Morone americana (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  58%|█████▊    | 51/88 [02:11<01:25,  2.32s/it]

Finished fetching 77 new non-research observations for Muntiacus reevesi (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  59%|█████▉    | 52/88 [02:18<02:06,  3.52s/it]

Finished fetching 204 new non-research observations for Myocastor coypus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  60%|██████    | 53/88 [02:20<01:49,  3.14s/it]

Finished fetching 16 new non-research observations for Myriophyllum aquaticum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  61%|██████▏   | 54/88 [02:22<01:34,  2.79s/it]

Finished fetching 8 new non-research observations for Myriophyllum heterophyllum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  62%|██████▎   | 55/88 [02:24<01:25,  2.59s/it]

Finished fetching 11 new non-research observations for Nasua nasua (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  64%|██████▎   | 56/88 [02:26<01:22,  2.58s/it]

Finished fetching 38 new non-research observations for Nyctereutes procyonoides (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  65%|██████▍   | 57/88 [02:29<01:20,  2.60s/it]

Finished fetching 51 new non-research observations for Ondatra zibethicus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  66%|██████▌   | 58/88 [02:31<01:12,  2.43s/it]

Finished fetching 8 new non-research observations for Orconectes limosus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  67%|██████▋   | 59/88 [02:33<01:06,  2.29s/it]

Finished fetching 4 new non-research observations for Orconectes virilis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  68%|██████▊   | 60/88 [02:35<01:00,  2.17s/it]

Finished fetching 2 new non-research observations for Oxyura jamaicensis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  69%|██████▉   | 61/88 [02:38<01:01,  2.29s/it]

Finished fetching 35 new non-research observations for Pacifastacus leniusculus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  70%|███████   | 62/88 [02:39<00:55,  2.14s/it]

Finished fetching 0 new non-research observations for Parthenium hysterophorus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  72%|███████▏  | 63/88 [02:43<01:01,  2.47s/it]

Finished fetching 100 new non-research observations for Pennisetum setaceum (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  73%|███████▎  | 64/88 [02:45<00:55,  2.30s/it]

Finished fetching 3 new non-research observations for Perccottus glenii (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  74%|███████▍  | 65/88 [02:46<00:49,  2.16s/it]

Finished fetching 0 new non-research observations for Persicaria perfoliata (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  75%|███████▌  | 66/88 [02:49<00:49,  2.23s/it]

Finished fetching 28 new non-research observations for Pistia stratiotes (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  76%|███████▌  | 67/88 [02:51<00:44,  2.14s/it]

Finished fetching 1 new non-research observations for Plotosus lineatus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  77%|███████▋  | 68/88 [02:54<00:47,  2.36s/it]

Finished fetching 66 new non-research observations for Procambarus clarkii (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  78%|███████▊  | 69/88 [02:56<00:43,  2.31s/it]

Finished fetching 17 new non-research observations for Procambarus virginalis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  80%|███████▉  | 70/88 [02:59<00:45,  2.55s/it]

Finished fetching 83 new non-research observations for Procyon lotor (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  81%|████████  | 71/88 [03:01<00:39,  2.35s/it]

Finished fetching 0 new non-research observations for Prosopis juliflora (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  82%|████████▏ | 72/88 [03:03<00:35,  2.22s/it]

Finished fetching 5 new non-research observations for Pseudorasbora parva (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  83%|████████▎ | 73/88 [03:05<00:33,  2.26s/it]

Finished fetching 5 new non-research observations for Pueraria montana (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  84%|████████▍ | 74/88 [03:07<00:30,  2.15s/it]

Finished fetching 0 new non-research observations for Pycnonotus cafer (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  85%|████████▌ | 75/88 [03:09<00:28,  2.20s/it]

Finished fetching 31 new non-research observations for Rugulopteryx okamurae (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  86%|████████▋ | 76/88 [03:11<00:25,  2.13s/it]

Finished fetching 4 new non-research observations for Salvinia molesta (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  88%|████████▊ | 77/88 [03:15<00:29,  2.67s/it]

Finished fetching 181 new non-research observations for Sciurus carolinensis (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  89%|████████▊ | 78/88 [03:17<00:24,  2.50s/it]

Finished fetching 1 new non-research observations for Sciurus niger (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  90%|████████▉ | 79/88 [03:19<00:21,  2.34s/it]

Finished fetching 0 new non-research observations for Solenopsis geminata (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  91%|█████████ | 80/88 [03:21<00:17,  2.23s/it]

Finished fetching 2 new non-research observations for Solenopsis invicta (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  92%|█████████▏| 81/88 [03:23<00:15,  2.17s/it]

Finished fetching 0 new non-research observations for Solenopsis richteri (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  93%|█████████▎| 82/88 [03:25<00:12,  2.14s/it]

Finished fetching 3 new non-research observations for Tamias sibiricus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  94%|█████████▍| 83/88 [03:28<00:11,  2.38s/it]

Finished fetching 95 new non-research observations for Threskiornis aethiopicus (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  95%|█████████▌| 84/88 [03:31<00:10,  2.54s/it]

Finished fetching 38 new non-research observations for Trachemys scripta elegans (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  97%|█████████▋| 85/88 [03:33<00:07,  2.34s/it]

Finished fetching 0 new non-research observations for Triadica sebifera (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  98%|█████████▊| 86/88 [03:35<00:04,  2.26s/it]

Finished fetching 3 new non-research observations for Vespa velutina nigrithorax (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress:  99%|█████████▉| 87/88 [03:37<00:02,  2.13s/it]

Finished fetching 0 new non-research observations for Wasmannia auropunctata (d1:2023-01-01, d2:2023-12-31).


Overall Species Progress: 100%|██████████| 88/88 [03:39<00:00,  2.49s/it]

Finished fetching 5 new non-research observations for Xenopus laevis (d1:2023-01-01, d2:2023-12-31).

✅ Script has finished processing all species.





(                Scientific Name Wikidata Q-number
 0                Acacia saligna           Q402385
 1          Acridotheres tristis           Q116667
 2           Ailanthus altissima           Q159570
 3          Alopochen aegyptiaca           Q274179
 4   Alternanthera philoxeroides          Q1472735
 ..                          ...               ...
 83    Trachemys scripta elegans           Q207839
 84            Triadica sebifera           Q702175
 85   Vespa velutina nigrithorax           Q136668
 86       Wasmannia auropunctata           Q978483
 87               Xenopus laevis           Q654718
 
 [88 rows x 2 columns],
 ['Acacia saligna',
  'Acridotheres tristis',
  'Ailanthus altissima',
  'Alopochen aegyptiaca',
  'Alternanthera philoxeroides',
  'Ameiurus melas',
  'Andropogon virginicus',
  'Arthurdendyus triangulatus',
  'Asclepias syriaca',
  'Axis axis',
  'Baccharis halimifolia',
  'Cabomba caroliniana',
  'Callosciurus erythraeus',
  'Callosciurus finlaysonii',
  'C

### 2.6. Fetching GBIF observations

### 2.7. Fetching EASIN observations

In [None]:
#get EASIN credentials (prerequisite for mining EASIN data)

from EASIN_mining_and_map_generation.EASIN_API_credentials_registration import register_user

register_user() # make sure to set EASIN_EMAIL and EASIN_PASSWORD in your .env file - not shown here for safety reasons

📡 Sending registration request to EASIN...
⚠️ Unexpected response [406]: {'Message': "Generating EASIN user didn't succeed. Message: Name simon.reynaert@plantentuinmeise.be is already taken."}


In [None]:
# get EASIN union list IAS PER COUNTRY occurrence data ('presence/absence') through publicly available REST API

from EASIN_mining_and_map_generation.get_unionlist_presence_EASIN_final import fetch_easin_presence

# 1. Define your file paths
input_csv_path = "list_of_union_concern.csv"
output_csv_path = "EASIN_IAS_occurrences_EU.csv"

# 2. Call the function and capture its return values
rows, missing_species = fetch_easin_presence(input_csv=input_csv_path, #actual function call
                                             output_csv=output_csv_path)

# 3. Print the informative summary using the captured values
print(f"\n Data Processing Complete")
# NOTE: The variable 'output_csv_path' is now used instead of 'output_csv'
print(f"   - {len(rows)} total presence records were written to '{output_csv_path}'.")

# Calculate the number of unique countries to estimate species count
unique_countries = set(r['country'] for r in rows)
# To avoid division by zero error if no rows are returned:
if unique_countries:
    estimated_species = len(rows) // len(unique_countries)
    print(f"   - These records cover approximately {estimated_species} species.")
else:
    print(f"   - No country records found in the output data.")


if missing_species:
    print("\n **Species with No Confirmed Match in EASIN:**")
    print(f"   - **{len(missing_species)}** species were not matched.")
    for species in missing_species:
        print(f"    - {species}")
else:
    print("\n All input species were successfully matched and processed.")


 Data Processing Complete
   - 5368 total presence records were written to 'EASIN_IAS_occurrences_EU.csv'.
   - These records cover approximately 88 species.

 All input species were successfully matched and processed.


In [None]:
# get all available (so full records) EASIN IAS occurrences using personal API credentials - fetching was paused due to long runtime, data shown below

from EASIN_mining_and_map_generation.get_EASIN_observations import run_easin_fetcher

run_easin_fetcher(species_file = "UnionList_Species_Traits_85_present.csv",
                   output_file = "EASIN_observations_BE_2010-2015.csv",
                   countries= ["BE"],
                   start_date="2010",
                   end_date="2015")

✅ Created new output file 'EASIN_observations_BE_2010-2015.csv' with 11 fixed fields.
🔄 Loading species data from UnionList_Species_Traits_85_present.csv...
🔍 Found 85 unique species IDs to process.
ℹ️ Resuming: 0 species already processed in EASIN_observations_BE_2010-2015.csv.
🔎 Date Filter:  (2010 to 2015)


🦎 Processing species:   0%|[32m          [0m| 0/85 [00:00<?, ?species/s, Status=Saved, Records=0]

✅ Saved 0 records for species R00053


🦎 Processing species:   1%|[32m          [0m| 1/85 [00:04<02:43,  1.94s/species, Status=Saved, Records=22]

✅ Saved 22 records for species R00212


🦎 Processing species:   2%|[32m▏         [0m| 2/85 [00:07<03:44,  2.71s/species, Status=Saved, Records=3107]

✅ Saved 3085 records for species R00460


🦎 Processing species:   4%|[32m▎         [0m| 3/85 [00:26<12:09,  8.90s/species, Status=Saved, Records=3107]


KeyboardInterrupt: 

In [5]:
#show first few EASIN species records

import pandas as pd

df2 = pd.read_csv("EASIN_observations_BE_2010-2015.csv")
df2.head()

Unnamed: 0,EASIN_ID,ScientificName,Country,Latitude,Longitude,DataPartner,Date,ObservationId,Reference,ReferenceUrl,Timestamp
0,R00053,,,,,,,,,,
1,R00212,Acridotheres tristis,BE,51.02226,4.26531,1.0,2002.0,1D518D40-767B-EF11-9114-B47AF17505B8,Natuurpunt,https://www.gbif.org/occurrence/1570633082,2021-01-02T00:00:00
2,R00212,Acridotheres tristis,BE,51.25525,4.20914,1.0,2004.0,990ADA7D-767B-EF11-9114-B47AF17505B8,Natuurpunt,https://www.gbif.org/occurrence/1570678649,2021-01-02T00:00:00
3,R00212,Acridotheres tristis,BE,51.06,4.72,6.0,2005.0,DF441CDB-2A95-EF11-9114-B47AF17505B8,"Bosmans J, 2010. Eerste broedgevallen van Treu...",https://www.google.com/url?sa=t&rct=j&q=&esrc=...,2018-11-12T00:00:00
4,R00212,Acridotheres tristis,BE,51.0692,4.77084,1.0,2006.0,2DED497A-777B-EF11-9114-B47AF17505B8,Natuurpunt,https://www.gbif.org/occurrence/1570633064,2021-01-02T00:00:00


## 3. Cleaning up internet activity data 

### 3.1. Deduplicating and geolocating Flickr images

In [18]:
# !!only works if the mined data .csv is located in the same folder as this notebook!!
from data_processing.process_flickr_images import process_flickr_data

process_flickr_data("flickr_species_observations_eu_combined_latin_normtag_2004-now.csv", "output_flickr_processing_test.csv", 100)

Deduplicating rows: 100%|██████████| 5054/5054 [00:00<00:00, 23721.05it/s]

Deduplicated & geocoded European data saved to: output_flickr_processing_test.csv
Number of rows in final CSV: 3055





### 3.2. Geolocating and pivoting iNaturalist observations

In [None]:
# !!only works if the mined data .csv is located in the same folder as this notebook!!
from data_processing.geolocate_process_inaturalist_data import process_inat_data

process_inat_data("species_inat_observations_onlycasual", "processed_inat_observations.csv")

Geolocated CSV saved to: species_inat_observations_onlycasual\Acacia_saligna_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Acacia_saligna_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Acridotheres_tristis_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Acridotheres_tristis_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Ailanthus_altissima_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Ailanthus_altissima_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alopochen_aegyptiaca_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alopochen_aegyptiaca_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alternanthera_philoxeroides_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alternanthera_philoxeroides_geolocated.csv
Geolocated CSV saved

date_str,Scientific Name,Country,2016-01-01,2016-01-02,2016-01-03,2016-01-04,2016-01-05,2016-01-06,2016-01-07,2016-01-08,...,2025-07-07,2025-07-08,2025-07-09,2025-07-10,2025-07-11,2025-07-12,2025-07-13,2025-07-14,2025-07-15,2025-07-16
0,Acacia saligna,AL,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Acacia saligna,AT,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Acacia saligna,ES,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
3,Acacia saligna,FR,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Acacia saligna,GR,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
970,Xenopus laevis,PT,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
971,Xenopus laevis,RU,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
972,Xenopus laevis,SE,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
973,Xenopus laevis,SK,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 3.3. Processing dates and pivoting GBIF data

In [None]:
# !!only works if the mined data .csv is located in the same folder as this notebook!!
from data_processing.process_GBIF_observations import process_gbif_data

process_gbif_data(
    input_file="GBIF_species_occurrences_EU.csv")

Loading data from GBIF_species_occurrences_EU.csv...
Successfully loaded 2,640,153 rows.

Parsing event dates...


Parsing event dates: 100%|██████████| 2640153/2640153 [00:57<00:00, 45602.54it/s]



--- Parsing Summary ---
Total rows: 2,640,153
Parsed successfully: 2,630,741 (99.64%)
Failed parses: 9,412 (0.36%)
Saved failed date ranges to: GBIF_species_occurrences_EU_failed_dates.csv

Creating time series from 2016-01-01 to 2025-07-13...

Processed dataset saved to: GBIF_species_occurrences_EU_processed.csv


date_str,Scientific Name,Country,2016-01-01,2016-01-02,2016-01-03,2016-01-04,2016-01-05,2016-01-06,2016-01-07,2016-01-08,...,2025-07-04,2025-07-05,2025-07-06,2025-07-07,2025-07-08,2025-07-09,2025-07-10,2025-07-11,2025-07-12,2025-07-13
0,Acacia saligna,AL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Acacia saligna,BE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Acacia saligna,CY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Acacia saligna,DK,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Acacia saligna,ES,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902,Xenopus laevis,FR,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
903,Xenopus laevis,GB,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
904,Xenopus laevis,IT,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
905,Xenopus laevis,NL,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 4. Data exploration and visualizations (more in Rmd files)

### 4.1. EASIN IAS presence map generation (html - cannot show in notebook)

In [None]:
# !!only works if the mined data .csv is located in the same folder as this notebook!!
from EASIN_mining_and_map_generation.generate_html_maps_IAS_presence_EASIN import generate_species_maps

generate_species_maps(csv_file = "species_by_country_presence_EASIN_updated.csv", #input CSV
                      shapefile_dir = "natural_earth",
                      map_output_dir="easin_species_maps_output")

Downloading Natural Earth shapefile...
Shapefile downloaded and extracted.
Saving map for Acacia saligna → easin_species_maps_output\Acacia_saligna_map.html
Saving map for Acridotheres tristis → easin_species_maps_output\Acridotheres_tristis_map.html
Saving map for Ailanthus altissima → easin_species_maps_output\Ailanthus_altissima_map.html
Saving map for Alopochen aegyptiaca → easin_species_maps_output\Alopochen_aegyptiaca_map.html
Saving map for Alternanthera philoxeroides → easin_species_maps_output\Alternanthera_philoxeroides_map.html
Saving map for Ameiurus melas → easin_species_maps_output\Ameiurus_melas_map.html
Saving map for Andropogon virginicus → easin_species_maps_output\Andropogon_virginicus_map.html
Saving map for Arthurdendyus triangulatus → easin_species_maps_output\Arthurdendyus_triangulatus_map.html
Saving map for Asclepias syriaca → easin_species_maps_output\Asclepias_syriaca_map.html
Saving map for Axis axis → easin_species_maps_output\Axis_axis_map.html
Saving map 

['easin_species_maps_output\\Acacia_saligna_map.html',
 'easin_species_maps_output\\Acridotheres_tristis_map.html',
 'easin_species_maps_output\\Ailanthus_altissima_map.html',
 'easin_species_maps_output\\Alopochen_aegyptiaca_map.html',
 'easin_species_maps_output\\Alternanthera_philoxeroides_map.html',
 'easin_species_maps_output\\Ameiurus_melas_map.html',
 'easin_species_maps_output\\Andropogon_virginicus_map.html',
 'easin_species_maps_output\\Arthurdendyus_triangulatus_map.html',
 'easin_species_maps_output\\Asclepias_syriaca_map.html',
 'easin_species_maps_output\\Axis_axis_map.html',
 'easin_species_maps_output\\Baccharis_halimifolia_map.html',
 'easin_species_maps_output\\Cabomba_caroliniana_map.html',
 'easin_species_maps_output\\Callosciurus_erythraeus_map.html',
 'easin_species_maps_output\\Callosciurus_finlaysonii_map.html',
 'easin_species_maps_output\\Cardiospermum_grandiflorum_map.html',
 'easin_species_maps_output\\Celastrus_orbiculatus_map.html',
 'easin_species_maps_ou