# Invasive alien species internet activity data mining and preprocessing for iEcology-IAS-miner

In this notebook, we explore the functionality of the **iECology-IAS-miner** python package, which is build to seemlessly extract internet activity, images, mentions and occurrences of invasive alien species across the EU from a variety of platforms. Note that these scripts will not work if the platform's respective API keys have not been set in your local .env file. The .env file can be opened in any texteditor and should look something like this:

YT_API_KEY='*insert youtube API key*'  
FLICKR_API_KEY='*insert flickr API key*'  
FLICKR_API_SECRET='*insert flickr secret*'  
WIKI_USER_AGENT='*insert project name* (*insert personal email*)'  
EASIN_EMAIL='*insert personal email*'  
EASIN_PW='*insert easin password*'  

Please contact me with persistent issues at **simon.reynaert@plantentuinmeise.be**.

In [6]:
#set the paths correctly on local device so that functions can be imported
import sys
import os

notebook_dir = os.getcwd()
print(f"The notebook is located here:{notebook_dir}")
src_path = os.path.abspath(os.path.join(notebook_dir, "../src"))
print(f"The functions are located here:{src_path}")

# Add to Python path if not already there
if src_path not in sys.path:
    sys.path.insert(0, src_path)

The notebook is located here:c:\Users\simon\Documents\GitHub\iEcology-IAS-miner\scripts
The functions are located here:c:\Users\simon\Documents\GitHub\iEcology-IAS-miner\src


## 1. Invasive alien species internet activity mining 

### 1.1. Fetching Flickr images

### 1.2. Fetching wikipedia geolocated pageviews

### 1.3. Fetching Wikipedia language-based pageviews

### 1.4. Fetching Youtube videos

### 1.5. Fetching iNaturalist observations

### 1.6. Fetching GBIF observations

## 2. Cleaning up internet activity data 

### 2.1. Deduplicating and geolocating Flickr images

In [18]:
# !!only works if the mined data .csv is located in the same folder as this notebook!!
from data_processing.process_flickr_images import process_flickr_data

process_flickr_data("flickr_species_observations_eu_combined_latin_normtag_2004-now.csv", "output_flickr_processing_test.csv", 100)

Deduplicating rows: 100%|██████████| 5054/5054 [00:00<00:00, 23721.05it/s]

Deduplicated & geocoded European data saved to: output_flickr_processing_test.csv
Number of rows in final CSV: 3055





### 2.2. Geolocating and pivoting iNaturalist observations

In [26]:
from data_processing.geolocate_process_inaturalist_data import process_inat_data

process_inat_data("species_inat_observations_onlycasual", "processed_inat_observations.csv")

Geolocated CSV saved to: species_inat_observations_onlycasual\Acacia_saligna_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Acacia_saligna_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Acridotheres_tristis_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Acridotheres_tristis_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Ailanthus_altissima_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Ailanthus_altissima_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alopochen_aegyptiaca_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alopochen_aegyptiaca_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alternanthera_philoxeroides_geolocated.csv
Geolocated CSV saved to: species_inat_observations_onlycasual\Alternanthera_philoxeroides_geolocated.csv
Geolocated CSV saved

date_str,Scientific Name,Country,2016-01-01,2016-01-02,2016-01-03,2016-01-04,2016-01-05,2016-01-06,2016-01-07,2016-01-08,...,2025-07-07,2025-07-08,2025-07-09,2025-07-10,2025-07-11,2025-07-12,2025-07-13,2025-07-14,2025-07-15,2025-07-16
0,Acacia saligna,AL,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Acacia saligna,AT,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Acacia saligna,ES,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0
3,Acacia saligna,FR,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Acacia saligna,GR,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
970,Xenopus laevis,PT,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
971,Xenopus laevis,RU,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
972,Xenopus laevis,SE,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
973,Xenopus laevis,SK,0.0,0.0,0,0.0,0,0.0,0,0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
