Releases: Big-Bee-Network/Bee-Specialization-Modeling
Release list
v0.6.1
Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data
Colleen Smith1, Nick Bachelder1, Avery L. Russell2, Vanessa Morales2, Abilene R. Mosher2, Katja C. Seltmann1
1Cheadle Center for Biodiversity and Ecological Restoration, University of California, Santa Barbara, Harder South Building 578, Santa Barbara, CA 93106, U.S.A.
2Department of Biology, Missouri State University, 910 S John Q Hammons Parkway, Temple Hall, Springfield, MO 65897
This repository contains data and code for the manuscript Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data currently under review in Oecologia. All files and figures can be found at
Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data
Predicting specialist and generalist bee species
This repository contains data and code for the manuscript Pollen specialist bee species are accurately predicted from visitation, occurrence and phylogenetic data.
Abstract
An animal’s diet breadth is a central aspect of its life history. Yet the factors driving which species have narrow dietary breadths (specialists) and which have broad dietary breadths (generalists) remain poorly understood. Studying diet breadth in herbivorous insects is especially challenging as comprehensive datasets on plant hosts are missing for many taxa and regions. Leveraging interaction data from museum specimens and published literature can help address this gap. Here, we focus on bees to predict pollen diet breadth using machine learning models, supplemented by a bee phylogeny and occurrence data for 682 bee species native to the United States. We found that, on average, 72.9% of the visits made by pollen specialist bees were to their host plants, and that specialist bees can be predicted with high accuracy (mean 94%). Our models predicted generalist bee species, which made up a minority of the species in our dataset, with lower accuracy (mean 70%). Models tested on spatially and phylogenetically blocked data reveal that the most informative predictors of diet breadth are plant phylogenetic diversity, bee species' geographic range, and regional abundance. These results suggest we can predict specialist bee species in regions and for taxonomic groups where diet breadth is unknown, but predicting generalists may be more challenging. Additionally, we found that male bees tend to visit the same species females use as pollen hosts. Overall, our findings align with prior research, indicating that range size is predictive of diet breadth and specialist bees mostly visit their host plants.
Description of data
In final_data
| File Name | Description |
|---|---|
| AppendixS2_20Aug2024-revision.csv | List of USA bee species in GloBI dataset, their diet breadth classifications and references. |
| globi_speciesLevelFinal-12Aug2024_revision2_genera_removed.csv | Species-level data used to predict specialist and generalist bee species with random forest model. |
| fowler_formatted-15Aug2024.csv | List of bee species in our dataset and their pollen host records from Jarrod Fowler's dataset. Used to assess how often specialist bees visit their host plants. |
| russell_formatted-15Aug2024.csv | List of bee species in the GloBI dataset and their pollen host records from Avery's Russell's dataset. Used to assess how often specialist bees visit their host plants. |
| sources_table28nov2023_revision.csv | Sources of bee interaction data found in GloBI visitation dataset |
In Large data files stored on Zenodo
| File Name | Description |
|---|---|
| globi_allNamesUpdated.csv.zip | GloBI dataset with interaction records of bee species from the contiguous United States. Both plant and bee names are updated to the currently accepted species name. |
| globi_allNamesUpdated_Henriquez_Piskulich.csv.zip | GloBI dataset with interaction records of bee species from the contiguous United States. Both plant and bee names are updated to the currently accepted species names with generic changes in bees to match the phylogeny we used in the analysis. |
| globi_speciesLevelFinal-12Aug2024_revision2.csv.zip | Species-level data used to predict specialist and generalist bee species with random forest model with additional genera that were not found in the GloBI visitation dataset. |
General workflow for running the analyses:
-
Download full GloBI interaction dataset from Zenodo.
-
Filter the dataset to just include bee data using scripts/format_bee_data.sh
a. Output: all_bee_data.tsv (not included in archive)
-
Reformat the data using scripts/GLOBI_bee_update_clean.Rmd
a. Output: interactions-14dec2022.csv (not included in archive)
-
Get list of USA native bee species and calculate extent of occurence using scripts/Chesshire2023-beeArea.R
a. Output: modeling_data/Chesshire2023_nameAlignment.csv
b. Output: modeling_data/chesshire2023_beeArea-11april2023.csv
-
Update Chesshire2023_nameAlignment.csv to include information about whether name alignment was informed by geography using scripts/name-alignment-chesshire.R
a. Output: modeling_data/Chesshire2023_nameAlignment-geoUpdate.csv
-
Update bee names and filter to just be native US bees using scripts/update bee names.R
a. Output: modeling_data/large_data_files/globi_american_native_bees_7march2023.csv.zip
-
Make bee phylogeny using scirpts/make bee phylogeny_Henriquez Piskulich.R
a. Output: modeling_data/bee_phylogenetic_data_Henriquez_Piskulich_tree.csv
b. Output: modeling_data/modified_tree_Henriquez_Piskulich.nwk
-
Update plant names and make plant phylogeny using scripts/make plant phylogeny.R
a. get plant families from WFO, using scripts/update plant names.R
b. Output: modeling_data/large_data_files/globi_allNamesUpdated.csv
c. Output: modeling_data/globi_phyloDiv.csv
-
Format and update name of list of specialist bees using scripts/format_fowler_hosts.R and scripts/format_russell_hosts.R
a. Output: modeling_data/russell_formatted-30nov2023.csv
b. Output: modeling_data/fowler_formatted-30nov2023.csv
-
Update datasets with taxon names in GloBI to match Henriquez Piskulich bee phylogeny scripts/make bee phylogeny_Henriquez Piskulich.R
a. Output: globi_allNamesUpdated_Henriquez_Piskulich.csv
-
Make final dataset using scripts/finalDataset.R
a. Output: modeling_data/large_data_files/globi_speciesLevelFinal-12Aug2024_revision2.csv.zip
-
Run analyses using scripts/main_analyses.R
-
Run analysis to predict specialist host plants using scripts/predict specialist hosts.R
-
Run code to look at how often specialist bees visit their host plants using scripts/specialists-nonhosts.R
Defintions of column names in globi_speciesLevelFinal-12Aug2024_revision2.csv
| Column Name | Defintion |
|---|---|
| scientificName | bee species name |
| bee_genus | bee species' genus |
| bee_family | bee species' family |
| diet_breadth_conservative | whether bee is a pollen specialist or generalist using conservative criteria to define generalists |
| diet_breadth_liberal | whether bee is a pollen specialist or generalist using liberal criteria to define generalists |
| phylo_rich | Faith's phylogenetic diversity of plant genera visited |
| phylo_simp | phylogenetic Simpson diversity of plant genera visited |
| eigen1_plantGenus | first eigenvalue of Morisita-Horn distance-matrix for plant genera visited |
| eigen2_plantGenus | second eigenvalue of Morisita-Horn distance-matrix for plant genera visited |
| eigen1_plantFam | first eigenvalue of Morisita-Horn distance-matrix for plant families visited |
| eigen2_plantFam | second eigenvalue of Morisita-Horn distance-matrix for plant families visited |
| rich_genus | number of plant genera visited |
| simpson_genus | inverse Simpson diversity of plant genera visited |
| rich_fam | number of plant families visited |
| simpson_fam | inverse Simpson diversity of plant families visited |
| n_globi | sample size in GLOBI |
| med_lat | median latitude of bee species in Chesshire et al 2023 |
| med_long | median longitude of bee species in Chesshire et al 2023 |
| n_chesshire | sample size in Chesshire et al 2023 |
| area_m2 | area in m2 of extent of occurrence in Chesshire et al 2023 |
| area_ha | area in hectares of extent of occurrence in Chesshire et al 2023 |
| spherical_geometry | was spherical geometry used to calculate area of bee's extent of occurrence? |
| mean_doy | mean day of year of collection in Chesshire et al 2023 |
| median_doy | median day of year of collection in Chesshire et al 2023 |
| quant10 | 10% quantile of day of year of collection in Chesshire et al 2023 |
| quant90 | 90% quantile of day of year of collection in Chesshire et al 2023 |
| flight_season | 'quant90' - 'quant10' |
| eigen1 | first eigenvalue of matrix of bee phylogenetic distance |
| eigen2 | second eigenvalue of matrix of bee phylogenetic distance |
| all other columns | phylogenetic distance of bee genus to bee genus in column name |
Version 0.4.0
Publishing new release for Zenodo.
Version 0.3.0
Version 0.2.0
Version 0.1.0
Version 0.1.0 of repository Bee-Specialization-Modeling