# Interactias Geo Selected Network

One way of examining the impact of invasive species is to look at all their interactions and the interations those organisms have with each other. This full interaction network gives you a good idea whether a species might be a "keystone" species 

The networks create from all interacting species globally can be misleading, because not all members of the network live in one place. The next step is to filter the species in the network geographically, just to find the interactions that might occur in an area.

I will harvest species interactions data from GLOBI (https://www.globalbioticinteractions.org/) to discover the species that interact with an invasive species.
I will then harvest all the interactions for those species to create two tiers of interactions.
I will then count all the occurences of these in species in GBIF for an area.
I will then create a network diagram to visualize this.

This notebook takes considerable insperation and code from Yikang Li's project on GLoBI (https://curiositydata.org/part1_globi_access/).

In [160]:
import sys
print(sys.version)

#Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
#pygbif 0.3.0

3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]


In [72]:
import pandas as pd
import re
import matplotlib.pyplot as plt
from pygbif import species
from pygbif import occurrences as occ

## Load the GLoBI data

The current snapshot of GLoBI was taken on 2019-11-05 from https://depot.globalbioticinteractions.org/snapshot/target/data/tsv/interactions.tsv.gz


In [73]:
# This takes a few minutes to load in.
# the low_memory=False property will get rid of a warning, but will not help if there is really no memory left
data = pd.read_csv('C://Users//quentin//Documents//interactias//interactias//data//interactions.tsv', delimiter='\t', encoding='utf-8', low_memory=False)
len(data)

3878740

In [74]:
# Take a little look at the data to make sure it makes sense
data.head()

Unnamed: 0,sourceTaxonId,sourceTaxonIds,sourceTaxonName,sourceTaxonRank,sourceTaxonPathNames,sourceTaxonPathIds,sourceTaxonPathRankNames,sourceTaxonSpeciesName,sourceTaxonSpeciesId,sourceTaxonGenusName,...,eventDateUnixEpoch,argumentTypeId,referenceCitation,referenceDoi,referenceUrl,sourceCitation,sourceNamespace,sourceArchiveURI,sourceDOI,sourceLastSeenAtUnixEpoch
0,EOL:12001247,EOL:12001247 | OTT:133330 | IRMNG:11733708 | N...,Leptoconchus massini,species,Animalia | Mollusca | Gastropoda | Neogastropo...,EOL:1 | EOL:2195 | EOL:2366 | EOL:2447 | EOL:4...,kingdom | phylum | class | order | superfamily...,Leptoconchus massini,EOL:12001247,Leptoconchus,...,,https://en.wiktionary.org/wiki/support,"Gittenberger, A., Gittenberger, E. (2011). Cry...",10.1007/s13127-011-0039-1,,Jorrit H. Poelen. 2014. Species associations m...,FloraVincent/template-dataset,https://github.com/FloraVincent/template-datas...,,2019-03-30T23:08:44.205Z
1,EOL:12001247,EOL:12001247 | OTT:133330 | IRMNG:11733708 | N...,Leptoconchus massini,species,Animalia | Mollusca | Gastropoda | Neogastropo...,EOL:1 | EOL:2195 | EOL:2366 | EOL:2447 | EOL:4...,kingdom | phylum | class | order | superfamily...,Leptoconchus massini,EOL:12001247,Leptoconchus,...,,https://en.wiktionary.org/wiki/support,"Gittenberger, A., Gittenberger, E. (2011). Cry...",10.1007/s13127-011-0039-1,,Jorrit H. Poelen. 2014. Species associations m...,FloraVincent/template-dataset,https://github.com/FloraVincent/template-datas...,,2019-03-30T23:08:44.205Z
2,EOL:12001243,EOL:12001243 | WD:Q13393577 | OTT:550603 | WOR...,Leptoconchus inpleuractis,species,Animalia | Mollusca | Gastropoda | Neogastropo...,EOL:1 | EOL:2195 | EOL:2366 | EOL:2447 | EOL:4...,kingdom | phylum | class | order | superfamily...,Leptoconchus inpleuractis,EOL:12001243,Leptoconchus,...,,https://en.wiktionary.org/wiki/support,"Gittenberger, A., Gittenberger, E. (2011). Cry...",10.1007/s13127-011-0039-1,,Jorrit H. Poelen. 2014. Species associations m...,FloraVincent/template-dataset,https://github.com/FloraVincent/template-datas...,,2019-03-30T23:08:44.205Z
3,EOL:12001243,EOL:12001243 | WD:Q13393577 | OTT:550603 | WOR...,Leptoconchus inpleuractis,species,Animalia | Mollusca | Gastropoda | Neogastropo...,EOL:1 | EOL:2195 | EOL:2366 | EOL:2447 | EOL:4...,kingdom | phylum | class | order | superfamily...,Leptoconchus inpleuractis,EOL:12001243,Leptoconchus,...,,https://en.wiktionary.org/wiki/support,"Gittenberger, A., Gittenberger, E. (2011). Cry...",10.1007/s13127-011-0039-1,,Jorrit H. Poelen. 2014. Species associations m...,FloraVincent/template-dataset,https://github.com/FloraVincent/template-datas...,,2019-03-30T23:08:44.205Z
4,EOL:12001243,EOL:12001243 | WD:Q13393577 | OTT:550603 | WOR...,Leptoconchus inpleuractis,species,Animalia | Mollusca | Gastropoda | Neogastropo...,EOL:1 | EOL:2195 | EOL:2366 | EOL:2447 | EOL:4...,kingdom | phylum | class | order | superfamily...,Leptoconchus inpleuractis,EOL:12001243,Leptoconchus,...,,https://en.wiktionary.org/wiki/support,"Gittenberger, A., Gittenberger, E. (2011). Cry...",10.1007/s13127-011-0039-1,,Jorrit H. Poelen. 2014. Species associations m...,FloraVincent/template-dataset,https://github.com/FloraVincent/template-datas...,,2019-03-30T23:08:44.205Z


## Drop duplicates

This line gets rid of duplicate interations. I currently can't see a reason to keep them, but this perhaps should be checked. 
Some more common interactions might have more support in the literature and therefore more records. Deduplicating them tends to equal out rare interactions with common ones.

In [75]:
data.drop_duplicates(['sourceTaxonName', 'interactionTypeName', 'targetTaxonName'], inplace = True)

In [76]:
## Check how many rows are left
len(data)

1103723

## Remove ranks that are not species
Many entries in GLoBI are non-specific interations at a high taxonomic level. For example, roses are visited by bees.
I have chossen to remove these interactions from the study, at least preliminarly

In [77]:
data.drop(data[data['sourceTaxonRank'] != 'species'].index, inplace = True)
data.drop(data[data['targetTaxonRank'] != 'species'].index, inplace = True)

In [78]:
## Check how many rows are left
len(data)

536018

## Remove "interactsWith"
There are many vague interactions with the term interactsWith. These aren't really that useful, because it could be any kind of interaction, positive or negative, direct or indirect.

In [79]:
data = data[data.interactionTypeName != 'interactsWith']

len(data)

394573

## Define the key taxon for the notebook for which to find all interactions


In [157]:
taxon = "Oxalis corniculata"
#taxon = "Oxalis pes-caprae"
#taxon = "Lantanophaga pusillidactyla"
#taxon = "Lantana camara"
#taxon = "Cirsium vulgare"
#taxon = "Procyon lotor" # raccoon
#taxon = "not exist"
#taxon = "Sciurus carolinensis" # Eastern grey squirrel

In [245]:
## Define the country of interest

In [246]:
country  = 'BE' #Belgium

## Check to see if the taxon exits in GBIF

In [158]:
try:
    key = species.name_suggest(q=taxon, limit = 1)
    #print(key)
    
    if len(key) == 0:
        raise ValueError("Taxon not found on GBIF")
except ValueError as ve:
    print(ve)
    exit(1)




Taxon not found on GBIF


In [82]:
print('The taxon to be studied is ' + key[0]['scientificName'])

The taxon to be studied is Lantana camara L.


In [17]:
# What are all the types of interactions involving taxon as source taxon?
data[data['sourceTaxonName'] == taxon]['interactionTypeName'].unique()

array(['visitsFlowersOf', 'pollinates', 'hasDispersalVector'],
      dtype=object)

In [83]:
# What are all the types of interactions involving taxon as target taxon?
data[data['targetTaxonName'] == taxon]['interactionTypeName'].unique()

array(['eats', 'visitsFlowersOf', 'dispersalVectorOf', 'hasHost',
       'parasiteOf', 'mutualistOf', 'pollinates'], dtype=object)

How many taxon sources do I have?

In [84]:
len(data[data['sourceTaxonName'] == taxon])

49

How many taxon targets do I have?

In [85]:
len(data[data['targetTaxonName'] == taxon])

272

Gather together all the data where the target is the taxon in question.

In [86]:
# What are the columns of this dataset?
data.columns

Index(['sourceTaxonId', 'sourceTaxonIds', 'sourceTaxonName', 'sourceTaxonRank',
       'sourceTaxonPathNames', 'sourceTaxonPathIds',
       'sourceTaxonPathRankNames', 'sourceTaxonSpeciesName',
       'sourceTaxonSpeciesId', 'sourceTaxonGenusName', 'sourceTaxonGenusId',
       'sourceTaxonFamilyName', 'sourceTaxonFamilyId', 'sourceTaxonOrderName',
       'sourceTaxonOrderId', 'sourceTaxonClassName', 'sourceTaxonClassId',
       'sourceTaxonPhylumName', 'sourceTaxonPhylumId',
       'sourceTaxonKingdomName', 'sourceTaxonKingdomId', 'sourceId',
       'sourceOccurrenceId', 'sourceCatalogNumber', 'sourceBasisOfRecordId',
       'sourceBasisOfRecordName', 'sourceLifeStageId', 'sourceLifeStageName',
       'sourceBodyPartId', 'sourceBodyPartName', 'sourcePhysiologicalStateId',
       'sourcePhysiologicalStateName', 'interactionTypeName',
       'interactionTypeId', 'targetTaxonId', 'targetTaxonIds',
       'targetTaxonName', 'targetTaxonRank', 'targetTaxonPathNames',
       'targetTaxonPath

## Simplify the table to make it readable

## Get the primary interation data for the species in question

In [87]:
    interactDataTaxon = data[data['targetTaxonName'] == taxon]
    interactDataTaxon = interactDataTaxon.append(data[data['sourceTaxonName'] == taxon])

## Get a list of all the primary interacting species

In [90]:
interactingTaxa = pd.DataFrame(interactDataTaxon['sourceTaxonName'].append(interactDataTaxon['targetTaxonName']).unique())

## Get all the secondary interactions

In [91]:
for name in interactingTaxa:
    interactDataTaxon = interactDataTaxon.append(data[data['sourceTaxonName'] == name])

In [92]:
    cleanInteractDataTaxon = interactDataTaxon[['sourceTaxonId', 'sourceTaxonName', 'sourceTaxonRank',
       'sourceTaxonFamilyName', 'interactionTypeName',
       'targetTaxonName','targetTaxonRank',
        ]].dropna(subset=['targetTaxonName'])

In [93]:
cleanInteractDataTaxon.head()

Unnamed: 0,sourceTaxonId,sourceTaxonName,sourceTaxonRank,sourceTaxonFamilyName,interactionTypeName,targetTaxonName,targetTaxonRank
206732,EOL_V2:1050136,Streptopelia chinensis,species,Columbidae,eats,Lantana camara,species
638690,EOL:2757005,Thyreus calceatus,species,Apidae,visitsFlowersOf,Lantana camara,species
639089,GBIF:1342584,Braunsapis bouyssoui,species,Apidae,visitsFlowersOf,Lantana camara,species
640953,EOL:2759914,Xylocopa fenestrata,species,Apidae,visitsFlowersOf,Lantana camara,species
1000508,EOL:326533,Lemur catta,species,Lemuridae,dispersalVectorOf,Lantana camara,species


In [94]:
# How many different sort of interaction do I have left?
# Checking out all the interaction types
cleanInteractDataTaxon['interactionTypeName'].unique()

array(['eats', 'visitsFlowersOf', 'dispersalVectorOf', 'hasHost',
       'parasiteOf', 'mutualistOf', 'pollinates', 'hasDispersalVector'],
      dtype=object)

In [95]:
cleanInteractDataTaxon.groupby(cleanInteractDataTaxon['interactionTypeName']).size().sort_values(ascending = False)

interactionTypeName
visitsFlowersOf       129
eats                   81
parasiteOf             42
pollinates             25
hasDispersalVector     16
mutualistOf            15
hasHost                12
dispersalVectorOf       1
dtype: int64

In [96]:
len(cleanInteractDataTaxon)

321

In [97]:
cleanInteractDataTaxon.head()

Unnamed: 0,sourceTaxonId,sourceTaxonName,sourceTaxonRank,sourceTaxonFamilyName,interactionTypeName,targetTaxonName,targetTaxonRank
206732,EOL_V2:1050136,Streptopelia chinensis,species,Columbidae,eats,Lantana camara,species
638690,EOL:2757005,Thyreus calceatus,species,Apidae,visitsFlowersOf,Lantana camara,species
639089,GBIF:1342584,Braunsapis bouyssoui,species,Apidae,visitsFlowersOf,Lantana camara,species
640953,EOL:2759914,Xylocopa fenestrata,species,Apidae,visitsFlowersOf,Lantana camara,species
1000508,EOL:326533,Lemur catta,species,Lemuridae,dispersalVectorOf,Lantana camara,species


## Create a file with all the nodes and their attributes

In [98]:
# Get the source nodes
#nodes = cleanInteractDataTaxon[['sourceTaxonName']].unique().tolist()
nodes = cleanInteractDataTaxon.drop_duplicates(subset=['sourceTaxonName'])

In [99]:
# Get the target nodes
nodes = nodes.append(cleanInteractDataTaxon.drop_duplicates(subset=['targetTaxonName']))

In [100]:
nodes.head()

Unnamed: 0,sourceTaxonId,sourceTaxonName,sourceTaxonRank,sourceTaxonFamilyName,interactionTypeName,targetTaxonName,targetTaxonRank
206732,EOL_V2:1050136,Streptopelia chinensis,species,Columbidae,eats,Lantana camara,species
638690,EOL:2757005,Thyreus calceatus,species,Apidae,visitsFlowersOf,Lantana camara,species
639089,GBIF:1342584,Braunsapis bouyssoui,species,Apidae,visitsFlowersOf,Lantana camara,species
640953,EOL:2759914,Xylocopa fenestrata,species,Apidae,visitsFlowersOf,Lantana camara,species
1000508,EOL:326533,Lemur catta,species,Lemuridae,dispersalVectorOf,Lantana camara,species


In [123]:
interactingTaxa.head(3)


Unnamed: 0,0
0,Streptopelia chinensis
1,Thyreus calceatus
2,Braunsapis bouyssoui


### This function takes a name string and checks on GBIF to see if the name exists there.

In [175]:
def speciesExistsInGBIF(name, rank):
    try:
        key = species.name_suggest(q=name, rank=rank, limit = 1)
        #print(key)

        if len(key) == 0:
            return False
        else:
            return key
    except ValueError as ve:
        print(ve)
        exit(1)

### Check to see which taxa in the interaction network are found in GBIF and list those ones that are not

In [187]:
taxaNotFound = []
taxaFound = []

print('Taxa from GLoBI, but not found in GBIF')
for name in interactingTaxa.iterrows():
    GBIFName = speciesExistsInGBIF(name[1], "species")
    if GBIFName == False:
        print(name[1][0])
        taxaNotFound.append({'name': name[1][0]})
    else:
        taxaFound.append(GBIFName)
    
    #print(name[1])
    
taxaFound = pd.DataFrame(taxaFound)

Taxa from GLoBI, but not found in GBIF
Papilio polyxenes
Parrhasius m-album
Graphium choredon
Aphelinus basilicus
Lantanophaga pusillidactyla


### This function takes a GBIF species key and counts how many occurences exist.

In [234]:
def speciesCountInGBIF(key, country):
    try:
        return occ.count(taxonKey=key, country = country)
    except ValueError as ve:
        print(ve)
        exit(1)

### Loop over all the taxa that are in the interaction network and are in GBIF to find the ones that have been found in the country

In [256]:
taxaFoundInCountry = []

for GBIFtaxon in taxaFound.iterrows():
    #print('{0} found {1} time in Belgium.'.format(GBIFtaxon[1][0]['species'],speciesCountInGBIF(GBIFtaxon[1][0]['key'],country)))
    GBIFOccCount = speciesCountInGBIF(GBIFtaxon[1][0]['key'],country)
    if GBIFOccCount > 0:
        taxaFoundInCountry.append({'key': GBIFtaxon[1][0]['key'], 'species': GBIFtaxon[1][0]['species'], 'count': GBIFOccCount})

taxaFoundInCountry.count()

TypeError: count() takes exactly one argument (0 given)

In [273]:
print("The number of species left in the network is {0}".format(len(taxaFoundInCountry)))

The number of species left in the network is 20
