## Characters influencing pollinator shift in the genus Fritillaria

![Fritillaria](Fritillaria_Flowers.jpg)

In [None]:
#import libraries
import pandas as pd
import re
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, BaggingClassifier, VotingClassifier, AdaBoostClassifier
from sklearn.model_selection import cross_val_score, cross_val_predict
from xgboost import plot_importance
from xgboost import plot_tree
import graphviz
from sklearn import metrics
from sklearn.preprocessing import StandardScaler, LabelEncoder
from scipy.spatial.distance import pdist, squareform
from sklearn.decomposition import PCA
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, classification_report
import itertools
from functools import reduce

In [None]:
#nectar, morphology and flower structure data
data = pd.read_csv("PELNE DANE.csv", delimiter=";", decimal=",")
#aminoacids, nectar proprerties
amino = pd.read_csv("aminokwasy.csv")
nectar = data.copy()
#reflectancy
xls = pd.ExcelFile('Fritillaria Warsaw.xlsx')

In [None]:
data.info()

* Gatunek - species
* KOL - colour of the flower (1 - greenish, 2 - white, 3 - claret, 4 - yellow, 5 - orange, 6 -pink, 7 - red, 1/3 -greenish and claret )
* APZ - stomata of the outer tepal
* APW - stomata of the inner tepal
* D1 - horizontal diagonal of flower
* D2 - vertical diagonal of flower
* PS - diagonal of the entrance
* R - PS / PL
* SC - scape length
* PL - petalas diagonal
* PYL - anthers length
* K-J - PL - PYL
* PYLZN - distance between stigma and anther
* PYLPL - distance between tepals and anther
* ZN - style length
* SZY - stigma length
* NB - number of flowers
* KAT -angle between flower's diagonal and scape
* V - nectar volume
* KON - nectar concentration
* MASA - nectar mass
* POW - area of nectaries
* NAS - distiance between the petal and the nectary
* WYS - nectary hight
* SZER - nectary width
* SZEW - diagonal of the scape

In [None]:
amino.info()

* Gatunek - species of the flower, 
* SEZON - year of study, 
* ORIGIN - origin of the flower, 
* dominujący cukier - sugar,
* ASP', 'GLU', 'ASN', 'SER', 'GLN', 'GLY', 'THR', 'VAL', 'MET', 'LEU', 'LYS', 'ARG', 'BALA', 'ALA','TYR', 'PRO', 'TRP', 'PHE', 'ILE', 'AABA', 'CY2', 'OSER+HIS', 'CIT','NVA', 'ORN', 'TAU', 'GABA', 'HYP', 'SAR', 'BABA' - aminoacids in nectar, 
* v - nectar volume, 
* % - nectar concentration, 
* pollinator - main pollinator

In [None]:
amino. columns

In [None]:
#removing unecessary colums
data.drop(["R", "K-J", "APZ", "APW"], axis=1, inplace=True)
amino.drop(["suma", 'SEZON'], axis=1, inplace=True)

In [None]:
#returns subspecies
def subgenus(name):
    if ((name == 'F. amabilis') | (name == 'F. ayakoana')):
        return "Japonica"
    elif ((name == 'F. affinis') | (name == 'F. gentneri') | (name == 'F. camtschatcensis') | 
          (name == 'F. eastwoodiae') | (name == 'F. liliacea') | (name == 'F. recurva')): 
        return "Liliorhiza"
    elif (name == 'F. sewerzowii'):
        return "Korolkowia"
    elif ((name == 'F. eduardii') | (name == 'F. imperialis') | (name == 'F. raddeana')):
        return "Petilium"
    elif ((name == 'F. bucharica') | (name == 'F. sthenanthera')):
        return "Rhinopetalum"
    elif (name == 'F. persica'):
        return "Theresia"
    elif ((name == 'F. grandiflora') | (name == 'F. olgae')):
        return "Other"
    else:
        return "Fritillaria"

In [None]:
#returns main pollinator
def pollinator(name):
    if((name == 'F. eduardii') | (name == 'F. imperialis')):
        return "PAS"
    elif ((name == 'F. camtchatcensis') | (name == 'F. drenovskii') | (name == 'F. graeca') | 
          (name == 'F. pyrenaica') | (name == 'F. thessala') | (name == 'F. ussuriensis')):
        return "FLY"
    elif ((name == "F. recurva") | (name == "F. gentneri")):
        return "HUM"
    else:
        return "BEE"

In [None]:
#adding columns with subspecies and main pollinator name 
data['Subspecies'] = data['Gatunek'].apply(subgenus)
data['Pollinator'] = data['Gatunek'].apply(pollinator)

In [None]:
#mean for nectar features group by species
df = data.groupby(['Gatunek']).agg({'V': 'mean', 'KON': 'mean', 
                                    'MASA':'mean'}).reset_index()

In [None]:
df.info()

In [None]:
df.dropna(inplace=True)

In [None]:
df['Pollinator'] = df['Gatunek'].apply(pollinator)
df['Subspecies'] = df["Gatunek"].apply(subgenus)

## Flower characters

In [None]:
# mean of angle and diagonal of flower
flow_prop = data.groupby(['Gatunek']).agg({'V': 'mean','KAT': 'mean',
                                   'PS': 'mean'}).reset_index()

In [None]:
flow_prop['Pollinator'] = flow_prop["Gatunek"].apply(pollinator)

In [None]:
col = {'BEE':'#228B22', 'PAS':'#ffe732', 'FLY':"#e5247e", 'HUM':'#46c4fd'}
flow_prop["colors"] = flow_prop.Pollinator.apply(lambda x: col[x])

In [None]:
labels =[]
sns.set_style("white")
for key, value in col.items():
    labels.append(key)
    plt.scatter(x = flow_prop.KAT[flow_prop.Pollinator == key], 
                y = flow_prop.V[flow_prop.Pollinator == key],
                s = 80,
                c = flow_prop.colors[flow_prop.Pollinator == key], alpha=0.7, edgecolors='black')

plt.yscale('log')
plt.xscale('log') 
plt.xlabel('Nectar volume (log scale)')
plt.ylabel('Angle (log scale)')
plt.title('Angle of flower vs nectar volume', fontsize=18)

plt.text(20, 64, 'F.recurva')
plt.text(65, 180, 'F.eduardi')
plt.legend(labels, title = "Pollinator", loc = 'lower left')

plt.show()

In [None]:
labels =[]
sns.set_style("white")
for key, value in col.items():
    labels.append(key)
    plt.scatter(x = flow_prop.PS[flow_prop.Pollinator == key], 
                y = flow_prop.V[flow_prop.Pollinator == key],
                s = 80,
                c = flow_prop.colors[flow_prop.Pollinator == key], alpha=0.7, edgecolors='black')

plt.yscale('log')
plt.xscale('log') 
plt.xlabel('Nectar volume (log scale)')
plt.ylabel('Diagonal (log scale)')
plt.title('Diagonal vs nectar volume', fontsize=18)

#plt.xticks([1000,10000,100000], ['1k','10k','100k'])
#plt.text(120, 24, 'F.imperialis')
#plt.text(70, 4, 'F.eduardi')

plt.legend(labels, title = "Pollinator", loc = 'lower right')
plt.show()

In [None]:
data.info()

In [None]:
#removing cols with to many missing values - (NB, SZEW, SZY) and with nectar prop.
data.drop(["V", "MASA", "KON","Subspecies", "D1", "D2", "NB", "KAT", "SZY", "SZEW","PS", "SC","ZN", "NAS", "WYS", "SZER"], axis=1, inplace=True)

In [None]:
columns = data.columns
columns[2:12]

In [None]:
#filling na's with mean value for each species
for i in range(2,7):
    data.loc[data[columns[i]].isnull(), columns[i]] = data.groupby('Gatunek')[columns[i]].transform('mean')

In [None]:
data = data.dropna()

In [None]:
data.drop("Gatunek", axis=1, inplace=True)

In [None]:
data.info()

In [None]:
sns.pairplot(data)
plt.show()

In [None]:
data['PYL'].corr(data['PL'])

In [None]:
data.drop("PL", axis=1, inplace=True)

In [None]:
sns.pairplot(data.iloc[:,1:10])
plt.show()

In [None]:
X = data.drop("Pollinator", axis=1)
y = data.Pollinator

In [None]:
y.value_counts()

In [None]:
X = pd.get_dummies(X,drop_first=True)
columnX = X.columns

In [None]:
le = LabelEncoder()
le.fit(y)
y_e = le.transform(y)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y_e, test_size=0.25, random_state=42)

In [None]:
models = [LogisticRegression(),
          DecisionTreeClassifier(),
          SVC(probability=True),  
          RandomForestClassifier(), 
          AdaBoostClassifier(),
         xgb.XGBClassifier()]

for model in models:

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    predictions = np.round(y_pred)

    accuracy = accuracy_score(y_test, predictions)
    print("%s accuracy: %.2f%%" % (model.__class__.__name__,accuracy * 100.0))

In [None]:
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
conf_mat = confusion_matrix(y_test, y_pred)

In [None]:
#plotting confusion matrix
def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')


In [None]:
labels = ["BEE","HUM","FLY", "PAS"]

In [None]:
# Plot non-normalized confusion matrix
plt.figure()
plot_confusion_matrix(conf_mat, classes=labels,
                      title='Confusion matrix')
plt.show()

In [None]:
importances = model.feature_importances_
indices = np.argsort(importances)[::-1]

# Rearrange feature names so they match the sorted feature importances
names = [columnX[i] for i in indices]

# Create plot
plt.figure()

# Create plot title
plt.title("Feature Importance")

# Add bars
plt.bar(range(X.shape[1]), importances[indices])

# Add feature names as x-axis labels
plt.xticks(range(X.shape[1]), names, rotation=90)

# Show plot
plt.show()

In [None]:
print(classification_report(y_test, y_pred))

Flower features can determine which pollinators or group of pollinators will be attracted. Moreover, the character and the location of the reward influence the species which are attracted. Our Fritillaria study indicates that foundation for the pollinators switch might be the quality and the quantity of the reward offered in the flower. However, such shift seems unlikely to generate reproductive isolation. Our analysis indicated flower color (orange), the area of the nectary (POW) and anthers length (PYL) as a key features cosing pollinator shift.

## Reflectancy

Flowers serve as filters signalling attractants for pollinators as well as deterrents for flower antagonists. Due to differences in the number of photoreceptors types, the spectral range of sensitivity of the colour photoreceptors, and differences in the colour preference of various flower visitors, flowers can be attractive to pollinators and unattractive to illegitimate visitors, i.e. pollen thieves and nectar robbers, at the same time. Since many flower visitors, such as hummingbirds, bees, hoverflies, beetles, butterflies and moths are sensitive to ultraviolet light, these flower visitors necessarily view different aspects of the spectral reflectance of flowers as compared to UV-blind humans.(Verhoeven et al. 2018)

![Spectral sensitivity](spectral_sensitivity.jpg)

In [None]:
names = xls.sheet_names

### Nectaries

In [None]:
#searching for "nectaries" in sheets name
nectaries = []
pattern = ".+nectari"
nectaries = [x for x in names if re.match(pattern,x)]

In [None]:
#getting indexes of "nectaries" sheets
index_nec = []
for i in range(len(nectaries)):
    index_nec.append(names.index(nectaries[i]))   

In [None]:
#nectaries sheets
sheets_nect = []
for i in range(len(nectaries)):
    sheet = xls.parse(index_nec[i])
    sheets_nect.append(sheet)

In [None]:
#columns wiht reflectancy value
columns_ref = sheets_nect[0].columns[1::2]

In [None]:
#selcecting UV & visible spectrum
for i in range(len(sheets_nect)):
    sheets_nect[i] = sheets_nect[i][(sheets_nect[i].nm > 300) & (sheets_nect[i].nm < 700)]

In [None]:
#mean of reflectancy value
for i in range(len(sheets_nect)):
    sheets_nect[i] = sheets_nect[i].assign(mean=sheets_nect[i][columns_ref].mean(axis=1)/1000)

In [None]:
#list of analysed species
species = ["F. imperialis","F. michailovskyi",'F. whittallii','F. tubiformis','F. gracilis',
           'F. eduardii','F. minima','F. thunbergii','F. eastwoodiae', "F. liliaceae", 
           "F. liliaceae", "F. kurdica",'F. pyrenaica', 'F. pontica', 'F. thunbergii',
           'F. uva vulpis', 'F. affinis','F. montana',  'F. aryiana', 'F. dasyphylla', 
           'F. verticillata','F. serpenticola', 'F. mutabilis', 'F. raddeana', 'F. graeca', 
           'F. caucasica', 'F. gibbosa', 'F. armena','F. stenanthera','F. ussuriensis',
           'F. kotschyana','F. meleagris', "F. michailovskyi", 'F. latakesis', 'F. gentneri',
          'F. recurva', 'F. sewerzowii']

In [None]:
#changing labels for species name
nect_sp=[]
for i in range(len(nectaries)):
    nect_sp.append(nectaries[i].replace('Sp'+str(i+1), species[i]))

In [None]:
nect_species = []
for i in range(len(nect_sp)):
    nect_species.append(nect_sp[i].replace(" nectaries", ""))

In [None]:
col_nect = ("#000000", "#ff1493", "#187e03", "#a8224a", "#58559c", "#ffb653", "#0d8dd7", 
            "#eaec32", "#ff0000", "#3701fe", "#fff138", "#767676")

In [None]:
import matplotlib as mpl
N = 37
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors   # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, col_nect)):
    plt.plot(sheets_nect[i]["nm"],sheets_nect[i]["mean"], 
             #color=mpl.colors.to_hex(col_nect[i]), 
             color = color, 
             markersize=1,
             linestyle=linestyle, marker=marker, label=nect_species[i])
    plt.ylim(0,70)
    #plt.title("Nectaries reflectancy", fontsize=16)
plt.xlabel("Wavelength [nm]")
plt.ylabel("Reflectancy")
#plt.figsize=(10,10)
plt.legend(bbox_to_anchor=(1.1, 1), loc=2, borderaxespad=0.,ncol=2)
#plt.savefig('Nectaries.jpg', bbox_inches="tight", dpi=1000)
plt.show()

### Petal outside

In [None]:
outside = []
pattern = ".+ou.+"
outside = [x for x in names if re.match(pattern,x)]

In [None]:
indexes_outside = []
for i in range(len(outside)):
    indexes_outside.append(names.index(outside[i]))

sheets_outside = []
for i in range(len(outside)):
    sheet = xls.parse(indexes_outside[i])
    sheets_outside.append(sheet)

In [None]:
#returns UV & visible spectrum
for i in range(len(sheets_outside)):
    sheets_outside[i] = sheets_outside[i][(sheets_outside[i].nm > 300) & (sheets_outside[i].nm < 700)]

In [None]:
#returns mean of wavelengths (relative)
for i in range(len(sheets_outside)):
    sheets_outside[i] = sheets_outside[i].assign(mean=sheets_outside[i][columns_ref].mean(axis=1)/1000)

In [None]:
numbers = []
for i in range(1,38):
    numbers.append("Sp"+str(i))
nms = list(zip(numbers, species))
outside_species = []
for i in range(11):
    outside_species.append(reduce(lambda a, kv: a.replace(*kv), nms, outside[i])) 

In [None]:
for i in range(11, len(outside)):
    outside_species.append(reduce(lambda a, kv: a.replace(*kv), nms[9:], outside[i])) 

In [None]:
N = 45
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors   # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, col_nect)):
    plt.plot(sheets_outside[i]["nm"],sheets_outside[i]["mean"], 
             #color=mpl.colors.to_hex(col_nect[i]), 
             color = color, 
             markersize=1,
             linestyle=linestyle, 
             marker=marker, label=outside_species[i])
plt.title("Tepals reflectancy", fontsize=16)
plt.xlabel("Wavelength [nm]")
plt.ylabel("Reflectancy")
#plt.figsize=(20,20)
plt.legend(bbox_to_anchor=(1.1, 1), loc=2, borderaxespad=0.,ncol=2)
#plt.savefig('Outside.jpg', bbox_inches="tight", dpi=1000)
plt.show()

In [None]:
#grouping species by color
green = ['F. whittallii', 'F. pontica', 'F. thunbergii', 'F. verticillata']
orange = ["F. imperialis",'F. eduardii']
pink = ['F. aryiana', 'F. gibbosa','F. stenanthera']
purple = ["F. michailovskyi",'F. tubiformis','F. gracilis', "F. kurdica",'F. pyrenaica', 
           'F. uva vulpis', 'F. affinis','F. montana', 'F. mutabilis', 'F. graeca', 
           'F. caucasica', 'F. armena','F. ussuriensis','F. kotschyana','F. meleagris', 
          "F. michailovskyi", 'F. latakesis', 'F. sewerzowii']
red = ['F. eastwoodiae', 'F. gentneri', 'F. recurva']
white = ["F. liliaceae"]
yellow = ['F. minima','F. thunbergii','F. dasyphylla', 'F. serpenticola', 'F. raddeana']
flower_colors = ["Green", "Orange", "Pink", "Red", "White", "Yellow"]
flower = [green, orange, pink, red, white, yellow]
col_ref = ['#228B22', "#e5247e", "#00447C", 'y', '#991188',"#ffe732", 'k', '#46c4fd']

### Petals reflectancy by colour of flower

In [None]:
for z in range(len(flower)):
    #indexes of specific color flower
    matching = [i for i, x in enumerate(outside_species) for keyword in flower[z] if keyword in x]
    plt.figure()
    for i in range(len(matching)):  
        plt.plot(sheets_outside[matching[i]]["nm"], sheets_outside[matching[i]]["mean"], 
                c = col_ref[i],
                label = outside_species[matching[i]]),
        plt.legend(),
        plt.ylim(0,70)
    plt.xlabel("Wavelength [nm]")
    plt.ylabel("Reflectancy")
    plt.title(flower_colors[z])
    #saving plot with proper name
    #plt.savefig('{}.jpg'.format(flower_colors[z]))
plt.show()

In [None]:
matching_purple = [i for i, x in enumerate(outside_species) for keyword in purple if keyword in x]
N=27
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors   # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, col_nect)):
    plt.plot(sheets_outside[matching_purple[i]]["nm"],sheets_outside[matching_purple[i]]["mean"], 
             #color=mpl.colors.to_hex(col_nect[i]), 
             color = color, 
             markersize=1,
             linestyle=linestyle, marker=marker, label=outside_species[matching_purple[i]])
    plt.ylim(0,70)
    plt.title("Purple")
    plt.xlabel("Wavelength [nm]")
    plt.ylabel("Reflectancy")
#plt.figsize=(10,10)

plt.legend(bbox_to_anchor=(1.2, 1), loc=2, borderaxespad=0.,ncol=2)
plt.savefig('purple.jpg', bbox_inches="tight", dpi=1000)
plt.show()

### Petal inside

In [None]:
inside = []
pattern = ".+in.+"
inside = [x for x in names if re.match(pattern,x)]

In [None]:
#getting inside sheets
indexes_inside = []
for i in range(len(inside)):
    indexes_inside.append(names.index(inside[i]))

sheets_inside = []
for i in range(len(inside)):
    sheet = xls.parse(indexes_inside[i])
    sheets_inside.append(sheet)

In [None]:
#UV & visible spectrum
for i in range(len(sheets_inside)):
    sheets_inside[i] = sheets_inside[i][(sheets_inside[i].nm > 300) & (sheets_inside[i].nm < 700)]

In [None]:
#returns mean of wavelengths
for i in range(len(sheets_inside)):
    sheets_inside[i] = sheets_inside[i].assign(mean=sheets_inside[i][columns_ref].mean(axis=1)/1000)

In [None]:
inside_species = []
for i in range(11):
    inside_species.append(reduce(lambda a, kv: a.replace(*kv), nms, inside[i]))

In [None]:
for i in range(11, len(outside)):
    inside_species.append(reduce(lambda a, kv: a.replace(*kv), nms[9:], outside[i]))

In [None]:
N = 45
l_styles = ['-','--','-.',':']
m_styles = ['','.','o','^','*']
colormap = mpl.cm.Dark2.colors   # Qualitative colormap
for i,(marker,linestyle,color) in zip(range(N),itertools.product(m_styles,l_styles, col_nect)):
    plt.plot(sheets_inside[i]["nm"],sheets_inside[i]["mean"], 
             #color=matplotlib.colors.to_hex(col[i]), 
             color = color, markersize=1,
             linestyle=linestyle, marker=marker, label=inside_species[i])
#plt.figsize=(10,10)
plt.title("Tepals inside reflectancy", fontsize=16)
plt.xlabel("Wavelength [nm]")
plt.ylabel("Reflectancy")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,ncol=3)
#plt.savefig('Inside.jpg', bbox_inches="tight", dpi=1000)
plt.show()

The wax layer on the petals surface coses lack of reflectance in UV. Red colour of the bird-pollinated species is not invisible for insect pollinators(cryptic). Red flowers (F. genteri) are visible for insect pollinators (red coloration activates other then green receptor in insects eyes).

In all cases the reflectance values were similar among most of the studied species. This indicate that the colour as a floral character seems to be conservative, which means that the pollinators preferences play a minor role. However, floral colour proved to be one of the most important characters deciding about occurrence of the pollinator shift. 

## Nectar properties

In [None]:
col = {'BEE':'#228B22', 'PAS':'#ffe732', 'FLY':"#e5247e", 'HUM':'#46c4fd'}
df["colors"] = df.Pollinator.apply(lambda x: col[x])

In [None]:
labels =[]
sns.set_style("white")
for key, value in col.items():
    labels.append(key)
    plt.scatter(x = df.V[df.Pollinator == key], y = df.KON[df.Pollinator == key],
            s = np.array(df.MASA[df.Pollinator == key]) * 10000, 
            c = df.colors[df.Pollinator == key], alpha=0.7, edgecolors='black')

#plt.yscale('log')
plt.xscale('log') 
plt.xlabel('Nectar volume (log scale)')
plt.ylabel('Nectar concentration')
plt.title('Nectar volume vs nectar concentration', fontsize=18)
plt.xlim(0,350)

#plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.text(120, 24, 'F.imperialis')
plt.text(70, 4, 'F.eduardi')
lgnd = plt.legend(labels, title='Color by pollinator')
for i in range(len(col)):
    lgnd.legendHandles[i]._sizes = [60]

plt.text(67,55, "Size by mass")
plt.scatter(110, 51, s=100, c = 'white', edgecolors='k')
plt.text(140, 49.8,  r'$0.1 \mu g$')
# Show the plot
plt.show()

In [None]:
import plotly as py
import plotly.graph_objs as go
import ipywidgets as widgets 
import numpy as np
from scipy import special
py.offline.init_notebook_mode(connected = True)
data = df
#df = df.sort_values(['pollinator'])
slope = 2.666051223553066e-05
hover_text = []
bubble_size = []

for index, row in df.iterrows():
    hover_text.append(('Species: {Gatunek}<br>'+
                      'Mass: {MASA}<br>'
                      ).format(Gatunek=row['Gatunek'],
                                            MASA=round(row['MASA'],3),
                                            ))
    bubble_size.append(row['MASA']*1000)

df['text'] = hover_text
df['size'] = bubble_size
sizeref = 2.*max(df['size'])/(100**2)

trace0 = go.Scatter(
    x=df['V'][df['Pollinator'] == 'BEE'],
    y=df['KON'][df['Pollinator'] == 'BEE'],
    mode='markers',
    name='BEE',
    text=df['text'][df['Pollinator'] == 'BEE'],
    marker=dict(
        symbol='circle',
        sizemode='area',
        sizeref=sizeref,
        size=df['size'][df['Pollinator'] == 'BEE'],
        color = '#228B22',
        line=dict(
            width=2
        ),
    )
)
trace1 = go.Scatter(
    x=df['V'][df['Pollinator'] == 'PAS'],
    y=df['KON'][df['Pollinator'] == 'PAS'],
    mode='markers',
    name='PAS',
    text=df['text'][df['Pollinator'] == 'PAS'],
    marker=dict(
        symbol='circle',
        sizemode='area',
        sizeref=sizeref,
        size=df['size'][df['Pollinator'] == 'PAS'],
        color = '#ffe732',
        line=dict(
            width=2
        ),
    )
)
trace2 = go.Scatter(
    x=df['V'][df['Pollinator'] == 'FLY'],
    y=df['KON'][df['Pollinator'] == 'FLY'],
    mode='markers',
    name='FLY',
    text=df['text'][df['Pollinator'] == 'FLY'],
    
    marker=dict(
        symbol='circle',
        sizemode='area',
        sizeref=sizeref,
        size=df['size'][df['Pollinator'] == 'FLY'],
        color = '#e5247e',
        line=dict(
            width=2
        ),
    )
)
trace3 = go.Scatter(
      x=df['V'][df['Pollinator'] == 'HUM'],
    y=df['KON'][df['Pollinator'] == 'HUM'],
    mode='markers',
    name='HUM',
    text=df['text'][df['Pollinator'] == 'HUM'],
    marker=dict(
        symbol='circle',
        sizemode='area',
        sizeref=sizeref,
        size=df['size'][df['Pollinator'] == 'HUM'],
        color='#46c4fd',
        line=dict(
            width=2
        ),
    )
)

data = [trace0, trace1, trace2, trace3]
layout = go.Layout(
    title='Nectar volume vs nectar concentration',
    xaxis=dict(
        title='Nectar volume (log scale)',
        gridcolor='rgb(255, 255, 255)',
        #range=[2.003297660701705, 5.191505530708712],
        type='log',
        zerolinewidth=1,
        hoverformat = '.2f',
        ticklen=5,
        gridwidth=2,
    ),
    yaxis=dict(
        title='Nectar concentration',
        gridcolor='rgb(255, 255, 255)',
        #range=[36.12621671352166, 91.72921793264332],
        zerolinewidth=1,
        ticklen=5,
        hoverformat = '.2f',
        gridwidth=2,
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
)

fig = dict(data=data, layout=layout)
py.offline.iplot(fig)

In [None]:
col = {'Fritillaria':'#228B22', 'Rhinopetalum':'red', 'Liliorhiza':"#e5247e", 
       'Petilium':'#46c4fd', 'Other': "#00447C", 'Theresia': '#991188','Korolkowia': "#ffe732", "Japonica":'k'}

In [None]:
df["colors"] = df.Subspecies.apply(lambda x: col[x])

In [None]:
labels =[]
for key, value in col.items():
    labels.append(key)
    plt.scatter(x = df.V[df.Subspecies == key], y = df.KON[df.Subspecies == key],
            s = np.array(df.MASA[df.Subspecies == key]) * 10000, 
            c = df.colors[df.Subspecies == key], alpha=0.7, edgecolors='black')

#plt.yscale('log')
plt.xscale('log') 
plt.xlabel('Nectar volume (log scale)')
plt.ylabel('Nectar concentration')
plt.title('Nectar volume vs nectar concentration', fontsize=16)
plt.xlim(0,450)
#plt.xticks([1000,10000,100000], ['1k','10k','100k'])
plt.text(120, 24, 'F.imperialis')
plt.text(70, 4, 'F.eduardi')
lgnd = plt.legend(labels, title='Color by subspecies', loc="lower left")
for i in range(len(col)):
    lgnd.legendHandles[i]._sizes = [60]

plt.text(0.23,75, "Size by mass")
plt.scatter(0.32, 70, s=100, c = 'white', edgecolors='k')
plt.text(0.4, 69,  r'$0.1 \mu g$')
# Show the plot
plt.show()

In [None]:
#total concentration of AA's in nectar
amino["Total"] = amino.iloc[:,3:33].sum(axis = 1)

In [None]:
#mean concentration of AA's, nectar volume and nectar concentration
amino_mean = amino.groupby('Gatunek').mean().reset_index()

In [None]:
amino_mean["Pollinator"] = amino_mean["Gatunek"].apply(pollinator)

In [None]:
col = {'BEE':'#228B22', 'PAS':'#ffe732', 'FLY':"#e5247e", 'HUM':'#46c4fd'}
amino_mean["colors"] = amino_mean.Pollinator.apply(lambda x: col[x])

In [None]:
labels =[]
sns.set_style("white")
for key, value in col.items():
    labels.append(key)
    plt.scatter(x = amino_mean.v[amino_mean.Pollinator == key], 
                y = amino_mean.Total[amino_mean.Pollinator == key],
                s = 80,
                c = amino_mean.colors[amino_mean.Pollinator == key], alpha=0.7, edgecolors='black')

plt.yscale('log')
plt.xscale('log') 
plt.xlabel('Mean nectar volume (log scale)')
plt.ylabel('Total concentration of aminoacids (log scale)')
plt.title('Nectar volume vs total concentration of aminoacids', fontsize=16)

plt.text(100, 15000, 'F.imperialis')
plt.text(55, 50000, 'F.eduardii')
#lgnd = plt.legend(labels, title='Color by pollinator')
#for i in range(len(col)):
#    lgnd.legendHandles[i]._sizes = [60]
plt.legend(labels, title = "Pollinator", loc = 'upper left')
# Show the plot
plt.show()

In [None]:
#Exploratory analysis of aminoacids in nectar by pollinator
df2 = amino.drop(["Gatunek", "ORIGIN", "dominujący cukier"], axis=1)
#reshaping df
df2 = pd.melt(df2, id_vars=['pollinator'])
df2.info()

In [None]:
#selecting essential aa's except GLN
essential = df2[(df2.variable == "ASP")| (df2.variable == "GLU") | (df2.variable == "SER") |
         (df2.variable == "PRO") | (df2.variable == "GLY")| (df2.variable == "ALA") | 
         (df2.variable == "VAL")]

In [None]:
#boxplots grouped by pollinator
plt.figure(figsize=(15,8))
sns.set_style("white")
ax = sns.boxplot(x="variable", y="value", hue="pollinator", data=essential, linewidth=1, width=1,
                 palette=["#228B22", "#e5247e", "#ffe732", "#00447C","#991188"])
plt.xlabel("Aminoacid")
plt.ylabel("Value")
plt.legend(title="Pollinator")
plt.title("Essential AAs by pollinator", fontsize=20)
#ax.set_facecolor('white')
ax.yaxis.label.set_color('k')
ax.xaxis.label.set_color('k')
ax.set_ylim(0,3000)
plt.show()

In [None]:
df2.variable.unique()

In [None]:
#boxplots grouped by pollinator
sns.set_style("white")
plt.figure(figsize=(15,8))
ax = sns.boxplot(x="variable", y="value", hue="pollinator", 
                 data=df2[(df2.variable!="GLN") & (df2.variable !="Total") &
                         (df2.variable != 'v') & (df2.variable != '%')], 
                 linewidth=1, width=1,
                 palette=["#228B22", "#e5247e", "#ffe732", "#00447C","#991188"])
plt.xlabel("Aminoacids")
plt.ylabel("Value")
plt.legend(title="Pollinator")
plt.title("Aminoacids by pollinator", fontsize = 20)
plt.xticks(rotation=45)
ax.set_facecolor('white')
ax.yaxis.label.set_color('k')
ax.xaxis.label.set_color('k')
ax.set_ylim(0,3000)
plt.show()

In [None]:
sns.boxplot(data=df2[df2.variable == "GLN"], x='pollinator', y='value', 
           palette=["#228B22", "#e5247e", "#ffe732", "#00447C","#991188"])
plt.xlabel("Pollinator")
plt.ylabel("Value")
plt.title("GLN concentration by pollinator", fontsize = 18)
plt.show()

In [None]:
sns.boxplot(data=df2[df2.variable == "Total"], x='pollinator', y='value', 
           palette=["#228B22", "#e5247e", "#ffe732", "#00447C","#991188"])
plt.xlabel("Pollinator")
plt.ylabel("Value")
plt.title("Total concentration of AAs by pollinator", fontsize=18)
plt.show()

In [None]:
#prepapring data
X = amino.drop(["pollinator", "Gatunek", "Total", "v", "%", "ORIGIN", "dominujący cukier" ], axis=1)
y = amino.pollinator
index = X.columns
X = StandardScaler().fit_transform(X)

In [None]:
models = [LogisticRegression(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(),
         AdaBoostClassifier(), xgb.XGBClassifier(),
          BaggingClassifier(DecisionTreeClassifier(), bootstrap=False)]

for m in models:
    
    print("%s %.1f%%" % (m.__class__.__name__, cross_val_score(m,X,y,cv=3).mean()*100))

In [None]:
model = xgb.XGBClassifier()
print(cross_val_score(model,X,y,cv=3).mean())
pd.Series(model.fit(X,y).feature_importances_, index = index).sort_values(ascending=False)
# Sort feature importances in descending order
importances = model.feature_importances_
indices = np.argsort(importances)[::-1]

# Rearrange feature names so they match the sorted feature importances
names = [index[i] for i in indices]

# Create plot
plt.figure()

# Create plot title
plt.title("Feature Importance")

# Add bars
plt.bar(range(X.shape[1]), importances[indices])

# Add feature names as x-axis labels
plt.xticks(range(X.shape[1]), names, rotation=90)

# Show plot
plt.show()

In [None]:
plot_tree(model)
#plt.savefig("tree.png", dpi=800)
plt.show()

In [None]:
y.value_counts()

In [None]:
y_pred = cross_val_predict(model, X, y, cv=3)
conf_mat = confusion_matrix(y, y_pred)
print(classification_report(y, y_pred))
labels=['BEE', 'FLY', 'HUM', 'LEP','PAS']

In [None]:
plt.figure()
plot_confusion_matrix(conf_mat, classes=labels,
                      title='Confusion matrix')
plt.show()

In [None]:
amino.dropna(inplace=True)
X_ext = amino.drop(["Gatunek", "ORIGIN", "pollinator"], axis=1)
X_ext = pd.get_dummies(X_ext,drop_first=True)
index_ext = X_ext.columns
y_ext = amino["pollinator"]
models = [LogisticRegression(), SVC(), DecisionTreeClassifier(), RandomForestClassifier(),
         AdaBoostClassifier(), xgb.XGBClassifier(),
          BaggingClassifier(DecisionTreeClassifier(), bootstrap=False)]

for m in models:
    
    print("%s %.1f%%" % (m.__class__.__name__, cross_val_score(m,X_ext,y_ext,cv=3).mean()*100))


In [None]:
model_ext = xgb.XGBClassifier(n_estimators=100)
print(cross_val_score(model_ext,X_ext,y_ext,cv=3).mean())
model_ext.fit(X_ext,y_ext)
importances_ext = model_ext.feature_importances_
indices_ext = np.argsort(importances_ext)[::-1]

# Rearrange feature names so they match the sorted feature importances
names_ext = [index_ext[i] for i in indices_ext]

# Create plot
plt.figure()

# Create plot title
plt.title("Feature Importance")

# Add bars
plt.bar(range(X.shape[1]), importances[indices])

# Add feature names as x-axis labels
plt.xticks(range(X_ext.shape[1]), names_ext, rotation=90)

# Show plot
plt.show()

In [None]:
y_pred_ext = cross_val_predict(model, X_ext, y_ext, cv=3)
conf_mat_ext = confusion_matrix(y_ext, y_pred_ext)
print(classification_report(y_ext, y_pred_ext))

In [None]:
plt.figure()
plot_confusion_matrix(conf_mat_ext, classes=['BEE', 'FLY', 'HUM','LEP', 'PAS'],
                      title='Confusion matrix')
plt.show()

In [None]:
nectar = nectar[["Gatunek","V", "KON", "MASA"]]

In [None]:
nectar.info()

In [None]:
nectar.loc[nectar.MASA.isnull(), 'MASA'] = nectar.groupby('Gatunek').MASA.transform('mean')

In [None]:
nectar.loc[nectar.V.isnull(), 'V'] = nectar.groupby('Gatunek').V.transform('mean')
nectar.loc[nectar.KON.isnull(), 'KON'] = nectar.groupby('Gatunek').KON.transform('mean')

In [None]:
nectar = nectar.dropna()

In [None]:
nectar['Pollinator'] = nectar["Gatunek"].apply(pollinator)

In [None]:
nectar.Pollinator.value_counts()

In [None]:
#area = data.groupby(['Gatunek']).agg({'V': 'mean','POW': 'mean'}).reset_index()

In [None]:
#prepapring data
X = nectar.drop(["Pollinator", "Gatunek"], axis=1)
y = nectar.Pollinator
index_nec = X.columns
X = StandardScaler().fit_transform(X)

In [None]:
le = LabelEncoder()
le.fit(y)
y_e = le.transform(y)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y_e, test_size=0.25, random_state=42)

In [None]:
models = [LogisticRegression(),
          DecisionTreeClassifier(),
          SVC(probability=True),  
          RandomForestClassifier(), 
          AdaBoostClassifier(),
         xgb.XGBClassifier()]

for model in models:

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    predictions = np.round(y_pred)

    accuracy = accuracy_score(y_test, predictions)
    print("%s accuracy: %.2f%%" % (model.__class__.__name__,accuracy * 100.0))

In [None]:
model_nec = RandomForestClassifier()
model_nec.fit(X_train, y_train)
y_pred = model.predict(X_test)
conf_mat_nec = confusion_matrix(y_test, y_pred)

In [None]:
# Plot non-normalized confusion matrix
plt.figure()
plot_confusion_matrix(conf_mat_nec, classes=["BEE","HUM","FLY", "PAS"],
                      title='Confusion matrix')
plt.show()

In [None]:
importances = model_nec.feature_importances_
indices = np.argsort(importances)[::-1]

# Rearrange feature names so they match the sorted feature importances
names = [index_nec[i] for i in indices]

# Create plot
plt.figure()

# Create plot title
plt.title("Feature Importance")

# Add bars
plt.bar(range(X.shape[1]), importances[indices])

# Add feature names as x-axis labels
plt.xticks(range(X.shape[1]), names, rotation=90)

# Show plot
plt.show()

AAs concertation and composition in Fritillaria might be influenced by several factors. While the phylogeny plays a role, as several closely related species have similar AAs composition and concertation, the prevailing evidence is that AA composition and concentration can be variable even within a single species, as well as between closely related taxa.
Similarly to subgenus Petilium, differences in AAs concentration and composition in the nectar of closely related species in subgenus Liliorhiza was found. Fritillaria affinis, F. eastwoodiae, F. liliacea produced nectar with higher AAs concentration. Furthermore, a higher amount of bee-preferred proline was present in presumably insect-pollinated species of this subgenus, which also indicates a strong influence of pollinators. 
Changes in AAs concentration and composition might play an important role in attracting new floral visitors in case of a pollinator shift. Analysis revealed high concentration of AAs in passerine bird-pollinated species, and very low AAs concentration in hummingbird-pollinated species. This tendencies were not reflected in closely related species from the same subgenus. 
Such low concentration of AAs in hummingbird-pollinated flowers may also have a repellent effect, aiding avoidance of competition with bees favouring higher AAs concentrations (Baker and Baker 1982; Nicolson 2007;Heil 2011), whereas the shortage of AAs in birds’ diet could be overcome via additional food sources, e.g. fruits (Nicolson 2007). 
Both the AA's concentration and amount of nectar produced (V, MASA) influence strongly on the pollinator type.

## References
Verhoeven Ch., Ren ZX, Lunau K. (2018) FALSE-COLOUR PHOTOGRAPHY: A NOVEL DIGITAL APPROACH TO VISUALIZE THE BEE VIEW OF FLOWERS Journal of Pollination Ecology, 23(12), 2018, pp 102-118