# Evaluation of *Sceloporus jarrovii* and *S. virgatus* population sizes in Crystal Creek
## George Middendorf and Christopher Agard

## Purpose
The intent of this notebook is to determine if the numbers of *Sceloporus jarrovii* and *S. virgatus* captured in Crystal Creek are in fact lower than in previous years.

## Methods
We will begin by identifying a set of years in the existing cleaned data set during which the search period begins at a point in the year comparable to the start of the 2018 search period (i.e., mid-August). We will then determine the number of lizards captured for each year.  Finally, we will compare the number of lizards captured, looking for significant differences in the overall number of captures and the number of captures according to various demographics (sex, size-range, and previous capture status).

## Results

### Import necessary packages

The pandas packages will be our workhorse here, but we will need the os package for a few things too.  We'll also increase the maximum number of rows that are displyed on each print in order to make reviewing results easier. We'll be using plotly to creae any figures.

In [1]:
import pandas as pd 
import numpy as np
import os,glob,time
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

plotly.tools.set_config_file(world_readable=True)

pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

### Read in necessary data

First we will designate the paths from which to read data and to which output files should be written.

In [2]:
# Source Data
sourceDataBig = 'S:/Chris/TailDemography/TailDemography/Raw Data'
sourceBlack = 'C:/Users/test/Desktop'
gdrivefileURL = "https://docs.google.com/spreadsheets/d/1gJZ1S3-ToP2br8OkGmf1BVuutzvYJG4_cNG3DHM8avU/edit#gid=0"

#Output Data paths
outputBig = 'S:/Chris/TailDemography/data'
outBlack = 'C:/Users/test/Desktop'

We'll need to change the working directory to the right paths and read in the data.

In [3]:
#make URL readable for pandas
editableURL = gdrivefileURL.replace('edit#gid','export?format=csv&gid')
df = pd.read_csv(editableURL)
df.head()

Unnamed: 0,Species,Toes,Date,Sex,SVL,TL,RTL,Autotomized (autotomized=TRUE),Mass,Paint Mark,Location,Meters,New/Recap,Painted,Sighting,Misc.,Vial,Time,Click Video
0,,,8/15/2018,,,,,,,,bottom of site,,,,,"IN: T=24.6C, H=57%, W=0.4 m/s; in at 0800; Mid...",,,
1,Sj,10-11,8/15/2018,m,49.0,71.0,0.0,True,3.3,w1c,T right sb above bottom site entrance,-25.0,new,yes,,,18-14,,
2,Sv,10-11,8/15/2018,f,57.0,68.0,0.0,True,6.0,w1a,bottom left wall,-7.0,new,yes,,,18-15,,
3,Sj,9-10,8/15/2018,m,54.0,57.0,20.0,False,4.5,w2c,opp wall v wall v juniper crossing,80.0,new,yes,,shed recently loose scales T,18-12,,
4,Sj,10-12,8/15/2018,m,50.0,64.0,0.0,True,3.5,3c,mid R island,150.0,new,yes,,"sex not recorded when caught on 15Aug, but rec...",18-13,,


In [4]:
locationDict = {'bottom of site':-50,'R sb at brownR ^blackR ^ 1oakR':435,'on wall vwvjx just below fox den':100}

In [29]:
#Drop date
df_captures = df.loc[(df.Sighting!='yes')&(df.Species.notna())&(df.Species!='Other')&(df.SVL.notna())&
                     (df.Meters.notna()),
                     ['Species', 'Toes','Sex', 'SVL', 'TL', 'RTL',
       'Autotomized (autotomized=TRUE) ', 'Mass', 'Paint Mark', 'Location',
       'Meters']].sort_values(['Meters','Location'])
df_captures.loc[df_captures['Autotomized (autotomized=TRUE) '],'RTL']=-1
df_captures = df_captures.drop(columns='Autotomized (autotomized=TRUE) ')
df_captures_multi = df_captures.groupby(['Species','Toes','Paint Mark']).Location.unique().reset_index()\
.merge(df_captures.groupby(['Species','Toes','Paint Mark']).Meters.unique().reset_index())\
.merge(df_captures.groupby(['Species','Toes','Paint Mark']).SVL.unique().reset_index()
       ,on=['Species','Toes','Paint Mark'])

# df_captures.loc[df_captures.Meters.notna()].to_csv('2018 Captures Cheat Sheet.csv'index=False)
df_captures_multi['SVL_diff']= df_captures_multi.apply()
df_captures_multi.loc[df_captures_multi.SVL.apply(len)>1]

Unnamed: 0,Species,Toes,Paint Mark,Location,Meters,SVL
0,Sj,1,w4c,[on wall vwvjx just below fox den],[103.0],"[88.0, 87.0]"
20,Sj,10-19,w12c,"[1 falls 1m up, bottom 1falls]",[0.0],"[43.0, 46.0]"
21,Sj,10-20,w13c.t,"[H3, 6m v H6, 5m up wall above sb]","[180.0, 202.0]","[80.0, 79.0]"
30,Sj,5-11,w62c,"[left wall at lizardR, R sb 5m ^ slab]","[135.0, 268.0]","[75.0, 49.0]"
49,Sj,7-11,w40c,"[sb 3m v lizardR, sb 2m v top island]","[132.0, 156.0]","[60.0, 63.0]"
52,Sj,7-15,w44c,"[sb at fox den, sb rt side 2m ^ foxden]","[103.0, 105.0]","[57.0, 53.0]"
56,Sj,8-11,w29c,"[T left sb opp wvwvwjx, bottom wvwvwvjx]",[85.0],"[56.0, 62.0]"
58,Sj,8-13,w32c,"[on Rs left side sb 2m v 1oakR, R sb at brownR...","[418.0, 435.0]","[44.0, 55.0]"
59,Sj,8-15,w33c,"[left Rs 5m ^ 1oakR, R left sb 3m v blackR ^1o...","[425.0, 427.0]","[55.0, 60.0]"
62,Sj,8-19,w37c,"[sb at 15, sb at btwn top rt wall, bottom left...","[15.0, 25.0]","[47.0, 48.0]"


What to do with multiple captures?
- Some of these are recaptured due to shedding

In [36]:
test=df_captures.groupby(['Species','Toes','Paint Mark']).Location.nunique().reset_index()
print("There are {} Species, Toes, Paint Mark combinations that have more than one record:\n"\
      .format(test.loc[test.Location>1].shape[0]))
test.loc[test.Location>1]

There are 15 Species, Toes, Paint Mark combinations that have more than one record:



Unnamed: 0,Species,Toes,Paint Mark,Location
20,Sj,10-19,w12c,2
21,Sj,10-20,w13c.t,2
30,Sj,5-11,w62c,2
49,Sj,7-11,w40c,2
52,Sj,7-15,w44c,2
56,Sj,8-11,w29c,2
58,Sj,8-13,w32c,2
59,Sj,8-15,w33c,2
62,Sj,8-19,w37c,2
67,Sj,9-11,w14c,2


In [38]:
print("Some of these cases are individuals who were captured multiple times (e.g., to repaint).")
df.loc[(df.Species=='Sj')&(df.Toes=='9-11')&(df['Paint Mark']=='w13c.t')]

Some of these cases are individuals who were captured multiple times (e.g., to repaint).


Unnamed: 0,Species,Toes,Date,Sex,SVL,TL,RTL,Autotomized (autotomized=TRUE),Mass,Paint Mark,Location,Meters,New/Recap,Painted,Sighting,Misc.,Vial,Time,Click Video
19,Sj,10-20,8/16/2018,m,80.0,111.0,0.0,False,17.5,w13c.t,H3,180.0,new,yes,,Bss,18-07,,
161,Sj,10-20,8/21/2018,m,79.0,111.0,0.0,False,16.3,w13c.t,"6m v H6, 5m up wall above sb",202.0,recap,yes,,"Bshedding, lost ""13"" but "".t"" remains on T, do...",,,


In [38]:
print("Some of these cases are individuals who were captured multiple times (e.g., to repaint).")
df.loc[(df.Species=='Sj')&(df.Toes=='10-20')&(df['Paint Mark']=='w13c.t')]

Some of these cases are individuals who were captured multiple times (e.g., to repaint).


Unnamed: 0,Species,Toes,Date,Sex,SVL,TL,RTL,Autotomized (autotomized=TRUE),Mass,Paint Mark,Location,Meters,New/Recap,Painted,Sighting,Misc.,Vial,Time,Click Video
19,Sj,10-20,8/16/2018,m,80.0,111.0,0.0,False,17.5,w13c.t,H3,180.0,new,yes,,Bss,18-07,,
161,Sj,10-20,8/21/2018,m,79.0,111.0,0.0,False,16.3,w13c.t,"6m v H6, 5m up wall above sb",202.0,recap,yes,,"Bshedding, lost ""13"" but "".t"" remains on T, do...",,,


In [39]:
print("There are some that are separate individuals with the same toe and paint mark info.")
df.loc[(df.Species=='Sj')&(df.Toes=='5-11')&(df['Paint Mark']=='w62c')]

There are some that are separate individuals with the same toe and paint mark info.


Unnamed: 0,Species,Toes,Date,Sex,SVL,TL,RTL,Autotomized (autotomized=TRUE),Mass,Paint Mark,Location,Meters,New/Recap,Painted,Sighting,Misc.,Vial,Time,Click Video
508,Sj,5-11,9/7/2018,m,75.0,105.0,0.0,False,15.0,w62c,left wall at lizardR,135.0,new,yes,,,18-71,,
519,Sj,5-11,9/8/2018,f,49.0,40.0,22.0,True,3.7,w62c,R sb 5m ^ slab,268.0,new,yes,,,18-71,,


In [None]:
os.chdir(sourceDataBig)

# combinedFiles = glob.glob('cleaned CC data 2000-2017*')
combinedFiles = 'cleaned CC data 2000-2017_2019-01-07 21hrs27min.csv'
combinedFiles

In [None]:
dfPrev=pd.read_csv('CC Data 2018 - Data Entry Sheet.csv')
dfPrev.columns = dfPrev.columns.str.lower()
print('2000-2017 species included:\n{}'.format(dfPrev.species.unique()))

Now we read in the 2018 data set and change its columns headers to lowercase.

In [None]:
df2018=pd.read_csv('CC Data 2018 - Data Entry Sheet.csv')
df2018.columns = df2018.columns.str.lower()
print('2018 species included:\n{}'.format(df2018.species.unique()))


We will only consider *S. jarrovii* and *S. virgatus* for our comparisons.

In [None]:
dfPrev = dfPrev.loc[dfPrev.species.isin(['j','v'])]
print('2000-2017 species included:\n{}'.format(dfPrev.species.unique()))

df2018 = df2018.loc[df2018.species.isin(['Sj','Sv'])]
print('2018 species included:\n{}'.format(df2018.species.unique()))

Now we need to coerce the dates in both data sets to datetime objects.  In order to compare across years in a figure we will also have to create a new variable, __week__, which represents that date as the week of the year (*i.e.*, from 1 to 52). 

In [None]:
dfPrev.date = pd.to_datetime(dfPrev.date,errors='coerce')
df2018.date = pd.to_datetime(df2018.date,errors='coerce')

dfPrev['week'] = dfPrev.date.dt.weekofyear
df2018['week'] = df2018.date.dt.weekofyear

In [None]:
dfPrev.week.sort_values().unique()

Now we can graph the timeperiods over which we captured lizards in both data sets to see which years overlapped with 2018.  We wil use a series of horizontal violin plots for this.

In [None]:
y2000 = go.Box(x=dfPrev.loc[dfPrev.year == 2000,"week"],name= '2000',boxpoints = 'all')
y2001 = go.Box(x=dfPrev.loc[dfPrev.year == 2001,"week"],name= '2001',boxpoints = 'all')
y2002 = go.Box(x=dfPrev.loc[dfPrev.year == 2002,"week"],name= '2002',boxpoints = 'all')
y2003 = go.Box(x=dfPrev.loc[dfPrev.year == 2003,"week"],name= '2003',boxpoints = 'all')
y2004 = go.Box(x=dfPrev.loc[dfPrev.year == 2004,"week"],name= '2004',boxpoints = 'all')
y2005 = go.Box(x=dfPrev.loc[dfPrev.year == 2005,"week"],name= '2005',boxpoints = 'all')
y2006 = go.Box(x=dfPrev.loc[dfPrev.year == 2006,"week"],name= '2006',boxpoints = 'all')
y2007 = go.Box(x=dfPrev.loc[dfPrev.year == 2007,"week"],name= '2007',boxpoints = 'all')
y2008 = go.Box(x=dfPrev.loc[dfPrev.year == 2008,"week"],name= '2008',boxpoints = 'all')
y2009 = go.Box(x=dfPrev.loc[dfPrev.year == 2009,"week"],name= '2009',boxpoints = 'all')
y2010 = go.Box(x=dfPrev.loc[dfPrev.year == 2010,"week"],name= '2010',boxpoints = 'all')
y2011 = go.Box(x=dfPrev.loc[dfPrev.year == 2011,"week"],name= '2011',boxpoints = 'all')
y2012 = go.Box(x=dfPrev.loc[dfPrev.year == 2012,"week"],name= '2012',boxpoints = 'all')
y2013 = go.Box(x=dfPrev.loc[dfPrev.year == 2013,"week"],name= '2013',boxpoints = 'all')
y2014 = go.Box(x=dfPrev.loc[dfPrev.year == 2014,"week"],name= '2014',boxpoints = 'all')
y2015 = go.Box(x=dfPrev.loc[dfPrev.year == 2015,"week"],name= '2015',boxpoints = 'all')
y2016 = go.Box(x=dfPrev.loc[dfPrev.year == 2016,"week"],name= '2016',boxpoints = 'all')
y2017 = go.Box(x=dfPrev.loc[dfPrev.year == 2017,"week"],name= '2017',boxpoints = 'all')
y2018 = go.Box(x=df2018.week,name= '2018',boxpoints = 'all')

dataPrev = [y2000,y2001,y2002,y2003,y2004,
        y2005,y2006,y2007,y2008,y2009,
        y2010,y2011,y2012,y2013,y2014,
        y2015,y2016,y2017,y2018]

layout = go.Layout(
    title = 'Distribution of Captures in Crystal Creek by Year',
    titlefont = dict(
        size = 20),
    yaxis = dict(
        title = 'Year',
        dtick = 1,
        titlefont = dict(
            size = 18)),
    xaxis = dict(
        title = 'Week of the Year',
        dtick = 1,
        titlefont = dict(
            size = 18)))

fig = go.Figure(
    data = dataPrev, 
    layout = layout)

py.iplot(fig, filename = 'Boxplot of Captures in CC by Week in the Year')

It looks like there's no year that completely overlaps the 2018 field season between 2000 and 2017.  The closet years end 2 weeks before the 2018 season started.

To get an idea of which year is the closest, let's look at the distance between the median week for captures and find those with the closest distance to 2018.

In [None]:
def vocab_run(x: list, connector_dict={1: None, 2: ' and ', 'run': ', '}):
    """"vocab_run takes a list, joins its the first the elements with a separator placing a different separator between
     the penultimate and final members of the list adn returns the result as a string
     :param x: a list of strings to be concatenated
     :param connector_dict: a dictionary with keys describing the size of the list and values indicating the type of
     connectors separate the list elements.
    """
    x = [str(el) for el in x]
    if len(x) == 1:
        vocab = x
    else:
        if len(x) == 2:
            vocab = (connector_dict[len(x)]).join(x)
        else:
            connector = connector_dict['run']
            connector_final = connector_dict[2]
            vocab = connector.join(x[:-1])+connector_final+x[-1]
    return vocab

In [None]:
distance = dfPrev.groupby('year').apply(lambda x: df2018.week.median() - x.week.median())\
.reset_index()\
.rename(columns={0:'distMed'})

distance.year = distance.year.astype(int)

nearest = distance[distance.distMed <= distance.distMed.min()]

print('The smallest median distance between any of the comparison years and the 2018 field season is {}.\
  The associated years are {}.'.format(int(distance.distMed.min()),prettyprint(nearest.year)))

distance


In [None]:
[int(i) for i in nearest.year]

In [None]:
data = go.Scatter(x = distance.year ,y=distance.distMed, mode = 'markers')

layout = go.Layout(
    title = 'Distance Between Median Week and 2018 Median Week',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        title = 'Year',
        titlefont = dict(
        size = 18)),
    yaxis = dict(
        title = 'Distance in Weeks Between Median Week and Median Week for 2018 Season',
        titlefont = dict(
            size = 18))
)

fig = go.Figure(data = data, layout = layout)

py.iplot(fig,filename = 'Scatter Plot of Distance Between Median Week and 2018 Median Week' )
# plot_url = py.plot(data, filename='Scatter Plot of Distance Between Median Week and 2018 Median Week')