# Evauation of *Sceloporus jarrovii* and *S. virgatus* population sizes in Crystal Creek
## George Middendorf and Christopher Agard

## Purpose
The intent of this notebook is to determine if the numbers of *Sceloporus jarrovii* and *S. virgatus* captured in Crystal Creek are in fact lower than in previous years.

## Methods
We will begin by identifying a set of years in the existing cleaned data set during which the search period begins at a point in the year comparable to the start of the 2018 search period (i.e., mid-August). We will then determine the number of lizards captured for each year.  Finally, we will compare the number of lizards captured, looking for significant differences in the overall number of captures and the number of captures according to various demographics (sex, size-range, and previous capture status).

## Results

### Import necessary packages

The pandas packages will be our workhorse here, but we will need the os package for a few things too.  We'll also increase the maximum number of rows that are displyed on each print in order to make reviewing results easier. We'll be using plotly to creae any figures.

In [1]:
import pandas as pd 
import numpy as np
import os,glob,time
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

plotly.tools.set_config_file(world_readable=True)

pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

### Read in necessary data

First we will designate the paths from which to read data and to which output files should be written.

In [2]:
# Source Data
sourceDataBig = 'S:/Chris/TailDemography/data'

#Output Data paths
outputBig = 'S:/Chris/TailDemography/data'


We'll need to change the working directory to the right paths and read in the data.

In [3]:
os.chdir(sourceDataBig)

combinedFiles = glob.glob('cleaned CC data 2000-2017*')
combinedFiles

['cleaned CC data 2000-2017_2018-09-07 22_47_08.525360.csv',
 'cleaned CC data 2000-2017_2018-09-08 16_41_11.234222.csv',
 'cleaned CC data 2000-2017_2018-09-08 23_41_22.661706.csv',
 'cleaned CC data 2000-2017_2018-09-11 21_19_30.768558.csv']

In [4]:
dfPrev=pd.read_csv(combinedFiles[-1])
print('2000-2017 species included:\n{}'.format(dfPrev.species.unique()))

2000-2017 species included:
['j' 'uo' 'v' 'sc' 'cn ex']


In [5]:
df2018=pd.read_csv('CC Data 2018 - Data Entry Sheet.csv')
print('2018 species included:\n{}'.format(df2018.Species.unique()))

2018 species included:
[nan 'Sj' 'Sv' 'Other (Other)' 'Other' 'Uo' 'A spp']


We will only consider *S. jarrovii* and *S. virgatus* for our comparisons.

In [6]:
dfPrev = dfPrev.loc[dfPrev.species.isin(['j','v'])]
print('2000-2017 species included:\n{}'.format(dfPrev.species.unique()))

df2018 = df2018.loc[df2018.Species.isin(['Sj','Sv'])]
print('2018 species included:\n{}'.format(df2018.Species.unique()))

2000-2017 species included:
['j' 'v']
2018 species included:
['Sj' 'Sv']


Now we need to coerce the dates in both data sets to datetime objects.  In order to compare across years in a figure we will also have to create a new variable, __week__, which represents that date as the week of the year (*i.e.*, from 1 to 52). 

In [7]:
dfPrev.date = pd.to_datetime(dfPrev.date,errors='coerce')
df2018.Date = pd.to_datetime(df2018.Date,errors='coerce')

dfPrev['week'] = dfPrev.date.dt.weekofyear
df2018['week'] = df2018.Date.dt.weekofyear

Now we can graph the timeperiods over which we captured lizards in both data sets to see which years overlapped with 2018.  We wil use a series of horizontal violin plots for this.

In [8]:
y2000 = go.Violin(x=dfPrev.loc[dfPrev.year == 2000,"week"],name= '2000')
y2001 = go.Violin(x=dfPrev.loc[dfPrev.year == 2001,"week"],name= '2001')
y2002 = go.Violin(x=dfPrev.loc[dfPrev.year == 2002,"week"],name= '2002')
y2003 = go.Violin(x=dfPrev.loc[dfPrev.year == 2003,"week"],name= '2003')
y2004 = go.Violin(x=dfPrev.loc[dfPrev.year == 2004,"week"],name= '2004')
y2005 = go.Violin(x=dfPrev.loc[dfPrev.year == 2005,"week"],name= '2005')
y2006 = go.Violin(x=dfPrev.loc[dfPrev.year == 2006,"week"],name= '2006')
y2007 = go.Violin(x=dfPrev.loc[dfPrev.year == 2007,"week"],name= '2007')
y2008 = go.Violin(x=dfPrev.loc[dfPrev.year == 2008,"week"],name= '2008')
y2009 = go.Violin(x=dfPrev.loc[dfPrev.year == 2009,"week"],name= '2009')
y2010 = go.Violin(x=dfPrev.loc[dfPrev.year == 2010,"week"],name= '2010')
y2011 = go.Violin(x=dfPrev.loc[dfPrev.year == 2011,"week"],name= '2011')
y2012 = go.Violin(x=dfPrev.loc[dfPrev.year == 2012,"week"],name= '2012')
y2013 = go.Violin(x=dfPrev.loc[dfPrev.year == 2013,"week"],name= '2013')
y2014 = go.Violin(x=dfPrev.loc[dfPrev.year == 2014,"week"],name= '2014')
y2015 = go.Violin(x=dfPrev.loc[dfPrev.year == 2015,"week"],name= '2015')
y2016 = go.Violin(x=dfPrev.loc[dfPrev.year == 2016,"week"],name= '2016')
y2017 = go.Violin(x=dfPrev.loc[dfPrev.year == 2017,"week"],name= '2017')
y2018 = go.Violin(x=df2018.week,name= '2018')

dataPrev = [y2000,y2001,y2002,y2003,y2004,
        y2005,y2006,y2007,y2008,y2009,
        y2010,y2011,y2012,y2013,y2014,
        y2015,y2016,y2017,y2018]

layout = go.Layout(
    title = 'Distribution of Captures in Crystal Creek by Year',
    titlefont = dict(
        size = 20),
    yaxis = dict(
        title = 'Year',
        dtick = 1,
        titlefont = dict(
            size = 18)),
    xaxis = dict(
        title = 'Week of the Year',
        dtick = 1,
        titlefont = dict(
            size = 18)))

fig = go.Figure(
    data = dataPrev, 
    layout = layout)

py.iplot(fig, filename = 'Violin plot of Captures in CC by Week in the Year')

It looks like there's no year that completely overlaps 2018, but the most overlap in the sample occurs with 2001, 2010, 2002, and 2000, with 2001 overlapping the most.  These will be the years we use to determine if 2018 capture rates we abnormal.

Let's reduce the dfPrev to only include data for 2000-2002 and 2010.

In [10]:
dfPrev = dfPrev.loc[dfPrev.year.isin([2000,2001,2002,2010])]