# Evaluation of *Sceloporus jarrovii* and *S. virgatus* population sizes in Crystal Creek
## George Middendorf and Christopher Agard

## Purpose
The intent of this notebook is to determine if the numbers of *Sceloporus jarrovii* and *S. virgatus* captured in Crystal Creek are in fact lower than in previous years.

## Methods
We will begin by identifying a set of years in the existing cleaned data set during which the search period begins at a point in the year comparable to the start of the 2018 search period (i.e., mid-August). We will then determine the number of lizards captured for each year.  Finally, we will compare the number of lizards captured, looking for significant differences in the overall number of captures and the number of captures according to various demographics (sex, size-range, and previous capture status).

## Results

### Import necessary packages

The pandas packages will be our workhorse here, but we will need the os package for a few things too.  We'll also increase the maximum number of rows that are displyed on each print in order to make reviewing results easier. We'll be using plotly to creae any figures.

In [1]:
import pandas as pd 
import numpy as np
import os,glob,time
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

plotly.tools.set_config_file(world_readable=True)

pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

ModuleNotFoundError: No module named 'plotly'

### Read in necessary data

First we will designate the paths from which to read data and to which output files should be written.

In [None]:
# Source Data
sourceDataBig = 'S:/Chris/TailDemography/data'
sourceBlack = 'C:/Users/test/Desktop'

#Output Data paths
outputBig = 'S:/Chris/TailDemography/data'
outBlack = 'C:/Users/test/Desktop'

We'll need to change the working directory to the right paths and read in the data.

In [None]:
os.chdir(sourceBlack)

combinedFiles = glob.glob('cleaned CC data 2000-2017*')
combinedFiles

In [4]:
dfPrev=pd.read_csv(combinedFiles[-1])
print('2000-2017 species included:\n{}'.format(dfPrev.species.unique()))

2000-2017 species included:
['j' 'uo' 'v' 'sc' 'cn ex']


Now we read in the 2018 data set and change its columns headers to lowercase.

In [5]:
df2018=pd.read_csv('CC Data 2018 - Data Entry Sheet.csv')
df2018.columns = df2018.columns.str.lower()
print('2018 species included:\n{}'.format(df2018.species.unique()))


2018 species included:
[nan 'Sj' 'Sv' 'Other (Other)' 'Other' 'Uo' 'A spp']


We will only consider *S. jarrovii* and *S. virgatus* for our comparisons.

In [6]:
dfPrev = dfPrev.loc[dfPrev.species.isin(['j','v'])]
print('2000-2017 species included:\n{}'.format(dfPrev.species.unique()))

df2018 = df2018.loc[df2018.species.isin(['Sj','Sv'])]
print('2018 species included:\n{}'.format(df2018.species.unique()))

2000-2017 species included:
['j' 'v']
2018 species included:
['Sj' 'Sv']


Now we need to coerce the dates in both data sets to datetime objects.  In order to compare across years in a figure we will also have to create a new variable, __week__, which represents that date as the week of the year (*i.e.*, from 1 to 52). 

In [7]:
dfPrev.date = pd.to_datetime(dfPrev.date,errors='coerce')
df2018.date = pd.to_datetime(df2018.date,errors='coerce')

dfPrev['week'] = dfPrev.date.dt.weekofyear
df2018['week'] = df2018.date.dt.weekofyear

In [8]:
dfPrev.week.sort_values().unique()

array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
       28, 29, 30, 31], dtype=int64)

Now we can graph the timeperiods over which we captured lizards in both data sets to see which years overlapped with 2018.  We wil use a series of horizontal violin plots for this.

In [9]:
y2000 = go.Box(x=dfPrev.loc[dfPrev.year == 2000,"week"],name= '2000',boxpoints = 'all')
y2001 = go.Box(x=dfPrev.loc[dfPrev.year == 2001,"week"],name= '2001',boxpoints = 'all')
y2002 = go.Box(x=dfPrev.loc[dfPrev.year == 2002,"week"],name= '2002',boxpoints = 'all')
y2003 = go.Box(x=dfPrev.loc[dfPrev.year == 2003,"week"],name= '2003',boxpoints = 'all')
y2004 = go.Box(x=dfPrev.loc[dfPrev.year == 2004,"week"],name= '2004',boxpoints = 'all')
y2005 = go.Box(x=dfPrev.loc[dfPrev.year == 2005,"week"],name= '2005',boxpoints = 'all')
y2006 = go.Box(x=dfPrev.loc[dfPrev.year == 2006,"week"],name= '2006',boxpoints = 'all')
y2007 = go.Box(x=dfPrev.loc[dfPrev.year == 2007,"week"],name= '2007',boxpoints = 'all')
y2008 = go.Box(x=dfPrev.loc[dfPrev.year == 2008,"week"],name= '2008',boxpoints = 'all')
y2009 = go.Box(x=dfPrev.loc[dfPrev.year == 2009,"week"],name= '2009',boxpoints = 'all')
y2010 = go.Box(x=dfPrev.loc[dfPrev.year == 2010,"week"],name= '2010',boxpoints = 'all')
y2011 = go.Box(x=dfPrev.loc[dfPrev.year == 2011,"week"],name= '2011',boxpoints = 'all')
y2012 = go.Box(x=dfPrev.loc[dfPrev.year == 2012,"week"],name= '2012',boxpoints = 'all')
y2013 = go.Box(x=dfPrev.loc[dfPrev.year == 2013,"week"],name= '2013',boxpoints = 'all')
y2014 = go.Box(x=dfPrev.loc[dfPrev.year == 2014,"week"],name= '2014',boxpoints = 'all')
y2015 = go.Box(x=dfPrev.loc[dfPrev.year == 2015,"week"],name= '2015',boxpoints = 'all')
y2016 = go.Box(x=dfPrev.loc[dfPrev.year == 2016,"week"],name= '2016',boxpoints = 'all')
y2017 = go.Box(x=dfPrev.loc[dfPrev.year == 2017,"week"],name= '2017',boxpoints = 'all')
y2018 = go.Box(x=df2018.week,name= '2018',boxpoints = 'all')

dataPrev = [y2000,y2001,y2002,y2003,y2004,
        y2005,y2006,y2007,y2008,y2009,
        y2010,y2011,y2012,y2013,y2014,
        y2015,y2016,y2017,y2018]

layout = go.Layout(
    title = 'Distribution of Captures in Crystal Creek by Year',
    titlefont = dict(
        size = 20),
    yaxis = dict(
        title = 'Year',
        dtick = 1,
        titlefont = dict(
            size = 18)),
    xaxis = dict(
        title = 'Week of the Year',
        dtick = 1,
        titlefont = dict(
            size = 18)))

fig = go.Figure(
    data = dataPrev, 
    layout = layout)

py.iplot(fig, filename = 'Boxplot of Captures in CC by Week in the Year')

It looks like there's no year that completely overlaps the 2018 field season between 2000 and 2017.  The closet years end 2 weeks before the 2018 season started.

To get an idea of which year is the closest, let's look at the distance between the median week for captures and find those with the closest distance to 2018.

In [10]:
def vocab_run(x: list, connector_dict={1: None, 2: ' and ', 'run': ', '}):
    """"vocab_run takes a list, joins its the first the elements with a separator placing a different separator between
     the penultimate and final members of the list adn returns the result as a string
     :param x: a list of strings to be concatenated
     :param connector_dict: a dictionary with keys describing the size of the list and values indicating the type of
     connectors separate the list elements.
    """
    x = [str(el) for el in x]
    if len(x) == 1:
        vocab = x
    else:
        if len(x) == 2:
            vocab = (connector_dict[len(x)]).join(x)
        else:
            connector = connector_dict['run']
            connector_final = connector_dict[2]
            vocab = connector.join(x[:-1])+connector_final+x[-1]
    return vocab

In [None]:
distance = dfPrev.groupby('year').apply(lambda x: df2018.week.median() - x.week.median())\
.reset_index()\
.rename(columns={0:'distMed'})

distance.year = distance.year.astype(int)

nearest = distance[distance.distMed <= distance.distMed.min()]

print('The smallest median distance between any of the comparison years and the 2018 field season is {}.\
  The associated years are {}.'.format(int(distance.distMed.min()),prettyprint(nearest.year)))

distance


In [None]:
[int(i) for i in nearest.year]

In [None]:
data = go.Scatter(x = distance.year ,y=distance.distMed, mode = 'markers')

layout = go.Layout(
    title = 'Distance Between Median Week and 2018 Median Week',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        title = 'Year',
        titlefont = dict(
        size = 18)),
    yaxis = dict(
        title = 'Distance in Weeks Between Median Week and Median Week for 2018 Season',
        titlefont = dict(
            size = 18))
)

fig = go.Figure(data = data, layout = layout)

py.iplot(fig,filename = 'Scatter Plot of Distance Between Median Week and 2018 Median Week' )
# plot_url = py.plot(data, filename='Scatter Plot of Distance Between Median Week and 2018 Median Week')