<a id='TOC'></a>

# Table of Contents
1. [Things to Do](#Things-to-Do)
1. [Introduction](#Introduction)
1. [Set up Python](#Set-up-Python)
2. [Functions](#Functions)
3. [Getting Data](#Get-Data)
4. [Analyze Data](#Analyze-Data):
    - [Population Size](#Population-Size)
    - [Species Distribution](#Species-Distribution)
    - [Sex Distribution](#Sex-Distribution)
    - [Tail Condition Distribution](#Tail-Condition-Distribution)
    - [Location](#Location)
    - [Morphometrics](#Morphometrics):
        - [SVL](#SVL)
        - [TL](#TL)
        - [RTL](#RTL)
        - [Mass](#Mass)
    - [Survival and Rates and Likelihood of Recapture](#Survival-and-Rates-and-Likelihood-of-Recapture)
    - [Captures](#Captures)
    - [Growth](#Growth)
        - [SVL Growth](#SVL-Growth)
5. [Export Files](#Export-Files)

# Things to Do


## [Resume Here](#resume)

## Introduction

This notebook contains code and output of descriptive analyses for the 2000-2017 CC dataset after cleaning.

The objectives of this notebook are to describe the community of the _Sceloporus jarrovii_ and _Sceloporus virgatus_ lizards in the Crystal Creek wash from 2000 until 2017.  The population demographic metrics we examine are: [population size](#Population-Size), [sex distribution](#Sex-Distribution), [tail condition distribution](#Tail-Condition-Distribution), [location](#Location), [morphometrics](#Morphometrics) -- [SVL](#SVL), [TL](#TL), [RTL](#RTL), [mass](#Mass) --,  [survival and rates and likelihood of recapture](#Survival-and-Rates-and-Likelihood-of-Recapture), and [growth](#Growth).

We will examine these metrics and interactions among them with particular interest in the impact of environmental factors from year to year.


##  Set up Python

First we will need to set up the python environment, importing the necessary packages and setting the display options.

[Top](#TOC)

In [1]:
import pandas as pd
import numpy as np
import os, glob, logging
from summary_functions import *
from scipy import stats
from monthlit import *
from prettyprint import *


import plotly
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
plotly.tools.set_config_file(world_readable=True)


# increase print limit
pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

### Setting File Locations

In [2]:
deviceDict = {'dataBig':{'source':'S:/Chris/TailDemography/TailDemography/outputFiles'
                         ,'log':'S:/Chris/TailDemography/TailDemography/Scripts and notes/Descriptive/'
                         ,'output':'S:/Chris/TailDemography/TailDemography/outputFiles/'},
              'silverSurfer':{'source':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\outputFiles'
                              ,'log':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\Scripts and notes\\Descriptive\\'
                              ,'output':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\outputFiles\\Descriptive\\'}
              ,'dataPers':{'source':'C:/Users/Christopher/Google Drive/TailDemography/outputFiles'
                           ,'log': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\Scripts and notes\\Descriptive\\'
                           ,'output':'C:/Users/Christopher/Google Drive/TailDemography/outputFiles/Descriptve/'}}

### Choose Device

In [3]:
device = deviceDict['silverSurfer']
device

{'source': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\outputFiles',
 'log': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\Scripts and notes\\Descriptive\\',
 'output': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\outputFiles\\Descriptive\\'}

# Source Data


### Logging

In [4]:
logging.basicConfig(filename=device['log']+'Desriptive Analyses.log'
                    , filemode='a',
                    format='%(funcName)s - %(levelname)s - %(message)s - %(asctime)s', level=logging.DEBUG)

## Functions

This section contains functions that were created for this notebook.

- [distribution](#distribution) #delete this we will use scipy stats describe instead
- [monthlit](#monthlit)
- [description](#description)
- [vocab_run](#vocab_run)

### distribution
[Back to Top](#TOC)

[Back to Functions](#Functions)

*distribution* takes a series or list of numeric objects, *x*, and returns descriptive stats of x including
        n, minimum, maximum, median, sIQR, mean, and stdev
    
Here are a few examples of how *distribution* works.

In [5]:
foo = [0,1,2,'r']
distribution(foo)

In [6]:
bar = [0,1,2]
distribution(bar)

Unnamed: 0,n,minimum,maximum,median,siqr,mean,stdev
0,3,0,2,1.0,0.5,1.0,1.0


[Back to Functions](#Functions)

## monthlit
[Back to Top](#TOC)

[Back to Functions](#Functions)

Here are a few examples of how _monthlit_ works.

In [7]:
dates = pd.DataFrame(data={'dates':['2018-12-9','2019-8-5', '2017/7/4',np.nan,None]})
dates.dates = pd.to_datetime(dates.dates)
dates

Unnamed: 0,dates
0,2018-12-09
1,2019-08-05
2,2017-07-04
3,NaT
4,NaT


In [8]:
np.isnan(np.nan)

True

In [9]:
monthlit(dates.dates.dt.month[0])

'Dec'

In [10]:
dates.dates.dt.month.apply(monthlit)

0    Dec
1    Aug
2    Jul
3    NaN
4    NaN
Name: dates, dtype: object

[Back to Functions](#Functions)

## description
[Back to Top](#TOC)

[Back to Functions](#Functions)

In [42]:
def description(x,variable,percentage=False):
    if percentage:
            res = x[variable].describe()
            res[['mean','std','min','25%','50%','75%','max']] = res[['mean','std','min','25%','50%','75%','max']]\
            .apply(lambda x:x*100) 
#Need to Add CI calculation to this function
#             meanCI = 'not calculated'
    else:
        res = x[variable].describe() 
    res['siqr'] = (res['75%']-res['25%'])/2
    res['meanCI'] = 'not calculated'
    return res

### vocab_run
[Back to Top](#TOC)

[Back to Functions](#Functions)

*vocab_run* takes a list, joins its the first the elements with a separator placing a different separator between
     the penultimate and final members of the list and returns the result as a string
     :param x: a list of strings to be concatenated
     :param connector_dict: a dictionary with keys describing the size of the list and values indicating the type of
     connectors separate the list elements.
    
Here are a few examples of how *vocab_run* works.

In [12]:
print("Could you bring some {} please?".format(vocab_run(['foo','bar','stuffkins'])))

Could you bring some foo, bar and stuffkins please?


In [13]:
print("You can either have {}.  You'll have to make a choice."\
      .format(vocab_run(['foo','bar','stuffkins'],connector_dict={1: None, 2: ' or ', 'run': ', '})))

You can either have foo, bar or stuffkins.  You'll have to make a choice.


[Back to Functions](#Functions)

We'll display all files in the source folder with the prefix _'cleaned CC data 2000-2017'_. The file names will be saved in a variable, _mysourcefiles_.

## Get Data
[Top](#TOC)

Here we can set the locations from which we get data and to which we export it.

In [14]:
os.chdir(device['source'])
mysourcefiles = glob.glob('cleaned CC data 2000-2017_*.csv')
mysourcefiles

['cleaned CC data 2000-2017_2019-01-31 01hrs43min.csv',
 'cleaned CC data 2000-2017_2019-03-10 14hrs42min.csv',
 'cleaned CC data 2000-2017_2019-03-12 21hrs48min.csv',
 'cleaned CC data 2000-2017_2019-03-12 22hrs52min.csv',
 'cleaned CC data 2000-2017_2019-03-12 22hrs55min.csv',
 'cleaned CC data 2000-2017_2019-04-25 00hrs58min.csv',
 'cleaned CC data 2000-2017_2019-04-25 01hrs00min.csv',
 'cleaned CC data 2000-2017_2019-05-02 00hrs17min.csv',
 'cleaned CC data 2000-2017_2019-05-02 01hrs03min.csv',
 'cleaned CC data 2000-2017_2019-05-02 01hrs08min.csv',
 'cleaned CC data 2000-2017_2019-05-04 00hrs33min.csv']

In [15]:
pd.to_datetime(mysourcefiles[0].split("_")[1].split(".csv")[0].split(' ')[0])

Timestamp('2019-01-31 00:00:00')

Automate getting the latest file

In [16]:
[latestFile for latestFile in mysourcefiles if \
 max({pd.to_datetime(afile.split("_")[1].split(".csv")[0].split(' ')[0]) \
      for afile in mysourcefiles}) == pd.to_datetime(latestFile.split("_")\
                                                     [1].split(".csv")[0].split(' ')[0])]

['cleaned CC data 2000-2017_2019-05-04 00hrs33min.csv']

In [17]:
min({afile.split(' ')[-1].replace('hrs','').replace('min.csv','') for afile in mysourcefiles})

'0017'

In [18]:
latest = mysourcefiles[-1]
latest

'cleaned CC data 2000-2017_2019-05-04 00hrs33min.csv'

The most recent file is the one we will use as _df_ in our descriptive analysis.

In [32]:
df=pd.read_csv(latest)
df.date=pd.to_datetime(df.date)
df['month']=df.date.dt.month.apply(monthlit)
df.head()

Unnamed: 0,species,toes_orig,sex,date,svl,tl,rtl,autotomized,mass,location,meters,newRecap,painted,sighting,paint.mark,vial,misc,rtl_orig,toes,toe_pattern,description,action,pattern_b,replacement,year,tl_svl,mass_svl,all_meters,initialCaptureDate,year_diff,smallest_svl,svl_diff,liznumber,sex_count,daysSinceCapture,capture,month
0,j,3-7-11-19,m,2002-07-14,63.0,92.0,0.0,False,10.0,halfway up to site,-200,N,,,b101t,toes in vial 58-02,,0.0,3-7-11-19,,,,,,2002,1.460317,0.15873,['-200'],2002-07-14,0,63.0,0.0,375,1,0,1,Jul
1,j,3-7-11-18,m,2002-07-14,66.0,92.0,0.0,False,10.8,left downstream 100m v 1 falls,-100,N,,,b102t,toes in vial 59-02,,0.0,3-7-11-18,,,,,,2002,1.393939,0.163636,['-100' '87'],2002-07-14,0,66.0,0.0,374,1,0,1,Jul
2,j,3-7-12-16,m,2002-07-14,68.0,103.0,0.0,False,10.3,90m v 1 falls,-90,N,,,b103t,toes in vial 60-02,,0.0,3-7-12-16,,,,,,2002,1.514706,0.151471,['-90'],2002-07-14,0,68.0,0.0,376,1,0,1,Jul
3,j,10,m,2002-07-14,85.0,118.0,0.0,False,19.5,sb - trail intersection v 1 falls,-20,R,toe loss may be natural,,b104t,,,0.0,10,,,,,,2002,1.388235,0.229412,['-20' '-12' '-35'],2002-07-14,0,85.0,0.0,830,2,0,1,Jul
4,v,10-16,f,2002-07-03,63.0,49.0,21.0,True,11.6,left side @ base juniper 8 m ^ sb; 15m v 1 falls,-15,N,painted; gravid; <5 mites in pockets,,w1a,toes in vial 34-02 (10-16),,21.0,10-16,,,,,,2002,0.777778,0.184127,['-15'],2002-07-03,0,63.0,0.0,1414,1,0,1,Jul


## Analyze Data
[Top](#TOC)

We will first examine the range and distribution of number of variables in our data set:
- [Population Size](#Population-Size)
- [Species Distribution](#Species-Distribution)
- [Sex Distribution](#Sex-Distribution)
- [Tail Condition Distribution](#Tail-Condition-Distribution)
- [Location](#Location)
- [Morphometrics](#Morphometrics):
    - [SVL](#SVL)
    - [TL](#TL)
    - [RTL](#RTL)
    - [Mass](#Mass)
- [Survival and Rates and Likelihood of Recapture](#Survival-and-Rates-and-Likelihood-of-Recapture)
- [Captures](#Captures)
- [Growth](#Growth)
    - [SVL Growth](#SVL-Growth)
We will use first captures of each lizard in a year for these analyses.

## Reducing the analyses sample by date range and capture

In [33]:
monthsExcluded = ['Oct','Dec']
idx_exclusion = (df.month.isin(monthsExcluded))&(df.capture==1)&(df.species!='j')
print("The number of individuals captured for the first time in {} is {}. \
These are excluded for further analyses.".format(vocab_run(monthsExcluded,{1: None, 2: ' or ', 'run': ', '}),
                                                df.loc[idx_exclusion].liznumber.nunique()))
df=df.loc[~idx_exclusion]
df=df.loc[(df.species=='j')& (df.sex.isin(['m','f']))]

The number of individuals captured for the first time in Oct or Dec is 1. These are excluded for further analyses.


Here we create datasets including only the first or last sighting in each year for a given animal.

In [34]:
df_lastInYear = df.loc[~df.duplicated(subset=['liznumber','year'],keep='last')]
df_firstInYear = df.loc[~df.duplicated(subset=['liznumber','year'])]

### Population Size

We will begin by examining the range and distribution of individuals in the population.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [36]:
populationSize = df_firstInYear.groupby(['year']).liznumber.nunique().reset_index()
populationSize

Unnamed: 0,year,liznumber
0,2000,153
1,2001,135
2,2002,119
3,2003,97
4,2004,70
5,2005,79
6,2006,66
7,2007,94
8,2008,88
9,2009,105


In [38]:
Sjarrovii = go.Scatter(x = populationSize.year
           , y = populationSize.liznumber
          ,name = 'S. jarrovii')
data = [Sjarrovii]
layout = go.Layout(
    title = 'Population Size for Sceloporus jarrovii 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18),
        range = [1999.5,2017.5]),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18))
)
fig = go.Figure(
        data = data,
        layout = layout)
plot(fig, filename = 'Population Size for Sceloporus jarrovii.html')
iplot(fig, filename = 'Population Size for Sceloporus jarrovii.html')
# pio.to_image(fig, format='html')

In [44]:
description(populationSize,'liznumber')

count                 18
mean             81.7222
std              31.0309
min                   44
25%                 56.5
50%                   76
75%                96.25
max                  153
siqr              19.875
meanCI    not calculated
Name: liznumber, dtype: object

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

### Sex Distribution

We will begin by examining the range and distribution of _sex_ values.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [45]:
populationSize_sex = df_firstInYear.groupby(['year','sex']).liznumber.nunique().reset_index()\
.merge(df_firstInYear.groupby(['year']).liznumber.nunique().reset_index()\
       .rename(columns={'liznumber':'liznumberYear'}),
       on=['year'])
populationSize_sex\
.loc[populationSize_sex.sex=='m','propMale'] = populationSize_sex\
.loc[populationSize_sex.sex=='m'].liznumber/populationSize_sex\
.loc[populationSize_sex.sex=='m'].liznumberYear
populationSize_sex\
.loc[populationSize_sex.sex=='f','propFemale'] = (populationSize_sex\
.loc[populationSize_sex.sex=='f'].liznumber/populationSize_sex\
.loc[populationSize_sex.sex=='f'].liznumberYear)
populationSize_sex

Unnamed: 0,year,sex,liznumber,liznumberYear,propMale,propFemale
0,2000,f,84,153,,0.54902
1,2000,m,69,153,0.45098,
2,2001,f,72,135,,0.533333
3,2001,m,63,135,0.466667,
4,2002,f,67,119,,0.563025
5,2002,m,52,119,0.436975,
6,2003,f,54,97,,0.556701
7,2003,m,43,97,0.443299,
8,2004,f,38,70,,0.542857
9,2004,m,32,70,0.457143,


In [50]:
Sjarrovii = go.Scatter(x = populationSize_sex.loc[(populationSize_sex.propMale.notna())].year
           , y = populationSize_sex.loc[(populationSize_sex.propMale.notna())].propMale
          ,name = 'S. jarrovii')
data = [Sjarrovii]
layout = go.Layout(
    title = 'Proportion of Males by Year for Sceloporus jarrovii 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18),
        range = [1999.5,2017.5]),
    yaxis = dict(
        tickformat = ".2%",
        title = 'Percentage of Males',
        titlefont = dict(
            size = 18),
    range = [0,1])
    
)
fig = go.Figure(
        data = data,
        layout = layout)
plot(fig, filename = 'Proportion of Males by Species and Year for Sceloporus jarrovii 2000-2017.html')
iplot(fig, filename = 'Proportion of Males by Species and Year for Sceloporus jarrovii 2000-2017.html')

In [52]:
description(populationSize_sex,'propMale',True)

count                 18
mean             45.5842
std              4.61074
min              34.0909
25%               43.732
50%              45.4545
75%              49.0281
max              54.4304
siqr             2.64804
meanCI    not calculated
Name: propMale, dtype: object

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

### Tail Condition Distribution

Here we look at the proportion of individuals in our data who have experienced autotomy.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [55]:
populationSize_aut = df_firstInYear.groupby(['year','autotomized']).liznumber.nunique()\
.reset_index()\
.merge(df_firstInYear.groupby(['year']).liznumber.nunique().reset_index()\
       .rename(columns={'liznumber':'liznumberYear'})
       ,on=['year'])
populationSize_aut\
.loc[populationSize_aut.autotomized,'propAutotomized'] = populationSize_aut\
.loc[populationSize_aut.autotomized].liznumber/populationSize_aut\
.loc[populationSize_aut.autotomized].liznumberYear
populationSize_aut\
.loc[~populationSize_aut.autotomized,'propIntact'] = (populationSize_aut\
.loc[~populationSize_aut.autotomized].liznumber/populationSize_aut\
.loc[~populationSize_aut.autotomized].liznumberYear)
populationSize_aut

Unnamed: 0,year,autotomized,liznumber,liznumberYear,propAutotomized,propIntact
0,2000,False,106,153,,0.69281
1,2000,True,47,153,0.30719,
2,2001,False,97,135,,0.718519
3,2001,True,38,135,0.281481,
4,2002,False,86,119,,0.722689
5,2002,True,33,119,0.277311,
6,2003,False,72,97,,0.742268
7,2003,True,25,97,0.257732,
8,2004,False,45,70,,0.642857
9,2004,True,25,70,0.357143,


In [56]:
Sjarrovii = go.Scatter(x = populationSize_aut.loc[(populationSize_aut.propAutotomized.notna())].year
           , y = populationSize_aut.loc[(populationSize_aut.propAutotomized.notna())].propAutotomized
          ,name = 'Autotomized S. jarrovii')



data = [Sjarrovii]
layout = go.Layout(
    title = 'Proportion Autotomized by Year for Sceloporus jarrovii 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18),
        range = [1999.5,2017.5]),
    yaxis = dict(
        tickformat=".2%",
        title = 'Proportion of Autotomized Lizards',
        titlefont = dict(
            size = 18),
    range=[0,1])
)
fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Proportion Autotomized by Year for Sceloporus jarrovii 2000-2017.html')
plot(fig, filename = 'Proportion Autotomized by Year for Sceloporus jarrovii 2000-2017.html')

'Proportion Autotomized by Year for Sceloporus jarrovii 2000-2017.html'

In [58]:
description(populationSize_aut, 'propAutotomized',True)

count                 18
mean             30.4747
std              6.34579
min              14.8936
25%              27.6471
50%              29.9242
75%              36.2013
max                   40
siqr             4.27712
meanCI    not calculated
Name: propAutotomized, dtype: object

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

### Morphometrics

In this section we describe the distributions of various morphometrics.

- [SVL](#SVL)
- [TL](#TL)
- [RTL](#RTL)
- [Mass](#SVL)

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

#### SVL

Now we examine the range and distribution of svl values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of SVL](#SVLhist)

In [59]:
print("svl values in the data set range from {} to {} for and are distributed across species and sex \
as displayed here:"\
      .format(df_firstInYear.svl.min(), df_firstInYear.svl.max()))
description(df_firstInYear.groupby('sex'),variable='svl')

svl values in the data set range from 13.0 to 98.0 for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
f,795.0,61.542138,14.155369,13.0,54.0,65.0,73.0,86.0,9.5,not calculated
m,676.0,68.008876,18.280204,24.0,55.0,73.0,83.0,98.0,14.0,not calculated


In [66]:
SVLbyYear = description(df_firstInYear.groupby(['year','sex']),'svl').reset_index()
SVLbyYear

Unnamed: 0,year,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,f,84.0,60.071429,12.676499,31.0,54.0,60.0,68.5,83.0,7.25,not calculated
1,2000,m,69.0,68.289855,17.052774,31.0,60.0,70.0,83.0,98.0,11.5,not calculated
2,2001,f,72.0,58.736111,13.863513,37.0,43.0,58.5,71.0,82.0,14.0,not calculated
3,2001,m,63.0,65.650794,17.982212,36.0,49.0,67.0,81.5,93.0,16.25,not calculated
4,2002,f,67.0,60.41791,12.344977,27.0,55.0,60.0,70.0,80.0,7.5,not calculated
5,2002,m,52.0,69.346154,17.347526,27.0,63.0,69.5,83.25,92.0,10.125,not calculated
6,2003,f,54.0,55.537037,17.084378,26.0,36.25,62.0,69.0,80.0,16.375,not calculated
7,2003,m,43.0,61.348837,21.156879,29.0,35.0,65.0,80.5,89.0,22.75,not calculated
8,2004,f,38.0,60.631579,16.066452,30.0,58.25,66.0,72.25,80.0,7.0,not calculated
9,2004,m,32.0,67.5,19.660013,29.0,64.5,72.0,80.0,91.0,7.75,not calculated


Let's plot these values. 

##### Histogram of SVL values

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)


In [69]:
female = go.Bar(x=SVLbyYear.loc[(SVLbyYear.sex=='f')]['year'], y = SVLbyYear.loc[(SVLbyYear.sex=='f')]['50%']
                      ,name='Females')
male = go.Bar(x=SVLbyYear.loc[(SVLbyYear.sex=='m')]['year'], y = SVLbyYear.loc[(SVLbyYear.sex=='m')]['50%']
                    ,name='Males')

data = [male,female]
layout = go.Layout(
    title = 'Histogram of SVL at Capture for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median SVL (mm)',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Histogram of Median SVL Sceloporus jarrovii in CC 2000-2017.html')
plot(fig, filename = 'Histogram of Median SVL Sceloporus jarrovii in CC 2000-2017.html')

'Histogram of Median SVL Sceloporus jarrovii in CC 2000-2017.html'

 Outliers will be addressed in the Cleaning notebook, but will be removed for the remained of the analyses here.

#### TL

Now we examine the range and distribution of TL values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of TL](#TLhist)

In [71]:
TLbyYear = description(df_firstInYear.groupby(['year','autotomized','sex']),'tl').reset_index()
TLbyYear

Unnamed: 0,year,autotomized,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,False,f,60.0,77.233333,17.040696,38.0,71.75,78.5,88.25,114.0,8.25,not calculated
1,2000,False,m,46.0,87.413043,25.0054,38.0,80.0,90.0,108.5,122.0,14.25,not calculated
2,2000,True,f,24.0,67.208333,15.348075,30.0,61.0,66.5,77.25,93.0,8.125,not calculated
3,2000,True,m,23.0,76.347826,19.508993,30.0,64.0,75.0,93.0,108.0,14.5,not calculated
4,2001,False,f,59.0,77.627119,18.380929,50.0,57.5,79.0,92.5,109.0,17.5,not calculated
5,2001,False,m,38.0,80.236842,24.043243,48.0,55.0,79.0,100.0,124.0,22.5,not calculated
6,2001,True,f,13.0,62.846154,17.742821,35.0,56.0,66.0,71.0,92.0,7.5,not calculated
7,2001,True,m,25.0,71.96,20.235036,23.0,66.0,80.0,83.0,103.0,8.5,not calculated
8,2002,False,f,52.0,79.942308,17.875839,31.0,73.75,81.0,92.5,106.0,9.375,not calculated
9,2002,False,m,34.0,91.117647,26.114857,34.0,80.0,92.0,110.75,130.0,15.375,not calculated


In [75]:
femaleIntact = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['year'], 
                      y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['50%']
                      ,name='Intact Females')
femaleAut = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['year'], 
                   y = TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['50%']
                      ,name='Autotomized Females')
maleIntact = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['year'], 
              y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['50%']
                    ,name='Intact Males')
maleAut = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['year'], 
              y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['50%']
                    ,name='Autotomized Males')

data = [maleIntact,maleAut,femaleIntact,femaleAut]
layout = go.Layout(
    title = 'Median TL for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median TL (mm)',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Median TL Sceloporus jarrovii in CC 2000-2017.html')
plot(fig, filename = 'Median TL Sceloporus jarrovii in CC 2000-2017.html')

'Median TL Sceloporus jarrovii in CC 2000-2017.html'

#### RTL

Now we examine the range and distribution of RTL values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of RTL](#RTLhist)

In [None]:
print("rtl values among autotomized lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df_firstInYear.rtl.min(), df_firstInYear.rtl.max()))
description(df_firstInYear.loc[df_firstInYear.autotomized==True],gvariables=['species','sex'],variable='rtl')

#### Mass

Now we examine the range and distribution of mass values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of mass](#masshist)

In [None]:
print("Mass values among autotomized lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df_firstInYear.mass.min(), df_firstInYear.mass.max()))
description(df_firstInYear,gvariables=['species','sex'],variable='mass')

## Captures

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Let's take a look at the number of times that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of captures for each lizard and finally count the number of lizards that have a given number of captures.  We will use all captures for this

In [None]:
print("The maximum number of captures among lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df_firstInYear.groupby('liznumber').capture.max().min(), df_firstInYear.groupby('liznumber').capture.max().max()))
description(df_firstInYear,gvariables=['species','sex'],variable='capture')

## Years

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Let's take a look at the number of years over which that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of years over which each lizard was captures and finally count the number of lizards that have a given number of years over which they were captured.  We will use all captures for this.

In [None]:
(df_firstInYear.groupby('liznumber').year_diff.max()+1).value_counts(normalize=True).reset_index()\
.rename(columns={'index':'year_diff','year_diff':'count'}).sort_values('year_diff')

In [None]:
print("The maximum number of years over which lizards were capured among lizards in the data set range from {} to {}\
for and are distributed across species and sex as displayed here:"\
      .format(df_firstInYear.groupby('liznumber').year_diff.max().min()+1, 
              df_firstInYear.groupby('liznumber').year_diff.max().max()+1))
description(df_firstInYear,gvariables=['species','sex'],variable='year_diff')

In [None]:
Sj = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.species=='j')].groupby('liznumber').capture.max(),name='Sj')
Sv = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.species=='v')].groupby('liznumber').capture.max(),name='Sv')

data = [Sj,Sv]
layout = go.Layout(
    title = 'Histogram of Maximum Number of Captures for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Unique Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Histogram of Maximum Number of Captures for CC 2000-2017.html')
plot(fig, filename = 'Histogram of Maximum Number of Captures for CC 2000-2017.html')

In [None]:
# drop this fig

In [None]:
SjIntact = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.species=='j')&(df_firstInYear.autotomized==False)]\
                  .groupby('liznumber').capture.max(),name='Sj-Intact')
SvIntact = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.species=='v')&(df_firstInYear.autotomized==False)]\
                  .groupby('liznumber').capture.max(),name='Sv-Intact')
SjAut = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.species=='j')&(df_firstInYear.autotomized==True)]\
                  .groupby('liznumber').capture.max(),name='Sj-Autotomized')
SvAut = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.species=='v')&(df_firstInYear.autotomized==True)]\
                  .groupby('liznumber').capture.max(),name='Sv-Autotomized')
data = [SjIntact,SvIntact,SjAut,SvAut]
layout = go.Layout(
    title = 'Histogram of Maximum Number of Captures for CC 2000-2017 by Autotomy Status',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Unique Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Number of Captures for CC 2000-2017 by Autotomy Status')

### Number of lizards (*Sj*) by year and sex

In [None]:
df_firstInYear.groupby('year').sex.value_counts()

Pull out all individuals that we've recaught for Sj and writes to csv

In [None]:
multicapToes=df_firstInYear.loc[(df_firstInYear.species=='j')& (df_firstInYear.toes!="")& (df_firstInYear.toes!='NA')]\
.toes.value_counts()[df_firstInYear.loc[df_firstInYear.species=='j']\
                     .toes.value_counts()>1].index.tolist()
df_firstInYear.loc[df_firstInYear.toes.isin(multicapToes)].sort_values(by=['toes','date']).shape[0]
# df_firstInYear.loc[df_firstInYear.toes.isin(multicapToes)].sort_values(by=['toes','date']).to_csv('multicaps.csv')
# multicapToes

### Maximum Number of Captures based on Tail condition
 - match for species, sex, size and location

In [None]:
#separate species
dfF = df_firstInYear.loc[df_firstInYear.sex =='f']
dfM = df_firstInYear.loc[df_firstInYear.sex =='m']

females = go.Histogram(x = dfF.groupby('liznumber').capture.max(),name='females')
males = go.Histogram(x = dfM.groupby('liznumber').capture.max(),name='males')

data = [males,females]
layout = go.Layout(
    title = 'Maximum Number of Captures per Individual 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Captures per Individual in Crystal Creek 2000 - 2017')

In [None]:
#Drop this

In [None]:
SjFemales = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.sex =='f')&(df_firstInYear.species=='j')]\
                       .groupby('liznumber').capture.max(),name='Sj-females')
SjMales = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.sex =='m')&(df_firstInYear.species=='j')]\
                     .groupby('liznumber').capture.max(),name='Sj-males')
SvFemales = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.sex =='f')&(df_firstInYear.species=='v')]\
                       .groupby('liznumber').capture.max(),name='Sv-females')
SvMales = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.sex =='m')&(df_firstInYear.species=='v')]\
                     .groupby('liznumber').capture.max(),name='Sv-males')

data = [SjMales,SjFemales,SvMales,SvFemales]
layout = go.Layout(
    title = 'Maximum Number of Captures per Individual 2000-2017 by sex and species',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Captures per Individual in Crystal Creek 2000 - 2017 by sex and species')

## Growth

- [SVL Growth](#SVL-Growth)
- [TL Growth](#TL-Growth)
- [RTL Growth](#RTL-Growth)
- [Mass Growth](#Mass-Growth)

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Let's take a look at the number of times that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of captures for each lizard and finally count the number of lizards that have a given number of captures.

### SVL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the body size growth rate?

### TL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the tail size growth rate?

### RTL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the regrown tail size growth rate?

### Mass Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the body size growth rate in terms of MAss?

In [None]:
df.capture.value_counts()

## Export Files
[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)