# Table of Contents
1. [Things to Do](#Things-to-Do)
1. [Introduction](#Introduction)
1. [Set up Python](#Set-up-Python)
2. [Functions](#Functions)
3. [Getting Data](#Get-Data)
4. [Analyze Data](#Analyze-Data):
    - [Population Size](#Population-Size) - noticeable changes
    - [Sex Distribution](#Sex-Distribution) - does not change
    - [Tail Condition Distribution](#Tail-Condition-Distribution)- noticeable change
        - [Severity of Autotomy](#Severity-of-Autotomy)- noticeable change
    - [Location](#Location)
    - [Morphometrics](#Morphometrics):
        - [SVL](#SVL)
        - [TL](#TL)
        - [RTL](#RTL)
        - [Mass](#Mass)- noticeable change (Check Males)
    - [Survival and Rates and Likelihood of Recapture](#Survival-and-Rates-and-Likelihood-of-Recapture)
    - [Captures](#Captures)
    - [Growth](#Growth)
        - [SVL Growth](#SVL-Growth)
        - [TL Growth](#TL-Growth)
        - [RTL Growth](#RTL-Growth)
        - [Mass Growth](#Mass-Growth)
    - [Correlations to Population](#Correlations-to-Population)
5. [Export Files](#Export-Files)

# Things to Do

Also search for "to be done"

## [Resume Here](#resume)

## Introduction

This notebook contains code and output of descriptive analyses for the 2000-2017 CC dataset after cleaning.

The objectives of this notebook are to describe the community of the _Sceloporus jarrovii_ and _Sceloporus virgatus_ lizards in the Crystal Creek wash from 2000 until 2017.  The population demographic metrics we examine are: [population size](#Population-Size), [sex distribution](#Sex-Distribution), [tail condition distribution](#Tail-Condition-Distribution), [location](#Location), [morphometrics](#Morphometrics) -- [SVL](#SVL), [TL](#TL), [RTL](#RTL), [mass](#Mass) --,  [survival and rates and likelihood of recapture](#Survival-and-Rates-and-Likelihood-of-Recapture), and [growth](#Growth).

We will examine these metrics and interactions among them with particular interest in the impact of environmental factors from year to year.


##  Set up Python

First we will need to set up the python environment, importing the necessary packages and setting the display options.

[Top](#Table-of-Contents)

In [1]:
import pandas as pd
import numpy as np
import pingouin as pg
import os, glob, logging
from summary_functions import *
from scipy import stats
from monthlit import *
from prettyprint import *


import plotly
import chart_studio.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
# plotly.tools.set_config_file(world_readable=True)


# increase print limit
pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

### Setting File Locations

In [2]:
deviceDict = {'dataBig':{'source':'S:/Chris/TailDemography/TailDemography/outputFiles'
                         ,'log':'S:/Chris/TailDemography/TailDemography/Scripts and notes/Descriptive/'
                         ,'output':'S:/Chris/TailDemography/TailDemography/outputFiles/'},
              'silverSurfer':{'source':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\outputFiles'
                              ,'log':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\Scripts and notes\\Descriptive\\'
                              ,'output':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\outputFiles\\Descriptive\\'}
              ,'dataPers':{'source':'C:/Users/Christopher/Google Drive/TailDemography/outputFiles'
                           ,'log': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\Scripts and notes\\Descriptive\\'
                           ,'output':'C:/Users/Christopher/Google Drive/TailDemography/outputFiles/Descriptve/'}
             ,'gandolf':{'source':'C:/Users/craga/Google Drive/TailDemography/outputFiles'
                           ,'log': 'C:/Users/craga/Google Drive/TailDemography/Scripts and notes/Descriptive/'
                           ,'output':'C:/Users/craga/Google Drive/TailDemography/outputFiles/Descriptive/'}}

### Choose Device

In [3]:
device = deviceDict['gandolf']
device

{'source': 'C:/Users/craga/Google Drive/TailDemography/outputFiles',
 'log': 'C:/Users/craga/Google Drive/TailDemography/Scripts and notes/Descriptive/',
 'output': 'C:/Users/craga/Google Drive/TailDemography/outputFiles/Descriptive/'}

# Source Data


### Logging

In [4]:
logging.basicConfig(filename=device['log']+'Desriptive Analyses.log'
                    , filemode='a',
                    format='%(funcName)s - %(levelname)s - %(message)s - %(asctime)s', level=logging.DEBUG)

## Functions

This section contains functions that were created for this notebook.

- [distribution](#distribution) #delete this we will use scipy stats describe instead
- [monthlit](#monthlit)
- [description](#description)
- [vocab_run](#vocab_run)

### distribution
[Back to Top](#Table-of-Contents)

[Back to Functions](#Functions)

*distribution* takes a series or list of numeric objects, *x*, and returns descriptive stats of x including
        n, minimum, maximum, median, sIQR, mean, and stdev
    
Here are a few examples of how *distribution* works.

In [5]:
foo = [0,1,2,'r']
distribution(foo)

In [6]:
bar = [0,1,2]
distribution(bar)

Unnamed: 0,n,minimum,maximum,median,siqr,mean,stdev
0,3,0,2,1.0,0.5,1.0,1.0


[Back to Functions](#Functions)

## monthlit
[Back to Top](#Table-of-Contents)

[Back to Functions](#Functions)

Here are a few examples of how _monthlit_ works.

In [7]:
dates = pd.DataFrame(data={'dates':['2018-12-9','2019-8-5', '2017/7/4',np.nan,None]})
dates.dates = pd.to_datetime(dates.dates)
dates

Unnamed: 0,dates
0,2018-12-09
1,2019-08-05
2,2017-07-04
3,NaT
4,NaT


In [8]:
np.isnan(np.nan)

True

In [9]:
monthlit(dates.dates.dt.month[0])

'Dec'

In [10]:
dates.dates.dt.month.apply(monthlit)

0    Dec
1    Aug
2    Jul
3    NaN
4    NaN
Name: dates, dtype: object

[Back to Functions](#Functions)

## description
[Back to Top](#Table-of-Contents)

[Back to Functions](#Functions)

In [11]:
def description(x,variable,percentage=False):
    if percentage:
            res = x[variable].describe()
            res[['mean','std','min','25%','50%','75%','max']] = res[['mean','std','min','25%','50%','75%','max']]\
            .apply(lambda x:x*100) 
#Need to Add CI calculation to this function
#             meanCI = 'not calculated'
    else:
        res = x[variable].describe() 
    res['siqr'] = (res['75%']-res['25%'])/2
    res['meanCI'] = 'not calculated'
    return res

### vocab_run
[Back to Top](#Table-of-Contents)

[Back to Functions](#Functions)

*vocab_run* takes a list, joins its the first the elements with a separator placing a different separator between
     the penultimate and final members of the list and returns the result as a string
     :param x: a list of strings to be concatenated
     :param connector_dict: a dictionary with keys describing the size of the list and values indicating the type of
     connectors separate the list elements.
    
Here are a few examples of how *vocab_run* works.

In [12]:
print("Could you bring some {} please?".format(vocab_run(['foo','bar','stuffkins'])))

Could you bring some foo, bar and stuffkins please?


In [13]:
print("You can either have {}.  You'll have to make a choice."\
      .format(vocab_run(['foo','bar','stuffkins'],connector_dict={1: None, 2: ' or ', 'run': ', '})))

You can either have foo, bar or stuffkins.  You'll have to make a choice.


[Back to Functions](#Functions)

We'll display all files in the source folder with the prefix _'cleaned CC data 2000-2017'_. The file names will be saved in a variable, _mysourcefiles_.

## Get Data
[Top](#Table-of-Contents)

Here we can set the locations from which we get data and to which we export it.

In [14]:
os.chdir(device['source'])
mysourcefiles = glob.glob('cleaned CC data 2000-2017_*.csv')
mysourcefiles

['cleaned CC data 2000-2017_2019-01-31 01hrs43min.csv']

In [15]:
pd.to_datetime(mysourcefiles[0].split("_")[1].split(".csv")[0].split(' ')[0])

Timestamp('2019-01-31 00:00:00')

Automate getting the latest file

In [16]:
[latestFile for latestFile in mysourcefiles if \
 max({pd.to_datetime(afile.split("_")[1].split(".csv")[0].split(' ')[0]) \
      for afile in mysourcefiles}) == pd.to_datetime(latestFile.split("_")\
                                                     [1].split(".csv")[0].split(' ')[0])]

['cleaned CC data 2000-2017_2019-01-31 01hrs43min.csv']

In [17]:
min({afile.split(' ')[-1].replace('hrs','').replace('min.csv','') for afile in mysourcefiles})

'0143'

In [18]:
latest = mysourcefiles[-1]
latest

'cleaned CC data 2000-2017_2019-01-31 01hrs43min.csv'

The most recent file is the one we will use as _df_ in our descriptive analysis.

In [19]:
df=pd.read_csv(latest)
print('orig: ',df.liznumber.nunique())
df = df.loc[df.svl<56]
print('adult: ',df.liznumber.nunique())
df.date=pd.to_datetime(df.date)
df['month']=df.date.dt.month.apply(monthlit)
df.head()

orig:  1625
adult:  872


Unnamed: 0,species,toes_orig,sex,date,svl,tl,rtl,autotomized,mass,location,meters,newRecap,painted,sighting,paint.mark,vial,misc,rtl_orig,toes,toe_pattern,year,tl_svl,mass_svl,initialCaptureDate,year_diff,svl_diff,liznumber,sex_count,daysSinceCapture,capture,month
3,j,3-6-11-17,m,2010-08-18,50.0,68.0,0.0,False,4.0,1m vT at top R island,157,N,yes,,y<c.t,,Bss; lost toes,0.0,3-6-11-17,,2010,1.36,0.08,2010-08-18,0,0.0,936,1,0,1,Aug
6,j,3-6-15-17,m,2010-08-18,52.0,70.0,0.0,True,4.5,r outcrop ^ oak R,425,N,yes,,y68c,67-10-cc,Tss,-1.0,3-6-15-17,,2010,1.346154,0.086538,2010-08-18,0,0.0,949,2,0,1,Aug
8,j,3-6-15-18,f,2010-08-18,53.0,72.0,0.0,False,5.0,pine R,408,N,yes,,y69c,68-10-cc,,0.0,3-6-15-18,,2010,1.358491,0.09434,2010-08-18,0,0.0,950,1,0,1,Aug
10,j,2-6-13-16,m,2010-08-19,55.0,74.0,0.0,False,5.0,3m ^ 1 falls,3,N,yes,,y71c,70-10-cc,,0.0,2-6-13-16,,2010,1.345455,0.090909,2010-08-19,0,0.0,896,1,0,1,Aug
11,j,2-6-14-17,m,2010-08-19,47.0,64.0,0.0,False,3.5,3m ^ 1 falls,3,N,yes,,y÷c.t,71-10-cc,possible Bss; mark is a yellow division symbol...,0.0,2-6-14-17,,2010,1.361702,0.074468,2010-08-19,0,0.0,899,1,0,1,Aug


## Analyze Data
[Top](#Table-of-Contents)

We will first examine the range and distribution of number of variables in our data set:
- [Population Size](#Population-Size)
- [Species Distribution](#Species-Distribution)
- [Sex Distribution](#Sex-Distribution)
- [Tail Condition Distribution](#Tail-Condition-Distribution)
- [Severity of Autotomy](#Severity-of-Autotomy)
- [Location](#Location)
- [Morphometrics](#Morphometrics):
    - [SVL](#SVL)
    - [TL](#TL)
    - [RTL](#RTL)
    - [Mass](#Mass)
- [Survival and Rates and Likelihood of Recapture](#Survival-and-Rates-and-Likelihood-of-Recapture)
- [Captures](#Captures)
- [Growth](#Growth)
    - [SVL Growth](#SVL-Growth)
    - [TL Growth](#TL-Growth)
    - [RTL Growth](#RTL-Growth)
    - [NTL Growth](#NTL-Growth)
    - [Mass Growth](#Mass-Growth)
We will use first captures of each lizard in a year for these analyses.
- [Correlations to Population](#Correlations-to-Population)

## Reducing the analyses sample by date range and capture

In [20]:
df.loc[~(df.species=='j')].species.unique()

array(['v', 'other', 'sj?'], dtype=object)

In [21]:
df.loc[~(df.sex.isin(['m','f']))].liznumber.nunique()

24

In [22]:
df.loc[~(df.sex.isin(['m','f']))].sex.unique()

array([nan, 'n', '[m]'], dtype=object)

In [23]:
sexdict= {'male':'m', 'f`':'f','n':'m','[m]':'m'}
df.loc[df.sex.isin(sexdict.keys()),'sex'] = df.loc[df.sex.isin(sexdict.keys()),'sex'].apply(lambda x: sexdict[x])
df.sex.unique()

array(['m', 'f', nan], dtype=object)

In [24]:
df.loc[~(df.sex.isin(['m','f']))].liznumber.nunique()

21

In [25]:
monthsExcluded = ['Oct','Dec']
idx_exclusion = (df.month.isin(monthsExcluded))&(df.capture==1)&(df.species!='j')
print("The number of individuals captured for the first time in {} is {}. \
These are excluded for further analyses.".format(vocab_run(monthsExcluded,{1: None, 2: ' or ', 'run': ', '}),
                                                df.loc[idx_exclusion].liznumber.nunique()))
df=df.loc[~idx_exclusion]
df=df.loc[(df.species=='j')& (df.sex.isin(['m','f']))]

The number of individuals captured for the first time in Oct or Dec is 1. These are excluded for further analyses.


Here we create datasets including only the first or last sighting in each year for a given animal.

In [26]:
df_lastInYear = df.loc[~df.duplicated(subset=['liznumber','year'],keep='last')]
df_firstInYear = df.loc[~df.duplicated(subset=['liznumber','year'])]

### Population Size

We will begin by examining the range and distribution of individuals in the population.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Add a column of counts of lizards weighted by the person-hours for the year, or provide argument based on asymptote analysis that we captured all lizards in the site.

In [27]:
populationSize = df_firstInYear.groupby(['year']).liznumber.nunique().reset_index()
populationSize['percChange'] = -(1-populationSize.liznumber/populationSize.liznumber.shift())
populationSize

Unnamed: 0,year,liznumber,percChange
0,2000,38,
1,2001,50,0.315789
2,2002,29,-0.42
3,2003,38,0.310345
4,2004,16,-0.578947
5,2005,20,0.25
6,2006,2,-0.9
7,2007,43,20.5
8,2008,21,-0.511628
9,2009,51,1.428571


 For some reason the first trace dissappears after I add the second one.  Need to fix this

In [28]:
popSize = go.Scatter(x = populationSize.year
           , y = populationSize.liznumber
          ,name = 'population size')
percentChange = go.Scatter(x = populationSize.year
           , yaxis = 'y2', y = populationSize.percChange
          ,name = 'percent change')
data = [popSize]#,percentChange]
layout = go.Layout(
    title = 'Population Size for Sceloporus jarrovii 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18),
        range = [1999.5,2017.5]),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18),
        rangemode = 'tozero'
        ),
    yaxis2 = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18),
        rangemode = 'tozero',
        side = 'right'
        )
)
fig = go.Figure(
        data = data,
        layout = layout)
#plot(fig, filename = 'Population Size for Sceloporus jarrovii.html')
iplot(fig, filename = 'Population Size for Sceloporus jarrovii.html')
# pio.to_image(fig, format='html')

We see a large decline in the <i>S. jarrovii</i> population size.

In [29]:
description(populationSize,'liznumber')

count                 16
mean             24.8125
std              18.1153
min                    1
25%                 7.25
50%                   25
75%                38.75
max                   51
siqr               15.75
meanCI    not calculated
Name: liznumber, dtype: object

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

### Sex Distribution

We will begin by examining the range and distribution of _sex_ values.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Maybe aymptote analysis should be extended to examine asymptote by demographic factors as well.

In [30]:
populationSize_sex = df_firstInYear.groupby(['year','sex']).liznumber.nunique().reset_index()\
.merge(df_firstInYear.groupby(['year']).liznumber.nunique().reset_index()\
       .rename(columns={'liznumber':'liznumberYear'}),
       on=['year'])
populationSize_sex\
.loc[populationSize_sex.sex=='m','propMale'] = populationSize_sex\
.loc[populationSize_sex.sex=='m'].liznumber/populationSize_sex\
.loc[populationSize_sex.sex=='m'].liznumberYear
populationSize_sex\
.loc[populationSize_sex.sex=='f','propFemale'] = (populationSize_sex\
.loc[populationSize_sex.sex=='f'].liznumber/populationSize_sex\
.loc[populationSize_sex.sex=='f'].liznumberYear)
populationSize_sex.to_csv(device['output']+'population size 56 mm plus.csv',index = False)
populationSize_sex

Unnamed: 0,year,sex,liznumber,liznumberYear,propMale,propFemale
0,2000,f,26,38,,0.684211
1,2000,m,12,38,0.315789,
2,2001,f,30,50,,0.6
3,2001,m,20,50,0.4,
4,2002,f,22,29,,0.758621
5,2002,m,7,29,0.241379,
6,2003,f,22,38,,0.578947
7,2003,m,16,38,0.421053,
8,2004,f,9,16,,0.5625
9,2004,m,7,16,0.4375,


In [31]:
Sjarrovii = go.Scatter(x = populationSize_sex.loc[(populationSize_sex.propMale.notna())].year
           , y = populationSize_sex.loc[(populationSize_sex.propMale.notna())].propMale
          ,name = '<i>S. jarrovii</i>')
data = [Sjarrovii]
layout = go.Layout(
    title = 'Proportion of Males by Year for <i>Sceloporus jarrovii</i> 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18),
        range = [1999.5,2017.5]),
    yaxis = dict(
        tickformat = ".2%",
        title = 'Percentage of Males',
        titlefont = dict(
            size = 18),
    range = [0,1])
    
)
fig = go.Figure(
        data = data,
        layout = layout)
# plot(fig, filename = 'Proportion of Males by Species and Year for Sceloporus jarrovii 2000-2017.html')
iplot(fig, filename = 'Proportion of Males by Species and Year for Sceloporus jarrovii 2000-2017.html')

In [32]:
description(populationSize_sex,'propMale',True)[['mean','std','min','25%','50%','75%','max','siqr']]

mean    47.2748
std     12.6559
min     24.1379
25%          40
50%      44.186
75%     57.1429
max     66.6667
siqr    8.57143
Name: propMale, dtype: object

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

To be Done

There is not much variation in the sex ratio in the population here, suggesting that these drops do not differentially  impact males or females overall.  We should confirm that there isn't a difference in particular age/size groups.  Such a difference might be explained by developmental or social changes which leave certain groups more vulnerable than others.

### Tail Condition Distribution

Here we look at the proportion of individuals in our data who have experienced autotomy.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [33]:
populationSize_aut = df_firstInYear.groupby(['year','autotomized']).liznumber.nunique()\
.reset_index()\
.merge(df_firstInYear.groupby(['year']).liznumber.nunique().reset_index()\
       .rename(columns={'liznumber':'liznumberYear'})
       ,on=['year'])
populationSize_aut\
.loc[populationSize_aut.autotomized,'propAutotomized'] = populationSize_aut\
.loc[populationSize_aut.autotomized].liznumber/populationSize_aut\
.loc[populationSize_aut.autotomized].liznumberYear
populationSize_aut\
.loc[~populationSize_aut.autotomized,'propIntact'] = (populationSize_aut\
.loc[~populationSize_aut.autotomized].liznumber/populationSize_aut\
.loc[~populationSize_aut.autotomized].liznumberYear)
populationSize_aut

Unnamed: 0,year,autotomized,liznumber,liznumberYear,propAutotomized,propIntact
0,2000,False,35,38,,0.921053
1,2000,True,3,38,0.078947,
2,2001,False,46,50,,0.92
3,2001,True,4,50,0.08,
4,2002,False,24,29,,0.827586
5,2002,True,5,29,0.172414,
6,2003,False,35,38,,0.921053
7,2003,True,3,38,0.078947,
8,2004,False,11,16,,0.6875
9,2004,True,5,16,0.3125,


In [34]:
Sjarrovii = go.Scatter(x = populationSize_aut.loc[(populationSize_aut.propAutotomized.notna())].year
           , y = populationSize_aut.loc[(populationSize_aut.propAutotomized.notna())].propAutotomized
          ,name = 'Autotomized <i>S. jarrovii</i>')



data = [Sjarrovii]
layout = go.Layout(
    title = 'Proportion Autotomized by Year for <i>Sceloporus jarrovii</i> 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18),
        range = [1999.5,2017.5]),
    yaxis = dict(
        tickformat=".2%",
        title = 'Proportion of Autotomized Lizards',
        titlefont = dict(
            size = 18),
    range=[0,1])
)
fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Proportion Autotomized by Year for Sceloporus jarrovii 2000-2017.html')
# plot(fig, filename = 'Proportion Autotomized by Year for Sceloporus jarrovii 2000-2017.html')

In [35]:
description(populationSize_aut, 'propAutotomized',True)[['mean','std','min','25%','50%','75%','max','siqr']]

mean     12.835
std     7.75402
min     4.87805
25%     7.89474
50%     10.5335
75%     15.0246
max       31.25
siqr    3.56495
Name: propAutotomized, dtype: object

Proportion of _S. jarrovii_ with evidence of autotomy is also oscillates over the years.  
To be done:
- does this track with oscillations in the population?

### Severity of Autotomy
Here we will regress SVL and sex on TL for intact _S. jarrovii_ to estimate what the TL would have been had an animal not autotomized.

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [36]:
dfPred = df_firstInYear[['liznumber','svl','sex','tl','autotomized','year']]

In [37]:
dfPred['tl_svl'] = dfPred.tl/dfPred.svl

In [38]:
intactRatio = go.Box(y=dfPred.loc[dfPred.autotomized==False].tl_svl, name = 'Intact')
autotomizedRatio = go.Box(y=dfPred.loc[dfPred.autotomized].tl_svl, name = 'Autotmized')
data = [autotomizedRatio, intactRatio]
layout = go.Layout(
    title = 'Box Plot of TL/SVL Ratio by Autotomy Status for <i>Sceloporus jarrovii<i/> CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Autotomy Status',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'TL/SVL Ratio',
        titlefont = dict(
            size = 18))
)

fig = go.Figure(
        data = data,
        layout = layout)

iplot(fig, filename = 'Box Plot of TL/SVL Ratio by Autotomy Status for <i>Sceloporus jarrovii<i/> CC 2000-2017.html')
# plot(fig, filename = 'Box Plot of TL/SVL Ratio by Autotomy Status for <i>Sceloporus jarrovii<i/> CC 2000-2017.html')

We will identify the _Sceloporus jarrovii_ with a tl/svl ratio outside of the center 95%.  To do this, we create a boolean column _extremeRatio_ individuals with tl/svl ratios outside of the center 95% have a value of **True** in this column.  We only use individuals without extreme ratios to model tl and svl.

In [39]:
print(dfPred.loc[dfPred.autotomized==False].tl_svl.describe())
dfPred.loc[(dfPred.autotomized==False)&
                   (dfPred.tl_svl>=dfPred.tl_svl.quantile(.025))&
                   (dfPred.tl_svl<=dfPred.tl_svl.quantile(.925)),'extremeRatio']= False
dfPred.loc[(dfPred.autotomized==False)&
                   (dfPred.tl_svl<dfPred.tl_svl.quantile(.025))|
                   (dfPred.tl_svl>dfPred.tl_svl.quantile(.925)),'extremeRatio']= True
dfPred.loc[dfPred.extremeRatio.isna()].autotomized.value_counts(dropna=False)


count    354.000000
mean       1.352573
std        0.347068
min        0.342857
25%        1.294913
50%        1.340909
75%        1.380714
max        7.615385
Name: tl_svl, dtype: float64


True    42
Name: autotomized, dtype: int64

In [40]:
sjM = pg.linear_regression(dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='m')].svl,
                     dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='m')].tl,
                     remove_na=True)
sjM

Unnamed: 0,names,coef,se,T,pval,r2,adj_r2,CI[2.5%],CI[97.5%]
0,Intercept,-7.467649,1.114087,-6.702931,4.891518e-10,0.957691,0.957382,-9.67068,-5.264618
1,svl,1.49701,0.026882,55.687177,5.6606380000000005e-96,0.957691,0.957382,1.443851,1.550168


In [41]:
sjF = pg.linear_regression(dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='f')].svl,
                     dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='f')].tl,
                     remove_na=True)
sjF

Unnamed: 0,names,coef,se,T,pval,r2,adj_r2,CI[2.5%],CI[97.5%]
0,Intercept,-5.293225,0.936443,-5.652482,5.981374e-08,0.961044,0.960832,-7.140838,-3.445613
1,svl,1.467836,0.021846,67.191265,6.542026999999999e-131,0.961044,0.960832,1.424734,1.510937


SVL and sex are sufficient to predict TL in males.

In [42]:
femaleIntercept = [thing for thing in sjF.loc[sjF.names=='Intercept','coef']][0]
maleIntercept = [thing for thing in sjM.loc[sjM.names=='Intercept','coef']][0]
femaleSvl = [thing for thing in sjF.loc[sjF.names=='svl','coef']][0]
maleSvl = [thing for thing in sjM.loc[sjM.names=='svl','coef']][0]

print(femaleIntercept,maleIntercept,femaleSvl,maleSvl)

-5.293225121160946 -7.4676489722314185 1.4678356397402224 1.4970097127269817


In [43]:
dfPred.loc[dfPred.sex=='f','tlPred'] = dfPred.loc[dfPred.sex=='f'].svl*femaleSvl+femaleIntercept
dfPred.loc[dfPred.sex=='m','tlPred'] = dfPred.loc[dfPred.sex=='m'].svl*maleSvl+maleIntercept
dfPred['diff'] = dfPred.tlPred - dfPred.tl
dfPred['propDiff'] = dfPred['diff']/dfPred.tlPred

In [44]:
dfPred.groupby(['autotomized','sex']).diff.describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
autotomized,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
False,f,202.0,-0.84652,6.538281,-85.211362,-1.983306,-0.562265,0.883652,11.566393
False,m,152.0,-0.376809,4.102237,-19.58128,-2.057358,-0.132115,0.91872,32.927691
True,f,19.0,16.37434,12.094129,-3.837114,11.548858,14.741844,18.58979,44.291543
True,m,24.0,21.562435,12.242207,0.376856,13.944138,20.406759,26.420215,51.394798


In [45]:
male = go.Scatter(x=dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='m')].sort_values('svl')['svl'],
          y=dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='m')].sort_values('svl')['tl'],
                  name = 'Male',mode='markers',marker=dict(size=5,opacity=0.8))
female = go.Scatter(x=dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='f')].sort_values('svl')['svl'],
          y=dfPred.loc[(dfPred.extremeRatio==False)&(dfPred.sex=='f')].sort_values('svl')['tl'],
                    name = 'Female',mode='markers',marker=dict(size=5,opacity=0.8))
All = go.Scatter(x=dfPred.loc[(dfPred.extremeRatio==False)].sort_values('svl')['svl'],
          y=dfPred.loc[(dfPred.extremeRatio==False)].sort_values('svl')['tl'],
                    name = 'All',mode='markers',marker=dict(size=20,opacity=0.5,line=dict(width=2)))
maleOut = go.Scatter(x=dfPred.loc[(dfPred.extremeRatio!=False)&(dfPred.sex=='m')].sort_values('svl')['svl'],
          y=dfPred.loc[(dfPred.extremeRatio!=False)&(dfPred.sex=='m')].sort_values('svl')['tl'],
                  name = 'Male Extreme',mode='markers',marker=dict(size=5,opacity=0.8))
femaleOut = go.Scatter(x=dfPred.loc[(dfPred.extremeRatio!=False)&(dfPred.sex=='f')].sort_values('svl')['svl'],
          y=dfPred.loc[(dfPred.extremeRatio!=False)&(dfPred.sex=='f')].sort_values('svl')['tl'],
                    name = 'Female Extreme',mode='markers',marker=dict(size=5,opacity=0.8))
AllOut = go.Scatter(x=dfPred.loc[(dfPred.extremeRatio!=False)].sort_values('svl')['svl'],
          y=dfPred.loc[(dfPred.extremeRatio!=False)].sort_values('svl')['tl'],
                    name = 'All Extreme',mode='markers',marker=dict(size=20,opacity=0.5,line=dict(width=2)))

data = [All,male,female,AllOut,maleOut,femaleOut]
layout = go.Layout(
    title = 'Scatter Plot of SVL vs TL for Intact <i>Sceloporus jarrovii</i> CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        tickangle=45,
        dtick = 1,
        title = 'SVL (mm)',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'TL (mm)',
#         tickformat = ".0%",
        titlefont = dict(
            size = 18))
)

fig = go.Figure(
        data = data,
        layout = layout)
# fig.update_xaxes(range=[0,dfPred.loc[(dfPred.autotomized==False)]['count'].describe()['max']+1])
# fig.update_yaxes(range=[0,1])

iplot(fig, filename = 'Scatter Plot of SVL vs TL for Intact Sceloporus jarrovii CC 2000-2017.html')
# plot(fig, filename = 'Scatter Plot of SVL vs TL for Intact Sceloporus jarrovii CC 2000-2017.html')

In [46]:
dfPred.loc[((dfPred.autotomized==False)&(dfPred.extremeRatio==False))|
          (dfPred.autotomized==True)].groupby(['autotomized','sex']).propDiff.describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
autotomized,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
False,f,185.0,0.00054,0.042678,-0.084099,-0.030019,-0.007453,0.024059,0.215479
False,m,139.0,0.00011,0.043088,-0.087522,-0.025787,-0.001706,0.021457,0.198774
True,f,19.0,0.330552,0.254929,-0.058885,0.210518,0.304676,0.372659,0.832044
True,m,24.0,0.436977,0.214088,0.005355,0.332212,0.421,0.543241,0.849744


In [47]:
dfPred.loc[((dfPred.autotomized==False)&(dfPred.extremeRatio==False))|
          (dfPred.autotomized==True)].groupby(['year','autotomized'])\
.propDiff.describe()[['count','50%']].reset_index()

Unnamed: 0,year,autotomized,count,50%
0,2000,False,31.0,0.010799
1,2000,True,3.0,0.304676
2,2001,False,44.0,-0.001731
3,2001,True,4.0,0.377471
4,2002,False,24.0,-0.000407
5,2002,True,5.0,0.357981
6,2003,False,32.0,-0.010459
7,2003,True,3.0,0.409342
8,2004,False,10.0,0.007264
9,2004,True,5.0,0.431209


In [48]:
dfYearlyEst = dfPred.loc[((dfPred.autotomized==False)&(dfPred.extremeRatio==False))|
          (dfPred.autotomized==True)].groupby(['year','autotomized','sex'])\
.propDiff.describe()[['count','50%']].reset_index(drop=False).merge(dfPred.loc[((dfPred.autotomized==False)&(dfPred.extremeRatio==False))|
          (dfPred.autotomized==True)].groupby(['year','autotomized'])\
.propDiff.describe()[['count','50%']].reset_index().rename(columns = {'count':'count_all',
                                                                      '50%':'50%_all'}),
                                                                    how = 'left',
                                                                    on =['year','autotomized'])
dfYearlyEst.loc[dfYearlyEst.autotomized]

Unnamed: 0,year,autotomized,sex,count,50%,count_all,50%_all
2,2000,True,f,2.0,0.226479,3.0,0.304676
3,2000,True,m,1.0,0.410791,3.0,0.304676
6,2001,True,f,2.0,0.30748,4.0,0.377471
7,2001,True,m,2.0,0.490644,4.0,0.377471
10,2002,True,f,3.0,0.211648,5.0,0.357981
11,2002,True,m,2.0,0.492038,5.0,0.357981
14,2003,True,m,3.0,0.409342,3.0,0.409342
17,2004,True,f,2.0,0.606279,5.0,0.431209
18,2004,True,m,3.0,0.431209,5.0,0.431209
21,2005,True,m,1.0,0.645255,1.0,0.645255


In [49]:
male = go.Scatter(x=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='m')].sort_values('count_all')['count'],
          y=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='m')].sort_values('count_all')['50%'],
                  name = 'Male',mode='markers')
female = go.Scatter(x=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='f')].sort_values('count_all')['count'],
          y=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='f')].sort_values('count_all')['50%'],
                  name = 'Female',mode='markers')
All = go.Scatter(x=dfYearlyEst.loc[(dfYearlyEst.autotomized)].sort_values('count_all')['count_all'],
          y=dfYearlyEst.loc[(dfYearlyEst.autotomized)].sort_values('count_all')['50%_all'],
                    name = 'All',mode='markers',marker=dict(size=20,opacity=0.5,line=dict(width=2)))
medianAll = go.Scatter(x=[0,dfYearlyEst.loc[(dfYearlyEst.autotomized)]['count_all'].describe()['max']+1],
                      y=[dfYearlyEst.loc[(dfYearlyEst.autotomized)]['50%_all'].median(),
                        dfYearlyEst.loc[(dfYearlyEst.autotomized)]['50%_all'].median()],
                         name='medianAll',mode='lines')
data = [All,male,female,medianAll]
layout = go.Layout(
    title = 'Scatter Plot of Population Size vs Median Proportion of Tail Loss for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Population Size',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median Proportion of Tail Lost',
        tickformat = ".0%",
        titlefont = dict(
            size = 18)),
    boxmode= 'group')

fig = go.Figure(
        data = data,
        layout = layout)
fig.update_xaxes(range=[0,dfYearlyEst.loc[(dfYearlyEst.autotomized)]['count_all'].describe()['max']+1])
fig.update_yaxes(range=[0,1])

iplot(fig, filename = 'Scatter Plot of Population Size vs Median Proportion of Tail Loss for CC 2000-2017.html')
# plot(fig, filename = 'Scatter Plot of Population Size vs Median Proportion of Tail Loss for CC 2000-2017.html')

In [50]:
male = go.Scatter(x=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='m')]['year'],
          y=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='m')]['50%'],
                  name = 'Male',mode='markers')
female = go.Scatter(x=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='f')]['year'],
          y=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='f')]['50%'],
                  name = 'Female',mode='markers')
All = go.Scatter(x=dfYearlyEst.loc[(dfYearlyEst.autotomized)]['year'],
          y=dfYearlyEst.loc[(dfYearlyEst.autotomized)]['50%_all'],
                    name = 'All',mode='lines',marker=dict(size=20,opacity=0.5,line=dict(width=2)))
medianAll = go.Scatter(x=[dfYearlyEst.loc[(dfYearlyEst.autotomized)]['year'].describe()['min'],
                          dfYearlyEst.loc[(dfYearlyEst.autotomized)]['year'].describe()['max']],
                      y=[dfYearlyEst.loc[(dfYearlyEst.autotomized)]['50%_all'].median(),
                        dfYearlyEst.loc[(dfYearlyEst.autotomized)]['50%_all'].median()],
                         name='medianAll',mode='lines')
data = [All,male,female,medianAll]
layout = go.Layout(
    title = 'Scatter Plot of Median Proportion of Tail Loss for CC 2000-2017 by Year',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median Proportion of Tail Lost',
        tickformat = ".0%",
        titlefont = dict(
            size = 18)),
    boxmode= 'group')

fig = go.Figure(
        data = data,
        layout = layout)
# fig.update_xaxes(range=[0,dfYearlyEst.loc[(dfYearlyEst.autotomized)]['count_all'].describe()['max']+1])
# fig.update_yaxes(range=[0,1])

iplot(fig, filename = 'Scatter Plot of Median Proportion of Tail Loss for CC 2000-2017 by Year.html')
# plot(fig, filename = 'Scatter Plot of Median Proportion of Tail Loss for CC 2000-2017 by Year.html')

To be done:

- for each sex and all, regress prop tl on relevant count 
- add these lines to plots with CI
- report on stat (what does this tell us?)

# resume
[TOC](#Table-of-Contents)

In [51]:
male = go.Box(y=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='m')]['50%'],
                  name = 'Male (n = {:.0f})'\
              .format(dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='m')]['count'].sum()))
female = go.Box(y=dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='f')]['50%'],
                    name = 'Female (n = {:.0f})'\
              .format(dfYearlyEst.loc[(dfYearlyEst.autotomized)&(dfYearlyEst.sex=='f')]['count'].sum()))
All = go.Box(y=dfYearlyEst.loc[(dfYearlyEst.autotomized)]['50%'],
                    name = 'All (n = {:.0f})'\
              .format(dfYearlyEst.loc[(dfYearlyEst.autotomized)]['count'].sum()))
data = [All,male,female,]
layout = go.Layout(
    title = 'Boxplot Median Proportion of Tail Loss for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Population Size',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median Proportion of Tail Lost',
        tickformat = ".0%",
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Boxplot Median Proportion of Tail Loss for CC 2000-2017.html')
# plot(fig, filename = 'Boxplot Median Proportion of Tail Loss for CC 2000-2017.html')

In [52]:
dfPred.loc[(dfPred.autotomized==True)].groupby('sex').propDiff.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
f,19.0,0.330552,0.254929,-0.058885,0.210518,0.304676,0.372659,0.832044
m,24.0,0.436977,0.214088,0.005355,0.332212,0.421,0.543241,0.849744


In [53]:
female = go.Box(x=dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='f')]['year'], 
                y = dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='f')].propDiff
                      ,name='Females')
male = go.Box(x=dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='m')]['year'], 
              y = dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='m')].propDiff
                    ,name='Males')
All = go.Box(x=dfPred.loc[(dfPred.autotomized==True)]['year'], 
              y = dfPred.loc[(dfPred.autotomized==True)].propDiff
                    ,name='All')
data = [All,male,female]
layout = go.Layout(
    title = 'Box Plot of Tail Lost at Capture for <i>Sceloporus jarrovii</i> by Year',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'propDiff',
        tickformat = ".0%",
        titlefont = dict(
            size = 18)),
    boxmode= 'group')

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Box Plot of TL/SVL Sceloporus jarrovii by Year.html')
# plot(fig, filename = 'Box Plot of TL/SVL Sceloporus jarrovii by Year.html')

In [54]:
female = go.Box(x=dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='f')]['year'], 
                y = dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='f')].propDiff
                      ,name='Females')
male = go.Box(x=dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='m')]['year'], 
              y = dfPred.loc[(dfPred.autotomized==True)&(dfPred.sex=='m')].propDiff
                    ,name='Males')
All = go.Box(x=dfPred.loc[(dfPred.autotomized==True)]['year'], 
              y = dfPred.loc[(dfPred.autotomized==True)].propDiff
                    ,name='All')
data = [All,male,female]
layout = go.Layout(
    title = 'Box Plot of Tail Lost at Capture for <i>Sceloporus jarrovii</i> by Year',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'propDiff',
        tickformat = ".0%",
        titlefont = dict(
            size = 18)),
    boxmode= 'group')

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Box Plot of TL/SVL Sceloporus jarrovii by Year.html')
# plot(fig, filename = 'Box Plot of TL/SVL Sceloporus jarrovii by Year.html')

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

### Morphometrics

In this section we describe the distributions of various morphometrics.

- [SVL](#SVL)
- [TL](#TL)
- [RTL](#RTL)
- [Mass](#SVL)

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

#### SVL

Now we examine the range and distribution of svl values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of SVL](#SVLhist)

In [55]:
print("svl values in the data set range from {} to {} for and are distributed across sex \
as displayed here:"\
      .format(df_firstInYear.svl.min(), df_firstInYear.svl.max()))
description(df_firstInYear.groupby('sex'),variable='svl')

svl values in the data set range from 13.0 to 55.0 for and are distributed across sex as displayed here:


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
f,221.0,42.004525,7.938684,13.0,37.0,42.0,48.0,55.0,5.5,not calculated
m,176.0,40.795455,7.053524,24.0,35.0,40.0,46.0,55.0,5.5,not calculated


In [56]:
SVLbyYear = description(df_firstInYear.groupby(['year','sex']),'svl').reset_index()
SVLbyYear

Unnamed: 0,year,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,f,26.0,45.653846,9.286301,31.0,36.0,50.0,53.0,55.0,8.5,not calculated
1,2000,m,12.0,39.333333,9.393744,31.0,32.75,34.5,42.5,55.0,4.875,not calculated
2,2001,f,30.0,43.866667,5.66741,37.0,40.0,42.0,48.75,55.0,4.375,not calculated
3,2001,m,20.0,43.2,5.307393,36.0,39.0,42.0,45.75,53.0,3.375,not calculated
4,2002,f,22.0,47.409091,10.878809,27.0,50.25,53.0,54.0,55.0,1.875,not calculated
5,2002,m,7.0,31.0,2.828427,27.0,29.0,31.0,33.5,34.0,2.25,not calculated
6,2003,f,22.0,37.090909,9.206153,26.0,30.25,34.0,41.25,55.0,5.5,not calculated
7,2003,m,16.0,37.4375,9.79094,29.0,31.0,33.5,39.25,55.0,4.125,not calculated
8,2004,f,9.0,34.888889,7.9285,30.0,31.0,32.0,34.0,55.0,1.5,not calculated
9,2004,m,7.0,33.571429,3.457222,29.0,31.5,33.0,35.5,39.0,2.0,not calculated


Let's plot these values. 

##### Figures of SVL values

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)


In [57]:
female = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='f')]['year'], 
                y = df_firstInYear.loc[(df_firstInYear.sex=='f')].svl
                      ,name='Female Box',yaxis='y1')
male = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='m')]['year'], 
              y = df_firstInYear.loc[(df_firstInYear.sex=='m')].svl
                    ,name='Males Box',yaxis='y1')
female_line = go.Scatter(x=SVLbyYear.loc[(SVLbyYear.sex=='f')]['year'], 
                y = SVLbyYear.loc[(SVLbyYear.sex=='f')]['50%']
                      ,name='Female Median', line = dict(color = 'red'),yaxis='y1')
male_line = go.Scatter(x=SVLbyYear.loc[(SVLbyYear.sex=='m')]['year'], 
              y = SVLbyYear.loc[(SVLbyYear.sex=='m')]['50%']
                    ,name='Male Median', line = dict(color = 'blue')
                      ,yaxis='y1')
female_count = go.Scatter(x=SVLbyYear.loc[(SVLbyYear.sex=='f')]['year'], 
                y = SVLbyYear.loc[(SVLbyYear.sex=='f')]['count']
                      ,name='Females Count',line = dict(color = 'red',dash = 'dash'),
                          yaxis = 'y1')
male_count = go.Scatter(x=SVLbyYear.loc[(SVLbyYear.sex=='m')]['year'], 
              y = SVLbyYear.loc[(SVLbyYear.sex=='m')]['count']
                    ,name='Male Count',line = dict(color = 'blue',dash = 'dash'),
                       yaxis = 'y1')

data = [male,female,male_line,female_line,male_count,female_count]
layout = go.Layout(
    title = 'Box Plot of SVL at Capture for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median SVL (mm)',
        titlefont = dict(
            size = 18)),
    yaxis2 = dict(
        title = 'Count of Lizard',
        titlefont = dict(
            size = 18),
        side = 'right'),
    boxmode= 'group')

fig = go.Figure(
        data = data,
        layout = layout)
# fig.update_yaxes(title_text="<b>secondary</b> yaxis title", secondary_y=True)
iplot(fig, filename = 'Box Plot of Median SVL Sceloporus jarrovii in CC 2000-2017.html')
plot(fig, filename = 'Box Plot of Median SVL Sceloporus jarrovii in CC 2000-2017.html')

'Box Plot of Median SVL Sceloporus jarrovii in CC 2000-2017.html'

To be done:
- look at correlation between the count and the median svl 
Outliers will be addressed in the Cleaning notebook, but will be removed for the remained of the analyses here.

#### TL

Now we examine the range and distribution of TL values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of TL](#TLhist)

In [58]:
TLbyYear = description(df_firstInYear.groupby(['year','autotomized','sex']),'tl').reset_index()
TLbyYear

Unnamed: 0,year,autotomized,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,False,f,24.0,62.458333,14.337483,38.0,46.75,69.0,74.0,78.0,13.625,not calculated
1,2000,False,m,11.0,52.0,17.005881,38.0,40.5,43.0,61.5,80.0,10.5,not calculated
2,2000,True,f,2.0,41.5,16.263456,30.0,35.75,41.5,47.25,53.0,5.75,not calculated
3,2000,True,m,1.0,30.0,,30.0,30.0,30.0,30.0,30.0,0.0,not calculated
4,2001,False,f,28.0,59.642857,8.486217,48.0,53.75,56.5,68.25,76.0,7.25,not calculated
5,2001,False,m,18.0,58.333333,8.764903,48.0,52.5,55.0,63.0,78.0,5.25,not calculated
6,2001,True,f,2.0,37.5,3.535534,35.0,36.25,37.5,38.75,40.0,1.25,not calculated
7,2001,True,m,2.0,26.0,4.242641,23.0,24.5,26.0,27.5,29.0,1.5,not calculated
8,2002,False,f,19.0,64.0,16.599866,31.0,66.5,71.0,74.0,78.0,3.75,not calculated
9,2002,False,m,5.0,39.8,5.310367,34.0,34.0,43.0,44.0,44.0,5.0,not calculated


##### Histogram of TL (Intact)

In [59]:
femaleBoxIntact = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='f')]['year'], 
                y = df_firstInYear.loc[(df_firstInYear.sex=='f')&~(df_firstInYear.autotomized)].tl
                      ,name='Females')
maleBoxIntact = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='m')]['year'], 
              y = df_firstInYear.loc[(df_firstInYear.sex=='m')&~(df_firstInYear.autotomized)].tl
                    ,name='Males')
femaleIntact = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['year'], 
                      y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['50%']
                      ,name='Intact Females')
maleIntact = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['year'], 
              y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['50%']
                    ,name='Intact Males')
female_count = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['year'], 
                y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['count']
                      ,name='Females Count',line = dict(color = 'red',dash = 'dash'))
male_count = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['year'], 
              y = TLbyYear.loc[~(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['count']
                    ,name='Male Count',line = dict(color = 'blue',dash = 'dash'))

data = [maleBoxIntact,femaleBoxIntact,maleIntact,femaleIntact,male_count,female_count]
layout = go.Layout(
    title = 'TL for Intact <i>Sceloporus jarrovii</i> in CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'TL (mm)',
        titlefont = dict(
            size = 18)),
boxmode='group')

fig = go.Figure(
        data = data,
        layout = layout)
#iplot(fig, filename = 'TL for Intact Sceloporus jarrovii in CC 2000-2017.html')
plot(fig, filename = 'TL for Intact Sceloporus jarrovii in CC 2000-2017.html')

'TL for Intact Sceloporus jarrovii in CC 2000-2017.html'

To be done:
- look at correlation between the count and the median tl 
Outliers will be addressed in the Cleaning notebook, but will be removed for the remained of the analyses here.

##### Histogram of TL (Autotomized)

In [60]:
femaleBoxAutotomized = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='f')]['year'], 
                y = df_firstInYear.loc[(df_firstInYear.sex=='f')&(df_firstInYear.autotomized)].tl
                      ,name='Females')
maleBoxAutotomized = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='m')]['year'], 
              y = df_firstInYear.loc[(df_firstInYear.sex=='m')&(df_firstInYear.autotomized)].tl
                    ,name='Males')
femaleAutotomized = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['year'], 
                      y = TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['50%']
                      ,name='Autotomized Females')
maleAutotomized = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['year'], 
              y = TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['50%']
                    ,name='Autotomized Males')
female_count = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['year'], 
                y = TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='f')]['count']
                      ,name='Females Count',line = dict(color = 'red',dash = 'dash'))
male_count = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['year'], 
              y = TLbyYear.loc[(TLbyYear.autotomized)&(TLbyYear.sex=='m')]['count']
                    ,name='Male Count',line = dict(color = 'blue',dash = 'dash'))

data = [maleBoxAutotomized,femaleBoxAutotomized,maleAutotomized,femaleAutotomized,male_count,female_count]
layout = go.Layout(
    title = 'TL for Autotomized <i>Sceloporus jarrovii</i> in CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'TL (mm)',
        titlefont = dict(
            size = 18)),
boxmode='group')

fig = go.Figure(
        data = data,
        layout = layout)
#iplot(fig, filename = 'TL for Autotomized Sceloporus jarrovii in CC 2000-2017.html')
plot(fig, filename = 'TL for Autotomized Sceloporus jarrovii in CC 2000-2017.html')

'TL for Autotomized Sceloporus jarrovii in CC 2000-2017.html'

##### Histogram of TL (Intact vs. Autotomized) 2

In [61]:
TLbyYear.groupby('year')['year','50%'].apply(lambda x: x[['50%']])

Unnamed: 0,50%
0,69.0
1,43.0
2,41.5
3,30.0
4,56.5
5,55.0
6,37.5
7,26.0
8,71.0
9,43.0


In [62]:
# BoxAut = go.Box(x=df_firstInYear.loc[(df_firstInYear.autotomized)]['year'], 
#                 y = df_firstInYear.loc[(df_firstInYear.autotomized)].tl
#                       ,name='Autotomized')
# BoxIntact = go.Box(x=df_firstInYear.loc[(df_firstInYear.autotomized==False)]['year'], 
#               y = df_firstInYear.loc[(df_firstInYear.autotomized==False)].tl
#                     ,name='Intact')
compared = (TLbyYear.loc[~(TLbyYear.autotomized)]['50%'].reset_index(drop=True))\
-(TLbyYear.loc[(TLbyYear.autotomized)]['50%'].reset_index(drop=True))
Comparison = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)]['year'], 
                   y = compared,name='Intact - Autotomized',
                       mode = 'markers')
# Intact = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)]['year'], 
#               y = TLbyYear.loc[~(TLbyYear.autotomized)]['50%']
#                     ,name='Intact')
data = [Comparison]
layout = go.Layout(
    title = 'Comparison on Median TL of <i>Sceloporus jarrovii</i> in CC 2000-2017 by Tail Condition',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'TL (mm)',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
#iplot(fig, filename = 'Comparison on Median TL of Sceloporus jarrovii in CC 2000-2017.html')
plot(fig, filename = 'Comparison on Median TL of Sceloporus jarrovii in CC 2000-2017.html')

'Comparison on Median TL of Sceloporus jarrovii in CC 2000-2017.html'

##### Histogram of TL (Intact vs. Autotomized)

In [63]:
BoxAut = go.Box(x=df_firstInYear.loc[(df_firstInYear.autotomized)]['year'], 
                y = df_firstInYear.loc[(df_firstInYear.autotomized)].tl
                      ,name='Autotomized')
BoxIntact = go.Box(x=df_firstInYear.loc[(df_firstInYear.autotomized==False)]['year'], 
              y = df_firstInYear.loc[(df_firstInYear.autotomized==False)].tl
                    ,name='Intact')
# Autotomized = go.Scatter(x=TLbyYear.loc[(TLbyYear.autotomized)]['year'], 
#                    y = TLbyYear.loc[(TLbyYear.autotomized)]['50%']
#                       ,name='Autotomized')
# Intact = go.Scatter(x=TLbyYear.loc[~(TLbyYear.autotomized)]['year'], 
#               y = TLbyYear.loc[~(TLbyYear.autotomized)]['50%']
#                     ,name='Intact')
data = [BoxAut,BoxIntact]
layout = go.Layout(
    title = 'TL of <i>Sceloporus jarrovii</i> in CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'TL (mm)',
        titlefont = dict(
            size = 18)),
boxmode='group')

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'TL of Sceloporus jarrovii in CC 2000-2017.html')
#plot(fig, filename = 'TL of Sceloporus jarrovii in CC 2000-2017.html')

#### RTL

Now we examine the range and distribution of RTL values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of RTL](#RTLhist)

In [64]:
RTLbyYear = description(df_firstInYear.loc[df_firstInYear.autotomized].groupby(['year','sex']),'rtl').reset_index()
RTLbyYear

Unnamed: 0,year,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,f,2.0,11.5,16.263456,0.0,5.75,11.5,17.25,23.0,5.75,not calculated
1,2000,m,1.0,5.0,,5.0,5.0,5.0,5.0,5.0,0.0,not calculated
2,2001,f,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,not calculated
3,2001,m,2.0,11.0,9.899495,4.0,7.5,11.0,14.5,18.0,3.5,not calculated
4,2002,f,3.0,11.666667,20.207259,0.0,0.0,0.0,17.5,35.0,8.75,not calculated
5,2002,m,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,not calculated
6,2003,m,3.0,7.333333,7.023769,0.0,4.0,8.0,11.0,14.0,3.5,not calculated
7,2004,f,2.0,2.5,2.12132,1.0,1.75,2.5,3.25,4.0,0.75,not calculated
8,2004,m,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,not calculated
9,2005,m,1.0,10.0,,10.0,10.0,10.0,10.0,10.0,0.0,not calculated


##### Histogram of rtl

In [65]:
femaleAut = go.Scatter(x=RTLbyYear.loc[(RTLbyYear.sex=='f')]['year'], 
                   y = RTLbyYear.loc[(RTLbyYear.sex=='f')]['50%']
                      ,name='Females')
maleAut = go.Scatter(x=RTLbyYear.loc[(RTLbyYear.sex=='m')]['year'], 
              y = RTLbyYear.loc[(RTLbyYear.sex=='m')]['50%']
                    ,name='Males')

data = [maleAut,femaleAut]
layout = go.Layout(
    title = 'Median RTL for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median RTL (mm)',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Median RTL Sceloporus jarrovii in CC 2000-2017.html')
#plot(fig, filename = 'Median RTL Sceloporus jarrovii in CC 2000-2017.html')

#### Mass

Now we examine the range and distribution of mass values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Plot of mass](#Plot-of-mass)

We probably need to adjust this analysis to consider teh month in which the females were captures since gravidity or nearness to the time of parturition may bias the analysis.

In [66]:
MassbyYear = description(df_firstInYear.loc[df_firstInYear.autotomized].groupby(['year','sex']),'mass').reset_index()
MassbyYear

Unnamed: 0,year,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,f,2.0,2.1,1.555635,1.0,1.55,2.1,2.65,3.2,0.55,not calculated
1,2000,m,1.0,2.0,,2.0,2.0,2.0,2.0,2.0,0.0,not calculated
2,2001,f,2.0,2.5,0.0,2.5,2.5,2.5,2.5,2.5,0.0,not calculated
3,2001,m,2.0,1.35,0.494975,1.0,1.175,1.35,1.525,1.7,0.175,not calculated
4,2002,f,3.0,3.266667,1.792577,1.2,2.7,4.2,4.3,4.4,0.8,not calculated
5,2002,m,2.0,1.1,0.141421,1.0,1.05,1.1,1.15,1.2,0.05,not calculated
6,2003,m,2.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,0.0,not calculated
7,2004,f,2.0,0.85,0.353553,0.6,0.725,0.85,0.975,1.1,0.125,not calculated
8,2004,m,3.0,1.066667,0.814453,0.5,0.6,0.7,1.35,2.0,0.375,not calculated
9,2005,m,1.0,0.8,,0.8,0.8,0.8,0.8,0.8,0.0,not calculated


##### Plot of mass

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [67]:
female = go.Scatter(x=MassbyYear.loc[(MassbyYear.sex=='f')]['year'], 
                   y = MassbyYear.loc[(MassbyYear.sex=='f')]['50%']
                      ,name='Females')
male = go.Scatter(x=MassbyYear.loc[(MassbyYear.sex=='m')]['year'], 
              y = MassbyYear.loc[(MassbyYear.sex=='m')]['50%']
                    ,name='Males')

data = [male,female]
layout = go.Layout(
    title = 'Median Mass for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median Mass (mm)',
        titlefont = dict(
            size = 18),
    range = [0,MassbyYear['50%'].max()+5]),
)

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Median Mass Sceloporus jarrovii in CC 2000-2017.html')
#plot(fig, filename = 'Median Mass Sceloporus jarrovii in CC 2000-2017.html')

## Captures

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Let's take a look at the number of times that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of captures for each lizard and finally count the number of lizards that have a given number of captures.  We will use all captures for this

In [68]:
df_firstInYear.groupby('liznumber').capture.apply(lambda x: x.values)

liznumber
1                   [1]
4                   [1]
5                   [1]
6                   [2]
7                [2, 1]
8                   [5]
9                [3, 1]
11                  [1]
18                  [2]
19               [9, 1]
23                  [1]
30                  [1]
33                  [1]
35                  [1]
40                  [1]
48                  [1]
51                  [1]
71                  [1]
74                  [1]
77                  [2]
80                  [1]
81                  [1]
95                  [1]
97                  [1]
105                 [1]
107                 [1]
112                 [1]
113                 [1]
114                 [1]
115                 [1]
118                 [1]
120                 [1]
121                 [1]
122                 [1]
124                 [1]
125                 [1]
126                 [1]
127              [1, 2]
128                 [1]
131                 [1]
133                 [1]
138   

In [69]:
print("The maximum number of captures among lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df_firstInYear.groupby('liznumber').capture.max().min(),
              df_firstInYear.groupby('liznumber').capture.max().max()))
captureMedYr = description(df_firstInYear.groupby(['year','sex']),variable='capture')[['50%','75%','max']].reset_index()
captureMedYr

The maximum number of captures among lizards in the data set range from 1 to 66 for and are distributed across species and sex as displayed here:


Unnamed: 0,year,sex,50%,75%,max
0,2000,f,1.0,1.0,1.0
1,2000,m,1.0,1.0,1.0
2,2001,f,1.0,1.0,2.0
3,2001,m,1.0,1.0,2.0
4,2002,f,1.0,1.0,2.0
5,2002,m,1.0,1.0,1.0
6,2003,f,1.0,1.0,1.0
7,2003,m,1.0,1.0,2.0
8,2004,f,1.0,1.0,1.0
9,2004,m,1.0,1.0,1.0


In [70]:
female = go.Scatter(x=captureMedYr.loc[(captureMedYr.sex=='f')]['year'], 
                   y = captureMedYr.loc[(captureMedYr.sex=='f')]['50%']
                      ,name='Females')
male = go.Scatter(x=captureMedYr.loc[(captureMedYr.sex=='m')]['year'], 
              y = captureMedYr.loc[(captureMedYr.sex=='m')]['50%']
                    ,name='Males')
# femaleBox = go.Box(x=df_firstInYear.loc[df_firstInYear.sex=='f']['year'], 
#                    y = df_firstInYear.loc[df_firstInYear.sex=='f'].groupby('liznumber').capture.apply(lambda x: x.values)
#                       ,name='Females')
# maleBox = go.Box(x=df_firstInYear.loc[df_firstInYear.sex=='m']['year'], 
#               y = df_firstInYear.loc[df_firstInYear.sex=='m'].groupby('liznumber').capture.apply(lambda x: x.values)
#                     ,name='Males')

data = [male,female]#,maleBox,femaleBox]
layout = go.Layout(
    title = 'Median Years Captured for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median Years Captured',
        titlefont = dict(
            size = 18),
    range = [0,captureMedYr['50%'].max()+1]),
)

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Median Years Captured Sceloporus jarrovii in CC 2000-2017.html')
#plot(fig, filename = 'Median Years Captured  Sceloporus jarrovii in CC 2000-2017.html')

In [71]:
# female = go.Scatter(x=captureMedYr.loc[(captureMedYr.sex=='f')]['year'], 
#                    y = captureMedYr.loc[(captureMedYr.sex=='f')]['50%']
#                       ,name='Females')
# male = go.Scatter(x=captureMedYr.loc[(captureMedYr.sex=='m')]['year'], 
#               y = captureMedYr.loc[(captureMedYr.sex=='m')]['50%']
#                     ,name='Males')
femaleBox = go.Box(x=df_firstInYear.loc[df_firstInYear.sex=='f']['year'], 
                   y = df_firstInYear.loc[df_firstInYear.sex=='f'].groupby('liznumber').capture.apply(lambda x: x.values)
                      ,name='Females')
maleBox = go.Box(x=df_firstInYear.loc[df_firstInYear.sex=='m']['year'], 
              y = df_firstInYear.loc[df_firstInYear.sex=='m'].groupby('liznumber').capture.apply(lambda x: x.values)
                    ,name='Males')

data = [maleBox,femaleBox]
layout = go.Layout(
    title = 'Median Years Captured for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median Years Captured',
        titlefont = dict(
            size = 18))
)

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Box Years Captured Sceloporus jarrovii in CC 2000-2017.html')
#plot(fig, filename = 'Box Years Captured  Sceloporus jarrovii in CC 2000-2017.html')

In order to interpret these data we need to factor in person-hours for each year.

## Years

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Let's take a look at the number of years over which that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of years over which each lizard was captures and finally count the number of lizards that have a given number of years over which they were captured.  We will use all captures for this.

In [72]:
(df_firstInYear.groupby('liznumber').year_diff.max()+1).value_counts(normalize=True).reset_index()\
.rename(columns={'index':'year_diff','year_diff':'proportion'}).sort_values('year_diff')

Unnamed: 0,year_diff,proportion
0,1,0.95288
1,2,0.015707
3,3,0.007853
9,6,0.002618
8,7,0.002618
7,8,0.002618
6,9,0.002618
5,11,0.002618
4,14,0.002618
2,16,0.007853


In [73]:
print("The number of years over which each lizard was captured among lizards in the data set range from {} to {}\
for and are distributed across species and sex as displayed here:"\
      .format(df_firstInYear.groupby('liznumber').year_diff.max().min()+1, 
              df_firstInYear.groupby('liznumber').year_diff.max().max()+1))
description(df_firstInYear.groupby('sex'),variable='year_diff')

The number of years over which each lizard was captured among lizards in the data set range from 1 to 16for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
f,221.0,0.244344,1.369813,0.0,0.0,0.0,0.0,13.0,0.0,not calculated
m,176.0,0.363636,2.057359,0.0,0.0,0.0,0.0,15.0,0.0,not calculated


In [74]:
Males = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.sex=='m')].groupby('liznumber').capture.max(),
                     name='Males')
Females = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.sex=='f')].groupby('liznumber').capture.max(),
                       name='Females')

data = [Males,Females]
layout = go.Layout(
    title = 'Histogram of Maximum Number of Captures by Sex for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Unique Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
iplot(fig, filename = 'Histogram of Maximum Number of Captures by Sex for CC 2000-2017.html')
#plot(fig, filename = 'Histogram of Maximum Number of Captures by Sex for CC 2000-2017.html')

### Maximum Number of Captures based on Tail condition
 - match for species, sex, size and location

In [75]:
intactFemale = go.Histogram(x = df_firstInYear.loc[(~df_firstInYear.autotomized)&
                                                   (df_firstInYear.sex=='f')].groupby('liznumber').capture.max()
                      ,name='intact females')
intactMale = go.Histogram(x = df_firstInYear.loc[(~df_firstInYear.autotomized)&
                                                 (df_firstInYear.sex=='m')].groupby('liznumber').capture.max()
                      ,name='intact males')
autotomizedFemale = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.autotomized)&
                                                        (df_firstInYear.sex=='f')].groupby('liznumber').capture.max()
                           ,name='autotomized females')
autotomizedMale = go.Histogram(x = df_firstInYear.loc[(df_firstInYear.autotomized)&
                                                      (df_firstInYear.sex=='m')].groupby('liznumber').capture.max()
                           ,name='autotomized males')

data = [intactFemale,intactMale,autotomizedFemale,autotomizedMale]
layout = go.Layout(
    title = 'Maximum Number of Captures by Tail Condition 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Captures by Tail Condition in Crystal Creek 2000 - 2017')

## Growth

- [SVL Growth](#SVL-Growth)
- [TL Growth](#TL-Growth)
- [RTL Growth](#RTL-Growth)
- [Mass Growth](#Mass-Growth)

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

Let's take a look at the number of times that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of captures for each lizard and finally count the number of lizards that have a given number of captures.

### SVL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the body size growth rate?

In [76]:
df_firstInYear['svl_growth'] = df_firstInYear.loc[df_firstInYear.year_diff>0]\
.apply(lambda x: x.svl_diff/x.year_diff, axis=1)
df_firstInYear.loc[df_firstInYear.svl_growth.notna()]
                                                                         

Unnamed: 0,species,toes_orig,sex,date,svl,tl,rtl,autotomized,mass,location,meters,newRecap,painted,sighting,paint.mark,vial,misc,rtl_orig,toes,toe_pattern,year,tl_svl,mass_svl,initialCaptureDate,year_diff,svl_diff,liznumber,sex_count,daysSinceCapture,capture,month,svl_growth
15,j,a3-6-12-19,m,2010-07-27,46.0,63.0,0.0,False,3.5,2m ^ CC/CCC,242.0,N,yes,,yVc,50-10-cc,everted np,0.0,,9.0,2010,1.369565,0.076087,2003-03-25,7,20.0,1056,4,2681,44,Jul,2.857143
77,j,4 - 6 - 11 - 19,f,2011-06-20,55.0,75.0,0.0,False,4.0,below one falls tree 1m up,-18.0,R,yes,,g13b,06 - 11,,0.0,,1.0,2011,1.363636,0.072727,2003-03-25,8,39.0,1055,4,3009,66,Jun,4.875
97,j,4 6 14 19,f,2006-05-20,52.0,76.0,0.0,False,4.3,4m v 1 falls left side,-4.0,N,painted,,w5b,03-06,,0.0,,2.0,2006,1.461538,0.082692,2003-03-25,3,36.0,1055,4,1152,17,May,12.0
107,j,4 7 12 19,m,2006-05-20,55.0,84.0,0.0,False,5.0,sb between chute and 2 triple R,345.0,N,painted,,w25b,16-06,,0.0,,2.0,2006,1.527273,0.090909,2003-03-25,3,29.0,1056,4,1152,16,May,9.666667
169,j,,f,2009-07-23,45.0,61.0,0.0,False,2.8,2m v talus,321.0,N,,,,,"DEAD; no toes missing; T rec shed, Bss; dead i...",0.0,,,2009,1.355556,0.062222,2003-03-25,6,29.0,1055,4,2312,56,Jul,4.833333
241,j,3-6-13-20,f,2010-05-08,48.0,69.0,0.0,True,3.5,5m up ccc,,N,yes,,>c,59-10-cc,,-1.0,3-6-13-20,,2010,1.4375,0.072917,2004-07-02,6,0.0,945,1,2136,3,May,0.0
313,j,2-8-13-19,m,2009-07-24,43.0,58.0,0.0,False,2.4,opp slab,262.0,N,painted,,yGa,09-98,,0.0,2-8-13-19,,2009,1.348837,0.055814,2004-07-03,5,0.0,917,1,1847,2,Jul,0.0
374,j,3-10-14-20,m,2007-07-15,40.0,51.0,0.0,False,2.0,1m v opp slab,261.0,N,yes,,y^ab,07-77,,0.0,3-10-14-20,,2007,1.275,0.05,2005-07-20,2,5.0,345,1,725,3,Jul,2.5
876,j,5-13-17,f,2013-07-03,13.0,99.0,0.0,False,9.8,5m ^ 2 3R,307.0,R,yes,,o13c,,,0.0,5-13-17,,2013,7.615385,0.753846,2012-05-25,1,0.0,544,2,404,2,Jul,0.0
1062,j,5-15-16,f,2014-07-04,37.0,51.0,0.0,False,1.4,stacked wall,70.0,N,yes,,o.t,14-13,Trec Shed,0.0,5-15-16,,2014,1.378378,0.037838,2012-05-28,2,0.0,554,1,767,2,Jul,0.0


In [77]:
print("svl_growth values in the data set range from {} to {} for and are distributed across sex \
as displayed here:"\
      .format(df_firstInYear.svl_growth.min(), df_firstInYear.svl_growth.max()))
description(df_firstInYear.groupby('sex'),variable='svl_growth')

svl_growth values in the data set range from 0.0 to 25.0 for and are distributed across sex as displayed here:


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
f,12.0,5.259028,6.084244,0.0,0.0,3.116667,11.25,16.0,5.625,not calculated
m,9.0,6.669312,9.569092,0.0,0.0,2.5,9.666667,25.0,4.833333,not calculated


In [78]:
svl_growthbyYear = description(df_firstInYear.groupby(['year','sex']),'svl_growth').reset_index()
svl_growthbyYear

Unnamed: 0,year,sex,count,mean,std,min,25%,50%,75%,max,siqr,meanCI
0,2000,f,0.0,,,,,,,,,not calculated
1,2000,m,0.0,,,,,,,,,not calculated
2,2001,f,0.0,,,,,,,,,not calculated
3,2001,m,1.0,20.0,,20.0,20.0,20.0,20.0,20.0,0.0,not calculated
4,2002,f,4.0,10.0,6.97615,0.0,8.25,12.0,13.75,16.0,2.75,not calculated
5,2002,m,0.0,,,,,,,,,not calculated
6,2003,f,0.0,,,,,,,,,not calculated
7,2003,m,1.0,25.0,,25.0,25.0,25.0,25.0,25.0,0.0,not calculated
8,2004,f,0.0,,,,,,,,,not calculated
9,2004,m,0.0,,,,,,,,,not calculated


Let's plot these values. 

##### Figures of SVL values

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)


In [79]:
female = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='f')]['year'], 
                y = df_firstInYear.loc[(df_firstInYear.sex=='f')].svl_growth
                      ,name='Female Box',yaxis='y1')
male = go.Box(x=df_firstInYear.loc[(df_firstInYear.sex=='m')]['year'], 
              y = df_firstInYear.loc[(df_firstInYear.sex=='m')].svl_growth
                    ,name='Males Box',yaxis='y1')
female_line = go.Scatter(x=svl_growthbyYear.loc[(svl_growthbyYear.sex=='f')]['year'], 
                y = svl_growthbyYear.loc[(svl_growthbyYear.sex=='f')]['50%']
                      ,name='Female Median', line = dict(color = 'red'),yaxis='y1')
male_line = go.Scatter(x=svl_growthbyYear.loc[(svl_growthbyYear.sex=='m')]['year'], 
              y = svl_growthbyYear.loc[(svl_growthbyYear.sex=='m')]['50%']
                    ,name='Male Median', line = dict(color = 'blue')
                      ,yaxis='y1')
female_count = go.Scatter(x=svl_growthbyYear.loc[(svl_growthbyYear.sex=='f')]['year'], 
                y = svl_growthbyYear.loc[(svl_growthbyYear.sex=='f')]['count']
                      ,name='Females Count',line = dict(color = 'red',dash = 'dash'),
                          yaxis = 'y1')
male_count = go.Scatter(x=svl_growthbyYear.loc[(svl_growthbyYear.sex=='m')]['year'], 
              y = svl_growthbyYear.loc[(svl_growthbyYear.sex=='m')]['count']
                    ,name='Male Count',line = dict(color = 'blue',dash = 'dash'),
                       yaxis = 'y1')

data = [male,female,male_line,female_line]
layout = go.Layout(
    title = 'Box Plot of SVL Growth for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Median SVL Growth (mm)',
        titlefont = dict(
            size = 18)),
    yaxis2 = dict(
        title = 'Count of Lizard',
        titlefont = dict(
            size = 18),
        side = 'right'),
    boxmode= 'group')

fig = go.Figure(
        data = data,
        layout = layout)
# fig.update_yaxes(title_text="<b>secondary</b> yaxis title", secondary_y=True)
iplot(fig, filename = 'Box Plot of Median SVL Growth Sceloporus jarrovii in CC 2000-2017.html')
# plot(fig, filename = 'Box Plot of Median SVL Growth Sceloporus jarrovii in CC 2000-2017.html')

In [80]:
popVar = 'liznumberYear'
summerPred = df_reg_season.loc[(df_reg_season.source=='portal')&
                                               (df_reg_season.season.isin(['summer']))]\
                             [['popinYearplus1','TMIN 50%']].dropna()

sjWsumm = pg.linear_regression(summerPred[['TMIN 50%']],
                     summerPred['popinYearplus1'],
                     remove_na=True)

sjWsumm

NameError: name 'df_reg_season' is not defined

turn this into a function and then groupby and apply

df['svl_growth_ann']=df.groupby('liznumber').apply(lambda x: (x['svl'].max()-x['svl'].min())/(x['year_diff'].max()+1))

### TL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the tail size growth rate?

df['tl_growth_ann']=df.groupby('liznumber').apply(lambda x: (x['tl'].max()-x['tl'].min())/(x['year_diff'].max()+1))

### RTL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the regrown tail size growth rate?

df['rtl_growth_ann']=df.groupby('liznumber').apply(lambda x: (x['rtl'].max()-x['rtl'].min())/(x['year_diff'].max()+1))

### NTL Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

### Mass Growth

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

What is the body size growth rate in terms of MAss?

df['mass_growth_ann']=df.groupby('liznumber').apply(lambda x: (x['mass'].max()-x['mass'].min())/(x['year_diff'].max()+1))

tmp = df.groupby('liznumber').mass_growth_ann.nunique().reset_index()
check = tmp.loc[tmp.mass_growth_ann>1,'liznumber']
print(len(check))
check

## Correlations to Population

[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)

In [None]:
def candidate(m,dv,placement=(1,1)):
    assert(dv in m.columns)
    return m[dv].sort_values().reset_index().iloc[placement[0]:placement[1]+1,:]

from functools import reduce

def topcorr(corrdf,lowestrank,dvs):
    candidates = [candidate(corrdf,dv,(1,lowestrank)) for dv in dvs]
    merger =  reduce(lambda x, y: pd.merge(x, y, on = 'index', how = 'outer'), candidates).fillna('--')
    return merger

I need to create a df with with values to be correlated:
- count
- prop Male
- SVL
- TL
- RTL
- Mass
- TL Severity
- Sex Distr


In [None]:
dfCorrPrep = populationSize_sex.loc[populationSize_sex.sex=='f',
                                    ['year', 'sex', 'liznumber', 
                                     'liznumberYear','propFemale']]\
.merge(populationSize_aut.loc[populationSize_aut.propAutotomized.notna(),
                              ['year','liznumberYear', 'propAutotomized']], 
                         how = 'left', on = ['year','liznumberYear'])\
.merge(dfYearlyEst.loc[dfYearlyEst.autotomized,['year', 'sex', 
                                                'count_all', '50%_all']],
       on = ['year','sex'])\
.rename(columns = {'count_all':'nAutotomized', 
                   '50%_all':'medianPropTailAutotomized',
                                             'liznumber':'numberFemale'})
dfCorrPrep = dfCorrPrep[['liznumberYear', 
                         'propFemale', 'propAutotomized', 
                         'medianPropTailAutotomized']]
dfCorrPrep

In [None]:
#Dropping proportion of Females, but will put it back once I can order the y-axis
dfCorr = dfCorrPrep.corr()
dfCorr = topcorr(dfCorr,2,dfCorr.columns.tolist()) 
dfCorr = dfCorr.set_index('index')
dfCorr

In [None]:
testx = dfCorr.columns
testy = dfCorr.index
testz = dfCorr.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
#plot(test, filename = 'population correlation matrix.html')
iplot(test, filename = 'population correlation matrix.html')

## Export Files
[Back to Top](#Table-of-Contents)

[Back to Analyze Data](#Analyze-Data)