# Introduction
This notebook contains code and output of descriptive analyses for the 2000-2017 CC dataset after cleaning.

<a id='TOC'></a>

# Table of Contents
1. [Set up Python](#Setup)
2. [Functions](#Functions)
3. [Getting Data](#GetData)
4. [Analyzing Data](#Analyze)
5. [Export files](#ExportFiles)

## [Resume Here](#resume)

<a id='Setup'></a>
# Set up Python

[Top](#TOC)

In [1]:
import pandas as pd
import numpy as np
import os,glob
from scipy import stats

import plotly
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go

plotly.tools.set_config_file(world_readable=True)

# increase print limit
pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

<a id = 'Functions'></a>

# Functions

This section contains functions that were created for this notebook.

- [distribution](#distribution)
- [monthlit](#monthlit)

<a id = 'distribution'></a>

## distribution
[Back to Top](#TOC)

[Back to Functions](#Functions)

In [2]:
def distribution(x):
    """*distribution* takes a series or list of numeric objects, *x*, and returns descriptive stats of x"""
    if isinstance(x,list):
        x = pd.Series(x)
    try:
        n = x.count()
        minimum = x.min()
        maximum = x.max()
        median = x.median()
        siqr = stats.iqr(x)/2
        mean = x.mean()
        stdev = x.std()
        tmp_dict = {'n':[n],'minimum':[minimum], 'maximum':[maximum],
                                    'median':[median],'siqr':[siqr],'mean':[mean],
                                    'stdev':[stdev]}
        res = pd.DataFrame().from_dict(tmp_dict)
    except:
        nonnumericvalues = []
        for v in x:
            if not (isinstance(v,int)|(isinstance(v,float))):
                nonnumericvalues = nonnumericvalues + [v]
        nonnumericvalues
        if len(nonnumericvalues)>1:
            grammar = 'values in x are'
        else:
            grammar = 'value in x is'
        print("x must contain only numeric, or NoneType variables:\n x:\n{}\n the following {} non-numeric:\n{}"\
              .format(x,grammar,nonnumericvalues))
        res = None
    return res
    
    

In [3]:
foo = [0,1,2,'r']
distribution(foo)

x must contain only numeric, or NoneType variables:
 x:
0    0
1    1
2    2
3    r
dtype: object
 the following value in x is non-numeric:
['r']


In [4]:
bar = [0,1,2]
distribution(bar)

Unnamed: 0,n,minimum,maximum,median,siqr,mean,stdev
0,3,0,2,1.0,0.5,1.0,1.0


[Back to Functions](#Functions)

<a id = 'monthlit'></a>

## monthlit
[Back to Top](#TOC)

[Back to Functions](#Functions)

In [5]:
def monthlit(x):
    months = {1:'Jan',2:'Feb',3:'Mar',4:'Apr',5:'May',6:'Jun',7:'Jul',8:'Aug',9:'Sept',10:'Oct',11:'Nov',12:'Dec'}
    if (x is None)| (np.isnan(x)):
        res = x
    else:
        res = months[int(x)]
    return res

Here are a few examples of how _monthlit_ works.

In [6]:
dates = pd.DataFrame(data={'dates':['2018-12-9','2019-8-5', '2017/7/4',np.nan,None]})
dates.dates = pd.to_datetime(dates.dates)
dates

Unnamed: 0,dates
0,2018-12-09
1,2019-08-05
2,2017-07-04
3,NaT
4,NaT


In [7]:
np.isnan(np.nan)

True

In [8]:
monthlit(dates.dates.dt.month[0])

'Dec'

In [9]:
dates.dates.dt.month.apply(monthlit)

0    Dec
1    Aug
2    Jul
3    NaN
4    NaN
Name: dates, dtype: object

[Back to Functions](#Functions)

<a id='GetData'></a>

# Get Data
[Top](#TOC)

Here we can set the locations from which we get data and to which we export it.

In [10]:
# Source Data
sourceDataPers = 'C:/Users/Christopher/Google Drive/TailDemography/outputFiles'
sourceDataBig = 'S:/Chris/TailDemography/TailDemography/outputFiles'

#Output Data paths
outputPers = 'C:/Users/Christopher/Google Drive/TailDemography/outputFiles'
outputBig = 'S:/Chris/TailDemography/TailDemography/outputFiles'

We'll display all files in the source folder with the prefix _'cleaned CC data 2000-2017'_. The file names will be saved in a variable, _mysourcefiles_.

In [11]:
os.chdir(sourceDataBig)
mysourcefiles = glob.glob('cleaned CC data 2000-2017*')
mysourcefiles
latest = mysourcefiles[-1]

The most recent file is the one we will use as _df_ in our descriptive analysis.

In [12]:
df=pd.read_csv(latest)
df.head()

Unnamed: 0,species,toes_orig,sex,date,svl,tl,rtl,autotomized,mass,location,meters,newRecap,painted,sighting,paint.mark,vial,misc,rtl_orig,toes,toe_pattern,year,tl_svl,mass_svl,initialCaptureDate,year_diff,smallest_svl,svl_diff,liznumber,sex_count,daysSinceCapture,capture
0,j,3-7-11-19,m,2002-07-14,63.0,92.0,0.0,False,10.0,halfway up to site,-200,N,,,b101t,toes in vial 58-02,,0.0,3-7-11-19,,2002,1.460317,0.15873,2002-07-14,0,63.0,0.0,375,1,0,1
1,j,3-7-11-18,m,2002-07-14,66.0,92.0,0.0,False,10.8,left downstream 100m v 1 falls,-100,N,,,b102t,toes in vial 59-02,,0.0,3-7-11-18,,2002,1.393939,0.163636,2002-07-14,0,66.0,0.0,374,1,0,1
2,j,3-7-11-18,m,2005-07-20,87.0,85.0,29.0,True,21.5,R sb,87,R,painted,,w59c,,this is male missed many times already!,29.0,3-7-11-18,,2005,0.977011,0.247126,2002-07-14,3,66.0,21.0,374,1,1102,2
3,j,3-7-12-16,m,2002-07-14,68.0,103.0,0.0,False,10.3,90m v 1 falls,-90,N,,,b103t,toes in vial 60-02,,0.0,3-7-12-16,,2002,1.514706,0.151471,2002-07-14,0,68.0,0.0,376,1,0,1
4,j,10,m,2002-07-14,85.0,118.0,0.0,False,19.5,sb - trail intersection v 1 falls,-20,R,toe loss may be natural,,b104t,,,0.0,10,,2002,1.388235,0.229412,2002-07-14,0,85.0,0.0,825,2,0,1


<a id ='Analyze'></a>

# Analyze Data
[Top](#TOC)

We will first examine the range and distribution of number of variables in our data set:

- [Species](#Species)
- [Sex](#Sex)
- [Autotomy Status](#Autotomy)
- [Morphometrics](#Morphometrics)
    - [SVL](#SVL)
    - [TL](#TL)
    - [RTL](#RTL)
    - [Mass](#SVL)
- [Date](#Date)
- [Captures](#Captures)
- [Growth](#Growth)
    - [SVL Growth](#SVLgrowth)

<a id = 'Species'></a>

# Species

We will begin by examining the range and distribution of _species_ values.

[Back to Top](#TOC)

[Back to Analyze Data](#Analyze)

In [13]:
df.species.value_counts(dropna=False)

j        1790
v         833
other     233
sj?         1
Name: species, dtype: int64

These values should only be _'j', 'v', and 'other'_ any other values will be removed in the cleaning notebook and excluded from further analyses here.

In [14]:
df = df.loc[df.species.isin(['j','v'])]
df.species.value_counts(dropna=False)

j    1790
v     833
Name: species, dtype: int64

<a id = 'Sex'></a>

# Sex

We will begin by examining the range and distribution of _sex_ values.

[Back to Top](#TOC)

[Back to Analyze Data](#Analyze)

In [15]:
df.sex.value_counts(dropna=False)

f       1345
m       1250
NaN       23
n          2
f`         1
[m]        1
male       1
Name: sex, dtype: int64

These values should only be _'m' and 'f'_.  Any other values will be removed in the cleaning notebook and excluded from further analyses here.

In [16]:
df = df.loc[df.sex.isin(['m','f'])]
df.sex.value_counts(dropna=False)

f    1345
m    1250
Name: sex, dtype: int64

<a id = 'Autotomy'></a>

## Autotomy Status

Here we look at the proportion of individuals in our data who have experienced autotomy.

[Back to Analysis](#Analyze)

[Back to Top](#TOC)

In [17]:
#need to get rid of duplicate lizard numbers in each year (in case of a change in autotomy status)
jarrovii_aut = go.Bar(y = df.loc[(df.species=='j')&(df.autotomized==False)]\
                            .groupby(['year']).liznumber.nunique()
                     ,name = 'S. jarrovii (Intact)',x=df.year.unique())
jarrovii_int = go.Histogram(y = df.loc[(df.species=='j')&(df.autotomized==False)]\
                            .groupby(['year']).liznumber.nunique()
                     ,name = 'S. jarrovii (Intact)',x=df.year.unique())
virgatus_aut = go.Histogram(y = df.loc[(df.species=='v')&(df.autotomized==True)]\
                            .groupby(['year']).liznumber.nunique()
                     ,name = 'S. virgatus (Autotomized)',x=df.year.unique())
virgatus_int = go.Histogram(y = df.loc[(df.species=='v')&(df.autotomized==False)]\
                            .groupby(['year']).liznumber.nunique()
                     ,name = 'S. virgatus (Intact)',x=df.year.unique())
data = [jarrovii_aut, jarrovii_int, virgatus_aut, virgatus_int]
layout = go.Layout(
    title = 'Distribution of Autotomy for Sceloporus jarrovii and S. virgatus 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Year',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18))
)
fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Distribution of Autotomy for Sceloporus jarrovii and S. virgatus 2000-2017')

Let's look at how this distribution changes throughout the years.

<a id = 'resume'></a>

[Back to Top](#TOC)

 <a id = 'Morphometrics'></a>

# Morphometrics

In this section we describe the distributions of various morphometrics.

- [SVL](#SVL)
- [TL](#TL)
- [RTL](#RTL)
- [Mass](#SVL)

[Back to Analyze Data](#Analyze)

[Back to Top](#TOC)

 <a id = 'SVL'></a>

### SVL

Now we examine the range and distribution of svl values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of SVL](#SVLhist)

In [18]:
print("svl values in the data set range from {} to {} for and are distributed across species and sex \
as displayed here:"\
      .format(df.svl.min(), df.svl.max()))
df.groupby(['species','sex']).svl.apply(distribution)

svl values in the data set range from 13.0 to 98.0 for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n,minimum,maximum,median,siqr,mean,stdev
species,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
j,f,0,971,13.0,87.0,65.0,9.0,61.993821,13.828328
j,m,0,797,24.0,98.0,73.0,12.5,68.61857,17.68776
v,f,0,374,31.0,66.0,56.0,3.5,55.002674,5.916985
v,m,0,453,28.0,79.0,51.0,3.0,50.441501,4.965745


Let's plot these values. 

<a id = 'SVLhist'></a>

## Histogram of SVL values

[back to Top](#TOC)

[back to Analyze](#Analyze)


In [19]:
femaleSj = go.Histogram(x = df.loc[(df.species=='j')&(df.sex=='f')].svl,name='Sj females')
maleSj = go.Histogram(x = df.loc[(df.species=='j')&(df.sex=='m')].svl,name='Sj males')
femaleSv = go.Histogram(x = df.loc[(df.species=='v')&(df.sex=='f')].svl,name='Sv females')
maleSv = go.Histogram(x = df.loc[(df.species=='v')&(df.sex=='m')].svl,name='Sv males')

data = [maleSj,maleSv,femaleSj,femaleSv]
layout = go.Layout(
    title = 'Histogram of SVL at Capture for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'SVL at time of capture',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of SVL at Capture for CC 2000-2017')

 Outliers will be addressed in the Cleaning notebook, but will be removed for the remained of the analyses here.

In [20]:
df.loc[(df.species=='v')&(df.svl==79)] = None
print("svl values in the data set range from {} to {} for and are distributed across species and sex \
as displayed here:"\
      .format(df.svl.min(), df.svl.max()))
df.groupby(['species','sex']).svl.apply(distribution)

svl values in the data set range from 13.0 to 98.0 for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n,minimum,maximum,median,siqr,mean,stdev
species,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
j,f,0,971,13.0,87.0,65.0,9.0,61.993821,13.828328
j,m,0,797,24.0,98.0,73.0,12.5,68.61857,17.68776
v,f,0,374,31.0,66.0,56.0,3.5,55.002674,5.916985
v,m,0,452,28.0,60.0,51.0,3.0,50.378319,4.785489


 <a id = 'TL'></a>

### TL

Now we examine the range and distribution of TL values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of TL](#TLhist)

In [21]:
print("tl values among intact lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df.tl.min(), df.tl.max()))
df.loc[df.autotomized==False].groupby(['species','sex']).tl.apply(distribution)

tl values among intact lizards in the data set range from 7.0 to 132.0 for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n,minimum,maximum,median,siqr,mean,stdev
species,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
j,f,0,696,10.0,121.0,85.0,13.5,80.25431,19.27663
j,m,0,545,12.0,132.0,98.0,22.0,89.190826,26.190133
v,f,0,307,30.0,83.0,69.0,4.0,68.70684,7.573953
v,m,0,356,20.0,86.0,69.0,5.0,67.148876,8.631927


In [22]:
print("tl values among autotomized lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df.tl.min(), df.tl.max()))
df.loc[df.autotomized==True].groupby(['species','sex']).tl.apply(distribution)

tl values among autotomized lizards in the data set range from 7.0 to 132.0 for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n,minimum,maximum,median,siqr,mean,stdev
species,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
j,f,0,275,7.0,102.0,67.0,11.0,64.090909,18.89799
j,m,0,252,7.0,125.0,76.0,13.5,73.09127,24.381014
v,f,0,67,10.0,63.0,49.0,5.5,46.432836,10.709984
v,m,0,95,12.0,68.0,49.0,7.25,45.747368,13.922253


 <a id = 'RTL'></a>

### RTL

Now we examine the range and distribution of RTL values by species.

[Back to Morphometrics](#Morphometrics)


We will use the [distribution](#distribution) function to do this and then plot these values.

- [Histogram of RTL](#RTLhist)

In [23]:
print("rtl values among autotomized lizards in the data set range from {} to {} for and are \
distributed across species and sex as displayed here:"\
      .format(df.rtl.min(), df.rtl.max()))
df.loc[df.autotomized==True].groupby(['species','sex']).rtl.apply(distribution)

rtl values among autotomized lizards in the data set range from 0.0 to 60.0 for and are distributed across species and sex as displayed here:


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,n,minimum,maximum,median,siqr,mean,stdev
species,sex,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
j,f,0,275,0.0,58.0,19.0,14.25,19.829091,15.997601
j,m,0,252,0.0,60.0,21.0,15.625,20.706349,17.068443
v,f,0,67,0.0,39.0,26.0,12.5,20.776119,13.202182
v,m,0,95,0.0,40.0,17.0,10.75,18.042105,12.764575


<a id = 'Date'></a>

# Date

Next we will look at the distribution of the data points over by month in which they occur.

[Back to Top](#TOC)

[Back to Analyze Data](#Analyze)

In [24]:
pd.to_datetime(df.date).dt.month.apply(monthlit).value_counts(dropna = False)

Jul    1591
Mar     343
Jun     315
May     277
Apr      32
Aug      27
Oct       8
Dec       1
NaN       1
Name: date, dtype: int64

Observations in December and October are odd, so we will inspect these values.

In [25]:
idx_OctDecNan =pd.to_datetime(df.date).dt.month.isin([10,12,np.nan])
df.loc[idx_OctDecNan,['species','toes_orig','svl','date','svl','tl','rtl_orig','mass','location','paint.mark']]

Unnamed: 0,species,toes_orig,svl,date,svl.1,tl,rtl_orig,mass,location,paint.mark
1281,j,1-13-16,90.0,2010-10-07,90.0,120.0,0.0,23.0,3m v bottom rt curved wall in sb,y26c
1288,j,12-19,92.0,2010-10-07,92.0,100.0,17.0,21.0,juniper in R island,y27c
1298,j,1-14-18,79.0,2010-10-07,79.0,99.0,0.0,12.0,4m v wall v wall v juniper xing,y24c
1352,j,2-6-12-15,86.0,2010-10-07,86.0,102.0,28.0,17.0,6m v bottoom bowl right side 4 m up,y22c
1376,v,3-7-15-16,53.0,2010-10-07,53.0,69.0,0.0,6.0,on Lizard R,y12a
1395,j,1-6-15-17,72.0,2010-10-07,72.0,69.0,35.0,9.0,opp 2m ^ bottom rt curved wall,y23c
1396,j,1-7-11-16,65.0,2010-10-07,65.0,89.0,0.0,9.0,bottom R island,y25c.t
1397,j,1-8-11,75.0,2010-10-07,75.0,104.0,0.0,14.5,log in sb v CC/CCC,y28c
1398,j,3-10-13-(14)-16,92.0,2010-12-06,92.0,112.0,-1.0,18.0,talus^Rwall v talus left side 4m up,w1c
2520,,,,,,,,,,


Based on field notes, the month and day for these entries seems to have been reversed somehow.  In otherwords, the day is being interpreted as the month.  We will fix this below.

In [26]:
df.loc[idx_OctDecNan,'date'] = pd.to_datetime(df.loc[idx_OctDecNan].date,format='%Y-%d-%m')

Now let's look at the same distribution.

In [27]:
pd.to_datetime(df.date).dt.month.apply(monthlit).value_counts(dropna = False)

Jul    1599
Mar     343
Jun     316
May     277
Apr      32
Aug      27
NaN       1
Name: date, dtype: int64

In [28]:
pd.to_datetime(df.loc[df.year==2010].date).dt.month.unique()

array([5, 8, 6, 7], dtype=int64)

In [29]:
pd.to_datetime(df.loc[df.year==2010].date).max()

Timestamp('2010-08-19 00:00:00')

The latest date in 2010 now corresponds to the lastest capture date in the field notes for that year.

<a id = 'Captures'></a>

# Captures

[Back to Top](#TOC)

[Back to Analyze Data](#Analyze)

Let's take a look at the number of times that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of captures for each lizard and finally count the number of lizards that have a given number of captures.

In [30]:
df.groupby(['species','liznumber']).capture.max().reset_index().rename(columns={'capture':'numCaptures'})

Unnamed: 0,species,liznumber,numCaptures
0,j,1.0,2.0
1,j,2.0,4.0
2,j,3.0,3.0
3,j,4.0,8.0
4,j,5.0,3.0
5,j,6.0,2.0
6,j,7.0,2.0
7,j,8.0,5.0
8,j,9.0,3.0
9,j,10.0,3.0


In [31]:
Sj = go.Histogram(x = df.loc[(df.species=='j')].groupby('liznumber').capture.max(),name='Sj')
Sv = go.Histogram(x = df.loc[(df.species=='v')].groupby('liznumber').capture.max(),name='Sv')

data = [Sj,Sv]
layout = go.Layout(
    title = 'Histogram of Maximum Number of Captures for CC 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Unique Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Number of Captures for CC 2000-2017')

## Reducing the analyses sample by date range and capture

In [32]:
# convert date to pandas datetime
#df.date=pd.to_datetime(df.date)
# limiting months to between May and August
# df = df.loc[(df.date.dt.month>=5) & (df.date.dt.month<=8)]
# limit to first captures
df_first = df.sort_values(by=['liznumber','date'])
df_first = df_first.loc[~df_first.duplicated(subset='liznumber')]

### Reducing data to species and sex of interest

In [33]:
species2keep=['j']
df_first = df_first.loc[df_first.species.isin(species2keep)]
print ("\n{} of the original data set are entries belonging to a species of interest {}"\
       .format(df_first.shape[0],species2keep))
sex2keep=['m','f']
df = df_first.loc[df_first.sex.isin(sex2keep)]
print ("\n{} of the original data set are entries belonging to a sex categories of interest {}"\
       .format(df_first.shape[0],sex2keep))


1036 of the original data set are entries belonging to a species of interest ['j']

1036 of the original data set are entries belonging to a sex categories of interest ['m', 'f']


## Number of lizards (*Sj*) by year and sex

In [34]:
df.groupby('year').sex.value_counts()

year    sex
2000.0  f      84
        m      69
2001.0  f      51
        m      48
2002.0  f      33
        m      30
2003.0  f      39
        m      26
2004.0  f      36
        m      25
2005.0  m      35
        f      30
2007.0  f      47
        m      31
2008.0  m      32
        f      27
2009.0  f      52
        m      48
2010.0  f      20
        m      19
2011.0  f       1
2012.0  f      25
        m      18
2013.0  m      19
        f      14
2014.0  f      10
        m       6
2015.0  m      28
        f      26
2016.0  m      22
        f      14
2017.0  f      39
        m      32
Name: sex, dtype: int64

Pull out all individuals that we've recaught for Sj and writes to csv

In [35]:
multicapToes=df.loc[(df.species=='j')& (df.toes!="")& (df.toes!='NA')]\
.toes.value_counts()[df.loc[df.species=='j']\
                     .toes.value_counts()>1].index.tolist()
# df.loc[df.toes.isin(multicapToes)].sort_values(by=['toes','date']).to_csv('multicaps.csv')

## Maximum Number of Captures

In [36]:
dfF = df.loc[df.sex =='f']
dfM = df.loc[df.sex =='m']

females = go.Histogram(x = dfF.groupby('liznumber').capture.max(),name='females')
males = go.Histogram(x = dfM.groupby('liznumber').capture.max(),name='males')

data = [males,females]
layout = go.Layout(
    title = 'Maximum Number of Captures per Individual 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Captures per Individual in Crystal Creek 2000 - 2017')

In [37]:
df.capture.value_counts()

1.0    1032
2.0       4
Name: capture, dtype: int64

<a id = 'Growth'></a>

# Growth

[Back to Top](#TOC)

[Back to Analyze Data](#Analyze)

Let's take a look at the number of times that lizards have been captured.  To do this, we will group lizards by lizard number and then look at the maximum number of captures for each lizard and finally count the number of lizards that have a given number of captures.

 <a id = 'SVLgrowth'></a>

## SVL Growth

[Back to Top](#TOC)

[Back to Analyze Data](#Analyze)

What is the body size growth rate?

In [38]:
df.capture.value_counts()

1.0    1032
2.0       4
Name: capture, dtype: int64