This notebook contains code and output of descriptive analyses for the 2000-2017 CC dataset after cleaning

<a id='TOC'></a>

# Table of Contents
1. [Getting Data](#GetData)
2. [Analyzing Data](#Analyze)


In [1]:
import pandas as pd
import numpy as np
import os,glob

import plotly
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go

plotly.tools.set_config_file(world_readable=True)

# increase print limit
pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

Run the following chunk if running from local folder

In [2]:
# Source Data
sourceDataPers = 'C:/Users/Christopher/Google Drive/TailDemography/outputFiles'
sourceDataBig = 'S:/Chris/TailDemography/data'

#Output Data paths
outputPers = 'C:/Users/Christopher/Google Drive/TailDemography/outputFiles'
outputBig = 'S:/Chris/TailDemography/data'

In [3]:
os.chdir(sourceDataBig)
mysourcefile = glob.glob('cleaned CC data 2000-2017*')
mysourcefile

['cleaned CC data 2000-2017_2018-09-07 22_47_08.525360.csv',
 'cleaned CC data 2000-2017_2018-09-08 16_41_11.234222.csv',
 'cleaned CC data 2000-2017_2018-09-08 23_41_22.661706.csv',
 'cleaned CC data 2000-2017_2018-09-11 21_19_30.768558.csv',
 'cleaned CC data 2000-2017_2018-11-03 15_57_01.760674.csv',
 'cleaned CC data 2000-2017_2018-11-03 16_03_29.602954.csv',
 'cleaned CC data 2000-2017_2018-12-04 19_39_29.623717.csv']

<a id='GetData'></a>

# Get Data
[Top](#TOC)

In [4]:
df=pd.read_csv(mysourcefile[-1])
df.head()

Unnamed: 0,species,toes_orig,date,sex,svl,tl,rtl_orig,mass,paint.mark,location,meters,newRecap,painted,misc,vial,year,rtl,autotomized,new.recap_orig,sighting,review_sex,review_species,review_painted,review_new.recap,review_rtl,toes,toe_pattern,tl_svl,mass_svl,year_diff,svl_diff,initialCaptureDate,liznumber,daysSinceCapture,capture
0,j,1-13-19,2000-03-17,f,52.0,74.0,0.0,4.2,r1c,1falls,,new,,,,2000,0.0,intact,new,,True,False,False,False,False,1-13-19,,1.423077,0.080769,0,0.0,2000-03-17,37,0,1
1,j,1-13-20,2000-03-17,m,56.0,77.0,0.0,5.6,r2c,1falls,,new,,,,2000,0.0,intact,new,,True,False,False,False,False,1-13-20,,1.375,0.1,0,0.0,2000-03-17,512,0,1
2,j,1-14-19,2000-03-17,f,57.0,81.0,0.0,6.6,r3c,wall on rt side v wall at pine xing,,new,,,,2000,0.0,intact,new,,True,False,False,False,False,1-14-19,,1.421053,0.115789,0,0.0,2000-03-17,44,0,1
3,j,1-14-20,2000-03-17,f,57.0,79.0,0.0,5.5,r4c,wall on rt side v wall at pine xing,,new,,,,2000,0.0,intact,new,,True,False,False,False,False,1-14-20,,1.385965,0.096491,0,0.0,2000-03-17,45,0,1
4,j,3-8,2000-03-17,f,82.0,89.0,27.0,17.0,r5c,oak across from bottom wall at pine xing,,recap,,shed since last recapture,,2000,27.0,autotomized,recap,,True,False,False,False,False,3-8,,1.085366,0.207317,0,0.0,2000-03-17,273,0,1


Let's take a look at the number of times that lizards have been captured.  To do this, we'll look at the maximum number of captures for each lizard and then count the number of lizards with each number of captures.

In [5]:
df.groupby('liznumber').capture.max().value_counts().reset_index()\
.rename(columns={'index':'number of captures','capture':'number of lizards'})

Unnamed: 0,number of captures,number of lizards
0,1,947
1,2,272
2,3,116
3,4,45
4,5,23
5,6,9
6,7,4
7,8,3


<a id='Analyze'></a>

## Analyze the data
[Top](#TOC)

## Reducing the analyses sample by date range and capture

In [6]:
# convert date to pandas datetime
df.date=pd.to_datetime(df.date)
# limiting months to between May and August
# df = df.loc[(df.date.dt.month>=5) & (df.date.dt.month<=8)]
# limit to first captures
df_first = df.sort_values(by=['liznumber','date'])
df_first = df_first.loc[~df_first.duplicated(subset='liznumber')]

### Reducing data to species and sex of interest

In [7]:
species2keep=['j']
df_first = df_first.loc[df_first.species.isin(species2keep)]
print ("\n{} of the original data set are entries belonging to a species of interest {}"\
       .format(df_first.shape[0],species2keep))
sex2keep=['m','f']
df = df_first.loc[df_first.sex.isin(sex2keep)]
print ("\n{} of the original data set are entries belonging to a sex categories of interest {}"\
       .format(df_first.shape[0],sex2keep))


912 of the original data set are entries belonging to a species of interest ['j']

912 of the original data set are entries belonging to a sex categories of interest ['m', 'f']


## Number of lizards (*Sj*) by year and sex

In [8]:
df.year = df.year.astype(int)
df.year = df.year.astype(str)

In [9]:
df.groupby('year').sex.value_counts()

year  sex
2000  f      84
      m      68
2001  f      50
      m      46
2002  f      32
      m      30
2003  f      32
      m      21
2004  f      22
      m      17
2005  m      30
      f      22
2007  f      47
      m      31
2008  m      32
      f      27
2009  f      42
      m      41
2010  f       9
      m       7
2011  f       1
2012  f      22
      m      17
2013  m      16
      f      14
2014  f      11
      m       6
2015  m      28
      f      25
2016  m      22
      f      14
2017  f      26
      m      17
Name: sex, dtype: int64

In [10]:
df.to_csv('Cleaned Sj data.csv')

Pull out all individuals that we've recaught for Sj and writes to csv

In [11]:
multicapToes=df.loc[(df.species=='j')& (df.toes!="")& (df.toes!='NA')]\
.toes.value_counts()[df.loc[df.species=='j']\
                     .toes.value_counts()>1].index.tolist()
df.loc[df.toes.isin(multicapToes)].sort_values(by=['toes','date']).to_csv('multicaps.csv')

## Maximum Number of Captures

In [12]:
dfF = df.loc[df.sex =='f']
dfM = df.loc[df.sex =='m']

In [13]:
females = go.Histogram(x = dfF.groupby('liznumber').capture.max(),name='females')
males = go.Histogram(x = dfM.groupby('liznumber').capture.max(),name='males')

data = [males,females]
layout = go.Layout(
    title = 'Maximum Number of Captures per Individual 2000-2017',
    titlefont = dict(
        size = 20),
    xaxis = dict(
        dtick = 1,
        title = 'Maximum Number of Captures',
        titlefont = dict(
            size = 18)),
    yaxis = dict(
        title = 'Number of Lizards',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Maximum Captures per Individual in Crystal Creek 2000 - 2017')

In [16]:
df.capture.value_counts()

1    909
Name: capture, dtype: int64