# Terrestrial Righting

### Christopher Agard, George Middendorf

# Table of Contents

- [Introduction](#Introduction)
- [Functions](#Functions)
- [Set Up Python](#Set-Up-Python)
- [Get Data](#Get-Data)
- [Clean Data](#Clean-Data)
- [Descriptive Analyses](#Descriptive-Analyses)

[RESUME](#Resume)

## Introduction

The goal of this paper is to analyze terrestrial righting speed in _Sceloporus jarrovii_ and _S. virgatus_.

### Plan
    - Get raw data from each year (Done):
        - 2007
        - 2008
        - 2010
        - 2011
        - 2012?
        - 2015
        - 2016
    - Combine and clean data (Doing)
    - Analyze data
        
[Table of Contents](#Table-of-Contents)

## Set Up Python
[Table of Contents](#Table-of-Contents)

In [73]:
import pandas as pd
import numpy as np
import scipy.stats as ss
import os, glob, logging
import plotly
import chart_studio.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)


In [2]:
pd.options.display.max_columns = 50
pd.options.display.max_columns = 100
pd.options.display.max_colwidth = 1500

## Functions
[Table of Contents](#Table-of-Contents)
- [review](#review)

### review
[Functions](#Function) [Table of Contents](#Table-Contents)

In [3]:
def review(x, pretty = True):
    """This function returns the set of unique item types and the set of unique values within a series or list"""
    uniqueTypes = {type(val) for val in x}
    if len(uniqueTypes)>1:
        typeStatus = "Check"
    else:
        typeStatus = "OK"
    uniqueVal = {val for val in x}
    if pretty:
        res = ("Unique types include the following: {}".format(uniqueTypes),
               "Unique values include:{}".format(uniqueVal),typeStatus)
    else:
        res = (uniqueTypes,uniqueVal,typeStatus)
    return res

## Get Data
[Table of Contents](#Table-of-Contents)

In [4]:
# setting locations
gandolf = {'dropboxsource':'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive/',
          'googledrivesource':'C:/Users/craga/Google Drive/TailDemography/Righting Files/'}

source = gandolf

In [5]:
dtafiles = glob.glob(source['dropboxsource']+'*dta')
excelfiles = glob.glob(source['googledrivesource']+'*xls')
csvfiles = glob.glob(source['dropboxsource']+'*csv')+glob.glob(source['googledrivesource']+'*csv')
files = dtafiles + excelfiles + csvfiles
files

['C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Combined Righting (2008-2011)(d).dta',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Combined Righting (2008-2011).dta',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Terrestrial Righting Analyses Revisited with combined data including 2012.dta',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Terrestrial Righting combined.dta',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\2007 righting stats play sheet alpha 14iii10 (good one).xls',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\2015 Data (video confirmed outcomes).xls',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\Proccessing Information (22ix2010).xls',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\Schedule, Lizard Tracking, and Data capture sheet (2016).xls',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data fro

In [6]:
df2008_12 = pd.read_csv('C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Terrestrial Righting combined2.csv')
df2008_12.groupby(['Year']).head()

Unnamed: 0,Year,Trial,Drop,Tail Condition (1=Intact; 2=50% Autotomy; 3= 25% autotomy; 4=75% autotomy; 5=Regrown),Paint mark,Lizard,Time to right post fall,Species (Sj=1; Sv=2),Sex (1=M; 0=2=F),SVL (mm),TL (mm),RTL (mm),RemTL (mm),Mass (g),RemMass (g),righting Temp,tl/svl
0,2007,1,2.0,1,wI1b,1,0.17,1,0,76,105,0,0.0,16.5,0.0,,1.381579
1,2007,1,1.0,1,wJ14b,5,0.32,1,1,91,116,0,0.0,26.5,0.0,,1.274725
2,2007,2,1.0,1,wK2b,8,0.13,1,1,79,108,0,0.0,16.0,0.0,,1.367089
3,2007,1,1.0,1,wK4b,9,0.13,1,0,79,100,0,0.0,14.5,0.0,,1.265823
4,2007,2,2.0,1,wL3b,11,0.21,1,0,83,110,0,0.0,16.5,0.0,,1.325301
12,2008,1,,1,1f,34,0.15,2,0,48,60,0,0.0,3.75,0.0,,1.25
13,2008,1,,1,2f,35,0.19,2,0,49,67,0,0.0,4.5,0.0,,1.367347
14,2008,1,1.0,1,B5,36,0.17,2,0,56,65,0,0.0,6.75,0.0,,1.160714
15,2008,1,,1,C1,37,0.11,2,0,62,63,0,0.0,4.25,0.0,,1.016129
16,2010,2,2.0,1,p5b,38,0.22,2,0,61,75,0,0.0,7.0,0.0,,1.229508


In [7]:
df2015 = pd.read_excel('C:/Users/craga/Google Drive/TailDemography/Righting Files\\2015 Data (video confirmed outcomes).xls')
df2015.head()

Unnamed: 0,Year,Trial,Treatment,Lizard,Drop Outcome,Species,sex,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),New,toes,paintmark,Tail Vial,toe vial,Flag,Site,Notes,Random number,ART1,ART2,Drop Outcome.1,Video File Name,TRT1,TRT2,Unnamed: 28,if sv,if sj
0,2015.0,1.0,,L2xv2,,0.0,0.0,,,,,,,1.0,,o..b,,,,1.0,autotomized tail,,,,,,,,,<=0.8=4,<=0.5=4
1,2015.0,1.0,,L1xv2,,1.0,0.0,79.0,68.0,42.0,,17.5,,0.0,2-5-12-18,o.b.t,,,,1.0,,,,,,,,,,,
2,2015.0,1.0,,L1xv1,,1.0,0.0,,,,,,,1.0,,o:b,,,,1.0,tail has been broken; did not keep,,,,,,,,,,
3,2015.0,1.0,,L2xv3,,0.0,1.0,57.0,75.0,0.0,,7.0,,1.0,,o+b,,,R outcrop 20m v top site left side,0.0,did not cut toes,,,,,,,,,,
4,2015.0,1.0,,L1xv3,,1.0,1.0,76.0,76.0,20.0,,14.7,,0.0,11,o+b+t,,,,1.0,,,,,,,,,,,


In [8]:
df2016 = pd.read_excel('C:/Users/craga/Google Drive/TailDemography/Righting Files\\Schedule, Lizard Tracking, and Data capture sheet (2016).xls'
                       ,sheet_name='Lizard Data')
df2016.head()

Unnamed: 0,Species,New,toes,paintmark,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),sex,Tail Vial,toe vial,Flag,Site,Notes,Treatment,Random Assignment number,ART1,ART2,Video T/S,Additional Video T/S,TRT1,TRT2,TRT3,Unnamed: 25,Unnamed: 26,Sv,Sj,prop complete = 00.97,Unnamed: 30
0,sv,recap,1-12-20,w1c,55,62,7,,5.4,,M,,,7m ^bottom site,^CC,kink in bottom on T @33,0.0,0.276719,,,,,,,,,1.0,0.0,0.0,,
1,sv,new,,w.b2c,52,72,0,32.0,4.7,0.038,M,CAT16-1,,,^CC,,2.0,0.766538,,,DSC_2649/20150710_053420,,0.34,,,,2.0,0.714286,0.0,,
2,sv,recap,1-13-18,w3c,54,53,19,,5.8,,F,,,4^ w.b2c,,,,,,,,,,,,,,,,,
3,sv,new,2-9-15,w4c,46,63,0,35.0,2.8,0.031,M,CAT16-2,C16-7,2m v opposite stacked wall v stump,^CC,,2.0,0.462795,,,DSC_2649/20150710_053420,,0.34,,,,3.0,0.0,0.1875,,
4,sv,recap,1-15-19,w5c,60,50,14,,6.2,,F,,,bottom rock wall v S-curve,,[photo]s,,,,,,,,,,,,,,,


## Clean Data
- [Cleaning 2008-2012](#Cleaning-2008-2012)
- [Cleaning 2015](#Cleaning-2015)
- [Cleaning 2016](#Cleaning-2016)

[Table of Contents](#Table-of-Contents)

### Cleaning 2008-2012

- [Cleaning Analysis Columns 2008-2012](#Cleaning-Analysis-Columns-2008-2012)


[Cleaning 2008-2012](#Cleaning-2008-2012) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

#### Cleaning Analysis Columns 2007-2012

Adjusting column names for _Year_ to match other data sources.

[Cleaning 2007-2012](#Cleaning-2007-2012) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [9]:
df2008_12 = df2008_12.rename(columns={'Tail Condition (1=Intact; 2=50% Autotomy; 3= 25% autotomy; 4=75% autotomy; 5=Regrown)':'Treatment',
                         'Time to right post fall':'TRTmin','Species (Sj=1; Sv=2)':'Species','Sex (1=M; 0=2=F)':'sex',
                         'SVL (mm)':'SVL','TL (mm)':'TL','RTL (mm)':'RTL','RemTL (mm)':'RemTL','Mass (g)':'Mass',
                          'RemMass (g)':'RemMass','Paint mark':'paintmark','Year':'year'})
df2008_12.groupby('year').head()

Unnamed: 0,year,Trial,Drop,Treatment,paintmark,Lizard,TRTmin,Species,sex,SVL,TL,RTL,RemTL,Mass,RemMass,righting Temp,tl/svl
0,2007,1,2.0,1,wI1b,1,0.17,1,0,76,105,0,0.0,16.5,0.0,,1.381579
1,2007,1,1.0,1,wJ14b,5,0.32,1,1,91,116,0,0.0,26.5,0.0,,1.274725
2,2007,2,1.0,1,wK2b,8,0.13,1,1,79,108,0,0.0,16.0,0.0,,1.367089
3,2007,1,1.0,1,wK4b,9,0.13,1,0,79,100,0,0.0,14.5,0.0,,1.265823
4,2007,2,2.0,1,wL3b,11,0.21,1,0,83,110,0,0.0,16.5,0.0,,1.325301
12,2008,1,,1,1f,34,0.15,2,0,48,60,0,0.0,3.75,0.0,,1.25
13,2008,1,,1,2f,35,0.19,2,0,49,67,0,0.0,4.5,0.0,,1.367347
14,2008,1,1.0,1,B5,36,0.17,2,0,56,65,0,0.0,6.75,0.0,,1.160714
15,2008,1,,1,C1,37,0.11,2,0,62,63,0,0.0,4.25,0.0,,1.016129
16,2010,2,2.0,1,p5b,38,0.22,2,0,61,75,0,0.0,7.0,0.0,,1.229508


### Cleaning 2015
- [Identifying Rows for Analysis 2015](#Identifying-Rows-for-Analysis-2015)
- [Creating Columns 2015](#Creating-Columns-2015)
- [Cleaning Analysis Columns 2015](#Cleaning-Analysis-Columns-2015)


In [10]:
df2015.head()

Unnamed: 0,Year,Trial,Treatment,Lizard,Drop Outcome,Species,sex,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),New,toes,paintmark,Tail Vial,toe vial,Flag,Site,Notes,Random number,ART1,ART2,Drop Outcome.1,Video File Name,TRT1,TRT2,Unnamed: 28,if sv,if sj
0,2015.0,1.0,,L2xv2,,0.0,0.0,,,,,,,1.0,,o..b,,,,1.0,autotomized tail,,,,,,,,,<=0.8=4,<=0.5=4
1,2015.0,1.0,,L1xv2,,1.0,0.0,79.0,68.0,42.0,,17.5,,0.0,2-5-12-18,o.b.t,,,,1.0,,,,,,,,,,,
2,2015.0,1.0,,L1xv1,,1.0,0.0,,,,,,,1.0,,o:b,,,,1.0,tail has been broken; did not keep,,,,,,,,,,
3,2015.0,1.0,,L2xv3,,0.0,1.0,57.0,75.0,0.0,,7.0,,1.0,,o+b,,,R outcrop 20m v top site left side,0.0,did not cut toes,,,,,,,,,,
4,2015.0,1.0,,L1xv3,,1.0,1.0,76.0,76.0,20.0,,14.7,,0.0,11,o+b+t,,,,1.0,,,,,,,,,,,


#### Identifying Rows for Analysis 2015

Checking video notes to determine if any runs need to be excluded.  We will introduce a column identifying the rows to be use in subsequent analyses, _analyze_.

[Cleaning 2015](#Cleaning-2015) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [11]:
df2015['analyze'] = True

In [12]:
df2015.loc[df2015['Unnamed: 28'].notna(),['TRT1','TRT2','Unnamed: 28']]

Unnamed: 0,TRT1,TRT2,Unnamed: 28
8,0.33,,first or second?
9,0.4,,this was an SJ not sv
23,0.23,,Do not use questionable start given position
24,0.36,,do not use contact during righting event
29,0.16,,Do not questionable start given position
34,0.43,,multiple start and stop and half-turns (exclude?)
50,0.36,,90% sure of paint mark; should be f sv with orange badge


We will label _analyze_ values for any rows which indicate that the trial was questionable as False.

In [13]:
df2015.loc[(df2015['Unnamed: 28'].notna())&(df2015['Unnamed: 28'].str.contains('|'.join(['Do not','exclude'])))
                                            ,['TRT1','TRT2','Unnamed: 28']]

Unnamed: 0,TRT1,TRT2,Unnamed: 28
23,0.23,,Do not use questionable start given position
29,0.16,,Do not questionable start given position
34,0.43,,multiple start and stop and half-turns (exclude?)


In [14]:
df2015.loc[(df2015['Unnamed: 28'].notna())&(df2015['Unnamed: 28'].str.contains('|'.join(['Do not','exclude'])))
                                            ,['analyze']]=False
#label empty TRT rows as analyze false
df2015.loc[(df2015.TRT1.isna())&(df2015.TRT2.isna()),'analyze']=False

#### Creating Columns 2015

Here we will create a new column with the fastest righting time and year.

In [15]:
df2015['year']=2015

In [16]:
df2015.loc[(df2015.analyze)&(df2015.TRT1.isna()),'TRT1'] = df2015.loc[(df2015.analyze)&(df2015.TRT1.isna()),'TRT2']
df2015['TRTmin'] = np.nan
df2015.loc[df2015.analyze,'TRTmin'] = df2015.loc[df2015.analyze,['TRT1','TRT2']].apply(min, axis = 1)
df2015.loc[df2015.analyze].TRTmin

6     0.43
7     0.26
8     0.33
9     0.40
10    0.30
11    0.36
12    0.36
13    0.33
14    0.30
15    0.40
16    0.26
17    0.26
19    0.40
20    0.36
21    0.36
22    0.20
24    0.36
25    0.30
27    0.23
28    0.46
30    0.40
31    0.16
32    0.43
35    0.33
36    0.33
37    0.20
38    0.26
39    0.40
41    0.30
50    0.36
56    0.36
69    0.23
Name: TRTmin, dtype: float64

#### Cleaning Analysis Columns 2015

Here we clean columns to be used in the actual analysis of Terrestrial Righting Times.  The following columns will be used:
Species, Sex, SVL, TL, RTL, Mass, and TRTmin.

[Cleaning 2015](#Cleaning-2015) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [17]:
df2015.loc[df2015.analyze,['Species','sex','SVL','TL','RTL','Mass','TRTmin']].apply(review,0)

Species                                                                                                                                    (Unique types include the following: {<class 'float'>}, Unique values include:{0.0, 1.0, nan}, OK)
sex                                                                                                                                        (Unique types include the following: {<class 'float'>}, Unique values include:{0.0, 1.0, nan}, OK)
SVL        (Unique types include the following: {<class 'float'>}, Unique values include:{nan, 46.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 59.0, 60.0, 61.0, 65.0, 66.0, 68.0, 69.0, 70.0, 72.0, 74.0, 77.0, 78.0, 79.0, 80.0, 84.0}, OK)
TL          (Unique types include the following: {<class 'float'>}, Unique values include:{nan, 61.0, 64.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 75.0, 76.0, 89.0, 90.0, 92.0, 93.0, 94.0, 95.0, 96.0, 106.0, 107.0, 111.0, 114.0, 118.0}, OK)
RTL                                             

It looks like we will need to take a closer look at a number of columns which have _nan_ values and **_RTL_** which has a mix of item types.  Let's take a look at [Species](#Cleaning-Species-2015) first to see if addresses the other _nan_ values in other columns.  Then we will address [RTL](#RTL-2015) if need be.

[Cleaning Analysis Columns](#Cleaning-Analysis-Columns) [Cleaning 2015](#Cleaning-2015) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

##### Species 2015
[Cleaning Analysis Columns](#Cleaning-Analysis-Columns) [Cleaning 2015](#Cleaning-2015) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [18]:
review(df2015.loc[df2015.analyze].Species)

("Unique types include the following: {<class 'float'>}",
 'Unique values include:{0.0, 1.0, nan}',
 'OK')

In [19]:
# Since there are nan values among values we will analyze, we will inspect these rows.
df2015.loc[(df2015.analyze)&(df2015.Species.isna())]

Unnamed: 0,Year,Trial,Treatment,Lizard,Drop Outcome,Species,sex,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),New,toes,paintmark,Tail Vial,toe vial,Flag,Site,Notes,Random number,ART1,ART2,Drop Outcome.1,Video File Name,TRT1,TRT2,Unnamed: 28,if sv,if sj,analyze,year,TRTmin
69,,,,,,,,,,,,,,,,,,,,,,,,,,MOV024,0.23,,,sj with white mark on T,,True,2015,0.23


For now we will exclude this row from analysis.

In [20]:
df2015.loc[(df2015.analyze)&(df2015.Species.isna()),'analyze'] = False

Let's see if this has addressed the _nan_ in other columns, so we will rerun review on all columns

In [21]:
df2015.loc[df2015.analyze,['Species','sex','SVL','TL','RTL','Mass','RemTL','Mass2 (post-autotomy)','TRTmin']].apply(review,0)

Species                                                                                                                                                                      (Unique types include the following: {<class 'float'>}, Unique values include:{0.0, 1.0}, OK)
sex                                                                                                                                                                          (Unique types include the following: {<class 'float'>}, Unique values include:{0.0, 1.0}, OK)
SVL                                          (Unique types include the following: {<class 'float'>}, Unique values include:{46.0, 49.0, 50.0, 51.0, 52.0, 53.0, 54.0, 55.0, 59.0, 60.0, 61.0, 65.0, 66.0, 68.0, 69.0, 70.0, 72.0, 74.0, 77.0, 78.0, 79.0, 80.0, 84.0}, OK)
TL                                            (Unique types include the following: {<class 'float'>}, Unique values include:{61.0, 64.0, 66.0, 67.0, 68.0, 69.0, 70.0, 71.0, 75.0, 76.0, 89.0, 90.0, 92

It looks like excluding that row addressed the issue.  Now we will still need to take a closer look at [RTL](#RTL-2015) to address the multiple item types found in that column.

[Cleaning Analysis Columns](#Cleaning-Analysis-Columns) [Cleaning 2015](#Cleaning-2015) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

##### RTL 2015
[Cleaning Analysis Columns 2015](#Cleaning-Analysis-Columns-2015) [Cleaning 2015](#Cleaning-2015) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [22]:
df2015.loc[(df2015.analyze)&(df2015.RTL=='5?')]

Unnamed: 0,Year,Trial,Treatment,Lizard,Drop Outcome,Species,sex,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),New,toes,paintmark,Tail Vial,toe vial,Flag,Site,Notes,Random number,ART1,ART2,Drop Outcome.1,Video File Name,TRT1,TRT2,Unnamed: 28,if sv,if sj,analyze,year,TRTmin
15,2015.0,1.0,3.0,L1xv10,1.0,1.0,0.0,77.0,106.0,5?,83.0,15.3,14.4,1.0,,o19b,CT-20-15; CT-21-15,,C11,1.0,unsure if RTL is 5 or 0 check TL/SVL index to be sure and treat as 0 until then; hissed w/ mouth open when attempting to noose in LAHF for mass measuring; I was using my fingers to close the loop and my fingers were close to its head when the lizard hissed,0.490889,1,--,1.0,MOV03A,0.4,,,,,True,2015,0.4


Based on the _notes_ column we will set RTL to 0.

In [23]:
df2015.loc[(df2015.analyze)&(df2015.RTL=='5?'),"RTL"]= 0

For analysis we will use the object _df2015Analysis_

In [24]:
df2015Analysis = df2015.loc[df2015.analyze,['paintmark','Species','sex','SVL','TL','RTL','RemTL','Treatment',
                                            'Mass2 (post-autotomy)','TRTmin','year']]\
.rename(columns={'Mass2 (post-autotomy)':'RemMass'})
df2015Analysis.head()

Unnamed: 0,paintmark,Species,sex,SVL,TL,RTL,RemTL,Treatment,RemMass,TRTmin,year
6,o10b,1.0,1.0,68.0,90.0,0,25.0,3.0,,0.43,2015
7,o11b,1.0,1.0,79.0,114.0,0,28.0,4.0,,0.26,2015
8,o12b.t,1.0,0.0,72.0,96.0,0,24.0,4.0,,0.33,2015
9,o13b+t,1.0,1.0,68.0,93.0,0,73.0,3.0,10.3,0.4,2015
10,o14b,0.0,1.0,53.0,61.0,0,18.0,4.0,,0.3,2015


### Cleaning 2016
- [Identifying Rows for Analysis 2016](#Identifying-Rows-for-Analysis-2016)
- [Creating Columns 2016](#Creating-Columns-2016)
- [Cleaning Analysis Columns 2016](#Cleaning-Analysis-Columns-2016)

[Clean Data](#Clean-Data);[Table of Contents](#Table-of-Contents)

#### Identifying Rows for Analysis 2016

Checking video notes to determine if any runs need to be excluded.  We will introduce a column identifying the rows to be use in subsequent analyses, _analyze_.

[Cleaning 2016](#Cleaning-2016) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [25]:
df2016.head()

Unnamed: 0,Species,New,toes,paintmark,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),sex,Tail Vial,toe vial,Flag,Site,Notes,Treatment,Random Assignment number,ART1,ART2,Video T/S,Additional Video T/S,TRT1,TRT2,TRT3,Unnamed: 25,Unnamed: 26,Sv,Sj,prop complete = 00.97,Unnamed: 30
0,sv,recap,1-12-20,w1c,55,62,7,,5.4,,M,,,7m ^bottom site,^CC,kink in bottom on T @33,0.0,0.276719,,,,,,,,,1.0,0.0,0.0,,
1,sv,new,,w.b2c,52,72,0,32.0,4.7,0.038,M,CAT16-1,,,^CC,,2.0,0.766538,,,DSC_2649/20150710_053420,,0.34,,,,2.0,0.714286,0.0,,
2,sv,recap,1-13-18,w3c,54,53,19,,5.8,,F,,,4^ w.b2c,,,,,,,,,,,,,,,,,
3,sv,new,2-9-15,w4c,46,63,0,35.0,2.8,0.031,M,CAT16-2,C16-7,2m v opposite stacked wall v stump,^CC,,2.0,0.462795,,,DSC_2649/20150710_053420,,0.34,,,,3.0,0.0,0.1875,,
4,sv,recap,1-15-19,w5c,60,50,14,,6.2,,F,,,bottom rock wall v S-curve,,[photo]s,,,,,,,,,,,,,,,


Checking video notes to determine if any runs need to be excluded.  We will introduce a column identifying the rows to be use in subsequent analyses, _analyze_.

In [26]:
df2016['analyze'] = True

In [27]:
df2016.loc[df2016['Unnamed: 25'].notna(),['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
7,0.36,,,long contact on 1st trial
9,0.59,,,need to think about how we count the start of righting events like this?
17,0.33,0.29,,pushed off of hand with tail during 1st trial
21,,,0.11,Do not use; made contact during all attempts to right
23,0.28,,,Questionable contact during start of righting event
31,,,,Do not use; made contact during all attempts to right
32,0.13,,,pushed off of hand with tail during 1st trial
33,0.38,0.54,,TRT1 was with an intact tail; TRT2 was with 25% autotomy
43,0.19,0.29,,TRT2 was recorded after a brief chase. Use TRT1


We will label _analyze_ values for any rows which indicate that the trial was questionable as False.

#### TRT1
Here we exclude first trial based on notes in the column _Unnamed: 25_

In [28]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['1st'])))
                                            ,['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
7,0.36,,,long contact on 1st trial
17,0.33,0.29,,pushed off of hand with tail during 1st trial
32,0.13,,,pushed off of hand with tail during 1st trial


In [29]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['1st'])))
                                            ,['TRT1']]=np.nan
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['Use TRT1'])))
                                            ,['TRT2','TRT3']]=np.nan

In [30]:
df2016.loc[(df2016['Unnamed: 25'].notna())
           &(df2016['Unnamed: 25']
             .str.contains('|'.join(['Do not','exclude','Questionable',
                                     'TRT1 was with an intact tail; TRT2 was with 25% autotomy',
                                     'need to think about how we count the start of righting events like this?'])))
                                            ,['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
9,0.59,,,need to think about how we count the start of righting events like this?
21,,,0.11,Do not use; made contact during all attempts to right
23,0.28,,,Questionable contact during start of righting event
31,,,,Do not use; made contact during all attempts to right
33,0.38,0.54,,TRT1 was with an intact tail; TRT2 was with 25% autotomy


In [31]:
df2016.loc[(df2016['Unnamed: 25'].notna())
           &(df2016['Unnamed: 25']
             .str.contains('|'.join(['Do not','exclude','Questionable',
                                     'TRT1 was with an intact tail; TRT2 was with 25% autotomy',
                                     'need to think about how we count the start of righting events like this?'])))
           ,['analyze']]=False
#label empty TRT rows as analyze false
df2016.loc[(df2016.TRT1.isna())&(df2016.TRT2.isna())&(df2016.TRT3.isna()),'analyze']=False

#### Remaining entries with notes
Here we exclude remaining trials based on notes in the column _Unnamed: 25_

In [32]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['analyze']),['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
17,,0.29,,pushed off of hand with tail during 1st trial
43,0.19,,,TRT2 was recorded after a brief chase. Use TRT1


We can proceed with these values.

#### Creating Columns 2016
[Cleaning 2016](#Cleaning-2016) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

Here we will create a new column with the fastest righting times and year.

In [33]:
df2016['year']=2016

In [34]:
df2016.loc[df2016.analyze,['TRT1','TRT2','TRT3']] = df2016.loc[df2016.analyze,['TRT1','TRT2','TRT3']].fillna(999)
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['analyze']),['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
17,999.0,0.29,999.0,pushed off of hand with tail during 1st trial
43,0.19,999.0,999.0,TRT2 was recorded after a brief chase. Use TRT1


In [35]:
df2016['TRTmin'] = np.nan
df2016.loc[df2016.analyze,'TRTmin'] = df2016.loc[df2016.analyze,['TRT1','TRT2','TRT3']].apply(min, axis = 1)
df2016.loc[df2016.analyze].TRTmin

1     0.34
3     0.34
6     0.51
13    0.26
14    0.46
15    0.29
17    0.29
20    0.31
22    0.16
24    0.34
25    0.34
26    0.23
27    0.29
28    0.34
29    0.29
30    0.26
34    0.34
35    0.34
36    0.34
37    0.31
38    0.34
39    0.31
40    0.71
41    0.33
42    0.41
43    0.19
44    0.26
45    0.68
Name: TRTmin, dtype: float64

#### Cleaning Analysis Columns 2016

Here we clean columns to be used in the actual analysis of Terrestrial Righting Times.  The following columns will be used:
Species, Sex, SVL, TL, RTL, Mass, and TRTmin.

[Cleaning 2016](#Cleaning-2016) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [36]:
df2016.loc[df2016.analyze,['Species','sex','SVL','TL','RTL','Mass','RemTL','Mass2 (post-autotomy)','TRTmin']].apply(review,0)

Species                                                                                                                                                                                                        (Unique types include the following: {<class 'str'>}, Unique values include:{'sj', 'sv'}, OK)
sex                                                                                                                                                                                                              (Unique types include the following: {<class 'str'>}, Unique values include:{'M', 'F'}, OK)
SVL                                                                                                                            (Unique types include the following: {<class 'int'>}, Unique values include:{44, 46, 47, 48, 50, 51, 52, 56, 60, 61, 62, 63, 67, 68, 69, 70, 71, 72, 73, 82, 83, 86, 92}, OK)
TL                                                                                               

It looks like no columns require further inspection.
[Cleaning Analysis Columns 2016](#Cleaning-Analysis-Columns-2016) [Cleaning 2016](#Cleaning-2016) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

In [37]:
df2016.head(1)

Unnamed: 0,Species,New,toes,paintmark,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),sex,Tail Vial,toe vial,Flag,Site,Notes,Treatment,Random Assignment number,ART1,ART2,Video T/S,Additional Video T/S,TRT1,TRT2,TRT3,Unnamed: 25,Unnamed: 26,Sv,Sj,prop complete = 00.97,Unnamed: 30,analyze,year,TRTmin
0,sv,recap,1-12-20,w1c,55,62,7,,5.4,,M,,,7m ^bottom site,^CC,kink in bottom on T @33,0.0,0.276719,,,,,,,,,1,0.0,0.0,,,False,2016,


For analysis we will use the object _df2016Analysis_

In [38]:
df2016Analysis = df2016.loc[df2016.analyze,['Species','sex','Treatment','paintmark','SVL','TL','RTL','RemTL','Mass2 (post-autotomy)','year','TRTmin']]\
.rename(columns={'Mass2 (post-autotomy)':'RemMass'})
df2016Analysis.head()

Unnamed: 0,Species,sex,Treatment,paintmark,SVL,TL,RTL,RemTL,RemMass,year,TRTmin
1,sv,M,2.0,w.b2c,52,72,0,32.0,0.038,2016,0.34
3,sv,M,2.0,w4c,46,63,0,35.0,0.031,2016,0.34
6,sv,F,2.0,w6c,48,64,0,32.0,0.032,2016,0.51
13,sv,M,2.0,w11c,47,61,0,32.0,0.033,2016,0.26
14,sj,M,4.0,w4b,69,93,0,26.0,0.037,2016,0.46


### Creating a Combined Dataset

In [39]:
commoncols8_12and15 = [col for col in df2008_12.columns if col in df2015Analysis.columns]
commoncols = [col for col in commoncols8_12and15 if col in df2016Analysis]
commoncols

['year',
 'Treatment',
 'paintmark',
 'TRTmin',
 'Species',
 'sex',
 'SVL',
 'TL',
 'RTL',
 'RemTL',
 'RemMass']

In [67]:
dfAnalysis = pd.concat([df2008_12[commoncols],df2015Analysis[commoncols],df2016Analysis[commoncols]],sort = False)
print("The resulting df has a shape of {} and {} rows with na values for sex or Species."\
      .format(dfAnalysis.shape,dfAnalysis.loc[(dfAnalysis.sex.isna())|(dfAnalysis.Species.isna())].shape[0]))

The resulting df has a shape of (144, 11) and 0 rows with na values for sex or Species.


In [68]:
dfAnalysis.apply(review,axis = 0)

year                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               (Unique types include the following: {<class 'int'>}, Unique values include:{2016, 2007, 2008, 2010, 2011, 2012, 2015}, OK)
Treatment                

##### Create propRemTL combined

Here we a column that reports the proportion of TL that was removed.

In [76]:
dfAnalysis['propRemTL'] = dfAnalysis.RemTL/dfAnalysis.TL

##### Creating a Consistent Treatment column
Since the coding for Treatment was inconsistent over the years, we will create a column that identifies the treatment group based on the proportion of tail that was actually removed.

In [92]:
dfAnalysis['Treatment2']= np.nan
dfAnalysis.loc[dfAnalysis.propRemTL.apply(lambda x:(float('{:.2}'.format(x))))>=.75,'Treatment2'] = '75%'
dfAnalysis.loc[(dfAnalysis.propRemTL.apply(lambda x:(float('{:.2}'.format(x))))<.75)
               &(dfAnalysis.propRemTL.apply(lambda x:(float('{:.2}'.format(x))))>=.5),
               'Treatment2'] = '50%'
dfAnalysis.loc[(dfAnalysis.propRemTL.apply(lambda x:(float('{:.2}'.format(x))))<.5)
               &(dfAnalysis.propRemTL.apply(lambda x:(float('{:.2}'.format(x))))>=.25),
               'Treatment2'] = '25%'
dfAnalysis.loc[(dfAnalysis.propRemTL==0),'Treatment2'] = 'Intact'

In [93]:
dfAnalysis.groupby('Treatment2').propRemTL.describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Treatment2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
25%,50.0,0.272273,0.029705,0.245614,0.253574,0.265625,0.28,0.444444
50%,25.0,0.524778,0.028595,0.5,0.5,0.52,0.533333,0.621212
75%,27.0,0.780004,0.022641,0.747664,0.764699,0.782609,0.7863,0.857143
Intact,37.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


##### Species and Sex combined

Here we standardize values for _sex_ and _Species_ across years.

In [69]:
dfAnalysis.Species.unique()

array([1, 2, 0.0, 'sv', 'sj'], dtype=object)

In [63]:
dfAnalysis.loc[dfAnalysis.Species==0.0].sort_values('paintmark')

Unnamed: 0,year,Treatment,paintmark,TRTmin,Species,sex,SVL,TL,RTL,RemTL,RemMass
10,2015,4.0,o14b,0.3,0,1,53.0,61.0,0,18.0,
13,2015,4.0,o17b,0.33,0,1,55.0,70.0,0,17.0,5.5
14,2015,2.0,o18b,0.3,0,0,51.0,69.0,0,35.0,4.0
16,2015,4.0,o1b,0.26,0,0,54.0,67.0,0,18.0,
17,2015,4.0,o20b,0.26,0,0,61.0,69.0,0,17.0,7.8
20,2015,4.0,o23b,0.36,0,1,55.0,76.0,0,19.0,5.5
22,2015,4.0,o25b,0.2,0,0,60.0,71.0,0,18.0,6.0
24,2015,4.0,o27b,0.36,0,0,46.0,61.0,0,18.0,3.1
27,2015,4.0,o2b,0.23,0,1,50.0,67.0,0,19.0,
30,2015,4.0,o32b,0.4,0,0,52.0,64.0,0,17.0,5.8


In [60]:
dfAnalysis.groupby('year').sex.value_counts(dropna=False)

year  sex
2007  1      17
      0      16
2008  0       4
2010  0       1
2011  0       7
      1       5
2012  1      23
      0      12
2015  0      20
      1      11
2016  M      16
      F      12
Name: sex, dtype: int64

In [70]:
demoDict = {'Species':{1:'sj', 2:'sv', 0.0:'sv', 'sv':'sv', 'sj':'sj'},'sex':{0:'F',1:'M','F':'F','M':'M'}}
dfAnalysis['sex'] = dfAnalysis['sex'].map(demoDict['sex'])
dfAnalysis['Species'] = dfAnalysis['Species'].map(demoDict['Species'])

In [71]:
dfAnalysis.apply(review,0)

year                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               (Unique types include the following: {<class 'int'>}, Unique values include:{2016, 2007, 2008, 2010, 2011, 2012, 2015}, OK)
Treatment                

## Descriptive Analyses
[Table of Contents](#Table-of-Contents)

### TRTmin 

[Descriptive Analyses](#Descriptive-Analyses) [Clean Data](#Clean-Data) [Table of Contents](#Table-of-Contents)

Let's determine if the times for TRTmin are skewed.

In [74]:
dfAnalysis.groupby(['Species','Treatment']).TRTmin.count().reset_index()

Unnamed: 0,Species,Treatment,TRTmin
0,sj,1.0,11
1,sj,2.0,10
2,sj,3.0,14
3,sj,4.0,28
4,sj,5.0,12
5,sv,1.0,13
6,sv,2.0,16
7,sv,3.0,9
8,sv,4.0,31


In [109]:
dfAnalysis.loc[(idxSj)].groupby('Treatment2').TRTmin.count()

Treatment2
25%       26
50%       10
75%       13
Intact    23
Name: TRTmin, dtype: int64

In [118]:
## need to fix the x-axis labels
idxSj = dfAnalysis.Species =='sj'
# idxIntact = dfAnalysis.Treatment2=='Intact'
# idx25 = dfAnalysis.Treatment2=='25%'
# idx50 = dfAnalysis.Treatment2=='50%'
# idx75 = dfAnalysis.Treatment2=='75%'
Sj = go.Bar(y=dfAnalysis.loc[(idxSj)].groupby('Treatment2').TRTmin.count(),name='Sj')
Sv = go.Bar(y=dfAnalysis.loc[(~idxSj)].groupby('Treatment2').TRTmin.count(),name='Sv')
# Sj25 = go.Histogram(x=dfAnalysis.loc[(idxSj)].groupby('Treatment2').TRTmin.count(),name ='25%')
# Sj50 = go.Histogram(x=dfAnalysis.loc[(idxSj)].groupby('Treatment2').TRTmin.count(),name = '50%')
# Sj75 = go.Histogram(x=dfAnalysis.loc[(idxSj)].groupby('Treatment2').TRTmin.count(),name = '75%')
data=[Sj,Sv]
layout = go.Layout(
    title = 'Histogram of Records by Treatment and Species',
    titlefont = dict(
        size = 20),
    xaxis= dict(
    ),
    yaxis = dict(
        title = 'Treatment',
        titlefont = dict(
            size = 18)))

fig = go.Figure(
        data = data,
        layout = layout)
py.iplot(fig, filename = 'Histogram of Number of Records by treatment and trial.html')



In [129]:
threshold = 0.05
res = ss.skewtest(np.array(dfAnalysis.loc[idxSj].TRTmin))
if res[1] <= threshold:
    decision = "are "
else:
    decision = "are not "
print("Times for TRTmin in S. jarrovii {}skewed. {}".format(decision,res))

Times for TRTmin in S. jarrovii are skewed. SkewtestResult(statistic=4.32352236105819, pvalue=1.5355756588039867e-05)


In [139]:
dfAnalysis['TRTminLog'] = dfAnalysis.TRTmin.apply(lambda x: np.log(x))
threshold = 0.05
res = ss.skewtest(np.array(dfAnalysis.loc[idxSj].TRTminLog))
if res[1] <= threshold:
    decision = "are "
else:
    decision = "are not "
print("Times for TRTminLog in S. jarrovii {}skewed. {}".format(decision,res))

Times for TRTminLog in S. jarrovii are not skewed. SkewtestResult(statistic=0.521271271524251, pvalue=0.6021778109212036)


We will use the log transformed valies for S jarrovii analysis of TRTmin.

In [140]:
threshold = 0.05
res = ss.skewtest(np.array(dfAnalysis.loc[~idxSj].TRTmin))
if res[1] <= threshold:
    decision = "are "
else:
    decision = "are not "
print("Times for TRTmin in S. virgatus {}skewed. {}".format(decision,res))

Times for TRTmin in S. virgatus are not skewed. SkewtestResult(statistic=0.4248916686480463, pvalue=0.6709156483522115)


In [144]:
dfAnalysis[['sex','Treatment2','Species',]].apply(pd.Series.value_counts)

Unnamed: 0,sex,Treatment2,Species
25%,,50.0,
50%,,25.0,
75%,,27.0,
F,72.0,,
Intact,,37.0,
M,72.0,,
sj,,,75.0
sv,,,69.0


# Resume
[Table of Contents](#Table-of-Contents)

## Export

In [145]:
dfAnalysis.to_csv('TRT 2008-2016 analysis.csv')