# Terrestrial Righting

### Christopher Agard, George Middendorf

# Table of Contents

- [Introduction](#Introduction)
- [Set Up Python](#Set-Up-Python)
- [Get Data](#Get-Data)
- [Clean Data](#Clean-Data)
- [Analyses](#Analyses)

[RESUME](#Resume)

## Introduction

The goal of this paper is to analyze terrestrial righting speed in _Sceloporus jarrovii_ and _S. virgatus_.

### Plan
    - Get raw data from each year (Done):
        - 2007
        - 2008
        - 2010
        - 2011
        - 2012?
        - 2015
        - 2016
    - Combine and clean data (Doing)
    - Analyze data
        
[Table of Contents](#Table-of-Contents)

## Set Up Python
[Table of Contents](#Table-of-Contents)

In [2]:
import pandas as pd
import numpy as np
import scipy.stats as ss
import os, glob

In [3]:
pd.options.display.max_columns = 50
pd.options.display.max_columns = 100
pd.options.display.max_colwidth = 1500

## Get Data
[Table of Contents](#Table-of-Contents)

In [4]:
# setting locations
gandolf = {'dropboxsource':'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive/',
          'googledrivesource':'C:/Users/craga/Google Drive/TailDemography/Righting Files/'}

source = gandolf

In [5]:
dtafiles = glob.glob(source['dropboxsource']+'*dta')
excelfiles = glob.glob(source['googledrivesource']+'*xls')
files = dtafiles + excelfiles
files

['C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Combined Righting (2008-2011)(d).dta',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Combined Righting (2008-2011).dta',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Terrestrial Righting Analyses Revisited with combined data including 2012.dta',
 'C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive\\Terrestrial Righting combined.dta',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\2007 righting stats play sheet alpha 14iii10 (good one).xls',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\2015 Data (video confirmed outcomes).xls',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\Proccessing Information (22ix2010).xls',
 'C:/Users/craga/Google Drive/TailDemography/Righting Files\\Schedule, Lizard Tracking, and Data capture sheet (2016).xls']

In [6]:
# df07_12 = pd.read_stata('C:/Users/craga/Dropbox/Papers/Righting Paper/Righting data from drive//Terrestrial Righting combined.dta')
df2015 = pd.read_excel(files[-3])

In [7]:
df2016 = pd.read_excel(files[-1],sheet_name='Lizard Data')

## Clean Data
- [2015](#2015)
- [2016](#2016)

[Table of Contents](#Table-of-Contents)

### 2015
First we will clean the 2015 data.

In [8]:
df2015.head()

Unnamed: 0,Year,Trial,Treatment,Lizard,Drop Outcome,Species,sex,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),New,toes,paintmark,Tail Vial,toe vial,Flag,Site,Notes,Random number,ART1,ART2,Drop Outcome.1,Video File Name,TRT1,TRT2,Unnamed: 28,if sv,if sj
0,2015.0,1.0,,L2xv2,,0.0,0.0,,,,,,,1.0,,o..b,,,,1.0,autotomized tail,,,,,,,,,<=0.8=4,<=0.5=4
1,2015.0,1.0,,L1xv2,,1.0,0.0,79.0,68.0,42.0,,17.5,,0.0,2-5-12-18,o.b.t,,,,1.0,,,,,,,,,,,
2,2015.0,1.0,,L1xv1,,1.0,0.0,,,,,,,1.0,,o:b,,,,1.0,tail has been broken; did not keep,,,,,,,,,,
3,2015.0,1.0,,L2xv3,,0.0,1.0,57.0,75.0,0.0,,7.0,,1.0,,o+b,,,R outcrop 20m v top site left side,0.0,did not cut toes,,,,,,,,,,
4,2015.0,1.0,,L1xv3,,1.0,1.0,76.0,76.0,20.0,,14.7,,0.0,11,o+b+t,,,,1.0,,,,,,,,,,,


Checking video notes to determine if any runs need to be excluded.  We will introduce a column identifying the rows to be use in subsequent analyses, _analyze_.

In [9]:
df2015['analyze'] = True

In [10]:
df2015.loc[df2015['Unnamed: 28'].notna(),['TRT1','TRT2','Unnamed: 28']]

Unnamed: 0,TRT1,TRT2,Unnamed: 28
8,0.33,,first or second?
9,0.4,,this was an SJ not sv
23,0.23,,Do not use questionable start given position
24,0.36,,do not use contact during righting event
29,0.16,,Do not questionable start given position
34,0.43,,multiple start and stop and half-turns (exclude?)
50,0.36,,90% sure of paint mark; should be f sv with orange badge


We will label _analyze_ values for any rows which indicate that the trial was questionable as False.

In [11]:
df2015.loc[(df2015['Unnamed: 28'].notna())&(df2015['Unnamed: 28'].str.contains('|'.join(['Do not','exclude'])))
                                            ,['TRT1','TRT2','Unnamed: 28']]

Unnamed: 0,TRT1,TRT2,Unnamed: 28
23,0.23,,Do not use questionable start given position
29,0.16,,Do not questionable start given position
34,0.43,,multiple start and stop and half-turns (exclude?)


In [12]:
df2015.loc[(df2015['Unnamed: 28'].notna())&(df2015['Unnamed: 28'].str.contains('|'.join(['Do not','exclude'])))
                                            ,['analyze']]=False
#label empty TRT rows as analyze false
df2015.loc[(df2015.TRT1.isna())&(df2015.TRT2.isna()),'analyze']=False

Here we will create a new column with the fastest righting times

In [13]:
df2015.loc[(df2015.analyze)&(df2015.TRT1.isna()),'TRT1'] = df2015.loc[(df2015.analyze)&(df2015.TRT1.isna()),'TRT2']
df2015['TRTmin'] = np.nan
df2015.loc[df2015.analyze,'TRTmin'] = df2015.loc[df2015.analyze,['TRT1','TRT2']].apply(min, axis = 1)
df2015.loc[df2015.analyze].TRTmin

6     0.43
7     0.26
8     0.33
9     0.40
10    0.30
11    0.36
12    0.36
13    0.33
14    0.30
15    0.40
16    0.26
17    0.26
19    0.40
20    0.36
21    0.36
22    0.20
24    0.36
25    0.30
27    0.23
28    0.46
30    0.40
31    0.16
32    0.43
35    0.33
36    0.33
37    0.20
38    0.26
39    0.40
41    0.30
50    0.36
56    0.36
69    0.23
Name: TRTmin, dtype: float64

In [14]:
TRTdesc = df2015.loc[df2015.analyze,['TRTmin']].apply(lambda x:(x.describe()))
TRTdesc

Unnamed: 0,TRTmin
count,32.0
mean,0.325625
std,0.074701
min,0.16
25%,0.26
50%,0.33
75%,0.37
max,0.46


Let's determine if the times for TRTmin are skewed.

In [15]:
ss.skewtest(np.array(df2015.loc[df2015.analyze,'TRTmin']))

SkewtestResult(statistic=-0.9169770387757284, pvalue=0.3591546722640926)

2015 TRTmin is not skewed.

### 2016
First we will clean the 2016 data.

[Clean Data](#Clean-Data);[Table of Contents](#Table-of-Contents)

In [16]:
df2016.head()

Unnamed: 0,Species,New,toes,paintmark,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),sex,Tail Vial,toe vial,Flag,Site,Notes,Treatment,Random Assignment number,ART1,ART2,Video T/S,Additional Video T/S,TRT1,TRT2,TRT3,Unnamed: 25,Unnamed: 26,Sv,Sj,prop complete = 00.97,Unnamed: 30
0,sv,recap,1-12-20,w1c,55,62,7,,5.4,,M,,,7m ^bottom site,^CC,kink in bottom on T @33,0.0,0.276719,,,,,,,,,1.0,0.0,0.0,,
1,sv,new,,w.b2c,52,72,0,32.0,4.7,0.038,M,CAT16-1,,,^CC,,2.0,0.766538,,,DSC_2649/20150710_053420,,0.34,,,,2.0,0.714286,0.0,,
2,sv,recap,1-13-18,w3c,54,53,19,,5.8,,F,,,4^ w.b2c,,,,,,,,,,,,,,,,,
3,sv,new,2-9-15,w4c,46,63,0,35.0,2.8,0.031,M,CAT16-2,C16-7,2m v opposite stacked wall v stump,^CC,,2.0,0.462795,,,DSC_2649/20150710_053420,,0.34,,,,3.0,0.0,0.1875,,
4,sv,recap,1-15-19,w5c,60,50,14,,6.2,,F,,,bottom rock wall v S-curve,,[photo]s,,,,,,,,,,,,,,,


Checking video notes to determine if any runs need to be excluded.  We will introduce a column identifying the rows to be use in subsequent analyses, _analyze_.

In [17]:
df2016['analyze'] = True

In [18]:
df2016.loc[df2016['Unnamed: 25'].notna(),['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
7,0.36,,,long contact on 1st trial
9,0.59,,,need to think about how we count the start of righting events like this?
17,0.33,0.29,,pushed off of hand with tail during 1st trial
21,,,0.11,Do not use; made contact during all attempts to right
23,0.28,,,Questionable contact during start of righting event
31,,,,Do not use; made contact during all attempts to right
32,0.13,,,pushed off of hand with tail during 1st trial
33,0.38,0.54,,TRT1 was with an intact tail; TRT2 was with 25% autotomy
43,0.19,0.29,,TRT2 was recorded after a brief chase. Use TRT1


We will label _analyze_ values for any rows which indicate that the trial was questionable as False.

#### TRT1
Here we exclude first trial based on notes in the column _Unnamed: 25_

In [19]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['1st'])))
                                            ,['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
7,0.36,,,long contact on 1st trial
17,0.33,0.29,,pushed off of hand with tail during 1st trial
32,0.13,,,pushed off of hand with tail during 1st trial


In [20]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['1st'])))
                                            ,['TRT1']]=np.nan

In [21]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['Do not','exclude'])))
                                            ,['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
21,,,0.11,Do not use; made contact during all attempts to right
31,,,,Do not use; made contact during all attempts to right


In [22]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['Unnamed: 25'].str.contains('|'.join(['Do not','exclude'])))
                                            ,['analyze']]=False
#label empty TRT rows as analyze false
df2016.loc[(df2016.TRT1.isna())&(df2016.TRT2.isna())&(df2016.TRT3.isna()),'analyze']=False

#### Remaining entries with notes
Here we exclude remaining trials based on notes in the column _Unnamed: 25_

In [23]:
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['analyze']),['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
9,0.59,,,need to think about how we count the start of righting events like this?
17,,0.29,,pushed off of hand with tail during 1st trial
23,0.28,,,Questionable contact during start of righting event
33,0.38,0.54,,TRT1 was with an intact tail; TRT2 was with 25% autotomy
43,0.19,0.29,,TRT2 was recorded after a brief chase. Use TRT1


Here we will create a new column with the fastest righting times

In [24]:
df2016.loc[df2016.analyze,['TRT1','TRT2','TRT3']] = df2016.loc[df2016.analyze,['TRT1','TRT2','TRT3']].fillna(999)
df2016.loc[(df2016['Unnamed: 25'].notna())&(df2016['analyze']),['TRT1','TRT2','TRT3','Unnamed: 25']]

Unnamed: 0,TRT1,TRT2,TRT3,Unnamed: 25
9,0.59,999.0,999.0,need to think about how we count the start of righting events like this?
17,999.0,0.29,999.0,pushed off of hand with tail during 1st trial
23,0.28,999.0,999.0,Questionable contact during start of righting event
33,0.38,0.54,999.0,TRT1 was with an intact tail; TRT2 was with 25% autotomy
43,0.19,0.29,999.0,TRT2 was recorded after a brief chase. Use TRT1


In [25]:
df2016['TRTmin'] = np.nan
df2016.loc[df2016.analyze,'TRTmin'] = df2016.loc[df2016.analyze,['TRT1','TRT2','TRT3']].apply(min, axis = 1)
df2016.loc[df2016.analyze].TRTmin

1     0.34
3     0.34
6     0.51
9     0.59
13    0.26
14    0.46
15    0.29
17    0.29
20    0.31
22    0.16
23    0.28
24    0.34
25    0.34
26    0.23
27    0.29
28    0.34
29    0.29
30    0.26
33    0.38
34    0.34
35    0.34
36    0.34
37    0.31
38    0.34
39    0.31
40    0.71
41    0.33
42    0.41
43    0.19
44    0.26
45    0.68
Name: TRTmin, dtype: float64

In [26]:
TRTdesc = df2016.loc[df2016.analyze,['TRTmin']].apply(lambda x:(x.describe()))
TRTdesc

Unnamed: 0,TRTmin
count,31.0
mean,0.350323
std,0.124378
min,0.16
25%,0.29
50%,0.34
75%,0.34
max,0.71


Let's determine if the times for TRTmin are skewed.

In [27]:
ss.skewtest(np.array(df2016.loc[df2016.analyze,'TRTmin']))

SkewtestResult(statistic=3.269946192428115, pvalue=0.0010756794353216244)

2016 TRTmin are skewed and will likely require a log transformation which we will apply next.

In [28]:
df2016.loc[df2016.analyze,'TRTmin_log'] = df2016.loc[df2016.analyze].TRTmin.apply(np.log)

In [29]:
ss.skewtest(np.array(df2016.loc[df2016.analyze].TRTmin_log))

SkewtestResult(statistic=1.1717692607520804, pvalue=0.24128970718767684)

The log transformation of TRTmin mitigates skew.

## Descriptive Analyses
[Table of Contents](#Table-of-Contents)

### 2015

In [30]:
df2015.head(2)

Unnamed: 0,Year,Trial,Treatment,Lizard,Drop Outcome,Species,sex,SVL,TL,RTL,RemTL,Mass,Mass2 (post-autotomy),New,toes,paintmark,Tail Vial,toe vial,Flag,Site,Notes,Random number,ART1,ART2,Drop Outcome.1,Video File Name,TRT1,TRT2,Unnamed: 28,if sv,if sj,analyze,TRTmin
0,2015.0,1.0,,L2xv2,,0.0,0.0,,,,,,,1.0,,o..b,,,,1.0,autotomized tail,,,,,,,,,<=0.8=4,<=0.5=4,False,
1,2015.0,1.0,,L1xv2,,1.0,0.0,79.0,68.0,42.0,,17.5,,0.0,2-5-12-18,o.b.t,,,,1.0,,,,,,,,,,,,False,


In [31]:
df2015.loc[df2015.analyze,['sex','Trial','Treatment','Species','New']].apply(pd.Series.value_counts)

Unnamed: 0,sex,Trial,Treatment,Species,New
0.0,20.0,,,16.0,
1.0,11.0,31.0,,15.0,31.0
2.0,,,2.0,,
3.0,,,6.0,,
4.0,,,23.0,,


# Resume
[Table of Contents](#Table-of-Contents)

In [None]:
print(df2015.RTL.apply(type).unique())
print(df2015.TL.apply(type).unique())

In [None]:
df2015.loc[(df2015.analyze)&(df2015.TRTmin.notna())].groupby('Treatment').apply(lambda x:x.RTL/x.TL).describe()

### 2016

In [None]:
df2016.head(2)

In [None]:
df2016.loc[df2016.analyze,['sex','Treatment','Species','New']].apply(pd.Series.value_counts)