# Table of Contents
1. [Things to Do](#Things-to-Do)
1. [Introduction](#Introduction)
1. [Set up Python](#Set-up-Python)
2. [Functions](#Functions)
3. [Getting Data](#Get-Data)
4. [Analyze Data](#Analyze-Data)
5. [Export Files](#Export-Files)

# Things to Do


- [Resume Here](#Resume-Here)

## Introduction

This notebook contains code and output of descriptive analyses for the 2000-2017 CC dataset after cleaning.

The objectives of this notebook are to:

The metrics we examine are: .




##  Set up Python

First we will need to set up the python environment, importing the necessary packages and setting the display options.

[Top](#Table-of-Contents)

In [21]:
import pandas as pd
import numpy as np
import os, glob, logging
from summary_functions import *
from scipy import stats
from monthlit import *
from prettyprint import *


import plotly
import chart_studio.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
# plotly.tools.set_config_file(world_readable=True)


# increase print limit
pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

### Setting File Locations

In [22]:
deviceDict = {'dataBig':{'source':'S:/Chris/TailDemography/TailDemography/weather data files'
                         ,'log':'S:/Chris/TailDemography/TailDemography/weather data files/logs'
                         ,'output':'S:/Chris/TailDemography/TailDemography/weather data files/outputFiles/'},
              'silverSurfer':{'source':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/outputFiles'
                              ,'log':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/logs'
                              ,'output':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/outputFiles'}
              ,'dataPers':{'source':'C:/Users/Christopher/Google Drive/TailDemography/weather data files'
                           ,'log': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/logs'
                           ,'output':'C:/Users/Christopher/Google Drive/TailDemography/weather data files/outputFiles'}
             ,'gandolf':{'source':'C:/Users/craga/Google Drive/TailDemography/weather data files'
                           ,'log': 'C:/Users/craga/Google Drive/TailDemography/weather data files/logs'
                           ,'output':'C:/Users/craga/Google Drive/TailDemography/weather data files/outputFiles'}}

### Choose Device

In [23]:
device = deviceDict['gandolf']
device

{'source': 'C:/Users/craga/Google Drive/TailDemography/weather data files',
 'log': 'C:/Users/craga/Google Drive/TailDemography/weather data files/logs',
 'output': 'C:/Users/craga/Google Drive/TailDemography/weather data files/outputFiles'}

# Source Data


### Logging

In [24]:
logging.basicConfig(filename=device['log']+'Desriptive Analyses.log'
                    , filemode='a',
                    format='%(funcName)s - %(levelname)s - %(message)s - %(asctime)s', level=logging.DEBUG)

## Functions

This section contains functions that were created for this notebook.

- [distribution](#distribution) #delete this we will use scipy stats describe instead
- [monthlit](#monthlit)
- [description](#description)
- [vocab_run](#vocab_run)

### distribution
[Back to Top](#TOC)

[Back to Functions](#Functions)

*distribution* takes a series or list of numeric objects, *x*, and returns descriptive stats of x including
        n, minimum, maximum, median, sIQR, mean, and stdev
    
Here are a few examples of how *distribution* works.

In [25]:
foo = [0,1,2,'r']
distribution(foo)

In [26]:
bar = [0,1,2]
distribution(bar)

Unnamed: 0,n,minimum,maximum,median,siqr,mean,stdev
0,3,0,2,1.0,0.5,1.0,1.0


[Back to Functions](#Functions)

## monthlit
[Back to Top](#TOC)

[Back to Functions](#Functions)

Here are a few examples of how _monthlit_ works.

In [27]:
dates = pd.DataFrame(data={'dates':['2018-12-9','2019-8-5', '2017/7/4',np.nan,None]})
dates.dates = pd.to_datetime(dates.dates)
dates

Unnamed: 0,dates
0,2018-12-09
1,2019-08-05
2,2017-07-04
3,NaT
4,NaT


In [28]:
np.isnan(np.nan)

True

In [29]:
monthlit(dates.dates.dt.month[0])

'Dec'

In [30]:
dates.dates.dt.month.apply(monthlit)

0    Dec
1    Aug
2    Jul
3    NaN
4    NaN
Name: dates, dtype: object

[Back to Functions](#Functions)

## description
[Back to Top](#TOC)

[Back to Functions](#Functions)

In [31]:
def description(x,variable,percentage=False):
    if percentage:
            res = x[variable].describe()
            res[['mean','std','min','25%','50%','75%','max']] = res[['mean','std','min','25%','50%','75%','max']]\
            .apply(lambda x:x*100) 
#Need to Add CI calculation to this function
#             meanCI = 'not calculated'
    else:
        res = x[variable].describe() 
    res['siqr'] = (res['75%']-res['25%'])/2
    res['meanCI'] = 'not calculated'
    return res

### vocab_run
[Back to Top](#TOC)

[Back to Functions](#Functions)

*vocab_run* takes a list, joins its the first the elements with a separator placing a different separator between
     the penultimate and final members of the list and returns the result as a string
     :param x: a list of strings to be concatenated
     :param connector_dict: a dictionary with keys describing the size of the list and values indicating the type of
     connectors separate the list elements.
    
Here are a few examples of how *vocab_run* works.

In [32]:
print("Could you bring some {} please?".format(vocab_run(['foo','bar','stuffkins'])))

Could you bring some foo, bar and stuffkins please?


In [33]:
print("You can either have {}.  You'll have to make a choice."\
      .format(vocab_run(['foo','bar','stuffkins'],connector_dict={1: None, 2: ' or ', 'run': ', '})))

You can either have foo, bar or stuffkins.  You'll have to make a choice.


[Back to Functions](#Functions)

We'll display all files in the source folder with the prefix _'cleaned CC data 2000-2017'_. The file names will be saved in a variable, _mysourcefiles_.

## Get Data
[Top](#TOC)

Here we can set the locations from which we get data and to which we export it.

In [34]:
os.chdir(device['source'])
mysourcefiles = glob.glob('*_weather*.csv')
mysourcefiles

['paradise_weatherdata.csv', 'portal_weatherdata.csv']

In [35]:
def getweatherdata(afile,sourcename):
    tmp = pd.read_csv(afile)
    tmp['source'] = sourcename
    return tmp

Get weather data

In [36]:
df = pd.concat([getweatherdata(afile,afile.split('_')[0]) for afile in mysourcefiles]).drop(columns = 'Unnamed: 0')

Get population data.

In [37]:
df_pop = pd.read_csv('C:/Users/craga/Google Drive/TailDemography/outputFiles/Descriptive/population size.csv')
df_pop.head()

Unnamed: 0,year,sex,liznumber,liznumberYear,propMale,propFemale
0,2000,f,84,153,,0.54902
1,2000,m,69,153,0.45098,
2,2001,f,72,135,,0.533333
3,2001,m,63,135,0.466667,
4,2002,f,67,119,,0.563025


## Analyze Data
[Top](#TOC)

We will first examine the range and distribution of number of variables in our data set:


In [38]:
seasons={'Dec':'winter','Jan':'winter','Feb':'winter',
         'Mar':'spring','Apr':'spring','May':'spring',
         'Jun':'summer','Jul':'summer','Aug':'summer','Sept':'fall','Oct':'fall','Nov':'fall'}

In [39]:
# This could be used to generate season-level weather data (use season dates) - Chris
# This could also be used to approximate the start of the monsoon season
# Check historical data in May and June to identify in notes when the first juvenile were spotted - (George and Chris)
## Look for correlates in the data
# Use SWRS data to identify start of monsoons (George to get SWRS data)
# what other precipitation and temperature in the NOAA data set have been used for this (George and Chris to check the lit)
df['month'] = df.month.apply(monthlit)
df['season'] = df.month.apply(lambda x: seasons[x])
df_season = pd.DataFrame(df.groupby(['source','year','season'])['PRCP','SNOW','TMAX','TMIN','TAVG'].describe())[1:-1]
df_season.columns = [' '.join(col).strip() for col in df_season.columns.values]
df_season = df_season.reset_index()

In [40]:
df_season['year-season'] = df_season.year.astype(str) + '-' + df_season.season
df_season

Unnamed: 0,source,year,season,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,year-season
0,paradise,2007,summer,4.0,3.865,1.102739,2.91,2.91,3.865,4.82,4.82,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,86.5,3.002221,83.9,83.9,86.5,89.1,89.1,4.0,59.05,0.173205,58.9,58.9,59.05,59.2,59.2,4.0,72.8,1.616581,71.4,71.4,72.8,74.2,74.2,2007-summer
1,paradise,2007,winter,2.0,2.51,0.0,2.51,2.51,2.51,2.51,2.51,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,51.3,0.0,51.3,51.3,51.3,51.3,51.3,2.0,23.8,0.0,23.8,23.8,23.8,23.8,23.8,2.0,37.6,0.0,37.6,37.6,37.6,37.6,37.6,2007-winter
2,paradise,2008,fall,6.0,0.88,0.725148,0.27,0.345,0.57,1.4925,1.8,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,72.1,6.575105,63.9,66.5,74.3,77.15,78.1,6.0,40.0,8.187307,31.0,33.175,39.7,46.9,49.3,6.0,56.033333,7.327937,47.4,49.8,57.0,62.025,63.7,2008-fall
3,paradise,2008,summer,4.0,8.44,5.773503,3.44,3.44,8.44,13.44,13.44,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,82.5,0.69282,81.9,81.9,82.5,83.1,83.1,4.0,58.45,0.750555,57.8,57.8,58.45,59.1,59.1,4.0,70.5,0.69282,69.9,69.9,70.5,71.1,71.1,2008-summer
4,paradise,2008,winter,2.0,1.2,0.0,1.2,1.2,1.2,1.2,1.2,2.0,1.3,0.0,1.3,1.3,1.3,1.3,1.3,2.0,55.3,0.0,55.3,55.3,55.3,55.3,55.3,2.0,26.9,0.0,26.9,26.9,26.9,26.9,26.9,2.0,41.1,0.0,41.1,41.1,41.1,41.1,41.1,2008-winter
5,paradise,2009,fall,6.0,1.65,0.488999,1.02,1.2475,1.93,1.9825,2.0,6.0,1.933333,2.995107,0.0,0.0,0.0,4.35,5.8,6.0,72.6,7.553013,64.7,66.425,71.6,79.025,81.5,6.0,41.266667,8.586656,31.7,34.075,41.2,48.475,50.9,6.0,56.933333,8.060438,48.2,50.25,56.4,63.75,66.2,2009-fall
6,paradise,2009,spring,6.0,0.466667,0.368492,0.0,0.155,0.62,0.74,0.78,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,73.266667,7.555837,66.4,67.475,70.7,79.7,82.7,6.0,39.666667,7.353004,33.5,34.25,36.5,45.875,49.0,6.0,56.5,7.457077,50.0,50.9,53.6,62.825,65.9,2009-spring
7,paradise,2009,summer,6.0,1.97,1.042036,0.66,1.085,2.36,2.7575,2.89,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,87.6,2.176235,84.8,85.8,88.8,89.1,89.2,6.0,57.433333,3.820297,52.6,54.2,59.0,60.275,60.7,6.0,72.533333,3.009762,68.7,70.0,73.9,74.725,75.0,2009-summer
8,paradise,2009,winter,6.0,1.106667,0.539728,0.7,0.73,0.82,1.555,1.8,6.0,5.4,3.547393,1.0,2.375,6.5,8.15,8.7,6.0,55.8,5.822027,49.1,50.875,56.2,60.625,62.1,6.0,25.433333,0.859457,24.4,24.7,25.6,26.125,26.3,6.0,40.6,3.362142,36.7,37.75,40.9,43.375,44.2,2009-winter
9,paradise,2010,fall,6.0,0.743333,0.545698,0.14,0.2875,0.73,1.2025,1.36,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,72.833333,9.237027,61.7,64.95,74.7,80.25,82.1,6.0,40.9,11.196071,27.7,31.375,42.4,50.05,52.6,6.0,56.866667,10.230282,44.7,48.15,58.5,65.175,67.4,2010-fall


In [49]:
df_annual = pd.DataFrame(df.groupby(['source','year'])['PRCP','SNOW','TMAX','TMIN','TAVG'].describe())[1:-1]
df_annual.columns = [' '.join(col).strip() for col in df_annual.columns.values]
df_annual = df_annual.reset_index().sort_values('year')
df_annual

Unnamed: 0,source,year,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max
10,portal,2000,22.0,1.713636,2.417855,0.0,0.0125,0.52,2.255,7.81,22.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,22.0,71.5,12.366237,54.6,60.75,67.7,85.075,86.5,22.0,37.536364,12.363562,23.1,26.375,35.4,50.975,55.6,22.0,54.518182,12.108936,39.5,43.05,54.3,68.15,71.0
11,portal,2001,24.0,1.295,1.136918,0.37,0.5275,0.64,1.865,3.81,24.0,0.95,2.202173,0.0,0.0,0.0,0.0,6.6,22.0,69.954545,13.110618,50.7,58.2,72.0,82.05,88.1,20.0,35.33,10.755958,20.9,26.5,34.6,43.0,54.2,20.0,51.75,11.449592,36.6,41.6,50.95,62.6,69.1
12,portal,2002,24.0,1.029167,1.373719,0.0,0.0525,0.645,1.31,5.07,24.0,0.241667,0.570977,0.0,0.0,0.0,0.0,1.8,24.0,72.183333,13.329721,50.7,61.825,73.15,82.925,91.5,24.0,38.091667,12.445181,23.3,27.525,37.75,50.35,57.8,24.0,55.133333,12.67276,37.2,44.65,55.5,66.5,72.4
13,portal,2003,24.0,0.963333,0.934483,0.05,0.34,0.5,1.725,2.57,24.0,0.091667,0.310563,0.0,0.0,0.0,0.0,1.1,24.0,73.141667,12.824261,55.0,62.05,74.1,83.85,92.2,24.0,38.225,12.228913,21.2,28.025,37.5,48.925,56.4,24.0,55.691667,12.416081,38.1,44.7,55.8,66.8,74.2
14,portal,2004,24.0,1.7575,1.011689,0.5,0.9175,1.62,2.275,3.83,22.0,0.409091,0.946628,0.0,0.0,0.0,0.0,3.0,20.0,66.55,11.356635,52.6,55.0,67.85,78.4,82.4,20.0,35.72,11.321038,22.6,26.2,34.35,45.5,53.7,18.0,49.333333,10.37531,37.7,39.4,50.6,54.6,67.4
15,portal,2005,24.0,1.884167,1.826074,0.0,0.2075,1.135,3.3525,4.89,20.0,0.27,0.831042,0.0,0.0,0.0,0.0,2.7,22.0,73.727273,12.896075,53.5,62.875,73.7,83.15,91.2,22.0,39.409091,10.783537,26.6,29.9,37.6,49.275,55.3,22.0,56.572727,11.489299,40.8,45.85,55.0,67.65,73.3
16,portal,2006,22.0,2.144545,2.788276,0.12,0.2825,0.77,4.205,8.59,22.0,0.409091,1.324102,0.0,0.0,0.0,0.0,4.5,20.0,73.21,12.571769,53.5,63.7,73.5,84.9,91.1,18.0,40.766667,12.577665,22.0,30.9,39.5,53.0,58.1,18.0,58.077778,11.59947,40.8,47.3,57.0,67.7,73.2
17,portal,2007,24.0,1.8075,1.752842,0.03,0.485,1.135,2.815,5.76,24.0,1.175,3.115773,0.0,0.0,0.0,0.25,11.1,22.0,71.681818,13.7122,47.5,61.075,76.0,82.925,88.4,22.0,39.245455,13.280875,21.8,25.675,39.7,50.7,57.5,22.0,55.472727,13.321239,34.7,43.375,57.8,68.65,72.4
18,portal,2008,22.0,1.741818,2.941962,0.0,0.005,0.61,1.8175,9.91,20.0,0.1,0.307794,0.0,0.0,0.0,0.0,1.0,20.0,74.04,9.95328,56.0,66.5,75.6,81.1,89.4,20.0,40.66,11.642138,26.3,27.7,39.7,50.7,57.1,20.0,57.35,10.455746,41.8,47.1,58.0,68.8,70.2
0,paradise,2008,12.0,3.453333,4.786191,0.27,0.57,1.5,3.44,13.44,12.0,0.216667,0.506024,0.0,0.0,0.0,0.0,1.3,12.0,72.766667,10.48543,55.3,63.9,76.2,81.9,83.1,12.0,43.966667,12.978537,26.9,31.0,44.5,57.8,59.1,12.0,58.366667,11.62969,41.1,47.4,60.35,69.9,71.1


## Population Size

Can we predict the change in population size using the prvious year's weather?
First let's make a new data set that will allow us to vizualize the potential relationship between precipitation and population size.

In [50]:
df_reg_annual = df_annual.merge(df_pop.loc[df_pop.sex=='f'].drop(columns=['propMale','sex','liznumber']),on = ['year'],how='left')
df_reg_annual.head()

Unnamed: 0,source,year,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,liznumberYear,propFemale
0,portal,2000,22.0,1.713636,2.417855,0.0,0.0125,0.52,2.255,7.81,22.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,22.0,71.5,12.366237,54.6,60.75,67.7,85.075,86.5,22.0,37.536364,12.363562,23.1,26.375,35.4,50.975,55.6,22.0,54.518182,12.108936,39.5,43.05,54.3,68.15,71.0,153,0.54902
1,portal,2001,24.0,1.295,1.136918,0.37,0.5275,0.64,1.865,3.81,24.0,0.95,2.202173,0.0,0.0,0.0,0.0,6.6,22.0,69.954545,13.110618,50.7,58.2,72.0,82.05,88.1,20.0,35.33,10.755958,20.9,26.5,34.6,43.0,54.2,20.0,51.75,11.449592,36.6,41.6,50.95,62.6,69.1,135,0.533333
2,portal,2002,24.0,1.029167,1.373719,0.0,0.0525,0.645,1.31,5.07,24.0,0.241667,0.570977,0.0,0.0,0.0,0.0,1.8,24.0,72.183333,13.329721,50.7,61.825,73.15,82.925,91.5,24.0,38.091667,12.445181,23.3,27.525,37.75,50.35,57.8,24.0,55.133333,12.67276,37.2,44.65,55.5,66.5,72.4,119,0.563025
3,portal,2003,24.0,0.963333,0.934483,0.05,0.34,0.5,1.725,2.57,24.0,0.091667,0.310563,0.0,0.0,0.0,0.0,1.1,24.0,73.141667,12.824261,55.0,62.05,74.1,83.85,92.2,24.0,38.225,12.228913,21.2,28.025,37.5,48.925,56.4,24.0,55.691667,12.416081,38.1,44.7,55.8,66.8,74.2,97,0.556701
4,portal,2004,24.0,1.7575,1.011689,0.5,0.9175,1.62,2.275,3.83,22.0,0.409091,0.946628,0.0,0.0,0.0,0.0,3.0,20.0,66.55,11.356635,52.6,55.0,67.85,78.4,82.4,20.0,35.72,11.321038,22.6,26.2,34.35,45.5,53.7,18.0,49.333333,10.37531,37.7,39.4,50.6,54.6,67.4,70,0.542857


In [53]:
df_reg_season = df_season.merge(df_pop.loc[df_pop.sex=='f'].drop(columns=['propMale','sex','liznumber']),on = ['year'],how='left')
df_reg_season.head()

Unnamed: 0,source,year,season,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,year-season,liznumberYear,propFemale
0,paradise,2007,summer,4.0,3.865,1.102739,2.91,2.91,3.865,4.82,4.82,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,86.5,3.002221,83.9,83.9,86.5,89.1,89.1,4.0,59.05,0.173205,58.9,58.9,59.05,59.2,59.2,4.0,72.8,1.616581,71.4,71.4,72.8,74.2,74.2,2007-summer,94,0.574468
1,paradise,2007,winter,2.0,2.51,0.0,2.51,2.51,2.51,2.51,2.51,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,51.3,0.0,51.3,51.3,51.3,51.3,51.3,2.0,23.8,0.0,23.8,23.8,23.8,23.8,23.8,2.0,37.6,0.0,37.6,37.6,37.6,37.6,37.6,2007-winter,94,0.574468
2,paradise,2008,fall,6.0,0.88,0.725148,0.27,0.345,0.57,1.4925,1.8,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,72.1,6.575105,63.9,66.5,74.3,77.15,78.1,6.0,40.0,8.187307,31.0,33.175,39.7,46.9,49.3,6.0,56.033333,7.327937,47.4,49.8,57.0,62.025,63.7,2008-fall,88,0.5
3,paradise,2008,summer,4.0,8.44,5.773503,3.44,3.44,8.44,13.44,13.44,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,82.5,0.69282,81.9,81.9,82.5,83.1,83.1,4.0,58.45,0.750555,57.8,57.8,58.45,59.1,59.1,4.0,70.5,0.69282,69.9,69.9,70.5,71.1,71.1,2008-summer,88,0.5
4,paradise,2008,winter,2.0,1.2,0.0,1.2,1.2,1.2,1.2,1.2,2.0,1.3,0.0,1.3,1.3,1.3,1.3,1.3,2.0,55.3,0.0,55.3,55.3,55.3,55.3,55.3,2.0,26.9,0.0,26.9,26.9,26.9,26.9,26.9,2.0,41.1,0.0,41.1,41.1,41.1,41.1,41.1,2008-winter,88,0.5


In [54]:
#Drop paradise
df_reg_annual['popinYearless1'] = df_reg_annual.groupby('source').liznumberYear.shift(-1)
df_reg_annual['popinYearless2'] = df_reg_annual.groupby('source').liznumberYear.shift(-2)
df_reg_annual['popinYearless3'] = df_reg_annual.groupby('source').liznumberYear.shift(-3)
df_reg_annual['popinYearless4'] = df_reg_annual.groupby('source').liznumberYear.shift(-4)
df_reg_annual['popinYearless5'] = df_reg_annual.groupby('source').liznumberYear.shift(-5)
df_reg_annual

Unnamed: 0,source,year,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,liznumberYear,propFemale,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,portal,2000,22.0,1.713636,2.417855,0.0,0.0125,0.52,2.255,7.81,22.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,22.0,71.5,12.366237,54.6,60.75,67.7,85.075,86.5,22.0,37.536364,12.363562,23.1,26.375,35.4,50.975,55.6,22.0,54.518182,12.108936,39.5,43.05,54.3,68.15,71.0,153,0.54902,135.0,119.0,97.0,70.0,79.0
1,portal,2001,24.0,1.295,1.136918,0.37,0.5275,0.64,1.865,3.81,24.0,0.95,2.202173,0.0,0.0,0.0,0.0,6.6,22.0,69.954545,13.110618,50.7,58.2,72.0,82.05,88.1,20.0,35.33,10.755958,20.9,26.5,34.6,43.0,54.2,20.0,51.75,11.449592,36.6,41.6,50.95,62.6,69.1,135,0.533333,119.0,97.0,70.0,79.0,66.0
2,portal,2002,24.0,1.029167,1.373719,0.0,0.0525,0.645,1.31,5.07,24.0,0.241667,0.570977,0.0,0.0,0.0,0.0,1.8,24.0,72.183333,13.329721,50.7,61.825,73.15,82.925,91.5,24.0,38.091667,12.445181,23.3,27.525,37.75,50.35,57.8,24.0,55.133333,12.67276,37.2,44.65,55.5,66.5,72.4,119,0.563025,97.0,70.0,79.0,66.0,94.0
3,portal,2003,24.0,0.963333,0.934483,0.05,0.34,0.5,1.725,2.57,24.0,0.091667,0.310563,0.0,0.0,0.0,0.0,1.1,24.0,73.141667,12.824261,55.0,62.05,74.1,83.85,92.2,24.0,38.225,12.228913,21.2,28.025,37.5,48.925,56.4,24.0,55.691667,12.416081,38.1,44.7,55.8,66.8,74.2,97,0.556701,70.0,79.0,66.0,94.0,88.0
4,portal,2004,24.0,1.7575,1.011689,0.5,0.9175,1.62,2.275,3.83,22.0,0.409091,0.946628,0.0,0.0,0.0,0.0,3.0,20.0,66.55,11.356635,52.6,55.0,67.85,78.4,82.4,20.0,35.72,11.321038,22.6,26.2,34.35,45.5,53.7,18.0,49.333333,10.37531,37.7,39.4,50.6,54.6,67.4,70,0.542857,79.0,66.0,94.0,88.0,105.0
5,portal,2005,24.0,1.884167,1.826074,0.0,0.2075,1.135,3.3525,4.89,20.0,0.27,0.831042,0.0,0.0,0.0,0.0,2.7,22.0,73.727273,12.896075,53.5,62.875,73.7,83.15,91.2,22.0,39.409091,10.783537,26.6,29.9,37.6,49.275,55.3,22.0,56.572727,11.489299,40.8,45.85,55.0,67.65,73.3,79,0.455696,66.0,94.0,88.0,105.0,54.0
6,portal,2006,22.0,2.144545,2.788276,0.12,0.2825,0.77,4.205,8.59,22.0,0.409091,1.324102,0.0,0.0,0.0,0.0,4.5,20.0,73.21,12.571769,53.5,63.7,73.5,84.9,91.1,18.0,40.766667,12.577665,22.0,30.9,39.5,53.0,58.1,18.0,58.077778,11.59947,40.8,47.3,57.0,67.7,73.2,66,0.545455,94.0,88.0,105.0,54.0,45.0
7,portal,2007,24.0,1.8075,1.752842,0.03,0.485,1.135,2.815,5.76,24.0,1.175,3.115773,0.0,0.0,0.0,0.25,11.1,22.0,71.681818,13.7122,47.5,61.075,76.0,82.925,88.4,22.0,39.245455,13.280875,21.8,25.675,39.7,50.7,57.5,22.0,55.472727,13.321239,34.7,43.375,57.8,68.65,72.4,94,0.574468,88.0,105.0,54.0,45.0,51.0
8,portal,2008,22.0,1.741818,2.941962,0.0,0.005,0.61,1.8175,9.91,20.0,0.1,0.307794,0.0,0.0,0.0,0.0,1.0,20.0,74.04,9.95328,56.0,66.5,75.6,81.1,89.4,20.0,40.66,11.642138,26.3,27.7,39.7,50.7,57.1,20.0,57.35,10.455746,41.8,47.1,58.0,68.8,70.2,88,0.5,105.0,54.0,45.0,51.0,55.0
9,paradise,2008,12.0,3.453333,4.786191,0.27,0.57,1.5,3.44,13.44,12.0,0.216667,0.506024,0.0,0.0,0.0,0.0,1.3,12.0,72.766667,10.48543,55.3,63.9,76.2,81.9,83.1,12.0,43.966667,12.978537,26.9,31.0,44.5,57.8,59.1,12.0,58.366667,11.62969,41.1,47.4,60.35,69.9,71.1,88,0.5,105.0,54.0,45.0,51.0,55.0


In [55]:
#Drop paradise
df_reg_season['popinYearless1'] = df_reg_season.groupby('source').liznumberYear.shift(-1)
df_reg_season['popinYearless2'] = df_reg_season.groupby('source').liznumberYear.shift(-2)
df_reg_season['popinYearless3'] = df_reg_season.groupby('source').liznumberYear.shift(-3)
df_reg_season['popinYearless4'] = df_reg_season.groupby('source').liznumberYear.shift(-4)
df_reg_season['popinYearless5'] = df_reg_season.groupby('source').liznumberYear.shift(-5)
df_reg_season

Unnamed: 0,source,year,season,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,...,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,year-season,liznumberYear,propFemale,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,paradise,2007,summer,4.0,3.865,1.102739,2.91,2.91,3.865,4.82,4.82,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,86.5,3.002221,83.9,83.9,86.5,...,89.1,4.0,59.05,0.173205,58.9,58.9,59.05,59.2,59.2,4.0,72.8,1.616581,71.4,71.4,72.8,74.2,74.2,2007-summer,94,0.574468,94.0,88.0,88.0,88.0,105.0
1,paradise,2007,winter,2.0,2.51,0.0,2.51,2.51,2.51,2.51,2.51,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,51.3,0.0,51.3,51.3,51.3,...,51.3,2.0,23.8,0.0,23.8,23.8,23.8,23.8,23.8,2.0,37.6,0.0,37.6,37.6,37.6,37.6,37.6,2007-winter,94,0.574468,88.0,88.0,88.0,105.0,105.0
2,paradise,2008,fall,6.0,0.88,0.725148,0.27,0.345,0.57,1.4925,1.8,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,72.1,6.575105,63.9,66.5,74.3,...,78.1,6.0,40.0,8.187307,31.0,33.175,39.7,46.9,49.3,6.0,56.033333,7.327937,47.4,49.8,57.0,62.025,63.7,2008-fall,88,0.5,88.0,88.0,105.0,105.0,105.0
3,paradise,2008,summer,4.0,8.44,5.773503,3.44,3.44,8.44,13.44,13.44,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,82.5,0.69282,81.9,81.9,82.5,...,83.1,4.0,58.45,0.750555,57.8,57.8,58.45,59.1,59.1,4.0,70.5,0.69282,69.9,69.9,70.5,71.1,71.1,2008-summer,88,0.5,88.0,105.0,105.0,105.0,105.0
4,paradise,2008,winter,2.0,1.2,0.0,1.2,1.2,1.2,1.2,1.2,2.0,1.3,0.0,1.3,1.3,1.3,1.3,1.3,2.0,55.3,0.0,55.3,55.3,55.3,...,55.3,2.0,26.9,0.0,26.9,26.9,26.9,26.9,26.9,2.0,41.1,0.0,41.1,41.1,41.1,41.1,41.1,2008-winter,88,0.5,105.0,105.0,105.0,105.0,54.0
5,paradise,2009,fall,6.0,1.65,0.488999,1.02,1.2475,1.93,1.9825,2.0,6.0,1.933333,2.995107,0.0,0.0,0.0,4.35,5.8,6.0,72.6,7.553013,64.7,66.425,71.6,...,81.5,6.0,41.266667,8.586656,31.7,34.075,41.2,48.475,50.9,6.0,56.933333,8.060438,48.2,50.25,56.4,63.75,66.2,2009-fall,105,0.514286,105.0,105.0,105.0,54.0,54.0
6,paradise,2009,spring,6.0,0.466667,0.368492,0.0,0.155,0.62,0.74,0.78,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,73.266667,7.555837,66.4,67.475,70.7,...,82.7,6.0,39.666667,7.353004,33.5,34.25,36.5,45.875,49.0,6.0,56.5,7.457077,50.0,50.9,53.6,62.825,65.9,2009-spring,105,0.514286,105.0,105.0,54.0,54.0,54.0
7,paradise,2009,summer,6.0,1.97,1.042036,0.66,1.085,2.36,2.7575,2.89,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,87.6,2.176235,84.8,85.8,88.8,...,89.2,6.0,57.433333,3.820297,52.6,54.2,59.0,60.275,60.7,6.0,72.533333,3.009762,68.7,70.0,73.9,74.725,75.0,2009-summer,105,0.514286,105.0,54.0,54.0,54.0,54.0
8,paradise,2009,winter,6.0,1.106667,0.539728,0.7,0.73,0.82,1.555,1.8,6.0,5.4,3.547393,1.0,2.375,6.5,8.15,8.7,6.0,55.8,5.822027,49.1,50.875,56.2,...,62.1,6.0,25.433333,0.859457,24.4,24.7,25.6,26.125,26.3,6.0,40.6,3.362142,36.7,37.75,40.9,43.375,44.2,2009-winter,105,0.514286,54.0,54.0,54.0,54.0,45.0
9,paradise,2010,fall,6.0,0.743333,0.545698,0.14,0.2875,0.73,1.2025,1.36,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0,72.833333,9.237027,61.7,64.95,74.7,...,82.1,6.0,40.9,11.196071,27.7,31.375,42.4,50.05,52.6,6.0,56.866667,10.230282,44.7,48.15,58.5,65.175,67.4,2010-fall,54,0.5,54.0,54.0,54.0,45.0,45.0


In [57]:
#Dropping proportion of Females, but will put it back once I can order the y-axis
corrPortal_annual = df_reg_annual.loc[(df_reg_annual.source=='portal')]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_annual.columns
testy = corrPortal_annual.index
testz = corrPortal_annual.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal annual correlation matrix.html')
iplot(test, filename = 'portal annual correlation matrix.html')

## Ses

spring

In [62]:
corrPortal_season_spring = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['spring']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_season_spring.columns
testy = corrPortal_season_spring.index
testz = corrPortal_season_spring.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal spring correlation matrix.html')
iplot(test, filename = 'portal spring correlation matrix.html')

Summer

In [60]:
corrPortal_season_summer = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['summer']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_season_summer.columns
testy = corrPortal_season_summer.index
testz = corrPortal_season_summer.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal summer correlation matrix.html')
iplot(test, filename = 'portal summer correlation matrix.html')

fall

In [61]:
corrPortal_season_fall = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['fall']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_season_fall.columns
testy = corrPortal_season_fall.index
testz = corrPortal_season_fall.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal fall correlation matrix.html')
iplot(test, filename = 'portal fall correlation matrix.html')

Winter

In [59]:
corrPortal_season_winter = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['winter']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_season_winter.columns
testy = corrPortal_season_winter.index
testz = corrPortal_season_winter.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal winter correlation matrix.html')
iplot(test, filename = 'portal winter correlation matrix.html')

 - Correlations with Temperature metrics were the most strongly negative for population size in a given year, the highest being -.62 for the mean minimum temperature, 'TMIN mean'.
- None of the precipitation values seem to be very highly negatively correlated with drops in population size though min, several temperature metrics are more highly correlated with the average minimum temperature for the year being the most strongly negative.
- in most cases the correlation between temperature or precipitation and population size is strongest in the same year, but there are some notable exceptions 
    - 'PRCP 25%'{5X more negatively correlated in the following year than in previous} 
    - 'PRCP 50%' was more negatively correlated in the \_\_ year, by ~25%
    - 'SNOW 75%' was much more negatively correlated in the 3rd year by almost 10X; 
    - Max temperature metrics also showed a strogner correlation to later years (expound) 
    - Lowest minimum temperature 'TMIN' was over 3X more negatively correlated in the second subsequent year than in the year the temperature was recorded
    - Lowest average temperature, 'TAVG' also showed a negative correlation to population size which was over 1.5 times as high in the second subsequent year, but note that the strength of that correlation had dropped in the year immediately following to less than half of what it was in the year the temperature was recorded

# Resume Here

[Back to TOC](#Table-of-Contents)

Need to model this with regression.
two predictors: pop in year weather in year
dv: pop in year 2

Train model using captured juveniles to predict age class 
include weather

In [None]:
import numpy as np
import pingouin as pg
from scipy import stats

In [None]:
var = 'TMIN mean' 
r,p = stats.pearsonr(df_reg.liznumberYear,df_reg[var])
print('{}: r={}; p={}'.format(var,r,p))

In [None]:
var = 'TMIN mean' 
slope, intercept, r_value, p_value, std_err = stats.linregress(df_reg.liznumberYear,df_reg[var])
print("slope: {}    intercept: {}".format(slope, intercept))
slope: 1.944864    intercept: 0.268578
print("R-squared: {}".format(r_value**2)

In [None]:
pg.anova(data=df, dv='liznumberYear', between='group', detailed=True)
print(aov)

## Growth

## Sex Ratio