# Table of Contents
1. [Things to Do](#Things-to-Do)
1. [Introduction](#Introduction)
1. [Set up Python](#Set-up-Python)
2. [Functions](#Functions)
3. [Getting Data](#Get-Data)
4. [Analyze Data](#Analyze-Data)
5. [Export Files](#Export-Files)

# Things to Do


- [Resume Here](#Resume-Here)

## Introduction

This notebook contains code and output of descriptive analyses for the 2000-2017 CC dataset after cleaning.

The objectives of this notebook are to:

- describe relationships between weather, particularly precipitation and temperature, and changes in population size 
- describe relationships between weather, particularly precipitation and temperature, and changes in population demographics

The metrics we examine are: .




##  Set up Python

First we will need to set up the python environment, importing the necessary packages and setting the display options.

[Top](#Table-of-Contents)

In [1]:
import pandas as pd
import numpy as np
import os, glob, logging
from summary_functions import *
from scipy import stats
from monthlit import *
from prettyprint import *


import plotly
import chart_studio.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)
# plotly.tools.set_config_file(world_readable=True)


# increase print limit
pd.options.display.max_rows = 99999
pd.options.display.max_columns = 50

### Setting File Locations

In [2]:
deviceDict = {'dataBig':{'source':'S:/Chris/TailDemography/TailDemography/weather data files'
                         ,'log':'S:/Chris/TailDemography/TailDemography/weather data files/logs'
                         ,'output':'S:/Chris/TailDemography/TailDemography/weather data files/outputFiles/'},
              'silverSurfer':{'source':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/outputFiles'
                              ,'log':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/logs'
                              ,'output':'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/outputFiles'}
              ,'dataPers':{'source':'C:/Users/Christopher/Google Drive/TailDemography/weather data files'
                           ,'log': 'C:\\Users\\craga_eowcrpe\\Google Drive\\TailDemography\\weather data files/logs'
                           ,'output':'C:/Users/Christopher/Google Drive/TailDemography/weather data files/outputFiles'}
             ,'gandolf':{'source':'C:/Users/craga/Google Drive/TailDemography/weather data files'
                           ,'log': 'C:/Users/craga/Google Drive/TailDemography/weather data files/logs'
                           ,'output':'C:/Users/craga/Google Drive/TailDemography/weather data files/outputFiles'}}

### Choose Device

In [3]:
device = deviceDict['gandolf']
device

{'source': 'C:/Users/craga/Google Drive/TailDemography/weather data files',
 'log': 'C:/Users/craga/Google Drive/TailDemography/weather data files/logs',
 'output': 'C:/Users/craga/Google Drive/TailDemography/weather data files/outputFiles'}

# Source Data


### Logging

In [4]:
logging.basicConfig(filename=device['log']+'Desriptive Analyses.log'
                    , filemode='a',
                    format='%(funcName)s - %(levelname)s - %(message)s - %(asctime)s', level=logging.DEBUG)

## Functions

This section contains functions that were created for this notebook.

- [distribution](#distribution) #delete this we will use scipy stats describe instead
- [monthlit](#monthlit)
- [description](#description)
- [vocab_run](#vocab_run)

### distribution
[Back to Top](#TOC)

[Back to Functions](#Functions)

*distribution* takes a series or list of numeric objects, *x*, and returns descriptive stats of x including
        n, minimum, maximum, median, sIQR, mean, and stdev
    
Here are a few examples of how *distribution* works.

In [5]:
foo = [0,1,2,'r']
distribution(foo)

In [6]:
bar = [0,1,2]
distribution(bar)

Unnamed: 0,n,minimum,maximum,median,siqr,mean,stdev
0,3,0,2,1.0,0.5,1.0,1.0


[Back to Functions](#Functions)

## monthlit
[Back to Top](#TOC)

[Back to Functions](#Functions)

Here are a few examples of how _monthlit_ works.

In [7]:
dates = pd.DataFrame(data={'dates':['2018-12-9','2019-8-5', '2017/7/4',np.nan,None]})
dates.dates = pd.to_datetime(dates.dates)
dates

Unnamed: 0,dates
0,2018-12-09
1,2019-08-05
2,2017-07-04
3,NaT
4,NaT


In [8]:
np.isnan(np.nan)

True

In [9]:
monthlit(dates.dates.dt.month[0])

'Dec'

In [10]:
dates.dates.dt.month.apply(monthlit)

0    Dec
1    Aug
2    Jul
3    NaN
4    NaN
Name: dates, dtype: object

[Back to Functions](#Functions)

## description
[Back to Top](#TOC)

[Back to Functions](#Functions)

In [11]:
def description(x,variable,percentage=False):
    if percentage:
            res = x[variable].describe()
            res[['mean','std','min','25%','50%','75%','max']] = res[['mean','std','min','25%','50%','75%','max']]\
            .apply(lambda x:x*100) 
#Need to Add CI calculation to this function
#             meanCI = 'not calculated'
    else:
        res = x[variable].describe() 
    res['siqr'] = (res['75%']-res['25%'])/2
    res['meanCI'] = 'not calculated'
    return res

### vocab_run
[Back to Top](#TOC)

[Back to Functions](#Functions)

*vocab_run* takes a list, joins its the first the elements with a separator placing a different separator between
     the penultimate and final members of the list and returns the result as a string
     :param x: a list of strings to be concatenated
     :param connector_dict: a dictionary with keys describing the size of the list and values indicating the type of
     connectors separate the list elements.
    
Here are a few examples of how *vocab_run* works.

In [12]:
print("Could you bring some {} please?".format(vocab_run(['foo','bar','stuffkins'])))

Could you bring some foo, bar and stuffkins please?


In [13]:
print("You can either have {}.  You'll have to make a choice."\
      .format(vocab_run(['foo','bar','stuffkins'],connector_dict={1: None, 2: ' or ', 'run': ', '})))

You can either have foo, bar or stuffkins.  You'll have to make a choice.


[Back to Functions](#Functions)

We'll display all files in the source folder with the prefix _'cleaned CC data 2000-2017'_. The file names will be saved in a variable, _mysourcefiles_.

## Get Data
[Top](#TOC)

Here we can set the locations from which we get data and to which we export it.

In [14]:
os.chdir(device['source'])
mysourcefiles = glob.glob('*_weather*.csv')
mysourcefiles

['paradise_weatherdata.csv', 'portal_weatherdata.csv']

In [15]:
def getweatherdata(afile,sourcename):
    tmp = pd.read_csv(afile)
    tmp['source'] = sourcename
    return tmp

Get weather data

In [16]:
df = pd.concat([getweatherdata(afile,afile.split('_')[0]) for afile in mysourcefiles]).drop(columns = 'Unnamed: 0')

Get population data.

In [17]:
df_pop = pd.read_csv('C:/Users/craga/Google Drive/TailDemography/outputFiles/Descriptive/population size.csv')
df_pop.head()

Unnamed: 0,year,sex,liznumber,liznumberYear,propMale,propFemale
0,2000,f,84,153,,0.54902
1,2000,m,69,153,0.45098,
2,2001,f,72,135,,0.533333
3,2001,m,63,135,0.466667,
4,2002,f,67,119,,0.563025


# Analyze Data
[Top](#TOC)

We will first examine the range and distribution of number of variables in our data set:


In [18]:
# Weather for a year includes weather since the last collection date of the previous calendar year 
seasons={'Dec':'winter','Jan':'winter','Feb':'winter',
         'Mar':'spring','Apr':'spring','May':'spring',
         'Jun':'summer','Jul':'summer','Aug':'summer',
         'Sept':'fall','Oct':'fall','Nov':'fall'}

Split analysis up:
- Analysis 1
    - weather in the previous 365 days relative to the first date of collection/sighting for the current calendar year
    - additional factor would be population for previous calendar year (year -1)
- Analysis 2 (Skip this for now)
    - weather in the previous 365 days relative to the first date of collection/sighting for the current calendar year
    - additonal factor would be populationi in the the current calendar year (year 0)
    - dv: population in (year 1 through year x)
- Analysis 3,4,5
    - IV
        - population in year -1
        - onset of monsoon in year 0
        - precipitation in summer
        - interaction ?
    - DV
        - population in year 1
        - age/size structure in year 1 (looking for 45mm to 65mm)
        - sex ratio in year 1

In [19]:
# This could be used to generate season-level weather data (use season dates) - Chris
# This could also be used to approximate the start of the monsoon season
# Check historical data in May and June to identify in notes when the first juvenile were spotted - (George and Chris)
## Look for correlates in the data
# Use SWRS data to identify start of monsoons (George to get SWRS data)
# what other precipitation and temperature in the NOAA data set have been used for this (George and Chris to check the lit)
df['month'] = df.month.apply(monthlit)
df['season'] = df.month.apply(lambda x: seasons[x])
df_season = pd.DataFrame(df.groupby(['source','year','season'])['PRCP','SNOW','TMAX','TMIN','TAVG'].describe())[1:-1]
df_season.columns = [' '.join(col).strip() for col in df_season.columns.values]
df_season = df_season.reset_index()

In [20]:
df_season['year-season'] = df_season.year.astype(str) + '-' + df_season.season
df_season

Unnamed: 0,source,year,season,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,year-season
0,paradise,2007,summer,2.0,3.865,1.350574,2.91,3.3875,3.865,4.3425,4.82,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,86.5,3.676955,83.9,85.2,86.5,87.8,89.1,2.0,59.05,0.212132,58.9,58.975,59.05,59.125,59.2,2.0,72.8,1.979899,71.4,72.1,72.8,73.5,74.2,2007-summer
1,paradise,2007,winter,1.0,2.51,,2.51,2.51,2.51,2.51,2.51,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,51.3,,51.3,51.3,51.3,51.3,51.3,1.0,23.8,,23.8,23.8,23.8,23.8,23.8,1.0,37.6,,37.6,37.6,37.6,37.6,37.6,2007-winter
2,paradise,2008,fall,3.0,0.88,0.81074,0.27,0.42,0.57,1.185,1.8,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,72.1,7.35119,63.9,69.1,74.3,76.2,78.1,3.0,40.0,9.153688,31.0,35.35,39.7,44.5,49.3,3.0,56.033333,8.192883,47.4,52.2,57.0,60.35,63.7,2008-fall
3,paradise,2008,summer,2.0,8.44,7.071068,3.44,5.94,8.44,10.94,13.44,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,82.5,0.848528,81.9,82.2,82.5,82.8,83.1,2.0,58.45,0.919239,57.8,58.125,58.45,58.775,59.1,2.0,70.5,0.848528,69.9,70.2,70.5,70.8,71.1,2008-summer
4,paradise,2008,winter,1.0,1.2,,1.2,1.2,1.2,1.2,1.2,1.0,1.3,,1.3,1.3,1.3,1.3,1.3,1.0,55.3,,55.3,55.3,55.3,55.3,55.3,1.0,26.9,,26.9,26.9,26.9,26.9,26.9,1.0,41.1,,41.1,41.1,41.1,41.1,41.1,2008-winter
5,paradise,2009,fall,3.0,1.65,0.546717,1.02,1.475,1.93,1.965,2.0,3.0,1.933333,3.348632,0.0,0.0,0.0,2.9,5.8,3.0,72.6,8.444525,64.7,68.15,71.6,76.55,81.5,3.0,41.266667,9.600174,31.7,36.45,41.2,46.05,50.9,3.0,56.933333,9.011844,48.2,52.3,56.4,61.3,66.2,2009-fall
6,paradise,2009,spring,3.0,0.466667,0.411987,0.0,0.31,0.62,0.7,0.78,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,73.266667,8.447682,66.4,68.55,70.7,76.7,82.7,3.0,39.666667,8.220908,33.5,35.0,36.5,42.75,49.0,3.0,56.5,8.337266,50.0,51.8,53.6,59.75,65.9,2009-spring
7,paradise,2009,summer,3.0,1.97,1.165032,0.66,1.51,2.36,2.625,2.89,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,87.6,2.433105,84.8,86.8,88.8,89.0,89.2,3.0,57.433333,4.271222,52.6,55.8,59.0,59.85,60.7,3.0,72.533333,3.365016,68.7,71.3,73.9,74.45,75.0,2009-summer
8,paradise,2009,winter,3.0,1.106667,0.603435,0.7,0.76,0.82,1.31,1.8,3.0,5.4,3.966106,1.0,3.75,6.5,7.6,8.7,3.0,55.8,6.509224,49.1,52.65,56.2,59.15,62.1,3.0,25.433333,0.960902,24.4,25.0,25.6,25.95,26.3,3.0,40.6,3.758989,36.7,38.8,40.9,42.55,44.2,2009-winter
9,paradise,2010,fall,3.0,0.743333,0.610109,0.14,0.435,0.73,1.045,1.36,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,72.833333,10.32731,61.7,68.2,74.7,78.4,82.1,3.0,40.9,12.517588,27.7,35.05,42.4,47.5,52.6,3.0,56.866667,11.437803,44.7,51.6,58.5,62.95,67.4,2010-fall


In [21]:
df_annual = pd.DataFrame(df.groupby(['source','year'])['PRCP','SNOW','TMAX','TMIN','TAVG'].describe())[1:-1]
df_annual.columns = [' '.join(col).strip() for col in df_annual.columns.values]
df_annual = df_annual.reset_index().sort_values('year')
df_annual

Unnamed: 0,source,year,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max
10,portal,2000,11.0,1.713636,2.477564,0.0,0.025,0.52,2.06,7.81,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,71.5,12.671622,54.6,61.7,67.7,84.95,86.5,11.0,37.536364,12.668881,23.1,26.75,35.4,48.65,55.6,11.0,54.518182,12.407967,39.5,43.9,54.3,67.1,71.0
11,portal,2001,12.0,1.295,1.16247,0.37,0.5275,0.64,1.865,3.81,12.0,0.95,2.251666,0.0,0.0,0.0,0.0,6.6,11.0,69.954545,13.434386,50.7,59.7,72.0,82.0,88.1,10.0,35.33,11.050696,20.9,27.15,34.6,42.0,54.2,10.0,51.75,11.763338,36.6,42.775,50.95,61.275,69.1
12,portal,2002,12.0,1.029167,1.404593,0.0,0.0525,0.645,1.31,5.07,12.0,0.241667,0.583809,0.0,0.0,0.0,0.0,1.8,12.0,72.183333,13.629302,50.7,61.825,73.15,82.925,91.5,12.0,38.091667,12.724883,23.3,27.525,37.75,50.35,57.8,12.0,55.133333,12.957576,37.2,44.65,55.5,66.5,72.4
13,portal,2003,12.0,0.963333,0.955485,0.05,0.34,0.5,1.725,2.57,12.0,0.091667,0.317543,0.0,0.0,0.0,0.0,1.1,12.0,73.141667,13.112482,55.0,62.05,74.1,83.85,92.2,12.0,38.225,12.503754,21.2,28.025,37.5,48.925,56.4,12.0,55.691667,12.695129,38.1,44.7,55.8,66.8,74.2
14,portal,2004,12.0,1.7575,1.034427,0.5,0.9175,1.62,2.275,3.83,11.0,0.409091,0.970005,0.0,0.0,0.0,0.0,3.0,10.0,66.55,11.667833,52.6,55.475,67.85,76.55,82.4,10.0,35.72,11.631261,22.6,26.4,34.35,43.675,53.7,9.0,49.333333,10.694625,37.7,39.4,50.6,54.6,67.4
15,portal,2005,12.0,1.884167,1.867115,0.0,0.2075,1.135,3.3525,4.89,10.0,0.27,0.853815,0.0,0.0,0.0,0.0,2.7,11.0,73.727273,13.214544,53.5,63.95,73.7,83.0,91.2,11.0,39.409091,11.049838,26.6,30.4,37.6,48.55,55.3,11.0,56.572727,11.773028,40.8,46.1,55.0,67.2,73.3
16,portal,2006,11.0,2.144545,2.857133,0.12,0.285,0.77,3.29,8.59,11.0,0.409091,1.356801,0.0,0.0,0.0,0.0,4.5,10.0,73.21,12.916265,53.5,63.8,73.5,83.65,91.1,9.0,40.766667,12.96476,22.0,30.9,39.5,53.0,58.1,9.0,58.077778,11.95646,40.8,47.3,57.0,67.7,73.2
17,portal,2007,12.0,1.8075,1.792236,0.03,0.485,1.135,2.815,5.76,12.0,1.175,3.185799,0.0,0.0,0.0,0.25,11.1,11.0,71.681818,14.050823,47.5,62.75,76.0,82.25,88.4,11.0,39.245455,13.608847,21.8,26.35,39.7,50.5,57.5,11.0,55.472727,13.650208,34.7,44.55,57.8,67.6,72.4
18,portal,2008,11.0,1.741818,3.014614,0.0,0.01,0.61,1.675,9.91,10.0,0.1,0.316228,0.0,0.0,0.0,0.0,1.0,10.0,74.04,10.226023,56.0,68.05,75.6,80.475,89.4,10.0,40.66,11.961159,26.3,29.05,39.7,50.05,57.1,10.0,57.35,10.742258,41.8,48.55,58.0,67.25,70.2
0,paradise,2008,6.0,3.453333,5.019799,0.27,0.7275,1.5,3.03,13.44,6.0,0.216667,0.530723,0.0,0.0,0.0,0.0,1.3,6.0,72.766667,10.997212,55.3,66.5,76.2,80.95,83.1,6.0,43.966667,13.612005,26.9,33.175,44.5,55.675,59.1,6.0,58.366667,12.197322,41.1,49.8,60.35,68.35,71.1


## Population Size

Can we predict the change in population size using the prvious year's weather?
First let's make a new data set that will allow us to vizualize the potential relationship between precipitation and population size.

In [22]:
df_reg_annual = df_annual.merge(df_pop.loc[df_pop.sex=='f'].drop(columns=['propMale','sex','liznumber']),on = ['year'],how='left')
df_reg_annual.head()

Unnamed: 0,source,year,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,liznumberYear,propFemale
0,portal,2000,11.0,1.713636,2.477564,0.0,0.025,0.52,2.06,7.81,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,71.5,12.671622,54.6,61.7,67.7,84.95,86.5,11.0,37.536364,12.668881,23.1,26.75,35.4,48.65,55.6,11.0,54.518182,12.407967,39.5,43.9,54.3,67.1,71.0,153,0.54902
1,portal,2001,12.0,1.295,1.16247,0.37,0.5275,0.64,1.865,3.81,12.0,0.95,2.251666,0.0,0.0,0.0,0.0,6.6,11.0,69.954545,13.434386,50.7,59.7,72.0,82.0,88.1,10.0,35.33,11.050696,20.9,27.15,34.6,42.0,54.2,10.0,51.75,11.763338,36.6,42.775,50.95,61.275,69.1,135,0.533333
2,portal,2002,12.0,1.029167,1.404593,0.0,0.0525,0.645,1.31,5.07,12.0,0.241667,0.583809,0.0,0.0,0.0,0.0,1.8,12.0,72.183333,13.629302,50.7,61.825,73.15,82.925,91.5,12.0,38.091667,12.724883,23.3,27.525,37.75,50.35,57.8,12.0,55.133333,12.957576,37.2,44.65,55.5,66.5,72.4,119,0.563025
3,portal,2003,12.0,0.963333,0.955485,0.05,0.34,0.5,1.725,2.57,12.0,0.091667,0.317543,0.0,0.0,0.0,0.0,1.1,12.0,73.141667,13.112482,55.0,62.05,74.1,83.85,92.2,12.0,38.225,12.503754,21.2,28.025,37.5,48.925,56.4,12.0,55.691667,12.695129,38.1,44.7,55.8,66.8,74.2,97,0.556701
4,portal,2004,12.0,1.7575,1.034427,0.5,0.9175,1.62,2.275,3.83,11.0,0.409091,0.970005,0.0,0.0,0.0,0.0,3.0,10.0,66.55,11.667833,52.6,55.475,67.85,76.55,82.4,10.0,35.72,11.631261,22.6,26.4,34.35,43.675,53.7,9.0,49.333333,10.694625,37.7,39.4,50.6,54.6,67.4,70,0.542857


In [23]:
df_reg_season = df_season.merge(df_pop.loc[df_pop.sex=='f'].drop(columns=['propMale','sex','liznumber']),on = ['year'],how='left')
df_reg_season.head()

Unnamed: 0,source,year,season,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,year-season,liznumberYear,propFemale
0,paradise,2007,summer,2.0,3.865,1.350574,2.91,3.3875,3.865,4.3425,4.82,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,86.5,3.676955,83.9,85.2,86.5,87.8,89.1,2.0,59.05,0.212132,58.9,58.975,59.05,59.125,59.2,2.0,72.8,1.979899,71.4,72.1,72.8,73.5,74.2,2007-summer,94,0.574468
1,paradise,2007,winter,1.0,2.51,,2.51,2.51,2.51,2.51,2.51,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,51.3,,51.3,51.3,51.3,51.3,51.3,1.0,23.8,,23.8,23.8,23.8,23.8,23.8,1.0,37.6,,37.6,37.6,37.6,37.6,37.6,2007-winter,94,0.574468
2,paradise,2008,fall,3.0,0.88,0.81074,0.27,0.42,0.57,1.185,1.8,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,72.1,7.35119,63.9,69.1,74.3,76.2,78.1,3.0,40.0,9.153688,31.0,35.35,39.7,44.5,49.3,3.0,56.033333,8.192883,47.4,52.2,57.0,60.35,63.7,2008-fall,88,0.5
3,paradise,2008,summer,2.0,8.44,7.071068,3.44,5.94,8.44,10.94,13.44,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,82.5,0.848528,81.9,82.2,82.5,82.8,83.1,2.0,58.45,0.919239,57.8,58.125,58.45,58.775,59.1,2.0,70.5,0.848528,69.9,70.2,70.5,70.8,71.1,2008-summer,88,0.5
4,paradise,2008,winter,1.0,1.2,,1.2,1.2,1.2,1.2,1.2,1.0,1.3,,1.3,1.3,1.3,1.3,1.3,1.0,55.3,,55.3,55.3,55.3,55.3,55.3,1.0,26.9,,26.9,26.9,26.9,26.9,26.9,1.0,41.1,,41.1,41.1,41.1,41.1,41.1,2008-winter,88,0.5


In [24]:
#Drop paradise
df_reg_annual['popinYearless1'] = df_reg_annual.groupby('source').liznumberYear.shift(-1)
df_reg_annual['popinYearless2'] = df_reg_annual.groupby('source').liznumberYear.shift(-2)
df_reg_annual['popinYearless3'] = df_reg_annual.groupby('source').liznumberYear.shift(-3)
df_reg_annual['popinYearless4'] = df_reg_annual.groupby('source').liznumberYear.shift(-4)
df_reg_annual['popinYearless5'] = df_reg_annual.groupby('source').liznumberYear.shift(-5)
df_reg_annual

Unnamed: 0,source,year,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,TMAX 75%,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,liznumberYear,propFemale,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,portal,2000,11.0,1.713636,2.477564,0.0,0.025,0.52,2.06,7.81,11.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,11.0,71.5,12.671622,54.6,61.7,67.7,84.95,86.5,11.0,37.536364,12.668881,23.1,26.75,35.4,48.65,55.6,11.0,54.518182,12.407967,39.5,43.9,54.3,67.1,71.0,153,0.54902,135.0,119.0,97.0,70.0,79.0
1,portal,2001,12.0,1.295,1.16247,0.37,0.5275,0.64,1.865,3.81,12.0,0.95,2.251666,0.0,0.0,0.0,0.0,6.6,11.0,69.954545,13.434386,50.7,59.7,72.0,82.0,88.1,10.0,35.33,11.050696,20.9,27.15,34.6,42.0,54.2,10.0,51.75,11.763338,36.6,42.775,50.95,61.275,69.1,135,0.533333,119.0,97.0,70.0,79.0,66.0
2,portal,2002,12.0,1.029167,1.404593,0.0,0.0525,0.645,1.31,5.07,12.0,0.241667,0.583809,0.0,0.0,0.0,0.0,1.8,12.0,72.183333,13.629302,50.7,61.825,73.15,82.925,91.5,12.0,38.091667,12.724883,23.3,27.525,37.75,50.35,57.8,12.0,55.133333,12.957576,37.2,44.65,55.5,66.5,72.4,119,0.563025,97.0,70.0,79.0,66.0,94.0
3,portal,2003,12.0,0.963333,0.955485,0.05,0.34,0.5,1.725,2.57,12.0,0.091667,0.317543,0.0,0.0,0.0,0.0,1.1,12.0,73.141667,13.112482,55.0,62.05,74.1,83.85,92.2,12.0,38.225,12.503754,21.2,28.025,37.5,48.925,56.4,12.0,55.691667,12.695129,38.1,44.7,55.8,66.8,74.2,97,0.556701,70.0,79.0,66.0,94.0,88.0
4,portal,2004,12.0,1.7575,1.034427,0.5,0.9175,1.62,2.275,3.83,11.0,0.409091,0.970005,0.0,0.0,0.0,0.0,3.0,10.0,66.55,11.667833,52.6,55.475,67.85,76.55,82.4,10.0,35.72,11.631261,22.6,26.4,34.35,43.675,53.7,9.0,49.333333,10.694625,37.7,39.4,50.6,54.6,67.4,70,0.542857,79.0,66.0,94.0,88.0,105.0
5,portal,2005,12.0,1.884167,1.867115,0.0,0.2075,1.135,3.3525,4.89,10.0,0.27,0.853815,0.0,0.0,0.0,0.0,2.7,11.0,73.727273,13.214544,53.5,63.95,73.7,83.0,91.2,11.0,39.409091,11.049838,26.6,30.4,37.6,48.55,55.3,11.0,56.572727,11.773028,40.8,46.1,55.0,67.2,73.3,79,0.455696,66.0,94.0,88.0,105.0,54.0
6,portal,2006,11.0,2.144545,2.857133,0.12,0.285,0.77,3.29,8.59,11.0,0.409091,1.356801,0.0,0.0,0.0,0.0,4.5,10.0,73.21,12.916265,53.5,63.8,73.5,83.65,91.1,9.0,40.766667,12.96476,22.0,30.9,39.5,53.0,58.1,9.0,58.077778,11.95646,40.8,47.3,57.0,67.7,73.2,66,0.545455,94.0,88.0,105.0,54.0,45.0
7,portal,2007,12.0,1.8075,1.792236,0.03,0.485,1.135,2.815,5.76,12.0,1.175,3.185799,0.0,0.0,0.0,0.25,11.1,11.0,71.681818,14.050823,47.5,62.75,76.0,82.25,88.4,11.0,39.245455,13.608847,21.8,26.35,39.7,50.5,57.5,11.0,55.472727,13.650208,34.7,44.55,57.8,67.6,72.4,94,0.574468,88.0,105.0,54.0,45.0,51.0
8,portal,2008,11.0,1.741818,3.014614,0.0,0.01,0.61,1.675,9.91,10.0,0.1,0.316228,0.0,0.0,0.0,0.0,1.0,10.0,74.04,10.226023,56.0,68.05,75.6,80.475,89.4,10.0,40.66,11.961159,26.3,29.05,39.7,50.05,57.1,10.0,57.35,10.742258,41.8,48.55,58.0,67.25,70.2,88,0.5,105.0,54.0,45.0,51.0,55.0
9,paradise,2008,6.0,3.453333,5.019799,0.27,0.7275,1.5,3.03,13.44,6.0,0.216667,0.530723,0.0,0.0,0.0,0.0,1.3,6.0,72.766667,10.997212,55.3,66.5,76.2,80.95,83.1,6.0,43.966667,13.612005,26.9,33.175,44.5,55.675,59.1,6.0,58.366667,12.197322,41.1,49.8,60.35,68.35,71.1,88,0.5,105.0,54.0,45.0,51.0,55.0


In [25]:
#Drop paradise
df_reg_season['popinYearless1'] = df_reg_season.groupby('source').liznumberYear.shift(-1)
df_reg_season['popinYearless2'] = df_reg_season.groupby('source').liznumberYear.shift(-2)
df_reg_season['popinYearless3'] = df_reg_season.groupby('source').liznumberYear.shift(-3)
df_reg_season['popinYearless4'] = df_reg_season.groupby('source').liznumberYear.shift(-4)
df_reg_season['popinYearless5'] = df_reg_season.groupby('source').liznumberYear.shift(-5)
df_reg_season

Unnamed: 0,source,year,season,PRCP count,PRCP mean,PRCP std,PRCP min,PRCP 25%,PRCP 50%,PRCP 75%,PRCP max,SNOW count,SNOW mean,SNOW std,SNOW min,SNOW 25%,SNOW 50%,SNOW 75%,SNOW max,TMAX count,TMAX mean,TMAX std,TMAX min,TMAX 25%,TMAX 50%,...,TMAX max,TMIN count,TMIN mean,TMIN std,TMIN min,TMIN 25%,TMIN 50%,TMIN 75%,TMIN max,TAVG count,TAVG mean,TAVG std,TAVG min,TAVG 25%,TAVG 50%,TAVG 75%,TAVG max,year-season,liznumberYear,propFemale,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,paradise,2007,summer,2.0,3.865,1.350574,2.91,3.3875,3.865,4.3425,4.82,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,86.5,3.676955,83.9,85.2,86.5,...,89.1,2.0,59.05,0.212132,58.9,58.975,59.05,59.125,59.2,2.0,72.8,1.979899,71.4,72.1,72.8,73.5,74.2,2007-summer,94,0.574468,94.0,88.0,88.0,88.0,105.0
1,paradise,2007,winter,1.0,2.51,,2.51,2.51,2.51,2.51,2.51,1.0,0.0,,0.0,0.0,0.0,0.0,0.0,1.0,51.3,,51.3,51.3,51.3,...,51.3,1.0,23.8,,23.8,23.8,23.8,23.8,23.8,1.0,37.6,,37.6,37.6,37.6,37.6,37.6,2007-winter,94,0.574468,88.0,88.0,88.0,105.0,105.0
2,paradise,2008,fall,3.0,0.88,0.81074,0.27,0.42,0.57,1.185,1.8,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,72.1,7.35119,63.9,69.1,74.3,...,78.1,3.0,40.0,9.153688,31.0,35.35,39.7,44.5,49.3,3.0,56.033333,8.192883,47.4,52.2,57.0,60.35,63.7,2008-fall,88,0.5,88.0,88.0,105.0,105.0,105.0
3,paradise,2008,summer,2.0,8.44,7.071068,3.44,5.94,8.44,10.94,13.44,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,82.5,0.848528,81.9,82.2,82.5,...,83.1,2.0,58.45,0.919239,57.8,58.125,58.45,58.775,59.1,2.0,70.5,0.848528,69.9,70.2,70.5,70.8,71.1,2008-summer,88,0.5,88.0,105.0,105.0,105.0,105.0
4,paradise,2008,winter,1.0,1.2,,1.2,1.2,1.2,1.2,1.2,1.0,1.3,,1.3,1.3,1.3,1.3,1.3,1.0,55.3,,55.3,55.3,55.3,...,55.3,1.0,26.9,,26.9,26.9,26.9,26.9,26.9,1.0,41.1,,41.1,41.1,41.1,41.1,41.1,2008-winter,88,0.5,105.0,105.0,105.0,105.0,54.0
5,paradise,2009,fall,3.0,1.65,0.546717,1.02,1.475,1.93,1.965,2.0,3.0,1.933333,3.348632,0.0,0.0,0.0,2.9,5.8,3.0,72.6,8.444525,64.7,68.15,71.6,...,81.5,3.0,41.266667,9.600174,31.7,36.45,41.2,46.05,50.9,3.0,56.933333,9.011844,48.2,52.3,56.4,61.3,66.2,2009-fall,105,0.514286,105.0,105.0,105.0,54.0,54.0
6,paradise,2009,spring,3.0,0.466667,0.411987,0.0,0.31,0.62,0.7,0.78,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,73.266667,8.447682,66.4,68.55,70.7,...,82.7,3.0,39.666667,8.220908,33.5,35.0,36.5,42.75,49.0,3.0,56.5,8.337266,50.0,51.8,53.6,59.75,65.9,2009-spring,105,0.514286,105.0,105.0,54.0,54.0,54.0
7,paradise,2009,summer,3.0,1.97,1.165032,0.66,1.51,2.36,2.625,2.89,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,87.6,2.433105,84.8,86.8,88.8,...,89.2,3.0,57.433333,4.271222,52.6,55.8,59.0,59.85,60.7,3.0,72.533333,3.365016,68.7,71.3,73.9,74.45,75.0,2009-summer,105,0.514286,105.0,54.0,54.0,54.0,54.0
8,paradise,2009,winter,3.0,1.106667,0.603435,0.7,0.76,0.82,1.31,1.8,3.0,5.4,3.966106,1.0,3.75,6.5,7.6,8.7,3.0,55.8,6.509224,49.1,52.65,56.2,...,62.1,3.0,25.433333,0.960902,24.4,25.0,25.6,25.95,26.3,3.0,40.6,3.758989,36.7,38.8,40.9,42.55,44.2,2009-winter,105,0.514286,54.0,54.0,54.0,54.0,45.0
9,paradise,2010,fall,3.0,0.743333,0.610109,0.14,0.435,0.73,1.045,1.36,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,72.833333,10.32731,61.7,68.2,74.7,...,82.1,3.0,40.9,12.517588,27.7,35.05,42.4,47.5,52.6,3.0,56.866667,11.437803,44.7,51.6,58.5,62.95,67.4,2010-fall,54,0.5,54.0,54.0,54.0,45.0,45.0


## Correlations

In [26]:
def candidate(m,dv,placement=(1,1)):
    assert(dv in m.columns)
    return m[dv].sort_values().reset_index().iloc[placement[0]:placement[1]+1,:]

In [27]:
from functools import reduce

In [28]:
def topcorr(corrdf,lowestrank,dvs):
    candidates = [candidate(corrdf,dv,(1,lowestrank)) for dv in dvs]
    merger =  reduce(lambda x, y: pd.merge(x, y, on = 'index', how = 'outer'), candidates).fillna('--')
    return merger

In [29]:
#Dropping proportion of Females, but will put it back once I can order the y-axis
corrPortal_annual = df_reg_annual.loc[(df_reg_annual.source=='portal')]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_annual.columns
testy = corrPortal_annual.index
testz = corrPortal_annual.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal annual correlation matrix.html')
iplot(test, filename = 'portal annual correlation matrix.html')

In [30]:
mydvs = ['liznumberYear', 'popinYearless1', 'popinYearless2', 'popinYearless3', 'popinYearless4', 'popinYearless5']

In [31]:
annual =topcorr(corrPortal_annual,3,mydvs) 
annual

Unnamed: 0,index,liznumberYear,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,TMIN mean,-0.629165,--,--,--,--,--
1,TMIN 50%,-0.603868,--,--,--,--,--
2,TMIN max,-0.569143,-0.419539,--,--,--,--
3,TMIN 25%,--,-0.42179,--,--,--,-0.530104
4,TMAX 50%,--,-0.411493,-0.351992,-0.419513,--,--
5,TMAX min,--,--,-0.50909,--,--,--
6,TAVG min,--,--,-0.464191,--,--,--
7,TMAX 25%,--,--,--,-0.419912,--,--
8,TAVG 25%,--,--,--,-0.346994,-0.456096,-0.688057
9,PRCP max,--,--,--,--,-0.522149,--


## To-Do
- Run MV correlation
    - IV should be pop at year 0
    - DV should be pop at year 1- year X plus abiotic factors
    - Which abiotic

## Season

### Spring

In [32]:
corrPortal_spring = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['spring']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_spring.columns
testy = corrPortal_spring.index
testz = corrPortal_spring.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal spring correlation matrix.html')
iplot(test, filename = 'portal spring correlation matrix.html')

In [33]:
spring= topcorr(corrPortal_spring,3,mydvs) 
spring

Unnamed: 0,index,liznumberYear,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,TAVG min,-0.41007,-0.41007,-0.463851,-0.354253,-0.354253,-0.354253
1,TMIN min,-0.386554,-0.386554,-0.41594,-0.395319,-0.395319,-0.395319
2,TMAX min,-0.365931,-0.365931,-0.392629,-0.236499,-0.236499,-0.236499


### Summer

In [34]:
corrPortal_summer = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['summer']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_summer.columns
testy = corrPortal_summer.index
testz = corrPortal_summer.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal summer correlation matrix.html')
iplot(test, filename = 'portal summer correlation matrix.html')

In [35]:
summer= topcorr(corrPortal_summer,3,mydvs) 
summer

Unnamed: 0,index,liznumberYear,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,TMIN 75%,-0.599567,-0.596948,-0.524271,-0.524271,-0.524271,-0.525202
1,TMAX std,-0.582448,-0.580352,--,--,--,--
2,TMIN max,-0.567684,-0.569143,--,--,--,--
3,TMIN 50%,--,--,-0.565723,-0.565723,-0.565723,-0.57141
4,TMIN std,--,--,-0.497294,-0.497294,-0.497294,-0.503155


### Fall

In [36]:
corrPortal_fall = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['fall']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_fall.columns
testy = corrPortal_fall.index
testz = corrPortal_fall.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal fall correlation matrix.html')
iplot(test, filename = 'portal fall correlation matrix.html')

In [37]:
fall= topcorr(corrPortal_fall,3,mydvs) 
fall

Unnamed: 0,index,liznumberYear,popinYearless1,popinYearless2,popinYearless3,popinYearless4,popinYearless5
0,TAVG 50%,-0.610478,-0.610478,-0.610478,-0.61753,-0.553253,-0.553253
1,TMIN 75%,-0.596465,-0.596465,-0.596465,-0.594623,-0.535238,-0.535238
2,TAVG 75%,-0.576486,-0.576486,-0.576486,--,-0.552898,-0.552898
3,TMAX 50%,--,--,--,-0.581165,--,--


In [38]:
fall['index'].tolist()

['TAVG 50%', 'TMIN 75%', 'TAVG 75%', 'TMAX 50%']

In [39]:
popvars = ['popinYearless1','popinYearless2','popinYearless3','popinYearless4']
weathvars = fall['index'].tolist()
for popvar in popvars:
    for weathvar in weathvars:
        r,p = stats.pearsonr(df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['fall']))]\
                             [popvar].dropna(),df_reg_season.loc[(df_reg_season.source=='portal')&
                                                                 (df_reg_season.season.isin(['fall']))]\
                             [weathvar].dropna())
        print('{} vs {}: r={}; p={}'.format(popvar,weathvar,r,p))

popinYearless1 vs TAVG 50%: r=-0.6104777707920975; p=0.007127843214674913
popinYearless1 vs TMIN 75%: r=-0.5964653777823807; p=0.00897940949730995
popinYearless1 vs TAVG 75%: r=-0.5764857561854956; p=0.012269157294082995
popinYearless1 vs TMAX 50%: r=-0.5761087906255336; p=0.012339370082801468
popinYearless2 vs TAVG 50%: r=-0.6104777707920975; p=0.007127843214674913
popinYearless2 vs TMIN 75%: r=-0.5964653777823807; p=0.00897940949730995
popinYearless2 vs TAVG 75%: r=-0.5764857561854956; p=0.012269157294082995
popinYearless2 vs TMAX 50%: r=-0.5761087906255336; p=0.012339370082801468


ValueError: x and y must have the same length.

### Winter

In [None]:
corrPortal_winter = df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['winter']))]\
.drop(columns=['PRCP count', 'SNOW count', 'TMAX count', 'TMIN count', 'TAVG count','propFemale',
              'SNOW min', 'SNOW 25%', 'SNOW 50%',]).corr()
testx = corrPortal_winter.columns
testy = corrPortal_winter.index
testz = corrPortal_winter.values
test = go.Figure(go.Heatmap(x=testx,y=testy,z=testz))
plot(test, filename = 'portal winter correlation matrix.html')
iplot(test, filename = 'portal winter correlation matrix.html')

In [None]:
winter= topcorr(corrPortal_winter,3,mydvs) 
winter

In [None]:
popvars = ['popinYearless1','popinYearless2','popinYearless3','popinYearless4']
weathvars = ['PRCP mean','PRCP 75%','PRCP max']
for popvar in popvars:
    for weathvar in weathvars:
        r,p = stats.pearsonr(df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['winter']))]\
                             [popvar],df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['winter']))]\
                             [weathvar])
        print('{} vs {}: r={}; p={}'.format(popvar,weathvar,r,p))
popvars = ['popinYearless1','popinYearless2','popinYearless3','popinYearless4']
weathvars = ['TMIN 75%']
for popvar in popvars:
    for weathvar in weathvars:
        r,p = stats.pearsonr(df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['winter']))]\
                             [popvar],df_reg_season.loc[(df_reg_season.source=='portal')&(df_reg_season.season.isin(['winter']))]\
                             [weathvar])
        print('{} vs {}: r={}; p={}'.format(popvar,weathvar,r,p))

Unlike the other season, winter precipitation in year 0 has correlalations for population size in subsequent years.

# Resume Here

[Back to TOC](#Table-of-Contents)

Need to model this with regression.
two predictors: pop in year weather in year
dv: pop in year 2

Train model using captured juveniles to predict age class 
include weather

In [None]:
import numpy as np
import pingouin as pg
from scipy import stats

In [None]:
mydvs = ['liznumberYear', 'popinYearless1', 'popinYearless2', 'popinYearless3', 'popinYearless4', 'popinYearless5']

In [None]:
weathermetrics = ['']
iv = ['liznumberYear',weathermetric]
dv = 'popinYearless1'
r,p = stats.pearsonr(df_reg.liznumberYear,df_reg[var])
print('{}: r={}; p={}'.format(var,r,p))

In [None]:
var = 'TMIN mean' 
slope, intercept, r_value, p_value, std_err = stats.linregress(df_reg.liznumberYear,df_reg[var])
print("slope: {}    intercept: {}".format(slope, intercept))
# slope: 1.944864    intercept: 0.268578
print("R-squared: {}".format(r_value**2))

In [None]:
pg.anova(data=df, dv='liznumberYear', between='group', detailed=True)
print(aov)

## Growth

## Sex Ratio