# Bioindicators of Strawberry Creek (Virtual Communities)
### Professors George Roderick, John Huelsenbeck & Alan Shabel
_Estimated Time: 50 minutes_

--- 

Welcome! Throughout this lab you will be using Python to analyze the data that you collected from Strawberry Creek. Python is a general-purpose programming language that allows one to use data analysis methods that simulate data sets that we may not have the resources to collect in real life. 

This notebook samples from two virtual biological communities, representing the North and South Forks of Strawberry Creek.

**Learning Outcomes**

By the end of the notebook, students should be able to:

1. Explain the use of biological organisms as indicators of ecosystem health
2. Interpret biological metrics of diversity: taxon richness, %EPT, biotic index (FBI), % filterers, % predators, Shannon index
3. Use simulated resampling, or permutaions, to determine if two samples are likely to have come from the same underlying distribution.
4. Apply p-value to describe statistical significance

## Table of Contents 

1. [Jupyter Notebooks](#1)
    - [Types of Cells](#1.1)
    - [Running Cells](#1.2)
    - [Editting, Saving and Submitting](#1.3)
<br/><br/>
2. [Data Recording](#2)
<br/><br/>
3. [Introduction to Data Analytics](#3)
    - [Null and Alternate Hypothesis](#3.1)
    - [Permutation Test](#3.2)
    - [P-values & Statistical Significance](#3.3)
<br/><br/>
4. [Your Data](#4)
<br/><br/>
5. [Submitting the Lab](#5)

<br>

# 1. Jupyter Notebooks <a id='1'></a>
---

This lab is currently set up in a Jupyter Notebook. A Jupyter Notebook is an online, interactive computing environment, composed of different types of __cells__. Cells are chunks of code or text that are used to break up a larger notebook into smaller, more manageable parts and to let the viewer modify and interact with the elements of the notebook.
 
### Types of cells <a id= '1.1'> </a>

There are two types of cells in Jupyter, __code__ cells and __markdown__ cells. Code cells are cells indicated with “In [  ]:” to the left of the cell. In these cells you can write you own code and run the code in the individual cell.
Markdown cells hold text a majority of the time and do not have the “In [ ]” to the left of the cell.

### Running cells <a id= '1.2'> </a>

'Running' a cell is similar to pressing 'Enter' on a calculator once you've typed in an expression; it computes all of the expressions contained within the cell.

To run a code cell, you can do one of the following:
- press __Shift + Enter__
- click __Cell -> Run Cells__ in the toolbar at the top of the screen.

You can navigate the cells by either clicking on them or by using your up and down arrow keys. Try running the cell below to see what happens. 

In [None]:
print("Hello, World")

The input of the cell consists of the text/code that is contained within the cell's enclosing box. Here, the input is an expression in Python that "prints" or repeats whatever text or number is passed in. 

The output of running a cell is shown in the line immediately after it. Notice that markdown cells have no output. 

### Editing, Saving and Submitting <a id='1.3'> </a>

- To __edit__ a cell simply click on the desired cell and begin typing 
- To __save__ your notebook press _command + s_ on the keyboard 
- We will go into the specifics of how to __submit__ your work at the end of the lab, but you will essentially be converting your work into a PDF file and then including it in your Lab Report

Run this cell before proceeding with the rest of the lab!

In [None]:
import numpy as np
import pandas as pd
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
from IPython.display import display
from IPython.display import clear_output
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

<br>

# 2. Data Recording <a id='2'> </a>
---

In this section you will be importing the data you collected in the lab!

To import your data you must:
1. Open up the desired google sheets form.
2. Navigate to the __File__ tab and hover over __Download__.
3. From there another drop down tab should appear with differect formats to download the form as. Select the __Comma-Separated Values (csv)__ option.

To import the data set just run the following cell!  If all goes smoothly, you will see the first few rows of your data file.

In [None]:
#data = pd.read_csv("data set name")
data = pd.read_csv("CommunityData.csv")
data

Sample 50 individuals out of a community of 10000

In [None]:
sample_size = 50
indices = np.random.choice(len(data.index), size=sample_size, p=data['North'])
NorthSample = data.iloc[indices]
indices = np.random.choice(len(data.index), size=sample_size, p=data['South'])
SouthSample = data.iloc[indices]
#NorthSample
SouthSample


In [None]:
print ('EPT North', NorthSample['EPT'].mean() )
print ('EPT South', SouthSample['EPT'].mean() )

print ('Filters North', (NorthSample['Function'] == 'F').mean() )
print ('Filters South', (SouthSample['Function'] == 'F').mean() )

print ('Predators North', (NorthSample['Function'] == 'P').mean() )
print ('Predators South', (SouthSample['Function'] == 'P').mean() )

print ('FBI North', NorthSample['Biotic'].mean() )
print ('FBI South', SouthSample['Biotic'].mean() )

unique_NorthTaxa, counts_NorthTaxa = np.unique(NorthSample['Name'], return_counts=True)
NorthTaxa = (np.asarray((unique_NorthTaxa, counts_NorthTaxa)))
prop_NorthTaxa = counts_NorthTaxa / sample_size

unique_SouthTaxa, counts_SouthTaxa = np.unique(SouthSample['Name'], return_counts=True)
SouthTaxa = (np.asarray((unique_SouthTaxa, counts_SouthTaxa)))
prop_SouthTaxa = counts_SouthTaxa / sample_size


print ("Richness North", len(unique_NorthTaxa) )
print ("Richness South", len(unique_SouthTaxa) )

print ("Shannon North ", round(-(prop_NorthTaxa * np.log(prop_NorthTaxa)).sum(), 3) )
print ("Shannon South ", round(-(prop_SouthTaxa * np.log(prop_SouthTaxa)).sum(), 3) )



### Bibliography 