# Assignment 3: Advanced Python (30 pt)

This assignment covers materials from the material on loops, functions, and NumPy lectures.

Note that these questions are longer and somewhat more open ended than previous assignments. Please reach out if you need assistance getting started.

Feel free to create as many Python or Markdown cells as you desire to answer the questions.

## Question 1: For loops (10 pts)

Below, we have a nested dictionary structure containing information about several species ranging from vulnerable to critically endangered. Note than in some cases, species populations are listed as `None`. This means that the wild populations of these species are unknown. 

Use for loops to accomplish the following tasks: 

- Create a data structure containing all unique types of "Threats". This variable should not contain duplicate entries. Print the structure (2 pt).
- Create a list of all of the species listed as "Critically Endangered". Print the list (2 pt).
- Create a separate list containing the names of species with populations with fewer than 50 individuals and species with unknown population sizes. Print the list (3 pt).
- Find the species with the largest population size. Print this species name, and what its population size is (3 pt).

If you hard code the solutions (e.g. manually pick out which species has the largest population) you will receive NO points.

In [121]:
conservation_data = {
    "Giant Panda": {
        "Status": "Endangered",
        "Population": 1800,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Mountain Gorilla": {
        "Status": "Critically Endangered",
        "Population": 1063,
        "Threats": ["Habitat loss", "Poaching", "Civil unrest"]
    },
    "Amur Leopard": {
        "Status": "Critically Endangered",
        "Population": 84,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Vaquita": {
        "Status": "Critically Endangered",
        "Population": 10,
        "Threats": ["Bycatch in fishing nets"]
    },
    "African Elephant": {
        "Status": "Vulnerable",
        "Population": 415000,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Javan Rhino": {
        "Status": "Critically Endangered",
        "Population": 72,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Sumatran Orangutan": {
        "Status": "Critically Endangered",
        "Population": 14600,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Hawksbill Turtle": {
        "Status": "Critically Endangered",
        "Population": None,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Saola": {
        "Status": "Critically Endangered",
        "Population": None,
        "Threats": ["Habitat loss", "Poaching"]
    },
    "Iberian Lynx": {
        "Status": "Endangered",
        "Population": 94,
        "Threats": ["Habitat loss", "Poaching"]
    }
}


In [122]:
threats = set()
for species in conservation_data:
    for i in conservation_data[species]["Threats"]:
        threats.add(i)
    
print(threats)

critically_endangered = []
for species in conservation_data:
    if conservation_data[species]["Status"] == "Critically Endangered":
        critically_endangered.append(species)

print(critically_endangered)

for species in conservation_data:
    population = conservation_data[species]["Population"]
    if population is None or population < 50:
        print(species)

largest_population = 0
largest_population_species = "None"
for species in conservation_data:
    x = conservation_data[species]["Population"]
    if type(x) == int:
        if x > largest_population:
            largest_population = x
            largest_population_species = species
print("The species with the largest population is the",largest_population_species)    

{'Habitat loss', 'Poaching', 'Bycatch in fishing nets', 'Civil unrest'}
['Mountain Gorilla', 'Amur Leopard', 'Vaquita', 'Javan Rhino', 'Sumatran Orangutan', 'Hawksbill Turtle', 'Saola']
Vaquita
Hawksbill Turtle
Saola
The species with the largest population is the African Elephant


## Question 2: Functions (10 pt)

When considering the health of an ecosystem, an important concept to quantify is the diversity of that system. There are several metrics commonly used to calculate ecosystem diversity, one of which is call Simpson's Diversity Index.

This metric not takes into account how many species are present in an location, but also if one species has far more individuals than other species. For example, an ecosystem with 500 species but only one species above 10 individuals is not that diverse.

We can calculate Simpson's Diversity ($D$) as follows:

$D = 1 - [(\frac{n_1}{N})^2 + (\frac{n_2}{N})^2 + (\frac{n_3}{N})^2 + ...]$

For example, if an ecosystem has four species with 5, 2, 2, and 1 individuals (10 individuals total), you can calculate $D$ like this:

$D = 1 - [(\frac{5}{10})^2 + (\frac{2}{10})^2 + (\frac{2}{10})^2 + (\frac{1}{10})^2] = 0.66$

Define a function that calculates and returns $D$ given a list of species population levels, and run the function on several example lists (3 pt).

Your answer should work for a list of **any** length (1 pt).

Add documentation to the function that describes what it does, the desired parameters, and what data types the parameters should be (2 pt).

Within the function, check that the input is a list. If the input is not a list, give a custom error message (2 pt).

Also, make sure all entries in the list are integers. If there are floats, convert them to integers. If there are entries that are not floats or integers, give a custom error message (2 pt).



In [125]:
# example_input = [1882, 400, 321, 24]
            
def simpson_diversity(pop):
    """
    Calculate the Simpson diversity index of a given list. input of species abundance, formula gives diversity index
    parameter: the code checks if the item is in a list and if what is included is an integer or float. Then the code calculates the Simpson diversity index using the calculation
    """
    list1 = []
    if type(pop) != list:
        print("Error, insert a list")
        
        
    elif type(pop) == list:
        for i in pop:
            if type(i) not in (int, float):
                print("Error, list contains a value that isn't a number")
                
            else: 
                i = int(i)
                x = (i/len(pop))**2
                list1.append(x)          
    y = 1 - sum(list1)
    return(y)
    
list_1 = [1882, 400, 321, 24, 100, 200, 300]
list_2 = [3.0, 9.0, 90.0, 'hi']
list_3 = [1, 2.0, 3, 4.0]
list_4 = [190, 21, 38, 10,0.008]
print(simpson_diversity(list_1))
print(simpson_diversity(list_2))
print(simpson_diversity(list_3))
print(simpson_diversity(list_4))


-80520.24489795917
Error, list contains a value that isn't a number
-510.875
-0.875
-1522.4


## Question 3: Simulating data (10 pt)

In data analysis, we often simulate data to help test our predictions and get a feel for how the real data should be. This questions asks you to use the functions found in `numpy.random` to simulate rolling.

Define a function called `dice_simulator()` with an integer parameter called `n`. This function should create a list of integers 1 through 6 and randomly sample this list with replacement `n` times. The function should return the `n` samples as a list or numpy array. Note that `n` should be a positive integer (2 pt).

Define a function called `proportions()` to calculate what proportion of the "rolls" that are 1s, 2s, 3s, 4s, 5s, and 6s. Print these 6 proportions. `proportions()` should have a single parameter called `rolls`, which should take in the output of `dice_simulator()` (3 pt).

Define a function called `three_streak()` to calculate the maximum number of times 3 was "rolled" in a row and print this value. To be in a row, the 3's have to be next to each other in a list (such as if `rolls[1]` and `rolls[2]` are both 3). Like `proportions()`, `three_streak()` should have a single parameter called `rolls`, which should take in the output of `dice_simulator()` (3 pt). 
- *Hint: `max()` is a built in function in Python that finds the largest value in a list.*

Define a function called `simulation()` that calls `dice_simulator()`, `proportions()`, and `three_streak()`. Make sure that `proportions()` and `three_streak()` are called so that they use the same dice rolls. `simulation()` should take a single parameter `n` that is fed into `dice_simulator()`. Have this function print the value of n, as well (1 pt). 

Call `simulation()` several times with the `n` parameter at different values (1 pt). 





In [126]:
import numpy as np 

def dice_simulator(n):
    if n <= 0:
        print("Error, number is negative")
        return None

    dice_numbers = np.array([1,2,3,4,5,6])
    samples = np.random.choice(dice_numbers, size = n, replace = True)
    return np.array(samples)

test = dice_simulator(10)
print(test)

[3 5 2 2 2 5 2 5 3 4]


In [139]:

def proportions(rolls):
    n = len(rolls)
    counts = np.array([0,0,0,0,0,0])
    print(rolls)
    
    for i in rolls:
        counts[i -1] += 1

    props = counts/n 
    print(props)
    return props
test = dice_simulator(3)
proportions(test)
print(test)

   

[2 1 6]
[0.33333333 0.33333333 0.         0.         0.         0.33333333]
[2 1 6]


In [130]:
def three_streak(rolls):
    current_streak = 0
    streaks = np.array(rolls)
    for i in range(len(streaks)-1):
        if streaks[i] == 3 and streaks[i + 1] ==3: 
            current_streak += 1
        else:
            current_streak = 0
        streaks[i+1] = current_streak
    max_streak = np.max(streaks)
   
rolls = dice_simulator(5)
three_streak(rolls)
print(rolls)

[4 2 1 1 6]


In [133]:
def simulation(n):
    rolls = dice_simulator(n)
    prop = proportions(rolls)
    max_streak = three_streak(rolls)
    print(n)
    print(rolls)
    print(prop)
    print(max_streak)
    
simulation(10)
simulation(7)
simulation(134)
simulation(2)

[2 4 1 5 5 2 3 6 4 2]
[0.1 0.3 0.1 0.2 0.2 0.1]
10
[2 4 1 5 5 2 3 6 4 2]
[0.1 0.3 0.1 0.2 0.2 0.1]
None
[3 6 1 2 3 6 4]
[0.14285714 0.14285714 0.28571429 0.14285714 0.         0.28571429]
7
[3 6 1 2 3 6 4]
[0.14285714 0.14285714 0.28571429 0.14285714 0.         0.28571429]
None
[2 6 2 3 6 1 5 1 4 4 6 6 2 5 5 4 6 2 6 4 5 4 3 5 3 5 5 6 4 4 1 1 5 1 1 4 3
 1 5 4 6 6 1 3 3 1 6 4 5 1 3 4 3 5 3 4 6 4 5 1 4 3 4 5 3 1 6 5 3 2 1 6 4 3
 4 5 2 4 5 6 1 6 5 1 1 6 4 4 2 3 1 6 1 5 4 5 6 2 1 3 6 5 5 3 3 3 2 5 1 5 6
 6 1 1 4 2 1 2 2 2 3 6 6 5 4 3 1 2 4 1 5 5 5 5]
[0.18656716 0.10447761 0.14925373 0.17910448 0.20895522 0.17164179]
134
[2 6 2 3 6 1 5 1 4 4 6 6 2 5 5 4 6 2 6 4 5 4 3 5 3 5 5 6 4 4 1 1 5 1 1 4 3
 1 5 4 6 6 1 3 3 1 6 4 5 1 3 4 3 5 3 4 6 4 5 1 4 3 4 5 3 1 6 5 3 2 1 6 4 3
 4 5 2 4 5 6 1 6 5 1 1 6 4 4 2 3 1 6 1 5 4 5 6 2 1 3 6 5 5 3 3 3 2 5 1 5 6
 6 1 1 4 2 1 2 2 2 3 6 6 5 4 3 1 2 4 1 5 5 5 5]
[0.18656716 0.10447761 0.14925373 0.17910448 0.20895522 0.17164179]
None
[1 1]
[1. 0. 0. 0. 0. 0.]
2
[1