# Example Analysis with Zooniverse Snapshot Serengeti Data

<img src="http://publicradio1.wpengine.netdna-cdn.com/daily-circuit/files/2013/08/SLP-cheetah-with-cubs.jpg">

### Step 1: Lions and tigers and impala, oh my!

What different animals co-exist in the Serengeti Desert? Let's find out.

<a href="data/Serengeti_data.txt">Serengeti_data.txt</a> is the data file we'll work with (it's in your data folder). The file has the classification results for the Snapshot Serengeti animals (i.e., for a given date, what lions, tigers, zebra, wildabeast, etc. are there, where are they, and what are they doing). 

# Make a data table

The best way to visualize all the data that has been collected is using a data table. The code below will help us create a data table.

Notice how we use the command 'ascii.read' to read in this Serengeti_data.txt data table. This is different from what we did in  <a href="../Workshop1/Part_2.ipynb">Part 2</a>, where you used the command 'fits.open' to open the MWbubbles.fits data table. You use a different command depending on what type of data file you're working with.

In [None]:
#Import needed astropy library
from astropy.table import Table,Column
from astropy.io import ascii

#Print out the data table
ascii.read('data/Serengeti_data.txt')

#Run this cell (shortcut=shift+enter) to read in your data table.

For a description of what each column is referring to, see <a href="data/SnapshotSerengetiDataFields.pdf">here</a>. 

### Great, we've printed out the data table to the screen so we can see what's in it. 

### Now we are going to store the data table as the word "Animals". That will let us work with the data table by simply typing "Animals".

In [None]:
#Make Animals equal to ascii.read('data/Serengeti_data.txt') by filling in the blank
Animals =

#Run this cell

### Identify all Species Seen at Least Once in all the Snapshot Serengeti Images

We are going to create a function that allows us to go through all of the data and create a list that contains all of the species' names. In Python we can use a "for loop" to do this. It allows us to go through large data sets and repeatedly do a function.

In [None]:
def getUniqueSpecies(data_type): #creates a function with the input 'data_type'
    
    #create an empty list to store the names of the species
    -- = []   # fill in the blank by calling this empty list unique
    
    #loop through each animal in the list
    for -- in --: #fill in the blanks with animal and data_type
        if animal not in unique: #if the animal has not already been added to the list
            unique.append(animal) #add the animal to the list using the .append() function
    return sorted(unique) #returns an alphabetical version of the list using the sorted() function

Now we are going to use the uniqueSpecies function we just created to identify all the unique species in our "Animals" data table.

In [None]:
uniqueSpecies = getUniqueSpecies(Animals['species']) #store the entire function in uniqueSpecies for use later

print uniqueSpecies #print the list

#Run this cell

Here's a <a href="data/SnapshotSerengetiCommonSpecies.pdf">page with images</a> of the different species found in the Serengeti Desert.

### Make a pi chart showing how common (or uncommon) each species is.

###### First determine the number of times an example of each species was seen. <br>Then populate a dictionary that assigns a key to the number of times the animal was seen. 

A dictionary allows us to access values by calling on the assigned key. Examples of what dictionaries look like are:
* my_dict = {key1: value1, key2: value2, key3: value3} 
* usernames_passwords = {jdoe: password, jsmith: 1234} 

Using dictionaries helps us better organize large sets of data. You can access the "value" by using my_dict[key].
* For example, my_dict[key1] = value1. 

We are going to create a dictionary that assigns the number of times a species was seen to the name of the species. Our dictionary will look something like this: 
* totNum = {aardvark: 218, aardwolf: 73, baboon: 797}

In [None]:
import numpy as np

#Here you should call the empty dictionary totNum
-- = {}  #fill in the blank with totNum

#Now we loop through each species and determine how many times that species was seen in Snapshot Serengeti
#Make a for loop that goes through every "animal" in "uniqueSpecies"
for -- in --:  #fill in blanks with animal and uniqueSpecies
    totNum[animal] = len(np.where(Animals['species'] == animal)[0]) 
    #fills the dictionary we made earlier by assigning every species name to the number of times it was seen 
    
#Print to screen how many images each species has been seen in
for animal in uniqueSpecies:
    print animal, 'was seen in', totNum[animal], 'images'
    
print #print a blank line
print 'this is the whole dictionary: ', totNum #print the dictionary

#Run the cell

Above you should now see a printed list with each species and how many images it was seen in. Below that list, you should be able to see that we have a dictionary where every animal is the key in the dictionary, and the number of times it was seen is the value. We will use this dictionary to create a pie chart. 

## *Important*: To access the keys in our dictionary, you must use totNum.keys(), and to access the values, you must use totNum.values().

In [None]:
#print out the keys and the values
print totNum.keys()    #the names of the animals  
print totNum.values()  #how many images the animals were seen in

#Run the cell

## Now we can make a pie chart by using our dictionary

In [None]:
#http://matplotlib.org/examples/pie_and_polar_charts/pie_demo_features.html

#Have plot appear within the notebook
%matplotlib inline
#import the needed matplotlib libraries
import matplotlib.pyplot as plt
from matplotlib import cm

#Set the figure size to larger
#Try changing the numbers to see what will happen!
plt.figure(figsize = (12,12))

#Create the pie chart that represents the number of species
#Fill in the blank with the values from our totNum dictionary (see above for help)
plt.pie(--, colors = cm.Set1(np.arange(len(uniqueSpecies))/float(len(uniqueSpecies))))

#Create a legend for the chart that contains the names of the species
#Fill in the blank with the keys from our totNum dictionary (see above for help)
plt.legend(--, loc = 9, bbox_to_anchor=(1.25, 1))

# Set aspect ratio to be equal so that pie is drawn as a circle.
plt.axis('equal')

#Run the cell

### If you want to make a pie chart with the default settings, use the following code:

In [None]:
#http://matplotlib.org/examples/pie_and_polar_charts/pie_demo_features.html

#Set the figure size to larger
plt.figure(figsize = (9,9))

#Create the pie chart
plt.pie(totNum.values() , labels = totNum.keys())

# Set aspect ratio to be equal so that pie is drawn as a circle.
plt.axis('equal')

#Run the cell

### Step 2: Where are all the animals?

Let's visualize where the animals live in the serengeti. We can do this by using the longitude and latitude values from the Snapshot Serengeti data table and plotting them on an x-y axis. To make the plot, reuse the matplotlib.pyplot commands you used in <a href="../Workshop1/Part_2.ipynb">Part 2</a>.

In [None]:
#Identify the Longitude and Latitude arrays in our data table the
#lon corresponds to the x values on your plot
#lat corresponds to the y values on your plot
lon = Animals['longitude']
lat = Animals['latitude']

Create your latitude versus longitude plot using commands you learned in <a href="../Workshop1/Part_2.ipynb">Part 2</a>.

In [None]:
#Import the needed libraries
%matplotlib inline
import matplotlib.pyplot as plt

# Fill in the blank lines below to create your labeled plot
---
---
---
---

#Run the cell

Your plot should map out something like the image below, which also shows where the rivers cut through the Serengeti. The red oval is just pointing out a particular river system.
<img src="https://snapshotserengeti.files.wordpress.com/2013/05/slide1.jpg" width="500">

### Now let's just plot the cheetah.

The "where" function allows you to pick out the locations in an array (the index values) that have a specific value you're interested in. 

In [None]:
#import the needed numpy library
import numpy as np

x = np.array([1, 0, 2, 0, 1, 0, 5, 6, 0]) #create an array
print np.where(x == 0)[0] #use the 'where' function to pick out the index values where x equals zero

#Run the cell

The new array shows where the number zero was located in the original array.<br>
Remember computer programmers start counting at zero! 

Now, create an array with the indices for just the cheetah.

In [None]:
#import the needed numpy library
import numpy as np

#Use the 'where' function to identify all the cheetah in your data table.
#The 'where' function gives the indices of values that satisfy a given condition
#In this case, we get the indices for all cheetah in our data table.
#Fill in the blank with 'cheetah'
cheetah_ind = np.where(Animals['species'] == '-----')[0]

#Run the cell

### Plot the location of the cheetah.

In [None]:
#Create your plot using the same plotting commands you used above
#Make your x values lon[cheetah_ind]
#Make your y values lat[cheetah_ind]


#Remember to label your plot
#Run the cell

### Step 3: Now let's compare where impala tend to live with where cheetahs live. 

<img src="https://snapshotserengeti.files.wordpress.com/2015/02/tommie.jpg?w=479&h=336">

First, what do you think you'll see? Will cheetahs and impalas live near each other? Can you think of why impala might choose to live in different parts of the Serengeti than cheetahs?

### To do: Pick out the impala, plot their location, and compare their location with cheetahs'.


To do this, reuse commands used above for the cheetah and add another 'plt.plot(x,y)' command. Check out <a href="../examples/overplotExample.ipynb">this page</a> for hints on overplotting.

In [None]:
#Identify index positions of impala in the original Animals directory
#Fill in the blank with 'impala'
impala_ind = np.where(Animals['species'] == --)[0]

#Plot the impala 
#Remember to use the impala indices to find the longitude and latitudes of only the impala
plt.plot()

#Overplot the cheetah, using a smaller marker size
plt.plot()

#label your plot
plt.legend()
plt.title()
plt.xlabel()
plt.ylabel()

# Congratulations! You've finished Part 3!

## Extension Activity: 

 Extension \#1: The final column in the data table is 'vegetation'. Do you expect hyenas to prefer open grassland while giraffes might prefer treed grassland? Find out what type of vegetation each species seems to prefer. 

Extension \#2: Look for other trends in the data. Do your results make sense?