# Natural History Museums and Biodiversity Science Data Science Module Pre-lab

The goal of this data science module is to access and integrate diverse data sets to visualize correlations and discover patterns to address questions of species’ responses to environmental change. We will use programmatic tools to show how to use Berkeley resources such as the biodiversity data from biocollections and online databases, field stations, climate models, and other environmental data. If you have any questions getting the Jupyter notebook to run, try dropping into [data peer consulting](https://data.berkeley.edu/education/data-peer-consulting).

Before we begin analyzing and visualizing biodiversity data next class, this introductory notebook will help familiarize you with the basics of programming in Python.

## Table of Contents

Please complete sections 0 and 1 before coming to class.

1 - [Jupyter Notebooks](#jupyter)
    
2 - [Python Basics](#python)

# Part 1: Our Computing Environment, Jupyter notebooks  <a id='jupyter'></a>
This webpage is called a Jupyter notebook. A notebook is a place to write programs and view their results. 

### Text cells
In a notebook, each rectangle containing text or code is called a *cell*.

Text cells (like this one) can be edited by double-clicking on them. They're written in a simple format called [Markdown](http://daringfireball.net/projects/markdown/syntax) to add formatting and section headings.  You don't need to learn Markdown, but you might want to.

After you edit a text cell, click the "run cell" button at the top that looks like ▶| to confirm any changes. You can also hold `shift` and then press `enter` or `return`. (Try not to delete the instructions of the lab.)

### Code cells
Other cells contain code in the Python 3 language. Running a code cell will execute all of the code it contains.

To run the code in a code cell, first click on that cell to activate it.  It'll be highlighted with a little green or blue rectangle.  Next, either press ▶| or hold down the `shift` key and press `return` or `enter`.

Try running this cell:

In [None]:
print("Hello, World!")

And this one:

In [None]:
print("\N{WAVING HAND SIGN}, \N{EARTH GLOBE ASIA-AUSTRALIA}!")

In order to finish the setup for this notebook, run the following cell:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scripts.espm_module import *
import json
from IPython.core.display import display, HTML
plt.style.use('seaborn')
%matplotlib inline

# Part 2: Python basics <a id='python'></a>
Before getting into the more high level analyses we will do on the GBIF and Cal-Adapt data, we need to cover a few of the foundational elements of programming in Python.

#### A. Expressions
The departure point for all programming is the concept of the __expression__. An expression is a combination of variables, operators, and other Python elements that the language interprets and acts upon. Expressions act as a set of instructions to be fed through the interpreter, with the goal of generating specific outcomes. See below for some examples of basic expressions. Keep in mind that most of these just map to your understanding of mathematical expressions:

In [1]:
2 + 2

'me' + ' and I'

12 ** 2

6 + 4

10

You will notice that only the last line in a cell gets printed out. If you want to see the values of previous expressions, you need to call the `print` function on that expression. Functions use parenthesis around their parameters, just like in math!

In [None]:
print(2 + 2)

print('you' + ' and I')

print(12 ** 2)

print(6 + 4)

#### B. Variables
In the example below, `a` and `b` are Python objects known as __variables__. We are giving an object (in this case, an `integer` and a `float`, two Python data types) a name that we can store for later use. To use that value, we can simply type the name that we stored the value as. Variables are stored within the notebook's environment, meaning stored variable values carry over from cell to cell.

In [None]:
a = 4
b = 10/5

Notice that when you create a variable, unlike what you previously saw with the expressions, it does not print anything out.

We can continue to perform mathematical operations on these variables, which are now placeholders for what we've assigned:

In [None]:
print(a + b)

#### C. Lists
The following few cells will introduce the concept of __lists__.

A list is an ordered collection of objects. They allow us to store and access groups of variables and other objects for easy access and analysis. Check out this [documentation](https://www.tutorialspoint.com/python/python_lists.htm) for an in-depth look at the capabilities of lists.

To initialize a list, you use brackets. Putting objects separated by commas in between the brackets will add them to the list. 

We use `lst` as the variable name because `list` is a reserved keyword in Python (meaning it has special functionality and shouldn't be used as a variable name).

In [None]:
# an empty list
lst = []
print(lst)

# reassigning our empty list to a new list
lst = [1, 3, 6, 'lists', 'are' 'fun', 4]
print(lst)

To access a value in the list, put the index of the item you wish to access in brackets following the variable that stores the list. Lists in Python are zero-indexed, so the indicies for `lst` are 0, 1, 2, 3, 4, 5, and 6.

In [None]:
# Elements are selected like this:
example = lst[2]

# The above line selects the 3rd element of lst (list indices are 0-offset) and sets it to a variable named example.
print(example)

#### D. Dictionaries

Dictionaries are `key`-`value` pairs. Just like a word dictinary, you have a key that will index a specific definition.

In [None]:
my_dict = {'python': 'a large heavy-bodied nonvenomous constrictor snake occurring throughout the Old World tropics.'}

We can get a `value` back out by indexing the `key`:

In [None]:
my_dict['python']

But like real dictionaries, there can be more than one definition. You can keep a `list`, or even another dictionary within a specific `key`:

In [None]:
my_dict = {'python': ['a large heavy-bodied nonvenomous constrictor snake occurring throughout the Old World tropics.',
                      'a high-level general-purpose programming language.']}

We can index the `list` after the `key`:

In [None]:
my_dict['python'][0]

In [None]:
my_dict['python'][1]

**E: Functions**

In programming, we often reuse chunks of code. So instead of copy/pasting it and repeating the same code over and over again, we have something called a function, which gives a name to a block of code. This allows us to just call the function instead of rewriting code we used before.

For example, this is a function that squares an input.

In [None]:
# This code creates a function named square
def square(n):
    return n * n

In [None]:
# Let's find the square of 5
square(5)

In [None]:
# Let's try it with -3
square(-3)

Our use of functions later in the notebook you will see in class is more complex than this example. We will use them in order to reduce the amount of code in this notebook. For now, you can just ignore the details and structure of how functions work. Just remember that a **function** is a shortcut to easily re-run old code and that the `def` keyword means we are creating a function.

## You're done!

You have completed the pre-lab! Please fill out this [Google Form](https://forms.gle/h4fuE4k9Nv5nQ5wm6) to show that you completed this notebook.

---

Notebook developed by: Michelle Koo, Nina Pak, Natalie Graham, Monica Wilkinson, Andy Sheu, Harry Li

[Data Science Modules](http://data.berkeley.edu/education/modules)