# SW 282: Lab 1 - Introduction to Python and Jupyter

---

### Proessor Erin Kerrison

In this lab, we introduce Python and the Jupyter Notebook environment.

---

## Welcome to Jupyter  <a id='section 0'></a>

Welcome to the Jupyter Notebook! **Notebooks** are documents that can contain text, code, visualizations, and more. We'll be using them in this lab to manipulate and visualize our data.

A notebook is composed of rectangular sections called **cells**. There are 2 kinds of cells: markdown and code. A **markdown cell**, such as this one, contains text. A **code cell** contains code in Python, a programming language that we will be using for the remainder of this module. You can select any cell by clicking it once. After a cell is selected, you can navigate the notebook using the up and down arrow keys.

To run a code cell once it's been selected, 
- press Shift-Enter, or
- click the Run button in the toolbar at the top of the screen. 

If a code cell is running, you will see an asterisk (\*) appear in the square brackets to the left of the cell. Once the cell has finished running, a number will replace the asterisk and any output from the code will appear under the cell.

In [None]:
# run this cell
print("Hello World!")

You'll notice that many code cells contain lines of blue text that start with a `#`. These are *comments*. Comments often contain helpful information about what the code does or what you are supposed to do in the cell. The leading `#` tells the computer to ignore them.

In [None]:
# this is a comment- running the cell will do nothing!

Code cells can be edited any time after they are highlighted. Try editing the next code cell to print your name.

In [None]:
# edit the code to print your name
print("Hello: my name is NAME")

#### Saving and Loading

Your notebook can record all of your text and code edits, as well as any graphs you generate or calculations you make. You can save the notebook in its current state by clicking Control-S, clicking the floppy disc icon in the toolbar at the top of the page, or by going to the File menu and selecting "Save and Checkpoint".

The next time you open the notebook, it will look the same as when you last saved it.

**Note:** after loading a notebook you will see all the outputs (graphs, computations, etc) from your last session, but you won't be able to use any variables you assigned or functions you defined. You can get the functions and variables back by re-running the cells where they were defined- the easiest way is to highlight the cell where you left off work, then go to the Cell menu at the top of the screen and click "Run all above". You can also use this menu to run all cells in the notebook by clicking "Run all".

#### Completing the Notebooks


<div class="alert alert-info"> 

**QUESTION** cells are in blue and ask you to enter in lab data, make graphs, or do other lab tasks. To receive full credit for your lab, you must complete all **QUESTION** cells.


</div>



### 1. Python <a id='section 1'></a>

**Python** is  programming language- a way for us to communicate with the computer and give it instructions. Just like any language, Python has a *vocabulary* made up of words it can understand, and a *syntax* giving the rules for how to structure communication.

Python doesn't have a large vocabulary or syntax, but it can be used for many, many computational tasks.

Bits of communication in Python are called **expressions**- they tell the computer what to do with the data we give it.

Here's an example of an expression. 

In [None]:
# an expression
14 + 20

When you run the cell, the computer evaluates the expression and prints the result. Note that only the last line in a code cell will be printed, unless you explicitly tell the computer you want to print the result.

In [None]:
# more expressions. what gets printed and what doesn't?
100 / 10

print(4.3 + 10.98)

33 - 9 * (40000 + 1)

884

Many basic arithmetic operations are built in to Python, like `*` (multiplication), `+` (addition), `-` (subtraction), and `/` (division). There are many others, which you can find information about [here](http://www.inferentialthinking.com/chapters/03/1/expressions.html). 

The computer evaluates arithmetic according to the PEMDAS order of operations (just like you probably learned in middle school): anything in parentheses is done first, followed by exponents, then multiplication and division, and finally addition and subtraction.

In [None]:
# before you run this cell, can you say what it should print?
4 - 2 * (1 + 6 / 3)

#### A Note on Errors <a id="subsection error"></a>

Python is a language, and like natural human languages, it has rules.  It differs from natural language in two important ways:
1. The rules are *simple*.  You can learn most of them in a few weeks and gain reasonable proficiency with the language in a semester.
2. The rules are *rigid*.  If you're proficient in a natural language, you can understand a non-proficient speaker, glossing over small mistakes.  A computer running Python code is not smart enough to do that.

Whenever you write code, you'll make mistakes.  When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong.

Errors are normal; experienced programmers see errors all the time.  When you make an error, you just have to find the source of the problem, fix it, and move on.

We have made an error in the next cell.  Delete the `#`, then run it and see what happens.


In [None]:
# print("This line is missing something."

You should see something like this (minus our annotations):

<img src="images/error.jpg"/>

The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure, and this `SyntaxError` tells you that you have created an illegal structure.  "`EOF`" means "end of file," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you see a cryptic message like this, you can often get by without deciphering it.  (Of course, if you're frustrated, you can usually find out by searching for the error message online or asking course staff for help).

### 1a. Entering and Naming your data <a id='section 1a'></a>
Sometimes, the values you work with can get cumbersome- maybe the expression that gives the value is very complicated, or maybe the value itself is long. In these cases it's useful to give the value a **name**.

We can name values using what's called an *assignment* statement.

In [None]:
# assigns 442 to x
x = 442

The assignment statement has three parts. On the left is the *name* (`x`). On the right is the *value* (442). The *equals sign* in the middle tells the computer to assign the value to the name.

You'll notice that when you run the cell with the assignment, it doesn't print anything. But, if we try to access `x` again in the future, it will have the value we assigned it.

In [None]:
# show the value of x
x

You can also assign names to expressions. The computer will compute the expression and assign the name to the result of the computation.

In [None]:
y = 50 * 2 + 1
y

We can then use these names as if they were whatever they stand for (in this case, numbers).

In [None]:
x - 42

In [None]:
x + y

#### Lists
In Python, you can also make lists of numbers. A Python **list** is enclosed in square brackets. Items inside the list are separated by commas.

In [None]:
# a list
[7.0, 6.24, 9.98, 4]

Lists can have names too, which is handy for when you want to want to save a set of items without writing them out over and over again.

In [None]:
my_list = [4, 8, 15, 16, 23, 42]
my_list

### 1b. Basic calculations <a id='section 1b'></a>
Once you have your data in a list, Python has a variety of functions that can be used to perform calculations and draw conclusions.

#### Built-in functions
The most basic functions are built into Python. This means that Python already knows how to perform these functions without you needing to define them or import a library of functions. The `print()` function you saw earlier is an example of a built-in function. A full list of all built-in Python functions can be found [here](https://docs.python.org/3/library/functions.html).

Below are a few examples of functions you may find useful during this class

In [None]:
# what do you think this function calculates?
min(my_list)

In [None]:
# what about this one?
max(my_list)

In this example, we passed a single list to each of the functions. However, you can also pass multiple numbers separated by commas, or even multiple lists! You can try it out below and see if you can figure out how Python is choosing which list is greater than the other.

In [None]:
max([1, 2, 3], [3, 2, 0])

Some functions have _optional_ arguments. For instance, the most basic usage of the `round()` function takes a single argument.

In [None]:
round(3.14159)

You can also specify a second argument, which specifies how many decimal places you would like the output to have. If you don't include this argument, Python uses the _default_, which is zero.

In [None]:
round(3.14159, 2)

### 2. Libraries

#### `numpy`

For more complex calculations, you will need to either define functions or import functions that someone else has written. For numerical calculations, `numpy` is a popular library containing a wide variety of functions. If you are curious about all of the functions in the library, the `numpy` documentation can be found [here](https://docs.scipy.org/doc/numpy/reference/).

In order to use these functions, you have to first run an import statement. Import statements for all required libraries are typically run at the beginning of a notebook.

In [None]:
# This gives numpy an abbbreviation so that when we refer to it later we don't need to write the whole name out.
# We could abbreviate it however we want, but np is the conventional abbreviation for numpy.
import numpy as np

Now you can use all the functions in the `numpy` library. When using these functions, you must prefix them with `np.` so that Python knows to look in the `numpy` library for the function.

In [None]:
np.mean(my_list)

#### `datascience`

In this module, you will use the `datascience`library, which was developed for Data 8 here at UC Berkeley. This library allows you to manipulate data without thinking about a lot of the concepts that make other packages like `pandas` (the industry standard for working with rectangular data) harder to work with. To use the functions in the `datascience` libary, we import them as before but a bit differently. The syntax below imports the functions without requiring that we prepend our calls with an abbrevation as with our `numpy` mport above.

In [None]:
from datascience import *

### 3. Functions

So far, you've learnt how to carry out basic operations on your inputs and assign variables to certain values. Now, let's try to be more efficient. 

Let's say we want to perform a certain operation on many different inputs that will produce distinct outputs. What do we do? We write a _**function**_. A function is a block of code which works a lot like a machine: it takes an input, does something to it, and produces an output. The input is put between brackets and can also be called the _argument_ or _parameter_. Functions can have multiple arguments.

Try running the cell below after changing the value assigned to the variable `name`:

In [None]:
# Edit this cell to your own name!
name = "John Doe"

# Our function
def hello(name):
    return "Hello " + name + "!"

hello(name)

Interesting, right? Now, you don't need to write 10 different lines with 10 different names to print a special greeting for each person. All you need to is write one function that does all the work for you!

Functions are very useful in programming because they help you write shorter and more modular code. A good example to think of is the _print_ function, which we've used quite a lot in this module. It takes many different inputs and performs the specified task, printing its input, in a simple manner.

Now, let's write our own function. Let's look at the following rules: 

#### Defining
- All functions must start with the `def` keyword.  
- All functions must have a name, followed by parentheses, followed by a colon, e.g. `def hello():`
- The brackets may have a variable that stores its arguments (inputs)
- All functions must have a "return" statement which will return the output. Think of a function like a machine. When you put something inside, you want it to return something. Hence, this is very important.

#### Calling
After you define a function, it's time to use it. This is known as _calling_ a function. 

To call a function, simply write the name of the function with your input variable in brackets (argument).


In [None]:
# Complete this function
def my_first_function(argument):
    return "something" # function must return a value

# Calling our function below...
my_first_function(name)

Great! Now let's do some math. Let's write a function that returns the square of the input.

<div class="alert alert-info">
    
**QUESTION:** Replace the `...` below to write a function below called `square` which returns the square of its input.
    
</div>

In [None]:
# square function 
def square(x):
    return ...

square(5)

Neat stuff! Try different inputs and check if you get the correct answer each time.

You've successfully written your first function! Let's take this up one notch.

#### The Power Function

`pow` is a function that takes in two numbers: `x`, which is the "base" and `y`, the "power". So when you write pow(3,2) the function returns 3 raised to the power 2, which is $3^2 = 9$. 

<div class="alert alert-info">

**QUESTION:** Write a function called `mulpowply` which takes in three inputs `x`, `y`, `z` and returns the value of `x` multiplied by `y` to power `z`. Symbolically, it should return $(xy)^z$.

</div>

In [None]:
# mulpowply function
...

---

## Submission

Congrats on finishing your first lab notebook! To turn in this lab assignment, go to File > Download as > PDF via Chrome and upload the PDF to bCourses.

---
### References

- Sections of "Intro to Jupyter", adapted from materials by Kelly Chen and Ashley Chien in [UC Berkeley Data Science Modules core resources](http://github.com/ds-modules/core-resources)
- "A Note on Errors" subsection and "error" image adapted from materials by Chris Hench and Mariah Rogers for the Medieval Studies 250: Text Analysis for Graduate Medievalists [data science module](https://github.com/ds-modules/MEDST-250).
- "Intro to Jupyter" and "Intro to Python" adapted from materials by Keeley Takimoto for the Berkeley Executive Education [Program on Data Science and Analytics Module](https://github.com/ds-modules/BEE)
---
Notebook developed by: Chris Pyles

Data Science Modules: http://data.berkeley.edu/education/modules