# Jupyter and Python

* * * 

### Icons Used In This Notebook
🔔 **Question**: A quick question to help you understand what's going on.<br>
🥊 **Challenge**: Interactive exercise. We'll work through these in the workshop!<br>
⚠️ **Warning**: Heads-up about tricky stuff or common mistakes.<br>
💡 **Tip**: How to do something a bit more efficiently or effectively.<br>
🎬 **Demo**: Showing off something more advanced – so you know what Python can be used for!<br>

### Learning Objectives
1. [What it Means to "Know How to Code"](#how_to_code)
2. [Working with Jupyter notebooks](#jupyter)
3. [Variables in Python](#variables)
4. [Calculating with Variables](#calculate)
5. [Demo: Working with Data](#demo)

<a id='how_to_code'></a>
# What it Means to "Know How to Code"

Python is a general-purpose, powerful, and high-level programming language. It can be used for many tasks, including building websites and software, automating tasks, conducting data analyses, and more.

This workshop will take you through the fundamentals of Python, with a focus on data analysis. However, knowing **how to code** is a general, extendible skill. To "know" Python, R, Matlab, or any other language is not a matter of memorization, but about having a set of problem solving skills.

A programmer knows:

1) General structures and programming logic
2) How to find and use new functions
3) How to work through novel problems

It is these three aspects we want to give you an intuition for.

When you're programming, most of your time will be spent debugging, looking stuff up (like program-specific syntax, [documentation](https://github.com/dlab-berkeley/python-intensive/blob/master/glossary.md#documentation) for packages, useful functions, etc.), or testing. Relatively little time is actually spent typing out the code - most of it goes into the thinking, planning, and testing.

<a id='jupyter'></a>
# Working with Jupyter Notebooks

We use Jupyter Notebooks as our interface for coding. This format allows us to compile all aspects of a data project in one place, making it easier to show the entire process of a project to your intended audience! 

In Jupyter notebook documents, code and text are divided into cells which can each be run separately.

Jupyter notebooks consist of **Markdown cells** and **code cells**. You can change the type of a cell in the top of the screen where it says "Markdown" or "Code".

## Markdown cells

This cell is written in **Markdown**, a simple markup language. We use it to narrate the workshop and provide context. 

Markdown has its own syntax, which is fairly straighforward. Here's a [cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you want to know more. You can also double-click on any of the Markdown cells in the notebook to see how they are made.

**💡 Tip:** Double click this text cell to see the Markdown code rendering the output. Press **Shift + Enter** to go back to the formatted text.

### 💡 Tip: Command Mode & Edit Mode 

If you want to create new empty cells, you can use the `+` button at the top of the notebook. 

Enter **Edit Mode** by pressing **Enter** or using the mouse to click on a cell’s editor area. Edit Mode allows you to edit cells.<br>
Enter **Command Mode** by pressing **Esc** or using the mouse to click outside a cell's editor area. Command Mode allows you to use shortcut keys:

* **a**: Add cell above  
* **b**: Add cell below  
* **dd**: delete cell (also click Edit -> Undo Cell Operation if needed)
* **m**: Convert cell to Markdown  
* **y**: Convert cell to Code  

## Code cells

The cell below is a code cell. Press **Shift + Enter** on it to run the code and advance to the next cell.<br>

In [None]:
print('Welcome to D-Lab!')

`print()` is a **function**. A function is like a little program that performs an action on some value or data. You can often identify a function because of its trailing round parentheses `()`. This function just prints out whatever you put in between the parentheses.

🥊 **Challenge**: Write your own `print` statement in the code cell below. Follow the syntax of the example above, and change the text in the quotation marks.

In [None]:
# Write your own awesome print statement here!


🔔 **Question:** What does the below code cell do? What does the `print(a)` statement do?

In [None]:
a = 1 + 2
print(a)

You can enter the same line over and over again into the interpreter.

🥊 **Challenge**:  Try using **Control + Enter** to run the following cell three times. What is the output? Run the cell one more time. Is the output the same?

In [None]:
a = a + 1
print(a)

### 💡 Tip: Output in Jupyter

You don't always need a print statement. Jupyter will automatically output the result of the **last evaluation** in a cell to your screen. When you just want to check what value a certain variable holds, you can just type the variable name and run the cell. Let's try it:

In [None]:
a

## The Kernel

One of the tabs in the Jupyter menu is "Kernel". This kernel is the computational engine that executes the code contained in a Jupyter Notebook. Each time you run a code block, the kernel processes that block, executes the code, and keeps a record of what was run.
 
⚠️ **Warning:** Jupyter remembers all lines of code it executed, **even if it's not currently displayed in the notebook**. Deleting a line of code or changing it to Markdown does not delete it from the notebook's memory if it has already been run! This can cause a lot of confusion.

### Clearing the Kernel

To clear your session in a Jupyter notebook, use `Kernel -> Restart` in the menu. The kernel is basically the program actually running the code, so if you reset the kernel, it's as if you just opened up the notebook for the first time. All of the outputs are forgotten, and the variables are reset.

🔔 **Question:** First, run the cell below. What is the output?

In [None]:
mystring = 'And three shall be the count.'

print(mystring)

🔔 **Question:** Now use `Kernel -> Restart` in the menu! Then run the code below. What happens?

In [None]:
print(mystring)

Note that the error message tells you where the error happened (with an arrow, no less!). It is telling us that `mystring` is not defined, since we just reset the kernel.

💡 **Tip**: The number next to each code cell is a **counter**. It gives you a record of what order you ran your cells in. To reset the counter, you have to reset the kernel.

## Jupyter Autocomplete

Jupyter notebooks also allow for tab completion, just like many command line interpreters and text editors. If you begin typing the name of something (e.g., a variable, a file in the current directory, etc.) that already exists, you can simply hit **Tab** and Jupyter will autocomplete it for you. If there is more than one possibility, it will show them to you and you can choose from there. For example:

In [None]:
test_me = 1
test_me_2 = 2

🔔 **Question:** Try typing `te`, and see what happens when you hit `TAB`! What are you seeing?

In [None]:
# YOUR CODE HERE



## Exiting Jupyter

When you close your Jupyter notebook window, all of your values will be lost. But you can save your code for a later time.

First go to `File -> Close and Shutdown Notebook` in order to shut down the notebook you are using and close its window. Once all notebooks are shut down, you can shut down the entire Jupyter server by closing Anaconda Navigator. You may get a warning dialog box alerting you that Jupyter Notebook is still running. Just click **Quit** to shut everything down.

<a id='variables'></a>

# Variables in Python

You may have noticed our usage of the equals sign, `=`, in previous cells to create "placeholders" for certain values. We were creating **variables**.

*   Variables are containers for values that we want to refer to again later in the code.
*   In Python, the `=` symbol assigns the value on the right to the name on the left.
*   The variable is created when a value is assigned to it. When you call the variable, it will refer to whatever value it currently holds.

Here's Python code that assigns a year to a variable `year` and a month in quotation marks to a variable `month`.

In [None]:
year = 2020
month = 'July'

# Printing our new variables
print(year)
print(month)

💡 **Tip:** To see which variables we have assigned, use the magic commands `%who` and `%whos`. Magic commands are Jupyter-specific: [read about all of them here](https://ipython.readthedocs.io/en/stable/interactive/magics.html). There are a lot of really useful ones!

In [None]:
# This is a magic command
%whos

⚠️ **Warning**: If your kernel crashes (or if you reset it), you will **lose the variables** that have been set!

## Any Comments?

You might have seen a pound sign `#` at the beginning of the code celll above. This is a **comment**, meaning that line of your code won't run.

🔔 **Question:** Try running the cell below, then comment out the last two lines and run it again. What changes?


In [None]:
print(1 + 1)

This line should be commented
this_variable_too = 'a' * 'b'

## Indenting

Consistent indentation is essential in Python. Python pays close attention to blank spaces, and uses this to understand how you're structuring code. So, you're only supposed to add spaces or indents in places where Python expects you to - otherwise, you'll run into an error.

💡 To move multiple lines of code at once, you can select them and then hit `Control + ]` to indent them (move to the right), or `Control + [` to dedent them (move to the left).

💡 **Tip**: If you are on a Mac, use `Command` instead of `Control`.

🔔 **Question:** Read the error message down below. What is the error type? How can we fix it?

In [None]:
move_me = 1
    move_me_too = 'abc'

## Naming Variables

Variable names ***must*** follow a few rules:

* They cannot start with a digit.
* They cannot contain spaces, quotation marks, or other punctuation.
* They *may* contain an underscore (typically used to separate words in long variable names).
    
Ignoring these rules will result in an error in Python. 

### 💡 Tip: Guidelines for Variable Names

*   Python is case-sensitive (`A_variable` and `a_variable` are two separate variables).
*   Use meaningful variable names (e.g. `year` is more informative than `a_variable`). Ideally, you should be able to tell what is going on in the code and variables without having to run it.
*   There are different styles of writing variables, like **snake case** (`start_year`) and **camel case** (`startYear`). You're free to choose, but be consistent  (e.g avoid `startYear` and `stop_year`). 
*   Don't use variable names that refer to existing variables and functions in Python (e.g., `print`, `sum`, `str`).

## 🥊 Challenge: Debugging Variable Names

The following two blocks of code include variable names that cause an error. For each block of code, consider the following questions:
1. Which **rule** is being broken? Can you find this information in the error message?
2. What **guidelines** aren't being followed? 
3. How would you change the code?

In [None]:
1a = 'Washington'
b-2 = 'Olympia'

print('The capital of', 1a, 'is', b-2)

In [None]:
A_variable = 2012
anotherVariable = 42
print(a_variable - anotherVariable)

## Debugging

You've seen two types of errors by now: `SyntaxError` (you're writing something wrong) and `NameError` (the variable, function, or module you're calling doesn't exist). There are many other errors which we will go over later. For now, just remember: <span style="color:red">**error messages are your friend!**</span>

When you want to try and debug an error, think of the following:

1. **Read the errors!** They often point to what's wrong, and in which line the error is found. 
2. **Check your syntax.** You might just be spelling something wrong.
3. **Read the documentation.** You might just be using a function in a wrong way. Get into the habit of reading documentation. It looks daunting at first but will get easier over time. 
4. **Make it smaller.** If you're dealing with large chunks of code, it can help to split it up. This will allow you to hone in on where the error occurs.
5. **Print statements.** Printing out the state of your code can be very helpful to spot bugs that don't produce an error message, but cause incorrect output. This is sometimes called a *confidence check*.

<a id='calculate'></a>
# Calculating with Variables

*   The key feature of variables is that we can use them just as if they were values.
*   **Operators** (special symbols that perform calculations) are shown in purple in a Jupyter Notebook. These are special symbols that tell Python to perform certain operations.
* **Functions** are processes that perform multiple operations on variables. We will cover these in a later notebook. 

Let's check out some common operations below. 

🔔 **Question:** What outputs do you expect for each line below? Note what values get substituted in for the variables in each operation. 

In [None]:
apples = 15
students = 5

# Addition
apples = apples + 3
print('Addition', apples)

# Subtraction
apples = apples - 3
print('Subtraction:', apples)

# Multiplication
print('Multiplication:', apples * students)

# Division
print('Division:', apples / students)

# Exponentiation
print('Exponentiation:', students**2)

## 🥊 Challenge: Swapping Values

Let's say we have two variables and we want to swap the values for each of them. 

Does the following method accomplish the goal?

💡 **Tip**: What is the value of first and last at the end of the cell?

In [None]:
first = 'a'
last = 'z'

first = last
last = first

Using a third temporary variable (you could call it `temp`), swap the first and last variables, so that `first = 'z'` and `last = 'a'`.

In [None]:
first = 'a'
last = 'z'

# YOUR CODE HERE


This is a common technique that is used for swapping variables around. However, often we might choose to just use new variables, rather than overwrite the ones here. Can you think of a reason why we might avoid overwriting a variable? How about a reason why we *would* overwrite a variable?

<a id='demo'></a>

# 🎬 Demo: Working with Data

To cap off this session, here's a demo to see what reproducible data science with Python looks like.
Just run the code cell below, and don't worry if you don't understand everything!

* We'll be using a `pandas` data frame to store and manipulate the data - you'll learn more about `pandas` in the next workshop!
* Our data comes from the California Health Interview Survey (CHIS), the nation's largest state health survey. The data has been altered for demonstration purposes.

Let's have a look at the data:

In [None]:
import pandas as pd

# Reading in a comma-seperated values file
chis_df = pd.read_csv('../data/chis_extract.csv')
chis_df.head()

Looks like we have a bunch of information here. Let's focus on the column for the number of fruit people eat per day (the "fruit_perweek" column), and whether people rent or own a house (the "tenure" column).

In the next steps, we'll...

1. Check out if there's a correlation between variables
2. Plot that correlation

In [None]:
# Getting the correlation
chis_df['fruit_perweek'].corr(chis_df['veg_perweek'])

In [None]:
# Plotting
chis_df.plot(kind='scatter', x='fruit_perweek', y='veg_perweek');

Looks like people who own a house drink fewer sodas on average. There might be a confounding variable that could explain this...