## 201A Lecture 10/3: Introduction to Jupyter Notebooks and Python

### 1. An Introduction to the Environment

The goal for lecture today is to play with a Jupyter Notebook and introduce elements that will help you get familiar with Python as an interactive computational environment for exploring data. The material is presented in an interactive environment that runs within your web browser. It allows presentation of text and graphics to be combined with Python code that can be run interactively. We are looking at a Jupyter notebook now. Note that Jupyter is a relatively recent name for this so sometimes you may still see it referred to as an IPython noteboook. Jupyter is just the new version of IPython notebooks, but now also supports a variety of other languages and tools.

Let's start by getting familiar with the Jupyter Notebook and how it works.

### 1.1 Launching a Jupyter Notebook on DataHub

In this course we will be using hosted computing facilities to make it easy to start learning to code in Python and to leverage university computing infrastructure. In class we will use datahub, and you can use a Calnet login to connect it at http://datahub.berkeley.edu.

### 1.2 Using Jupyter Notebooks

Once you launch the Jupyter Notebook, and the Notebook opens up in your browser, you will be looking at a directory of the folder you were in when you launched the notebook. you can either load an existing notebook if you see one in the directory, or create a new one. 

Your notebook can record all of your text and code edits, as well as any graphs you generate or calculations you make. You can save the notebook in its current state by clicking Control-S, clicking the floppy disc icon in the toolbar at the top of the page, or by going to the File menu and selecting "Save and Checkpoint".

The next time you open the notebook, it will look the same as when you last saved it.

The Jupyter Notebook is made up of cells. We run a cell by hitting Shift-Enter.

In [None]:
#run this cell
print("I'm excited we're starting to learn Python")

### 1.3 Markdown Cells

Some cells are known as ***markdown cells***, which contain html text that you can use to share your analysis with others, as well as keep notes on what the different lines of code do. (Like this cell!) You can edit the contents of a cell by double-clicking on it.

Try it on this cell. When you are ready to save a markdown cell, just use Shift-Enter.

You can use Markdown syntax to format your text. Documentation on the markdown syntax is available here: https://www.markdownguide.org/basic-syntax/ 

### Hashtags  are for headings
**Two stars** makes text bold and ***three stars*** make the text bold and italic.  Take a minute and create a new markdown cell with some formatted text.  Write yourself an affirming note that you can do this!

### 1.4 Code Cells

In addition to markdown cells, Jupyter has ***code cells***, which contain programming code.  It's in these cells that the magic happens!  Try it - put your curser in the cell below, and hit "Ctrl-Enter" to run the code cell.


In [None]:
import math
math.sqrt(144)

That was cool!!!  Try changing the 144 to a different number, and run the cell again.

You can also add comments in your code cells using the hastag.

    # Change the next line so it takes the square root of 10
    
This is called a *comment*. It doesn't make anything happen in Python; Python ignores anything on a line after a `#`.  Instead, it's there to communicate something about the code to you, the human reader. Comments are extremely useful. 

### 1.5 Learning the Python Language

Python is a language, and like natural human languages, it takes time to learn.  There is vocabulary, or "syntax," as well as rules for how that syntax is presented.  Both of these just take practice, practice, practice.  We will teach you the rules, but you need to practice on your own. 

However, programming languages differ from natural language in one important way:

> The rules are rigid. If you're proficient in a natural language, you can understand a non-proficient speaker, glossing over small mistakes. A computer running Python code is not smart enough to do that.

Whenever you write code, you'll make mistakes. Errors are okay; even experienced programmers make many errors. When you make an error, you just have to find the source of the problem, fix it, and move on. Fixing it can take a long time.  A long long time.

In [None]:
print("This line is missing something."

The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure, and this `SyntaxError` tells you that you have created an illegal structure.  "`EOF`" means "end of file," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you can't figure it out intuitively, Google is your best friend!

Try to fix the code above so that you can run the cell and see the intended message instead of an error.

## 2. Python's Data Types

### 2.1 Numerals

One of the data types commonly used in Python is numerals, also known as numbers.  With numerals, Python acts just like a calculator.

In [None]:
3**2

**Enter a code cell and try some simple math operations below.**  Many basic arithmetic operations are built in to Python, like `*` (multiplication), `+` (addition), `-` (subtraction), and `/` (division). There are many others, which you can find information about [here](http://www.inferentialthinking.com/chapters/03/1/expressions.html). 

The computer evaluates arithmetic according to the PEMDAS order of operations (just like you probably learned in middle school): anything in parentheses is done first, followed by exponents, then multiplication and division, and finally addition and subtraction.

Python distinguishes between integers and floats.  Floats use decimal points, whereas integers are always whole numbers.  Run the two cells below to see the difference.

In [None]:
float(1776)

In [None]:
int(1776)

### 2.2 Variables

Variables store data values in Python (and in many other programming languages). Variables can include both string data (e.g., FIPS="06001240650") or numeric data (e.g., pct_BA_higher=BA_Higher/Total_pop). Strings in Python are always stored in between double or single quotation marks.

In [None]:
fips="06001240650"

In [None]:
fips

In [None]:
pct_BA_higher=1345/23984

In [None]:
pct_BA_higher

## 3. Conditional Statements

Conditional statements are also known as boolean statements.  With conditional statements, we can let the computer know when (aka under which condition) we want a specific operation to be executed.  We are going to use these a lot.

**A == B: True if A equals B**

**A != B: True if A is not equal to B**

**A > B: True if A is greater than B. Same syntax for "<"**

**A >= B: True if A is greater than or equal to B. Same syntax for "<="**

**Note:** "=" and "==" are not the same operations in Python. A single equals sign (=) assignes the value to the right of it to the variable name to the left of it. While the double equals sign (==) compares if the value on the right is equal to the value to the left of it. 

In the cell below guess what the output will be, before you run it. Feel free to change the numeral assigned to the variable "year".

In [None]:
year = 1782

# Who's the president

if year>=1789 and year<1797:
    print("George Washington")
elif year>=1797:
    print("John Adams")
else:
    print("King George")

## 4. Playful Example of the Power of Programming

In lab today, we're going to work with Python and ACS data.  But let's quickly do a fun exercise together to see how quickly we'll be able to do super cool things.

First, we need to download some "libraries" - more on that in lab today.  Just run the code below.

In [None]:
#This imports various libraries that Python will 
#call upon to do the fun analysis below
#We're going to learn more about libraries as we go

from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)

from urllib.request import urlopen 
import re
def read_url(url): 
    return re.sub('\\s+', ' ', urlopen(url).read().decode())

In [None]:
from IPython.display import Audio, Image, YouTubeVideo

In [None]:
id='AST2-4db4ic'
YouTubeVideo(id=id, width=900, height=400)

In [None]:
#This code tells Python to go to a website.
#The second line assigns what python "reads" at the wesite to a variable called little_women_text
#The third line creates a new variable, little_women_chapters, which says to split 
#the text every time there's a new chapter heading.
little_women_url = 'https://www.inferentialthinking.com/data/little_women.txt'
little_women_text = read_url(little_women_url)
little_women_chapters = little_women_text.split('CHAPTER ')[1:]

In [None]:
# Now, we can create a table with the chapter titles, and how the chapter starts

Table().with_column('Chapters', little_women_chapters)

In [None]:
# Which Little Woman gets the most acting time?
# Let's assign the main characters to a variable "people"
# Now, in one line, you can tell Python to count how many times each of those names appears

people = ['Amy', 'Beth', 'Jo', 'Laurie', 'Meg']
people_counts = {pp: np.char.count(little_women_chapters, pp) for pp in people}

counts = Table().with_columns(
        'Amy', people_counts['Amy'],
        'Beth', people_counts['Beth'],
        'Jo', people_counts['Jo'],
        'Laurie', people_counts['Laurie'],
        'Meg', people_counts['Meg']
    )

#The lines above create a new dataset (table) with the counts by chapter, let's print the table
counts

In [None]:
# Plot the cumulative counts

cum_counts = counts.cumsum().with_column('Chapter', np.arange(1, 48, 1))
cum_counts.plot(column_for_xticks=5)
plots.title('Cumulative Number of Times Name Appears');

## Fun, right?  And super powerful when we put it to use for planning questions.

**Note:** Save your work.  After loading a notebook you will see all the outputs (graphs, computations, etc) from your last session, but you won't be able to use any variables you assigned or functions you defined. You can get the functions and variables back by re-running the cells where they were defined- the easiest way is to highlight the cell where you left off work, then go to the Cell menu at the top of the screen and click "Run all above". You can also use this menu to run all cells in the notebook by clicking "Run all".