# First things first

Jupyter notebooks consist of a mix of *code cells* and *markdown cells*. Markdown cells (like this one) help document/explain the purpose and contents of the notebook. Code cells (like the cell below this one) let you write and run Python code. To run a code cell, activate the cell (by clicking in or next to it). You will see a thick bar to the left of the active cell. Then click the "play" button in the toolbar above, or press Shift-Enter (or Ctrl-Enter if you don't want to auto-advance to the next cell). Lots more helpful info about Jupyter notebooks [here](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/examples_index.html).

It is programming tradition to start with a ["Hello world!" program](https://en.wikipedia.org/wiki/%22Hello,_World!%22_program). Activate the code cell below and run it!

In [None]:
print('Hello world!')

Congratulations, you've written your first program. 

This program consists of a single **command**, `print('Hello world!')`. A command tells the computer what to do. The `print` command causes Python to display something 

In [None]:
# These lines are comments. The Python interpreter ignores everything in a line after the # character.
# Comments are just notes for humans to read to help understand the code.
# Best practice: add a comment for every couple lines of code to explain what's going on and why.
# You'd be amazed at how quickly you forget your code's logic

# Basic math

In [None]:
# addition
5 + 2

In [None]:
# subtraction
5 - 2

In [None]:
# multiplication
5 * 2

In [None]:
# division
5 / 2

# Integer division is also available, using the // operator:
# this returns the whole number portion of the result, so 5 // 2 = 2

In [None]:
# exponentiation: raising 5 to the 2nd power
5 ** 2

In [None]:
# exponentiation part 2: the square root of 5 (5 to the 0.5th power)
5 ** 0.5

# Note that spaces don't matter here: 5**0.5 == 5 ** 0.5

In [None]:
# the modulus operator: what is the remainder when you divide 5 by 2?
5 % 2 

In [None]:
# test for equality
5 == 2

# Variables and print commands

In [None]:
# variables, such as x here, contain values and their values can vary
x = 5

In [None]:
# what is the value of x?
x

In [None]:
# you can perform operations on variables, just like you can on two numbers
x + 3

In [None]:
# what is the value of x now?
x

In [None]:
# to update the value of a variable, you need to do an assignment again
x = x + 3

In [None]:
# and now what is the value of x?
x

In [None]:
# create a new variable y from an operation on x
x = 5
y = x * 2
y

In [None]:
# outputting values only displays the last thing output
x
y

In [None]:
# use print() to write some value to the console
print(x)
print(y)

In [None]:
# you can comma-separate values to print multiple to the console on one line
print(x, y)

In [None]:
# you can also print the result of an expression
print (x * y)

In [None]:
# Naming conventions: you can't have spaces or dashes in variable names
# Also, variable names must begin with a letter (but after that, numbers are fine)
# Underscores _ and Mixed Case is fine
# Variable names (like the rest of Python) are case sensitive
# Stylistically, variable names should begin with a lowercase letter
# names_with_underscores are one good approach
# namesWithCapitals are another

my_var = 1
my_var2 = 2
anotherVar = 3
this-aint-a-var = 4

# Data types

In [None]:
# integers are whole numbers
type(125)

In [None]:
# every variable has a data type, and they can be of any type
x = 125
type(x)

In [None]:
# float is a floating point (aka decimal) number
some_rate = 4.3
type(some_rate)

In [None]:
# strings are strings of characters
my_string = 'abc'
type(my_string)

In [None]:
# a list is a collection of elements denoted by square brackets
my_list = [1, 2, 3, 4]
print(my_list)
type(my_list)

In [None]:
# the elements of a list can be of different types
new_list = [1, 'Q', 8, 'four']
type(new_list)

In [None]:
# a dictionary is a collection of key:value pairs, denoted by curly braces
person = {'first_name': 'Geoff', 'last_name': 'Boeing'}
print(person)
type(person)

In [None]:
# A tuple is also a collection of elements, but you can't change individual elements after creation
# You will generally want to use lists instead of tuples. I mention them here only because they
# come up later when we build our geocoding script.
my_list = [1, 2, 3]
my_tuple = (1, 2, 3)

my_list[1] = 6
print(my_list)
my_tuple[1] = 4 # throws an error

In [None]:
# Some data types support iteration and indexing: 
# you can get the nth element from an iterable object (like a string) with [n] indexing notation
# In Python, the index starts with zero, not one
print(my_string[0])
print(my_list[1])
print(my_tuple[2])

# String processing

In [None]:
# some of the operators we saw earlier work on strings
city = 'Berkeley'
sep = ', '
state = 'California'
zip_code = '94703'

location = city + sep + state + ' ' + zip_code
print(location)

In [None]:
# multiplying a string just duplicates it
zip_code * 3

# division, exponentiation, and modulus don't work with strings.

In [None]:
# how many characters are in this string?
len(location)

In [None]:
# get a substring from some position up to but not including a second position
location[2:5]

In [None]:
# get the first n characters from the string
location[:5]

In [None]:
# get the characters from the string after the nth position
location[5:]

In [None]:
# you can replace characters in a string with the replace() method
location.replace('e', 'E')

# We'll look at methods a lot more later on.
# We don't have enough time to really get into it, but Python string formatting
# is incredibly flexible and powerful and merits further investigation.

In [None]:
# note that the replace() method does not mutate location itself!
# as before, if we wanted to store the modified string, we'd need to assign the result
# of location.replace() to location itself.
location

# Python "Control Flow" or "Programming Flow" tools
## i.e. "if" statements, "for" loops, and functions

Now we're going to learn some simple tools to help you give Python some more complicated directions about how to run different commands.

### Loops

This is how to tell Python to run something over and over. There are different types of loops but we'll focus on "for" loops. A simple structure you can use is:

1. For every element...
2. In a certain "iterable" (e.g. list)...
3. Do a certain thing
4. Repeat until you've gone through every item in the iterable

In [None]:
# Example: we're going to run through this list below.

bart = ["Berkeley", "Ashby", "MacArthur", "19th Street", "12th Street", "West Oakland"]
# side note: spaces after commas and single vs double quotes doesn't matter
# i.e. ['Berkeley', 'Ashby'] == ["Berkeley","Ashby"]

In [None]:
# Here's the simple structure above in python syntax

# Steps 1 and 2
# syntax is:  for "x" in "y"
# x is an arbitrary name that will refer to the current iteration
# y is the list you're iterating on 
# statements that are within the loop must be indented! Indentation is key to Python syntax

for station in bart:
    
    # Step 3: what you do for each item in the list
    # Here we will just print
    print(station)
    
    # Step 4: this structure will automatically repeat! 
    
print('How many times will this print?')

Let's break this down. What happened here is:

First iteration: 
* The variable "station" is set to the first item in the list "bart", which happens to be a string, "Berkeley"
* Printing "station" therefore gives you an output of "Berkeley"

Second iteration:
* The variable "station" now equals the second item in the list, the string "Ashby", so printing "station" now outputs "Ashby"

And this continues until you hit West Oakland, at which the loop sees no more items and the loop ends.

The last line, which is not indented, is not part of the for loop, so it gets printed only once.

In [None]:
# the variable "station" above is arbitrary; you can change it to whatever you want as long as it is consistent within the loop
# you'll often see people use "i" as convention

for i in bart:
    print(i)

In [None]:
# You can set more complicated commands as well.

for i in bart:
    print("Now I am at", i, "Station")

In [None]:
# You can also set multiple variables to change within loops

for i in bart:
    before = "Now I am at "
    after = " Station"
    print(before + i + after)

In [None]:
# Notice the difference between using , and + in the two cells above
# , puts spaces between each piece (or "argument"), and can display many different data types side by side
# + *concatenates* strings, as we've seen before. This means that (1) there will be no extra space and
# (2) all the components of the concatenation must be strings.
# It is possible to convert one data type to another using str(), float(), int()

print('I am at station #', 2, 'now')
print() # blank line
print('I am at station #' + str(2) + 'now')
print('I am at station #', 2, ' now', sep='') # we can define sep (separator) as an empty string to remove the spaces
print('I am at station #' + 2 + 'now') # throws an error

In [None]:
# Finally, just a taste of Python string formatting
# This is called an f-string (f for "format") and it's SO powerful
station_number = 2
print(f'I am at station #{station_number} now, and the sky is {0.927:.0%} clear')

### if statements

You can also have Python run commands based on conditions: do x if y. 

In [None]:
# Simple structure:

warriors_wins = 70
bulls_record = 72

if warriors_wins < bulls_record:
    print("Warriors did not break the record :(")

In [None]:
# What happens if the condition is not met here?

warriors_wins = 73

if warriors_wins < bulls_record:
    print("Warriors did not break the record :(")

In [None]:
# if can set "else" statements for when the condition is not met

if warriors_wins < bulls_record:
    print("Warriors did not break the record :(")
else:
    print("Warriors tied or broke the record!")

In [None]:
# you can set more than 2 conditions using "elif" which is short for "else if"

warriors_wins = 73

if warriors_wins < bulls_record:
    print("Warriors did not break the record :(")
elif warriors_wins == bulls_record:
    print("Warriors tied the record.")
else:
    print("Warriors broke the record!")

### Functions

So you've written a useful piece of code, and you're using it multiple times but finding it tedious to keep copying and pasting it. This is where **functions** come in. A function is basically a chunk of code that is saved under a name, and takes one or more inputs that are used in the chunk of code. Here's an example.

In [None]:
# Create a function to save the warriors code above

# "def" creates a function
# next is the function name: I call mine "record check"
# in parentheses are the "arguments" - what will the function take in and use?
# I have one argument, which will be a number of wins.

def record_check(number_wins):
    
    bulls_record = 72
    
    # here I copy my code from above, but I changed "warriors_wins" to "number_wins"
    # We are also now using "return" instead of print()
    # Functions can *return* one or more values, which you can then use however you want
    if number_wins < bulls_record:
        return "Warriors did not break the record :("
    elif number_wins == bulls_record:
        return "Warriors tied the record."
    else:
        return "Warriors broke the record!"
# when I run this cell, the code above will be saved or "defined" as the function "record_check"
# No output is printed though

In [None]:
# now we can run the function with various numbers 

print(record_check(70))
print(record_check(72))

# We don't use the print() command here so you can see the difference between print()ing a string and 
# displaying it as output
record_check(74)

In [None]:
# run the function in a loop

win_list = [i for i in range(69, 76)]
# This is called a *list comprehension*. Super powerful. We don't have time to get into it,
# but definitely look up Python comprehensions when you get a chance.

print(win_list)
# Note that range(69, 76) includes 69 but not 76!
# Think: "start at 69 and end right before 76"

for i in win_list: # can also say simply "for i in range(69, 76):"
    print('With', i, 'wins, the', record_check(i))
    
# Consider how making our function return, rather than print, its strings enables us to
# combine those strings in a novel fashion within the for loop.

## Micro-Project

This should take five minutes or so:

* Create a function
  * It should check whether the number it takes as an argument equals 30
  * If it does not equal 30, it should print "This is not Steph Curry's number"
  * If it does equal 30, it should print "This is Steph Curry's number"
* Create a list of the numbers 23, 11, 30, 12, and 40
* Create a for loop with this list. Each iteration should run your new function using a number in this new list.


In [None]:
# Write your function here (change names as desired!):

def function_name(argument):
    # content of function here
    
    
# Create your list here:


# Write your loop here:


# Let's build a geocoder!

Now that we've covered the basics of programming in Python, we're going to jump into a more complex, real-world example. Step by step, we will assemble a program that will:
1. read in a list of addresses from a CSV file,
2. clean up the data formatting,
3. geocode the addresses, and
4. export the geocoded data to a new CSV.

A script like this, which improves on ArcGIS's temperamental geocoding operations, is exactly the kind of tool that a planner might create as part of a typical workflow. In other words, we've reached the realm of the actually useful.

## Installing a new package

We'll use `geopy` to help us geocode our addresses. You may need to install this package - you'll know if this is the case, because the following cell will throw a `ModuleNotFoundError`. In another Anaconda Prompt/Terminal, activate your environment and type `conda install conda-forge::geopy`. (Yes, another conda-forge package; we discussed this briefly in the "getting started" guide.) Then restart this notebook's kernel and proceed with the cells below.

In [None]:
# We import the packages we need for this task. If you followed the instructions 
# on installing Python and making sure everything is set up correctly, these commands
# should look familiar. If not, the syntax is simple:

import pandas as pd
from geopy.geocoders import GoogleV3

In [None]:
# For our geocoding demo, we'll use a table of Oakland libraries I found
# through Code for Oakland: http://codeforoakland.org/data-sets/

# Reading a CSV into a pandas DataFrame is easy:
libraries = pd.read_csv('LibraryBranches.csv')

# Let's look at the table
libraries

In [None]:
# When you have a large DataFrame, it can be more convenient to see only the first few rows:
libraries.head() # defaults to 5 rows, but you can specify the number of rows

In [None]:
# Unlike Excel, you can't directly look at and modify cells in a table.
# Instead, you access data by specifying its *location.* There are two basic forms
# of location syntax, .loc[] and .iloc[] - note the square brackets, not parentheses.

# .loc[x, y]: look up the data at the location with row index named x and column named y
print(libraries.loc[0, 'PHONE'])

# .iloc[m, n]: look up the data in the mth row and nth column (first column == 0)
print(libraries.iloc[2, 2])

## Important concept: _objects_

It is common in Python and other programming languages to store data in flexible relational structures known as **objects**. Objects have **attributes**: for example, a `Person` object might have `height` and `age` numeric attributes, plus a `name` string attribute.

If an object has a function as an attribute, that function is called a **method** of the object. So our `Person` object can have a `get_older()` method that increases the Person's `height` attribute. (Note that like other, stand-alone functions, this method includes a set of parentheses.) In the cell above, `head()` is a method of the `libraries` object.

An object's methods and other attributes are accessed via **dot notation**: if `drew` is a `Person` object, `drew.height` will return `drew`'s `height` attribute. `drew.get_older()` will increase `drew`'s `age`.

We're not going to look at how to define our own objects (known also as *classes*) today, but it's a good skill to pick up, and tutorials are out there. https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/

Right now, we're about to interact with three useful objects that are already defined in the modules we've loaded: `DataFrame`, from `pandas`, and `Geocoder` and `Location` from `geopy.geocoders`. Let's see how we can combine some `DataFrame` attributes and methods to quickly modify our `libraries` object.

In [None]:
# These ALL-CAPS column names are irritating. Let's change them to lowercase:

libraries.columns = libraries.columns.str.lower()
libraries.head()

# What happened here?
# libraries.columns is an attribute of libraries (a pandas DataFrame object).
# The columns object is itself an Index object, containing the names of each column.
# libraries.columns.str allows us to access and modify each string in libraries.columns -
# in this case, we use the lower() method to convert each string to lowercase.
# So we are telling Python to overwrite libraries.columns (the set of column names)
# with a version of libraries.columns converted to lowercase.

In [None]:
# Let's look at just one column

libraries['name']

# libraries.name (dot notation) is equivalent to libraries['name'] (bracket notation).
# However, when you want to create a new column, you must use bracket notation.

In [None]:
# We need to clean up our data in preparation for geocoding.
# There are some extra dashes in the address field, and the city and state aren't specified.
# Pandas provides some powerful methods for quickly modifying many pieces of data:

libraries['full_address'] = libraries.address.str.replace('- ', '') + ', Oakland, CA'
libraries.head()

# What happened here?
# We looked at libraries.address (one column), used str to access that column's
# contents as strings, then used the replace() string method to remove dashes.
# Then we used the + operator to concatenate city and state to each address.
# Finally, we used bracket notation to create a new full_address column to store
# the results of this operation.

In [None]:
# Now that we've properly formatted our addresses, it's time to set up the Geocoder.

api_key = 'paste_your_api_key_here' # Ask Carolina for a sample Google Maps API key
# or get your own here: https://developers.google.com/maps/documentation/geocoding/get-api-key
g = GoogleV3(api_key=api_key)

# g is a GoogleV3 Geocoder object. This object has multiple methods, or built-in functions. 
# We'll be using the geocode() method, which takes a street address and returns a lat/long pair
# stored as a Location object. (Objects are just organized groups of data.)
# There is also a reverse() method that takes a lat/long pair and returns a street address.
# More info: https://geopy.readthedocs.org/en/1.11.0/#geopy.geocoders.GoogleV3
# and https://developers.google.com/maps/documentation/geocoding/intro
# The Google Maps Geocoding API has a limit of 2500 addresses per 24-hour period.

In [None]:
# Let's see the Geocoder in action. 
home = g.geocode('1001 Channin Brkley')
home

# Notice that the Google Maps Geocoding API is quite flexible - it was able
# to identify Channing Way from "Channin" and Berkeley, CA from "Brkley"!

In [None]:
# So, when we pass an address string to the Geocoder's geocode() method, the Geocoder
# looks up that address via the Google Maps Geocoding API and returns the result as a Location object.
# Now we work our way through the various components of this Location object.
print(home[0]) # matched address - a string
print(home[1]) # lat/long pair - a tuple
print(home[1][0]) # latitude - a float
print(home[1][1]) # longitude - a float

In [None]:
# OK! So we know how to get what we need out of the Location object that the Geocoder returns.
# We're now going to use a powerful feature of pandas - "apply," which applies a function to each
# item in a Series. 

# First, we need to define a function that takes an address string as a parameter,
# passes it to our Geocoder object, and returns the lat/long pair.

def getLatLong(address):
    result = g.geocode(address)
    return result[1]

In [None]:
# Now we can *apply* the function we defined to each address in the full_address column.
# As before, we use bracket notation to store the results of the apply operation into
# a new latlong column.

libraries['latlong'] = libraries.full_address.apply(getLatLong)
libraries.head()

In [None]:
libraries.latlong[0]

In [None]:
# It is a good idea to keep the lat/long pair itself. This is a common format
# that is used in web mapping (e.g. Leaflet). However, for some applications
# (e.g. ArcGIS, CartoDB), we will need separate latitude and longitude fields.
# Let's create those now, using a for loop.

for index, row in libraries.iterrows():
    libraries.loc[index, 'latitude'] = row['latlong'][0]
    libraries.loc[index, 'longitude'] = row['latlong'][1]

libraries.head()

# What happened here?
# libraries.iterrows(), a method of libraries, lets us iterate over each row in the
# libraries DataFrame. For each row, we store the zeroth element of 'latlong' as
# that row's 'latitude' and the 1th element of 'latlong' as that row's 'longitude'.
# Pandas creates new 'latitude' and 'longitude' columns if they don't already exist,
# or else overwrites data stored in these columns if they do already exist.
# Note, as before, how we use indentation to indicate which statements are part of the for loop.

In [None]:
# Almost there now. Before we export our data to a CSV file, let's remove
# some columns we don't need.

libraries.drop(['objectid', 'address'], inplace=True, axis=1)
libraries.head()

# What happened here?
# The drop() method removes data from a DataFrame. Its first argument is a list
# of entries to remove. The 'inplace' keyword specifies that the drop operation
# should modify libraries directly. (By default, inplace = False - you can always
# instead use libraries_new = libraries.drop(xyz) to save the modified
# DataFrame to libraries_new.) Axis is either 0 for rows or 1 for columns.
# In this case, we set it = 1 to drop columns, not rows.

In [None]:
# Time to save our data to disk. pandas DataFrames have a very convenient
# to_csv() method. The index=False argument tells pandas not to write the index
# as its own column, again eliminating unnecessary data.

libraries.to_csv('LibraryBranchesGeocoded.csv', index=False)

In [None]:
# Finally, let's collect all the pieces of the puzzle into a single cell.
# I have added some try/except syntax to handle situations in which an address string
# is totally invalid. Google will try hard to match each address, but if it can't, 
# the Geocoder will return None instead of a Location object.
# I also added three print() statements to inform the user about what the program
# has done. See if you can figure out how each statement uses attributes and methods!

# import the needed packages
import pandas as pd
from geopy.geocoders import GoogleV3

# read the CSV into a pandas DataFrame
libraries = pd.read_csv('LibraryBranches.csv')
print('Read', len(libraries), 'records')

# clean up the column names and address field
libraries.columns = libraries.columns.str.lower()
libraries['full_address'] = libraries.address.str.replace('- ', '') + ', Oakland, CA'

# in case you want to see how the program handles junk data, uncomment the next line
# libraries.loc[0, 'full_address'] = 'dkjfbga;knga;sd'

# create a geopy Geocoder
api_key = 'paste_your_API_key_here_again'
g = GoogleV3(api_key=api_key)

# define helper function
def getLatLong(address):
    result = g.geocode(address)
    try:
        return result[1]
    except Exception:
        return None

# geocode the addresses
libraries['latlong'] = libraries.full_address.apply(getLatLong)
print('Geocoded', sum(libraries.latlong.notnull()), 'addresses')

# extract latitude and longitude
for index, row in libraries.iterrows():
    try:
        libraries.loc[index, 'latitude'] = row['latlong'][0]
        libraries.loc[index, 'longitude'] = row['latlong'][1]
    except Exception:
        pass

# drop unnecessary columns
libraries.drop(['objectid', 'address'], inplace=True, axis=1)

# export the DataFrame to a new CSV file
libraries.to_csv('LibraryBranchesGeocoded.csv', index=False)
print('Exported', len(libraries), 'records')

# Not bad for 25 lines of code.