## Lab - September 3: Basic Python Programming Skills


# 1. Introduction

The goals for today's lab are to work through some basic features of Python programming.  There is SO much to learn, and so many ways of doing the same thing but with different codes, that we're going to focus on some basic concepts, but over time, you'll likely develop your own coding preferences or approaches. 

We're working in Jupyter notebooks - some of you may move to command line approaches to coding, but for now, we're going to keep it easy!

Lots more helpful info about Jupyter notebooks here: https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/examples_index.html 

## 1.1 Developing Good Habits

In lecture, we went over the difference between a code and a markdown cell.  But, in your code cells, you will also want to create notes that explain what a set of coding lines do.  You do this with a # sign in front of the text.

In [30]:
# These lines are comments. The Python interpreter ignores everything in a line after the # character.
# Comments are just notes for humans to read to help understand the code.

# Best practice: add a comment for every couple lines of code to explain what's going on and why.
# You'd be amazed at how quickly you forget your code's logic!

## 1.2 Some Basics

Let's do math!!!

In [None]:
# Addition
5 + 2

In [None]:
# Subtraction
5 - 2

In [None]:
# EXERCISE: See if you can figure out multiplication and division on your own


In [None]:
# EXERCISE: How about exponentiation and square root?
# Hint: Look this up on Google if you can't figure it out.


Okay - that was fun!  Let's learn some concepts about different types of variables

In [None]:
# Variables, such as x here, contain values.  Those values can be numeric, string, and we can manipulate them
# The equal sign (=) in name = input(...) does not mean equality. 
# It assigns the value on the right to the variable on the left by creating a reference or pointer to the data in memory.

x = 5
print(x)

In [None]:
# What is we decide to assign a new value to x?
x = 3
print(x)

# Notice that 5 has been overwritten by 3. 

In [None]:
# EXERCISE: Multiply x by 100 and then print the new value


In [None]:
# Now what if we change the value of x to a string? Will the calculation work?

x = "CYPLAN 201a"
x*100

In [None]:
# EXERCISE: Let's go one step harder - can you figure out how you might create a new y variable that is the square root of x?


Naming conventions and best-practices for variables: 

* You can't have spaces or dashes in variable names
* Variable names must begin with a letter (but after that, numbers are fine)
* Variable names (like the rest of Python) are case sensitive, so make it easy on yourself and use lowercase
* it is also good practice to keep your variable names short - I recommend under 8 characters
* names_with_underscores are one good approach

In [None]:
# Here are some examples of how to (and not to) name variables. 

my_var = 1
my_var2 = 2
this-aint-a-var = 4 # Notice that this will generate an error because you cannot use dashes in place of underscores

## 1.3 Data Types

When working with any programming language, you need a way to store and work with information. In Python, different kinds of information are called data types. Think of them as different “boxes” designed to hold different kinds of things: numbers, words, lists of items, and more.
Here are some of the most common data types:

1. **Integers (int)**: Whole numbers without a decimal point.
2. **Floating-Point Numbers (float)**: Numbers that have a decimal point.
3. **Strings (str)**: Text inside quotation marks (single or double).
4. **Booleans (bool)**: A value that is either True or False.
5. **Lists (list)**: An ordered collection of items (like a shopping list).
6. **Dictionaries (dict)**: A way to store data as key–value pairs (like a mini phone book).

<img src="Python-Data-Types.webp" width="500">

[Image Source](https://www.geeksforgeeks.org/python/python-data-types/)

In [None]:
# You can ask Python to tell you the type of any variable 
type(125)

In [None]:
# EXERCISE: Create a string variable with your name, and then look to see what type it is


In [None]:
# A list is a collection of elements denoted by square brackets
my_list = [1, 2, 3, 4]
print(my_list)
type(my_list)

In [None]:
# A dictionary is a collection of key:value pairs, denoted by curly braces
person = {'first_name': ['Carolina', 'Laura', 'Leila'], 
          'last_name': ['Reid', 'Latendresse', 'Tjiang']}
# i.e.'first_name' is the key. 'Carolina' is the value.

print(person)
print(person['first_name']) # This will print all the values under the key 'first_name'.
type(person)

## 1.4 Indexing and Slicing with Lists (and Strings)

For all sequential types (i.e. lists, strings, tuples) in Python, there are many useful operations we can use to manipulate objects. These operations are a key feature of how we will work with tabular data (i.e. data in tables) later on in this lab and in the course!

In [None]:
# We can pull out specific items in a list by referring to their index

animals = ["giraffe", "elephant", "lizard", "zebra"]
print(animals[0]) # Index 0 refers to the FIRST item in the list
print(animals[1]) # Index 1 refers to the SECOND item in the list

# What do you notice about the indexes?

In [None]:
# We can also use negative indexes

print(animals[-1]) # Index -1 refers to the LAST item in the list
print(animals[-3]) # Index -3 refers to the THIRD TO LAST item in the list 

# What do you notice about the indexes?

In [None]:
# We can also slice a list (i.e. take a subset).
# The most basic way to do this is to use the indexes.

letters = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]

# Example 1
print("Example 1: ", letters[2:7]) # This will return the items at indexes 2 through 6 (inclusive).

# Example 2
print("Example 2.1: ", letters[0:3]) # This will return first 3 items in the list (indexes 0, 1, and 2).
print("Example 2.2: ", letters[:3]) # This is the same as above.

# Example 3
print("Example 3.1: ", letters[-3:]) # This will return the last 3 items in the list.
print("Example 3.2: ", letters[:-2]) # This will return everything but the last two items in the list.

Here's a quick reminder of how indexes work when slicing a list (or any sequence type). 

Example code:
```python
nums = [10, 20, 30, 40, 50, 60, 70, 80, 90]
some_nums = nums[2:7]
```

Here is a visual representation of the slicing: <img src="first-slice.png" width="500">

[Image Source](https://railsware.com/blog/indexing-and-slicing-for-lists-tuples-strings-sequential-types/) _(Note: This link has an extensive list of almost all sequential type operations in Python.)_


In [None]:
# Remember, these operations work on any sequential type in Python. For example:

sentence = "Python makes data analysis so much easier!"
sentence[4:20]

In [None]:
# Now you try!
# EXERCISE: Create a list with 5 or more items and complete the following exercises: 

# 1. Print the third item in the list

# 2. Print the second to last item in the list

# 3. Print the second, third, and fourth item in the list.

## 1.5 Libraries and/or Packages

One of the challenging things about using open source technologies is that it is rarely presented as a "complete" software package.  If you're working in Excel, for example, you don't need to go outside of the program to insert an equation. It's a function in Excel. With open source software, different functions are created by different programmers, and we often have to "call" in that external program to do what we want. 

Some important libraries/packages in Python are:

>**numpy**: used for math and logic operations.

>**pandas**: used for the storing and basic handling of data in tabular format.

>**matplotlib**: used for data visualization, creating plots, graphs, etc.

>**math**: from datascience, a collection of math functions

We install these  with the following commands. The abbreviation will be what we use to "call" functions that belong to that library or package.  As we start to get more sophisticated, we'll call ever more libraries or packages into our Python notebooks. 

#### NOTE: If you are working in the desktop version of Python, you may have to install some of these libraries/packages first. 
For more information about the distinction between packages and libraries, see: https://realpython.com/videos/scripts-modules-packages-and-libraries/

In [2]:
import numpy as np
import pandas as pd
import math
import matplotlib as plt
%matplotlib inline

pd.options.display.float_format = '{:.2f}'.format

### 1.5.1 PANDAS

<img src="panda.jpeg" width="300">

No, not those pandas.  In Python, the word "panda" is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. With pandas, we can clean, transform and analyze our data.  While computer programmers often work with other forms of data, when we are doing planning data analysis, we are almost always working in pandas.

The building block of pandas are "series" - a one-dimensional labeled indexed array.  A dataframe is a multi-dimensional table made up of a collection of series.

<img src="Series.png" width="300">

#### Note how Python has indexed my values - you can think of a dataframe as an Excel worksheet.  Just like Excel, it is assigning row numbers to each observation, but in this case, the row numbers start with 0 rather than 1.  

# 2. Let's explore one easy ACS table to get us started!

## 2.1 Read in data from the American Community Survey
For this exercise, I've downloaded a table from the 2023 1-Year ACS for all of the counties in California. Next week, we'll pull these data directly from the Census website using an API (Application Programming Interface).  But for now, we'll just read in a csv file.  

I downloaded table B03002: https://api.census.gov/data/2022/acs/acs5/groups/B03002.html.  I cleaned up the data, renaming the variables I want so it's easier to work with.

I want to answer two questions today:
- Which county in California has the highest share of Hispanic/Latinx residents?
- What is the racial and ethnic distribution of Alameda county?

In [3]:
# Read in Excel file through pandas
df=pd.read_csv('ca_county_race_ethnicity_2023.csv') 

# Note: As long as the file you're reading into Python is in the same folder as your notebook, you don't have to specify the path (a.k.a. the location)

In [None]:
# Check out what's inside the dataframe
df.info()

In [None]:
# How many rows and columns (variables) do we have?
df.shape

In [6]:
# Take a sneak peek!
df.head()

Unnamed: 0,NAME,GEO_ID,total,total_moe,nh_white,nh_white_moe,nh_black,nh_black_moe,nh_native,nh_native_moe,...,nh_pi,nh_pi_moe,nh_1other,nh_1other_moe,nh_multi,nh_multi_moe,hispanic,hispanic_moe,state,county
0,"Alameda County, California",0500000US06001,1651949,-555555555,466445,1170,159042,1736,4002,418,...,11651,588,10314,1134,86873,3145,385245,-555555555,6,1
1,"Alpine County, California",0500000US06003,1695,234,993,215,0,14,417,27,...,0,14,6,11,22,23,249,115,6,3
2,"Amador County, California",0500000US06005,41029,-555555555,30234,341,781,143,398,112,...,26,22,489,298,2153,315,6361,-555555555,6,5
3,"Butte County, California",0500000US06007,209470,-555555555,139527,767,3550,393,1611,294,...,554,130,1072,543,11317,895,40829,-555555555,6,7
4,"Calaveras County, California",0500000US06009,45995,-555555555,35599,318,529,140,424,197,...,24,33,3,18,1947,486,6403,-555555555,6,9


In [None]:
# Here is a list of all the columns in the data
df.columns

## 2.2 Working with a Dataframe
Remember the list indexing we covered earlier? That same logic applies to how a pandas DataFrame is structured — we’re still accessing specific parts of a data structure using brackets.

With a list, we were using the **positional index** to retrieve, say, the fourth element in a given list with **list[3]**. 
With our dataframes, we’re going to access elements by **series**, identified by their **column name**. 

So instead of `list[3]`, we’ll point to a column with **`dataframe[column_name]`**. These labels make it easier to identify what part of the data we’re calling. 

Note: Remember that that column name is a string, so it’ll have to be in quotation marks for the program to find it… 

In [None]:
# EXERCISE: Try to access only the column that shows the total population for each county (i.e., each row)


In [None]:
# Using this technique, we can apply a function to a single column in the dataframe. For example: 
df['NAME'].unique()

## 2.3 Doing Calculations Using Pandas

If you’re coming from Excel, it’s natural to think in terms of rows: calculate something for the first row, then copy that logic down.

pandas is built for **vectorized** operations, meaning that it handles entire columns of data at once, not row-by-row. 

So to get the sum of White and Asian respondents in this dataframe, for instance, we tell pandas to add those two columns together...

In [None]:
# Calculate the sum of White and Asian respondents for each California county
df["nh_white"] + df["nh_asian"]

Yay! That worked! But without creating a new column in our dataframe to store that data, we won't be able to use that sum later on...

We can create a new column in pandas by using the same format as existing ones — just give it a new name in square brackets and assign it a value or expression.

For example:

* `df["new"] = df["old1"] * df["old2"]` to create a new column by performing a calculation on existing column(s)
* `df["new"] = 5` to create a new column with a specified value aross every row.

In [None]:
# EXERCISE: 
# 1. Create a new column that stores the sum of non-Hispanic Asian and White respondents
# 2. Create a new column called "year" with the ACS year you're working with.


Now let's figure out the percents!

In [None]:
# EXERCISE: Create a new column that stores the percent of respondents that are non-Hispanic White


In [None]:
# EXERCISE: Can you figure out how to sort the table?  And then write the code to do the same for the Hispanic population.
# Hint: Check out the Pandas documentation or Google the solution!


## 2.4 Bringing it all together!

We've covered a lot of ground and we know the intial stages of learning Python can be overwhelming. But here's an example of what we're working towards. 

Let's create a bar chart that shows the percent of non-Hispanic White, non-Hispanic Black, and Hispanic residents in Alameda County. (We'll work on making our charts more effective later in the semester.)

In [28]:
# Here, we use a "for" loop to apply the same percent calculation for each race/ethnicity variable, 
# and then automatically add _pct to the end of each created variable

race_cols = ["nh_white", "nh_black", "nh_native", "nh_asian", # First, create a list of all race/ethnicity counts
             "nh_pi", "nh_1other", "nh_multi", "hispanic"]

for col in race_cols: # This is a for loop, it loops through items in a list to expedite our analysis
    df[col + "_pct"] = df[col] / df["total"] * 100 # Here is the actual calculation being applied to each item in the race_cols list

df[["NAME"] + [c+"_pct" for c in race_cols]].head() # Create and display a new column that contains the percent of each race/ethnicity (column) for each county (row)

Unnamed: 0,NAME,nh_white_pct,nh_black_pct,nh_native_pct,nh_asian_pct,nh_pi_pct,nh_1other_pct,nh_multi_pct,hispanic_pct
0,"Alameda County, California",28.24,9.63,0.24,31.99,0.71,0.62,5.26,23.32
1,"Alpine County, California",58.58,0.0,24.6,0.47,0.0,0.35,1.3,14.69
2,"Amador County, California",73.69,1.9,0.97,1.43,0.06,1.19,5.25,15.5
3,"Butte County, California",66.61,1.69,0.77,5.26,0.26,0.51,5.4,19.49
4,"Calaveras County, California",77.4,1.15,0.92,2.32,0.05,0.01,4.23,13.92


In [15]:
# Pick Alameda County
alameda = df[df["NAME"] == "Alameda County, California"] # Here we are selecting only the rows in the dataframe where "NAME" equals "Alameda County, California"

# Collect the percent columns; how would you add more to the graph?
alameda_percents = alameda[["nh_white_pct","nh_black_pct","hispanic_pct"]]

In [None]:
import matplotlib.pyplot as plt # This is a special package for plotting

alameda_percents.plot(kind="bar", figsize=(8,5)) # Generate the plot and specify the figure size
plt.ylabel("Percent of Population") # Add a y-axis label
plt.title("Race/Ethnicity Distribution in Alameda County, CA") # Add a x-axis label
plt.show() # Tell Python to show us the plot

# 3. [Optional] Additional Resources for Practicing

### Are you confused? Want to review the Python basics we just covered?
First of all, do not fret! Learning Python is like learning a new language. It will take time (and lots of practice) to absorb everything you just learned.

Here are some resources to help you practice and clarify confusing concepts: 
* Practice Platforms
    * [HackerRank](https://www.hackerrank.com/) (Beginner to Intermediate)
    * [Codewars](https://www.codewars.com/) (Beginner to Advanced)
* Online Courses
    * [Programming for Everbody (UMich)](https://www.coursera.org/learn/python) on Coursera (Beginner)
    * [CS50's Introduction to Programming with Python (Harvard)](https://cs50.harvard.edu/python/) (Beginner)
    * [Data8: Foundations of Data Science (UCB)](https://www.data8.org/sp25/) (Beginner)
    * [How to Think Like a Computer Scientist](https://runestone.academy/ns/books/published/thinkcspy/index.html) (Beginner)
* Videos
    * [Tech with Tim](https://www.youtube.com/@TechWithTim) (Beginner to Imtermediate)
 

Finally, please remember to make use of GSI office hours! You can drop-in to review Python concepts and skills regardless of whether your questions are related to a class assignment or not. 

### Too easy? Here are some more advanced Python resources you can review. 

Here are some resources to help you advance your skills:
* Practice Platforms
    * [LeetCode](https://leetcode.com/) (Intermediate to Advanced)
    * [Kaggle](https://www.kaggle.com/) (Intermediate)
* Python Documentation
    * Dig around on the [official Python documentation](https://docs.python.org/3/)!
* Videos
    * [Corey Schafer's Python Tutorials](https://www.youtube.com/user/schafer5) (Intermediate)
    * [ArjanCode](https://www.youtube.com/@ArjanCodes) (Advanced)
  


Source: https://ucb-urban-informatics.github.io/cp255_web/docs/tutorials/python_reference.html