# Lecture 2, September 18, 2025 – Python Basics

In [None]:
# Imports
import pandas as pd
import numpy as np

import plotly.express as px
import matplotlib.pyplot as plt
plt.style.use('ggplot')

## Demo

### _Little Women_ (1868)

- _Little Women_, by Louisa May Alcott, is a novel that follows the life of four sisters – Meg, Jo, Beth, and Amy.
    - A movie based on the novel was released in 2019, starring Emma Watson (Meg) and Timothée Chalamet (Laurie).
- Using tools from this class, we'll learn (a bit) about the plot of the book, without reading it.
- Do not worry about any of this code – we'll cover the necessary pieces in the weeks to come. Sit back and relax!

In [None]:
# Read in 'lw.txt' to a variable called little_women_text.
little_women_text = open('data/lw.txt').read()

In [None]:
# See the first three thousand characters.
little_women_text[:3000]

In [None]:
# Print the first three thousand characters.
print(little_women_text[:3000])

In [None]:
# Create a variable "chapters" by splitting the text on 'CHAPTER '.
chapters = little_women_text.split('CHAPTER ') 

# Create a DataFrame with one column - the text of each chapters.
pd.DataFrame().assign(chapters=chapters)

In [None]:
# Number of occurrences of each name in each chapter.

counts = pd.DataFrame().assign(
    Amy=np.char.count(chapters, 'Amy'),
    Beth=np.char.count(chapters, 'Beth'),
    Jo=np.char.count(chapters, 'Jo'),
    Meg=np.char.count(chapters, 'Meg'),
    Laurie=np.char.count(chapters, 'Laurie'),
)
counts

In [None]:
# Cumulative number of times each name appears.

cumulative_counts = pd.DataFrame().assign(
    Amy=np.cumsum(counts.get('Amy')),
    Beth=np.cumsum(counts.get('Beth')),
    Jo=np.cumsum(counts.get('Jo')),
    Meg=np.cumsum(counts.get('Meg')),
    Laurie=np.cumsum(counts.get('Laurie')),
    Chapter=np.arange(1, 49, 1)
)

cumulative_counts

In [None]:
df = cumulative_counts.drop(columns=['Chapter']).melt().rename(columns={'variable': 'name', 'value': 'Count'})
df.assign(Chapter=list(range(1, 49)) * 5)

In [None]:
# Putting it all together, we get a helpful visualization.
cumulative_counts_df = cumulative_counts.drop(columns=['Chapter']).melt().rename(columns={'variable': 'name', 'value': 'Count'})
cumulative_counts_df = cumulative_counts_df.assign(Chapter=list(range(1, 49)) * 5)
px.line(cumulative_counts_df, x='Chapter', y='Count', color='name', width=900, height=600, title='Cumulative Number of Times Each Name Appears', template='ggplot2').show()

### Jupyter Notebooks 📓

- Often, code is written in a text editor and then run in a command-line interface (or both steps are done in an IDE).

<center>
<img src='images/terminal.png' width=800>
</center>

- **Jupyter Notebooks** allow us to write and run code within a single document. They also allow us to embed text (in Markdown format) and code. 

## Expressions

- An **expression** is a combination of values, operators, and functions that **evaluates** to some **value**.

- For now, let's think of Python like a calculator – it takes expressions and evaluates them.

- We will enter our expressions in **code cells**. To run a code cell, either:
    - **Hit `shift` + `enter` (or `shift` + `return`) on your keyboard (strongly preferred)**, or
    - Press the "▶ Run" button in the toolbar.

In [None]:
23

In [None]:
-15 + 2.718

In [None]:
4 ** 3

In [None]:
(2 + 3 + 4) / 3

In [None]:
# Only one value is displayed. Why?
9 + 10
13 / 4
21

### Arithmetic operations

| Operation | Operator | Example | Value | 
| --- | --- | --- | --- |
| Addition | `+` | `2 + 3` | `5` |
| Subtraction | `-` | `2 - 3` | `-1` |
| Multiplication | `*` | `2 * 3` | `6` |
| Division | `/` | `7 / 3` | `2.66667` |
| Remainder | `%` | `7 % 3` | `1` |
| Exponentiation | `**` | `2 ** 0.5` | `1.41421` |

### Python uses the typical order of operations – PEMDAS

- P arentheses (and other grouping symbols)
- E xponents
- MD Multiplication and Division
- AS Addition and Subtraction

In [None]:
5 * 2 ** 3

In [None]:
(5 * 2) ** 3

## Variables

### Motivation

Below, we compute the number of seconds in a year.

In [None]:
60 * 60 * 24 * 365

If we want to use the above value later in our notebook to find, say, the number of seconds in 12 years, we'd have to copy-and-paste the expression. **This is inconvenient, and prone to introducing errors.**

In [None]:
60 * 60 * 24 * 365 * 12

It would be great if we could **store** the initial value and refer to it later on!

### Variables and assignment statements

- A **variable** is a place to store a value so that it can be referred to later in our code. To define a variable, we use an **assignment statement**.

$$ \overbrace{\texttt{zebra}}^{\text{name}} = \overbrace{\texttt{23 - 14}}^{\text{any expression}} $$


-  An assignment statement changes the meaning of the **name** to the left of the `=` symbol.

- The expression on the right-hand side of the `=` symbol is evaluated before being assigned to the name on the left-hand side.
    * e.g. `zebra` is bound to `9` (value) not `23 - 14` (expression).

### Think of variable names as nametags!

In [None]:
# Note: This is an assignment statement, not an expression.
# Assignment statements don't output anything!
a = 1

In [None]:
a = 2

In [None]:
b = 2

### Naming variables

- Give your variables helpful names so that you know what they refer to.
- Variable names can contain uppercase and lowercase characters, the digits 0-9, and underscores.
    - They cannot start with a number.
    - They are case sensitive!

The following assignment statements are **valid**, but use **poor** variable names 😕.

In [None]:
six = 15

In [None]:
i_45love_chocolate_9999 = 60 * 60 * 24 * 365

The following assignment statements are **valid**, and use **good** variable names ✅.

In [None]:
seconds_per_hour = 60 * 60
hours_per_year = 24 * 365
seconds_per_year = seconds_per_hour * hours_per_year

### Python functions

- Functions in Python work the same way functions in math do.
- The inputs to functions are called **arguments**.
- Python comes with a number of built-in functions that we are free to use.
- **Calling** a function, or using a function, means asking the function to "run its recipe" on the given input.

In [None]:
abs(-23)

### Some functions can take a variable number of arguments

In [None]:
max(4, -8)

In [None]:
max(2, -3, -6, 10, -4)

In [None]:
max(9)

In [None]:
max(9 + 10, 9 - 10)

### Put ```?``` after a function's name to see its documentation 📄

Or use the `help` function, e.g. `help(round)`.

In [None]:
round(1.45678)

In [None]:
round?

In [None]:
round(1.45678, 3)

### Nested evaluation

We can **nest** many function calls to evaluate sophisticated expressions.

In [None]:
min(abs(max(-1, -2, -3, min(4, -2))), max(5, 100))

### Import statements
- Python doesn't have everything we need built in.
- In order to gain additional functionality, we import **modules** through **import statements**.
- **Modules** are collections of Python functions and values.
- Call these functions using the syntax `module.function()`, called "dot notation".

### Example: `import math`

Some of the many functions built into the `math` module are `sqrt`, `pow`, and `log`.

In [3]:
import math

In [None]:
math.sqrt(16)

In [None]:
math.pow(2, 5)

`math` also has constants built in!

In [4]:
math.pi

3.141592653589793

## Data types

### What's the difference? 🧐

In [None]:
4 / 2

In [None]:
5 - 3

To us, `2.0` and `2` are the same number, $2$. But to Python, these appear to be different! 

### Data types
- Every value in Python has a **type**.
    - Use the `type` function to check a value's type.
- It's important to understand how different types work with different operations, as the results may not always be what we expect.

### Two numeric data types: ```int``` and ```float``` 
- ```int```: An integer of any size.
- ```float```: A number with a decimal point.

### ```int```
- If you add (`+`), subtract (`-`), multiply (`*`), or exponentiate (`**`) `int`s, the result will be another `int`.
- `int`s have arbitrary precision in Python, meaning that your calculations will always be exact. 

In [None]:
7 - 15

In [None]:
type(7 - 15)

In [None]:
2 ** 300

In [None]:
2 ** 3000

### ```float```
* A `float` is specified using a **decimal** point.
* A `float` might be printed using scientific notation.

In [None]:
3.2 + 2.5

In [None]:
type(3.2 + 2.5)

In [None]:
# The result is in scientific notation: e+90 means "times 10^90".
2.0 ** 300

### The pitfalls of ```float```
* `floats` have limited precision; after arithmetic, the final few decimal places can be wrong in unexpected ways.
* `float`s have limited size, though the limit is huge.

In [None]:
1 + 0.2

In [None]:
1 + 0.1 + 0.1

In [None]:
2.0 ** 3000

### Converting between ```int``` and ```float```
- If you mix `int`s and `float`s in an expression, the result will always be a `float`.
     - Note that when you divide two `int`s, you get a `float` back.
- A value can be explicity **coerced** (i.e. converted) using the ```int``` and ```float``` functions.

In [None]:
2.0 + 3

In [None]:
12 / 2

In [None]:
int(12 / 2)

In [None]:
int(-2.9)

### Summary

- Expressions evaluate to values. Python will display the value of the last expression in a cell by default.
- Python knows about all of the standard mathematical operators and follows PEMDAS.
- Assignment statements allow us to bind values to variables.
- We can call functions in Python similar to how we call functions in math.
    - Python knows some functions by default, and import statements allow us to bring additional functionality from modules.
- All values in Python have a data type.
    - `int`s and `float`s are numbers.
    - `int`s are integers, while `float`s contain decimal points.

## Strings

### Strings 

- A string is a snippet of text of any length.
- In Python, strings are enclosed by either single quotes or double quotes (doesn't matter which!)

In [None]:
'woof'

In [None]:
type('woof')

In [None]:
"woof   🐶🐶"

In [None]:
# A string, not an int!
"1998" + "45"

### String arithmetic

When using the `+` symbol between two strings, the operation is called "concatenation".

In [None]:
s1 = 'baby'
s2 = '🐼'

In [None]:
s1 + s2

In [None]:
s1 + ' ' + s2

In [None]:
s1 * 3 # multiplication is repeated addition, same as s1 + s1 + s1

### String methods
* Associated with strings are special functions, called **string methods**.
* Access string methods with a `.` after the string ("dot notation").    
    * For instance, to use the `upper` method on string `s`, we write `s.upper()`.
* Examples include `upper`, `title`, and `replace`, but there are [many more](https://docs.python.org/3/library/stdtypes.html#string-methods).

In [None]:
my_cool_string = 'data science is super cool!'

In [None]:
my_cool_string.title()

In [None]:
my_cool_string.upper()

In [None]:
my_cool_string.replace('super cool', '💯' * 3)

In [None]:
"hello".upper()

In [None]:
# len is not a method, since it doesn't use dot notation.
len(my_cool_string)

# Lists

## What is a list?

A list is an ordered sequence of values

A list of integers:
- [3, 1, 4, 1, 5, 9]

A list of strings:
- ["Once", "upon", "a", "time", "there"]


Each value has an index
- for [3, 1, 4, 1, 5, 9]

| Index| 0 | 1 | 2| 3 | 4 | 5 |
| ----- | - | - | - | - | - | - |
| List Entry  | 3 | 1 | 4 | 1 | 5 | 9 |


Indexing is zero-based (counting starts with zero):



## Loop Examples

[see in python tutor](https://tinyurl.com/y6a34bpd)

In [None]:
for num in [2, 4, 6]:
    print(num)

In [None]:
for i in [1, 2, 3]:
    print("Hi there!")

In [None]:
for char in "happy": #sequence is a string, NOT a list
    print(char)      #prints the values of sequence

## The range function

A typical for loop does not use an explicit list:

In the following,
```
for i in range(5):
  … body
```
range(5) produced the numbers in the sequence [0, 1, 2, 3, 4]
 - note that range(5) __is not__ the same as the list [0, 1 ,2, 3, 4]
 - list(range(5)) is equivalent to the list [0, 1, 2, 3, 4]



__Detailed usage (1, 2, or 3 arguments)__

- range(5): cycles through [0, 1, 2, 3, 4] -> __Upper limit (exclusive)__

- range(1, 5): cycles through [1, 2, 3, 4] -> __Lower limit (inclusive)__

- range(1, 10, 2): cycles through [1, 3, 5, 7, 9] -> __step (distance between elements)__



In [None]:
# Getting length of a list
print(len([3,1,4,1,5,9]))

## List Operations

What operations should a list support efficiently and conveniently?
- Creation
- Querying
- Modification


## List Creation
- run the following cells from __top to bottom__
- [see in python tutor](https://tinyurl.com/thb52x)

In [None]:
a = [3, 1, 2 * 2, 1, 10 / 2, 10 - 1]
print(a)

In [None]:
b = [5, 3, 'hi']
print(b)

In [None]:
c = [4, 'a', a]
print(c)

In [None]:
d = [[1, 2], [3, 4], [5, 6]]
print(d)

## List Querying

Expressions that return parts of lists:


Single element:  		__mylist[index]__

- The single element stored at that location



In [None]:
l = [3,1,4,1,5,9]
print(l[2])

Sublist (“slicing”):  	__mylist[start:end]__

- the sublist that starts at *start*

- index start and ends at index *end – 1*

- If start is omitted: defaults to 0

- If end is omitted: defaults to len(mylist)

- mylist[:] and mylist[0:len(mylist)] both evaluate to the whole list

In [None]:
a = [0, 1, 2, 3, 4]
print(a[0])

In [None]:
a = [0, 1, 2, 3, 4]
print(a[5])

In [None]:
a = [0, 1, 2, 3, 4]
print(a[6])

In [None]:
a = [0, 1, 2, 3, 4]
print(a[-1]) # last element in list

In [None]:
a = [0, 1, 2, 3, 4]
print(a[-2]) # next to last element

In [None]:
a = [0, 1, 2, 3, 4]
print(a[0:2])

In [None]:
a = [0, 1, 2, 3, 4]
print(a[0:-1])

## List Modification
- Insertion
- Removal
- Replacement
- Rearrangement

### List insertions
mylist.append(x)

- Extend mylist by inserting x at the end

mylist.extend(L)
- Extend mylist by appending all the items in the argument list L to the end of mylist

ylist.insert(i, x)
- Insert item x before position i.

a.insert(0, x)
- inserts at the front of the list

a.insert(len(a), x)
- is equivalent to    a.append(x)

In [None]:
lst = [1, 2, 3, 4]
lst.append(5)
print(lst)

In [None]:
lst = [1, 2, 3, 4]
lst.extend([6, 7, 8])
print(lst)

In [None]:
lst = [1, 2, 3, 4]
lst.insert(3, 3.5)
print(lst)

__Comprehension check__

Which of the following is printed by the code below

- A) 4
- B) 5
- C) 2
- D) [4, 6]
- E) IndexError: list index out of range


```
lst = [1, 3, 5]
lst.insert(2, [4, 6])
print(lst[2])
```




In [None]:
lst = [1, 3, 5]
lst.insert(2, [4, 6])
print(lst[2])

### List removal

mylist.remove(x)
- Remove the first item from the list whose value is x
- It is an error if there is no such item
- __Returns None__



mylist.pop([i])
- Remove the item at the given position in the list, and return it.
- If no index is given, mylist.pop() removes and returns the last item in the list.
- Notation from the Python Library Reference: *The square brackets around the parameter, “[i]”, means the argument is optional.
It does not mean you should type square brackets at that position.*

In [None]:
#Examples
lst = [1, 2, 3, 4, 5, 6, 7]
print(lst.pop())

In [None]:
lst = [1, 2, 3, 4, 5, 6, 7]
print(lst.pop(1))

In [None]:
lst = [1, 2, 3, 4, 5, 6, 7]
print(lst.remove(3))


In [None]:
lst = [1, 2, 3, 4, 5, 6, 7]
lst.remove(3)
print(lst)

### List replacement


```
mylist[index] = new_value
mylist[start:end] = new_sublist
```
Replaces mylist[start]  to   mylist[end – 1] with new_sublist


Can change the length of the list

Examples:
```
mylist[start:end] = []
```
removes mylist[start]… mylist[end – 1]
```
mylist[len(mylist):] = L
```
is equivalent to a.extend(L)


In [None]:
#Examples
lst = [1, 2, 3, 4, 5, 6, 7]
lst[3] = 'blue'
print(lst)

In [None]:
lst = [1, 2, 3, 4, 5, 6, 7]
lst[1:3] = [10, 11, 12]
print(lst)

### List Rearrangement

mylist.sort()
- Sort the items of the list, in place.
- “in place” means by modifying the original list,
not by creating a new list.

mylist.reverse()
- Reverse the elements of the list, in place.

both sort() and reverse() __return none__






In [None]:
#Example
lst = [1, 2, 3, 4, 5, 6, 7]
lst.reverse()
print(lst)
lst.sort()
print(lst)

### List modification examples

- [see in python tutor](https://tinyurl.com/z2ktkhf2)

In [None]:
lst = [10, 12, 23, 54, 15]
lst.append(7)
lst.extend([8, 9, 3])
lst.insert(2, 2.75)
lst.remove(3)
print(lst.pop())
print(lst.pop(4))
lst[1:5] = [20, 21, 22]
lst2 = [4, 6, 8, 2, 0]
lst2.sort()
lst2.reverse()
lst3 = lst2
lst4 = lst2[:]
lst2[-1]= 17

print(lst)
print(lst2)
print(lst3)
print(lst4)

## List operations comprehension check

1. What will convert list a into [1, 2, 3, 4, 5]

a = [1, 3, 5]

- A. a.insert(1, 2)

    a.insert(2, 4)
- B. a[1:2] = [2, 3, 4]
- C. a.extend([2, 4])
- D. a[1] = 2

     a[3] = 4


## Excercise: List lookup



Goal: implement the following ([see in python tutor](https://tinyurl.com/5dme2skr))

```
def my_index(lst, value):
  """Return the position of the first occurrence
  of value in the list lst. Return None if value
  does not appear in lst."""
```

Examples:

story = ["Once", "upon", "a", "time", "there", "was", "a"]

my_index(story, "a") → 2

my_index(story, "there") → 4

__Note__ :  my_list[my_index(my_list, x)] == x

In [None]:
def my_index(lst, value):
  """Return the position of the first
  occurrence of value in the list lst.
  Return None if value does not appear
  in lst."""


## Exercise:  Convert Units

([see in python tutor](https://tinyurl.com/rccvsf49))

Using the following:

```
def cent_to_fahr(cent):
  return cent / 5.0 * 9 + 32

ctemps = [-40, 0, 20, 37, 100]

ftemps = []
```
Without setting it directly, set ftemps to [-40, 32, 68, 98.6, 212]


## More on list slicing

mylist[startindex:endindex] evaluates to a sublist of the original list
    
  - mylist[index] evaluates to an element of the original list

- Arguments are like those to the range function
mylist[start:end:step]
  - start index is inclusive, end index is exclusive
  - All 3 indices are optional
  - Can assign to a slice:  mylist[s:e] = yourlist


Example ([see in python tutor](https://tinyurl.com/y2cvczn6)):

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[2:]
print(my_list)

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[:5]
print(my_list)

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[-1]
print(my_list)

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[-4:]
print(my_list)

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[:-3]
print(my_list)

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[:]
print(my_list)

In [None]:
test_list = ['e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6']
my_list = test_list[::-1]
print(my_list)

__Answer__

test_list[2:]		From e2 to the end of the list

test_list[:5]		From beginning up to (but not including) e5

test_list[-1]		Last element

test_list[-4:]	    Last four elements

test_list[:-3]	    Everything except last three elements

test_list[:]		    Get a copy of the whole list

test_list[::-1]	    Reverse the list

## How to evaluate a list expression

There are two new forms of expression:
- List creation (aka, list literals):   [1, 2, 3, 4]
- List indexing (aka, dereferencing): a[i]

__Note:__ In the previous examples, there are the same tokens “[]” with two distinct meanings

## Evaluating list expressions

[a, b, c, d]		list __creation__
  - To evaluate:
    - evaluate each element to a value, from left to right
    - make a list of the values
  -The elements can be arbitrary values, including lists:
    - ["a", 3, fahr_to_cent(-40), [3 + 4, 5 * 6]]


a[b] 		list __indexing__

a is the list expression, b is the index

- To evaluate:
  - evaluate the list expression to a value
  - evaluate the index expression to a value
  - if the list value is not a list, execution terminates with an error
  - if the element is not in range (not a valid index), execution terminates with an error
  - the value is the given element of the list value (counting from zero)


### List expression examples

What does this mean (or is it an error)? ([see in python tutor](https://tinyurl.com/y5sg98eo))

In [None]:
["Once", "upon", "a", "time", "there"][2]

In [None]:
["Once", "upon", "a", "time", "there"][0,2,3]

In [None]:
["Once", "upon", "a", "time", "there"][[0,2,3]]

In [None]:
["Once", "upon", "a", "time", "there"][[0,2,3][1]]

## Two dimensional nested lists

Each single element within mylist is __also a list__.

(Remember: a list can contain just about anything.)


In [None]:
mylist = [
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]
]

mylist[0] evaluates to a list

In [None]:
mylist = [
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]
]
mylist[0]

mylist[0] is an expression that evaluates to the first item in mylist. This just happens to also be another list expression

mylist[0][0] evaluates to the first element of the first list in the nested list


In [None]:
mylist = [
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]
]
mylist[0][0]

Note the double square brackets below. Adding a list within a list within a list.

> Add blockquote



In [None]:
mylist = [
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]
]
mylist.append([[10,11,12]])
print(mylist)

Recall that items within a list don't have to all be the same type. What will the following do?


In [None]:
mylist[0][0][0]

# Dictionaries

**Dictionaries** are a data structure that allows us to map **keys** to **values**. This is incredibly helpful! However there are some considerations that we must have when using **dictionaries**. We will see these in examples below!

Dictionaries map **keys** to **values**. Lets take a look at this

In [None]:
value1 = 1
value2 = 2
value3 = 3
value4 = 4

# Lets create a dictionary
my_dictionary = {
    "key1": value1,
    "key2": value2,
    "key3": value3,
    "key4": value4
}

The **order** of the keys inside your dictionary does not matter

In [None]:
# Lets make a dictionary with the same contents but different key order
my_dictionary_diff_order = {
    "key3": value3,
    "key2": value2,
    "key1": value1,
    "key4": value4
}

# Lets test if our dictionaries are the same
my_dictionary == my_dictionary_diff_order

Let's try having non **unique keys**

In [None]:
# Lets create a dictionary with repeating keys
my_dictionary_rep_keys = {
    "key1": value1,
    "key1": value1,
    "key1": value1,
    "key4": value1
}

The **keys** must be **immutable data types**. **Mutable data types** include: lists, sets, and dictionaries (so don't use these as your keys)! However, **mutable** data types can be used as the values in your dictionary.

In [None]:
# Lets create a dictionary using these guidlines
my_dictionary = {
    "key": [1, 2],
    42: "The answer",
    0: None,
    33.2: 22.3
}

Now that we know a bit about **dictionaries**, lets see what we can do with them. With dictionaries we can **create**, **query** (access), and **modify**. lets take a look at this!

**Creating** a dictionary

In [None]:
d1 = {}
d2 = dict()
# d1 and d2 are now empty dictionaries

favorite_colors = {
    "asfg": "Blue",
    "among": "Blue",
    "paolopan": "yellow",
    "sg721": "green"
}
# favorite_colors is now a dictionary with some inital keys and values

In [None]:
state_capitals = {"GA" : "Atlanta", "WA": "Olympia" }

phonebook = dict()
phonebook["Alice"] = "206-555-4455"
phonebook["Bob"] = "212-555-2211"

atomic_number = {}
atomic_number["H"] = 1
atomic_number["Fe"] = 26
atomic_number["Au"] = 79

**Accessing** a dictionary

In [None]:
atomic_number = {"H": 1, "Fe": 26, "Au": 79}

# Lets print the values contained by the key "Au"
print(atomic_number["Au"])

# What about if we use a key that does not exist in our dictionary?
# print(atomic_number["B"])

# Lets check if the key "Au" is in our dictionary
print("Au" in atomic_number)

# Lets check if the value 26 is in our dictionary
print(26 in atomic_number)

# Lets print the keys of our dictionary
print(atomic_number.keys())
print(list(atomic_number.keys()))

# Lets print the values of our dictionary
print(atomic_number.values())
print(list(atomic_number.values()))

print(list(atomic_number.items()))

**Iterating** through a dictionary

In [None]:
# We will use the atomic_number we had created above

# Print out all the keys: 
for element_name in atomic_number.keys():
    print(element_name)

# Another way to print out all the keys: 
for element_name in atomic_number:
    print(element_name)
    
# Print out all the values: 
for element_number in atomic_number.values():
    print(element_number)

# Print out the keys and the values
for (element_name, element_number) in atomic_number.items():
    print("name:", element_name, "number:", element_number)

**Modifying** a dictionary

In [None]:
# Lets create a new dictionary mapping the species of the pet with the pet names!
pet_names = {
    "Ducks": ["Duckula", "Howard", "Daffy"],
    "Mice": ["Stuart", "Mickey", "Templeton"],
    "Horses": ["Secretariat", "Black Beauty"]
}

# Lets add to our dictionaries mapping
pet_names["Rabbits"] = ["Peter", "Energizer"]

# Now lets changing the mapping of our dictionary
pet_names["Horses"] = ["Misty"] #Change mapping

# Finally, lets remove mapping
del pet_names["Mice"]

Lets work through some **dictionary exercises**

In [None]:
# What do you think this does?
squares = {1: 1, 2: 4, 3: 9, 4: 16}
print(squares[3] + squares[3])
print(squares[3 + 3])
print(squares[2] + squares[2])
print(squares[2 + 2])

In [None]:
# How about converting a list to a dictionary?
# E.g. Given [5, 6, 7], produce {5: 25, 6: 36, 7: 49}

d = {}
for i in [5, 6, 7]:    # or range(5, 8)
    d[i] = i * i
    
k = {}
for i in d.keys():
     k[d[i]] = i

Aside: A **list** is (kind of) like a **dictionary**

In [None]:
# Lists map integers using an index for each value
mylist = ['a', 'b', 'c']
mylist[3] = 'c'  # error!

Lets stop and think...

When is a list **more convenient** to use then a dictionary?

When is a list **less convenient** to use then a dictionary?

Lets dive into the specifics to **keys** for dictionaries

Not every value is allowed to be a **key** in a **dictionary**

Dictionaries hold key: value pairs

as we mentioned earlier, the keys of a dictionary must be **immutable**, however the values in a dictionary can be **anything**.

**Mutable** and **immutable** data types

**Immutable datatypes:**
int, float, boolean, string, function, tuple, frozenset

**Mutable datatypes:**
list, dictionary, set


### Reading data from a file 

- We'll usually work with data stored in the CSV format. CSV stands for "comma-separated values."

- We can read in a CSV using `pd.read_csv(...)`. Replace the `...` with a path to the CSV file relative to your notebook; if the file is in the same folder as your notebook, this is just the name of the file.

In [None]:
# Our CSV file is stored not in the same folder as our notebook, 
# but within a folder called data.
states = pd.read_csv('data/states.csv')
states

### About the data 🗽

Most of the data is self-explanatory, but there are a few things to note:

- `'Population'` figures come from the 2020 census.

- `'Land Area'` is measured in square miles.

- The `'Region'` column places each state in one of four regions, as determined by the US Census Bureau.

<center>
<img src='images/regions.png' width=600>
</center>

- The `'Party'` column classifies each state as `'Democratic'` or `'Republican'` based on a political science measurement called the Cook Partisan Voter Index. 


<center>
<img src='images/party.png' width=600>
</center>

### Structure of a DataFrame

- DataFrames have *columns* and *rows*.
    - Think of each column as an array. Columns contain data of the same type.
- Each column has a label, e.g. `'Capital City'` and `'Land Area'`.
    - Column labels are stored as strings.
- Each row has a label too – these are shown in bold at the start of the row.
    - Right now, the row labels are 0, 1, 2, and so on.
    - Together, the row labels are called the _index_. The index is **not** a column!
    

In [None]:
# This DataFrame has 50 rows and 6 columns.
states

## Example 1: Population density

**Key concepts**: Accessing columns, performing calculations with them, and adding new columns.

### Finding population density

**Question**: What is the population density of each state, in people per square mile?

In [None]:
states

- We have, separately, the population and land area of each state.

- Steps:
    - Get the `'Population'` column.
    - Get the `'Land Area'` column.
    - Divide these columns element-wise.
    - Add a new column to the DataFrame with these results.

#### Step 1 – Getting the `'Population'` column

- We can get a column from a DataFrame using `.get(column_name)`.
- Column names are case sensitive!
- Column names are strings, so we need to use quotes.
- The result looks like a 1-column DataFrame, but is actually a *Series*.

In [None]:
states

In [None]:
states.get('Population')

### Digression: Series

- A *Series* is like an array, but with an index.
- In particular, Series support arithmetic, just like arrays.

In [None]:
states.get('Population')

In [None]:
type(states.get('Population'))

#### Steps 2 and 3 – Getting the `'Land Area'` column and dividing element-wise

In [None]:
states.get('Land Area')

- Just like with arrays, we can perform arithmetic operations with two Series, as long as they have the same length and same index. 
- Operations happen element-wise (by matching up corresponding index values), and the result is also a Series.

In [None]:
states.get('Population') / states.get('Land Area')

#### Step 4 – Adding the densities to the DataFrame as a new column

- Use `.assign(name_of_column=data_in_series)` to assign a Series (or array, or list) to a DataFrame.
- Don't put quotes around `name_of_column`.
- This creates a new DataFrame, which we must save to a variable if we want to keep using it.

In [None]:
states.assign(
    Density=states.get('Population') / states.get('Land Area')
)

In [None]:
states

In [None]:
states = states.assign(
    Density=states.get('Population') / states.get('Land Area')
)
states

## Example 2: Exploring population density
**Key concept**: Computing statistics of columns using Series methods.

### Questions

- What is the highest population density of any one state? 
- What is the average population density across all states?

Series, like arrays, have helpful methods, including `.min()`, `.max()`, and `.mean()`.

In [None]:
states.get('Density').max()

What state does this correspond to? We'll see how to find out shortly!

Other statistics:

In [None]:
states.get('Density').min()

In [None]:
states.get('Density').mean()

In [None]:
states.get('Density').median()

In [None]:
# Lots of information at once!
states.get('Density').describe()

## Example 3: *Which* state has the highest population density?

**Key concepts**: Sorting. Accessing using integer positions.

#### Step 1  – Sorting the DataFrame

- Use the `.sort_values(by=column_name)` method to sort.
    - The `by=` can be omitted, but helps with readability.
- Like most DataFrame methods, this returns a new DataFrame.

In [None]:
states.sort_values(by='Density')

This sorts, but in ascending order (small to large). The opposite would be nice!

#### Step 1 – Sorting the DataFrame in *descending* order

- Use `.sort_values(by=column_name, ascending=False)` to sort in *descending* order.
- `ascending` is an optional argument. If omitted, it will be set to `True` by default.
    - This is an example of a *keyword argument*, or a *named argument*.
    - If we want to specify the sorting order, we **must** use the keyword `ascending=`.

In [None]:
ordered_states = states.sort_values(by='Density', ascending=False)
ordered_states

In [None]:
# We must specify the role of False by using ascending=, 
# otherwise Python does not know how to interpret this.
states.sort_values(by='Density', False)

#### Step 2 – Extracting the state name

- We saw that the most densely populated state is New Jersey, but how do we extract that information using code?
- First, grab an entire column as a Series.
- Navigate to a particular entry of the Series using `.iloc[integer_position]`.
    - `iloc` stands for "integer location" and is used to count the rows, starting at 0.

In [None]:
ordered_states

In [None]:
ordered_states.get('State')

In [None]:
# We want the first entry of the Series, which is at "integer location" 0.
ordered_states.get('State').iloc[0]

- The row label that goes with New Jersey is 29, because our original data was alphabetized by state and New Jersey is the 30th state alphabetically. But we **don't use the row label** when accessing with `iloc`; we use the integer position counting from the top.

- If we try to use the row label (29) with `iloc`, we get the state with the 30th highest population density, which is **not** New Jersey.

In [None]:
ordered_states.get('State').iloc[29]

## Example 4: What is the population density of Pennsylvania?

**Key concepts**: Setting the index. Accessing using row labels.

### Population density of Pennsylvania

We know how to get the `'Density'` of all states. How do we find the one that corresponds to Pennsylvania?

In [None]:
states

In [None]:
# Which one is Pennsylvania?
states.get('Density')

### Utilizing the index

- When we load in a DataFrame from a CSV, columns have meaningful names, but rows do not.

In [None]:
pd.read_csv('data/states.csv')

- The row labels (or the *index*) are how we refer to specific rows. Instead of using numbers, let's refer to these rows by the names of the states they correspond to.

- This way, we can easily identify, for example, which row corresponds to Pennsylvania.

### Setting the index

- To change the index, use `.set_index(column_name)`.
- Row labels should be unique identifiers.
    - Each row should have a different, descriptive name that corresponds to the contents of that row's data.

In [None]:
states

In [None]:
states.set_index('State')

- Now there is one fewer column. When you set the index, a column becomes the index, and the old index disappears.

- 🚨 Like most DataFrame methods, `.set_index` returns a new DataFrame; it does not modify the original DataFrame.

In [None]:
states

In [None]:
states = states.set_index('State')
states

In [None]:
# Which one is Pennsylvania? The one whose row label is "Pennsylvania"!
states.get('Density')

### Accessing using the row label

To pull out one particular entry of a DataFrame corresponding to a row and column with certain labels:
1. Use `.get(column_name)` to extract the entire column as a Series.
2. Use `.loc[]` to access the element of a Series with a particular row label.

In this class, we'll always first access a column, then a row (but row, then column is also possible).

In [None]:
states.get('Density')

In [None]:
states.get('Density').loc['Pennsylvania']

### Summary: Accessing elements of a DataFrame

- First, `.get` the appropriate column as a Series.
- Then, use one of two ways to access an element of a Series:
    - `.iloc[]` uses the integer position.
    - `.loc[]` uses the row label.
    - Each is best for different scenarios.

In [None]:
states.get('Density')

In [None]:
states.get('Density').iloc[4]

In [None]:
states.get('Density').loc['California']

### Note

- Sometimes the integer position and row label are the same.
- This happens by default with `pd.read_csv`.

In [None]:
pd.read_csv('data/states.csv')

In [None]:
pd.read_csv('data/states.csv').get('Capital City').loc[35]

In [None]:
pd.read_csv('data/states.csv').get('Capital City').iloc[35]