# 1.1 Python Lists and Dictionaries

This section introduces Python lists, dictionaries and comprehensions.  The objects are foundational to Python and are used extensively.

## Introduction to Lists
Lists are *sequences* of objects.  A *sequence* is a "positionally ordered collection of other objects."  Lists can include any types of Python objects (and do not need to be homogeneous - i.e., object types can be different in the same list).

In [None]:
# Define a list and then show the list.  Note that the list contains objects of 
# different types (ints, floats, strings).
l1 = [1, 2, 3, 17.24, 967, 45, "dog", 'cat', [1, 2, 3], {'one':1, 'two':2}]
l1

In [None]:
# element referencing, i, -i, 0-based.
l1[0], l1[1]

In [None]:
# and can be negative (-1 starts at the "end" of the list)
l1[-1]

In [None]:
# Lists have "iterators" that allow you to easily iterate though the list elements
# without using explicit indices
for item in l1:
    print(item)

In [None]:
# as compared to the traditional indexing method ...
for j in range(len(l1)):
    print(l1[j])

In [None]:
# List slicing - as with string slicing, the slice 
# icludes elements i, i+1, i+2, up to, but not including j (and is a separate list)
l2 = l1[2:6]
l2

In [None]:
# contatenation
l3 = l1 + l2 + [86.4, 91.8, 'pony']
l3

In [None]:
# repetition
l4 = l3[:3]*3
l4

In [None]:
# Nested lists - this is a "list of lists."
l2 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
l2

In [None]:
# this one and the previous one are equivalent -- which is clearer (visually)?
l2 = [ [1, 2, 3]
     , [4, 5, 6]
     , [7, 8, 9]
     ]
l2

In [None]:
# to view the nested list (list of lists) in matrix form:
for r in l2:
    print(r)

### List Mutability
This is IMPORTANT and causes beginners problems.  Unlike simple objects, lists are *mutable* in Python.

In [None]:
# Create a list and assign two references to that list
l1 = [1, 2, 3]
l2 = l1
l1, l2

In [None]:
# Update the first element of l1
l1[0] = "The dog ate my homework."
# Now show both lists.  
l1, l2

In [None]:
# compare to the similar actions with simple objects (which are immutable)
x = 123
y = x
x, y

In [None]:
x = "The dog ate my homework."
x, y

In [None]:
# compare with this version (reset the two lists first)
l1 = [1, 2, 3]
l2 = l1
l1, l2

In [None]:
# Why was L2 changed last time but not this time?
l1 = ['a', 'b']
l1, l2

### Sample List/Sequence Functions/Methods

In [None]:
# append
l = [1, 2, 3, 'cow']
l.append(0)
l

In [None]:
# pop
x = l.pop()
l, x

In [None]:
l.remove?

In [None]:
# remove - try l.remove? and enter (to see the help)
l.remove(3)
l

In [None]:
#sort - Note that in the default version (with 'cow' in the list), you'll get an error
#  since the default sorting can't process numbers and strings together.  We will revisit
#  this limitation later.
l.sort()
l

In [None]:
l2 = [1, 22, 3, 9, 2]
l2.sort()
l2

In [None]:
l

In [None]:
# reverse
l.reverse()
l

### Range Objects and Lists

A nice (short) discussion on Python 3 iterators - https://stackoverflow.com/questions/22147757/iterators-in-python-3

In [None]:
# syntax: range([start], stop [, step])
# the brackets mean that start and step are optional arguments
r = range(10)
r, list(r)

In [None]:
r = range(7, 24, 2)
r, list(r)

In [None]:
# Range objects are often used as iterators for loop operations
for j in range(5):
    print(j)

In [None]:
str = "The dog ate my homework."
for j in range(len(str)):
    print (str[j])
# What exactly does the expression range(len(str)) return?

In [None]:
# Note that since strings are sequences, they also have iterators
for char in str:
    print(char)

In [None]:
l = ["one", 2, "three", 4, 'Five']
for j in range(len(l)):
    print(l[j])

In [None]:
# Note that list objects also have built-in iterators
for item in l:
    print(item)

## Introduction to List Comprehensions
This is an **important** concept in Python world -- comprehensions are your friends!

In [None]:
# Define a matrix
m = [ [1, 2, 3]
    , [4, 5, 6]
    , [7, 8, 9]
    ]
# Now, m is a list of 3 lists
# Equivalent to m = [[1, 2, 3], [4,5,6],[7,8,9]] (A list of three three-element lists)
m

In [None]:
# Or in more familar matrix form:
for r in m:
    print(r)

In [None]:
# To extract the third row (as a list)
m[2]

In [None]:
# the second element of the third row
m[2][1]

In [None]:
# extract the third column using a standard programatic approach:
c = []
for i in [0, 1, 2]:
    c.append(m[i][2])
c

In [None]:
# more generally (works with any length (number of rows) matrix):
c = []
for i in range(len(m)):
    c.append(m[i][2])
c

In [None]:
# or, y using the list iterator rather than an index:
c = []
for r in m:
    c.append(r[2])
c

In [None]:
# or, equivalently - use a list comprehension
c = [r[2] for r in m]
c

In [None]:
# or
c = [m[i][2] for i in range(len(m))]
c

In [None]:
# to extract the diagonal (upper left to lower right)
[m[i][i] for i in [0, 1, 2]]

In [None]:
# more generally ... (what does this mean "more generally?")
[m[i][i] for i in range(len(m))]

In [None]:
# even elements of the second column
[r[1] for r in m if r[1] % 2 == 0]
# added an "if condition" to the comprehension

In [None]:
# odd elements of the second column
[r[1] for r in m if r[1] % 2]

In [None]:
# 18 random die rolls
import random
[random.randint(1, 6) for i in range(18)]
# note that the iterator (i) is only used to iterate and isn't used in the expression.

In [None]:
# 100 "craps" rolls (rolling a pair of dice in each roll)
rolls = [[random.randint(1,6), random.randint(1,6)] for i in range(100)]
rolls

In [None]:
# Or, using Numpy (which is vectorized)
import numpy as np
rolls = [np.random.randint(1, 7, 2) for i in range(100)]
rolls
# notice that Numpy uses [low, high) and Random uses [Low, High] -- Look at the help to see this.

In [None]:
# Suppose you'd like to estimate the probability of rolling a '7' ...
#  (true prob. is 6/36=.1667)
import numpy as np
obs = 50000
p = len([r for r in [np.random.randint(1, 7, 2) for i in range(obs)] if sum(r) == 7])/obs
#
# Or, if you want to see it step-by-step - remember, think inside-to-outside, left-to-right.
#rolls = [np.random.randint(1, 7, 2) for i in range(obs)]
#sevens = [r for r in rolls if sum(r) == 7]
#p = len(sevens)/obs

"Estimate or prob. based on {:,d} samples: {:.4f}".format(obs,p)

### Sample List-based Data Structure and Accessing/Processing/Comprehensions

In [None]:
# Slide 16 of 03 Introduction to Python.pptx
#
# creating a list to define a person
person = ["Tom Howard", 54, 6.0]

# creating a list of lists to define a team
people = [
    ["Tom Howard",          54,  6.0],
    ["Jane Grimm",          19,  4.9],
    ["Sam Brown",           25,  6.2],
    ["Sarah Joan Spade",    26, 5.25],
    ["Blaine Jones",        62,  5.8],
    ["Devin Callahan",      32, 5.92],
]

In [None]:
person

In [None]:
people

In [None]:
# How many people on the team
len(people)

In [None]:
# Print each person's name and age
for p in people:
    print("{:} is {:} years old".format(p[0], p[1]))
# In this statement, each p represents a "record" in the dataset and
# each element of p (it is a list) is an "attribute"

In [None]:
# Create a list of all names
[p[0] for p in people]

In [None]:
# Create a list of all last names
[p[0].split()[-1] for p in people]

In [None]:
# Create a list of all ages
[p[1] for p in people]

In [None]:
# Compute the average age
sum([p[1] for p in people])/float(len(people))

In [None]:
# Using a more human-friendly format
"The average age of the team members is {:.1f} years.".format(
    sum([p[1] for p in people])/float(len(people))
)
# Note that the previous examples create anonymous objects -- nothing
# persists after dispalaying the expressions and the memory is 
# marked for garbage collection.

In [None]:
# Find the oldest person.  Go though step-by-step, uncommenting lines to
# see the progress.
# max age
ages =[p[1] for p in people] 
max(ages)

In [None]:
# which is max? - 62
ages.index(62)

In [None]:
# who - person 4
people[4][0]

In [None]:
# all together
people[[p[1] for p in people].index(max([p[1] for p in people]))][0]

In [None]:
# Find the youngest person
people[[p[1] for p in people].index(min([p[1] for p in people]))][0]

In [None]:
# Supposed that we had defined variables to store the "column numbers" -- the
#  indices into the internal lists for the given data items
name   = 0
age    = 1
height = 2
# Now, we can use these instead of the literals in all of the expressions
# For example.  
people[[p[age] for p in people].index(max([p[age] for p in people]))][name]
# now it's a little clearer that we're looking for the name of the oldest person
# Further, if we add or remove columns from the data set, can simply update the
# index variables and all of the expression code will still work as expected.


#### Student question - what if there are multiple people with the same min/max age?

In [None]:
# What if multiple people have the same min/max age?
people = [
    ["Tom Howard",          19,  6.0],
    ["Jane Grimm",          19,  4.9],
    ["Sam Brown",           25,  6.2],
    ["Sarah Joan Spade",    26, 5.25],
    ["Blaine Jones",        62,  5.8],
    ["Devin Callahan",      62, 5.92],
]

In [None]:
# Our previous statement:
people[[p[1] for p in people].index(max([p[1] for p in people]))][0]

In [None]:
# Only finds one of the two.  Why and which one?
people.index?

In [None]:
# Need something that supports multiple occurences.  One (of many possible) example(s) is:
[p[0] for p in people if p[1] == max([p[1] for p in people])]

In [None]:
# min age
[p[0] for p in people if p[1] == min([p[1] for p in people])]

### Dictionaries

In [None]:
# A dictionary is similar to a list, but uses a 'key' rather than an integer
# index to specify 'location' in the dictionary.
# Dictionaries are very useful, but there's usually a bit of a learning curve
d = {"a": 123, "b": 345, "c": 789}
d

In [None]:
d['a'], d['b'], d['c']

In [None]:
my_key = 'a'
d[my_key]

In [None]:
for k in ['a', 'b','c']:
    print(d[k])

In [None]:
# all of the keys
d.keys()

In [None]:
# as a list
list(d.keys())

In [None]:
# iterator -- note that the key is the iterator
for k in d:
    print(d[k])

In [None]:
d.values()

In [None]:
list(d.values())

In [None]:
# a dictionary of lists
dl = { 
     'one' : [1, 2, 3, 4, 5]
    ,'two' : [6, 7, 8, 9, 10]
    ,'three' : [127, 96, 455, 32, 5] 
}
dl

In [None]:
for key in dl:
    print(dl[key])

In [None]:
for key in dl:
    for e in dl[key]:
        print(e)

In [None]:
student = {'name': 'Joe', 'class':'sr', 'grade':92}
student

In [None]:
student['grade']

In [None]:
# Nested dictionaries
students = {
    'Jane Doe'   : {'ID':'b0001','Gender': 'F','HW1':95,'HW2':87, 'HW3':92,'Exam1': 88,'Exam2':93,'FinalExam':90},
    'John Blue'  : {'ID':'b0002','Gender': 'M','HW1':55,'HW2':76, 'HW3':89,'Exam1': 77,'Exam2':82,'FinalExam':80},
    'Kim Tester' : {'ID':'b0003','Gender': 'F','HW1':80,'HW2':75, 'HW3':65,'Exam1': 70,'Exam2':75,'FinalExam':80},
    'Larry Black': {'ID':'b0004','Gender': 'M','HW1':90,'HW2':90, 'HW3':92,'Exam1': 95,'Exam2':85,'FinalExam':94},
    'Susan White': {'ID':'b0005','Gender': 'F','HW1':65,'HW2':52, 'HW3':85,'Exam1': 45,'Exam2':80,'FinalExam':82}
    }


In [None]:
# show all students
for student in students:
    print (student)

In [None]:
# Final Exams
for student in students:
    print ("{:} made a {:d} on the final exam".format(student, students[student]['FinalExam']))

In [None]:
# Scores on the final exam using a comprehension
fe = [students[k]['FinalExam'] for k in students]
fe

In [None]:
# average score
sum(fe)/len(students)

In [None]:
# All together
sum([students[k]['FinalExam'] for k in students])/len(students)

In [None]:
# Human-friendly
"The average score on the final exam (for {:} students) is {:.1f}".format(
    len(students), sum([students[k]['FinalExam'] for k in students])/len(students))

See the companion Spyder Python script "lists_dictionaries.py" for more more data structure examples.