Control Structures
------------------

We've spent some time going into detail about some of the data types and structures available in python. It's now time to talk about how to navigate through some of this data, and use data to make decisions. Traversing over data and making decisions based upon data are a common aspect of every programming language, known as control flow. Python provides a rich control flow, with a lot of conveniences for the power users. Here, we're just going to talk about the basics, to learn more, please [consult the documentation](http://docs.python.org/2/tutorial/controlflow.html). 

A common theme throughout this discussion of control structures is the notion of a "block of code." Blocks of code are **demarcated by a specific level of indentation**, typically separated from the surrounding code by some control structure elements, immediately preceeded by a colon, `:`. We'll see examples below. 

Finally, note that control structures can be nested arbitrarily, depending on the tasks you're trying to accomplish. 

### while statements:

While loops are keep iterating until a given condition becomes true. For example, the following example counts until `i` becomes larger than 5

In [95]:
i=0
while i<=5:
    print(20*i)
    i += 1

0
20
40
60
80
100


Or consider the following example: The tea starts at 115 degrees Fahrenheit. You want it at 110 degrees. A chip of ice turns out to lower the temperature one degree every second. You test the temperature each time, and also print out the temperature before reducing the temperature. In Python you could write and run the code below:

In [96]:
import time # This is just to use the time.sleep function

temperature = 115  
while temperature > 110: # first while loop code
    print(temperature)
    time.sleep(1.0) # Wait for 1 sec
    temperature -= 1
     
print('The tea is cool enough.')

115
114
113
112
111
The tea is cool enough.


And here is a simplified simulator that starts with an amount of money in the bank, and then examines the effect of withdrawing a specific amount of money per year, until you run out of money. Notice that the loop will keep running for ever, if you never withdraw more money than what you have. (We will deal with this issue next.)

In [97]:
money_in_bank = 1000
interest = 6
year = 2017
widthdrawal_per_period = 200

while money_in_bank>0:
    print(f"At the beginning of {year} you have ${money_in_bank:8.2f}.")
    money_in_bank = money_in_bank - widthdrawal_per_period
    money_in_bank = money_in_bank * (1 + interest/100)
    year = year + 1
    print(f"At the end of {year} you have ${money_in_bank:8.2f}.")
    print("-----------------")

print("You have no money left!")

At the beginning of 2017 you have $ 1000.00.
At the end of 2018 you have $  848.00.
-----------------
At the beginning of 2018 you have $  848.00.
At the end of 2019 you have $  686.88.
-----------------
At the beginning of 2019 you have $  686.88.
At the end of 2020 you have $  516.09.
-----------------
At the beginning of 2020 you have $  516.09.
At the end of 2021 you have $  335.06.
-----------------
At the beginning of 2021 you have $  335.06.
At the end of 2022 you have $  143.16.
-----------------
At the beginning of 2022 you have $  143.16.
At the end of 2023 you have $  -60.25.
-----------------
You have no money left!


### Break and Continue: 

These two statements are used to modify iteration of loops. 
- `Break` is used to *exit immediately* the *inner most _loop_* in which it appears. 
- In contrast, `continue` stops the code executing within the loop and goes on to the *next iteration of the same loop*.

For example, consider our previous example. The loop will keep running for ever, if you never withdraw more money than what you have. To avoid this infinite loop, we can add an extra check in the code, checking if the year is above a certain limit, and stop execution of the loop at that point.

In [98]:
money_in_bank = 1000
interest = 6
year = 2017

# Try also the values 56.6 and 56.7 and see its behavior
widthdrawal_per_period = 50 

while money_in_bank>0:
    print(f"At the beginning of {year} you have ${money_in_bank}.")
    money_in_bank = money_in_bank - widthdrawal_per_period
    money_in_bank = money_in_bank * (1 + interest/100)
    year = year + 1
    if year > 2117:
        print("I am pretty sure you will not be alive by then")
        break
    if year > 2020:
        continue
    print(f"At the end of {year} you have ${money_in_bank}.")
    print("-----------------")

print("You have no money left (or you are dead)!")

At the beginning of 2017 you have $1000.
At the end of 2018 you have $1007.0.
-----------------
At the beginning of 2018 you have $1007.0.
At the end of 2019 you have $1014.4200000000001.
-----------------
At the beginning of 2019 you have $1014.4200000000001.
At the end of 2020 you have $1022.2852000000001.
-----------------
At the beginning of 2020 you have $1022.2852000000001.
At the beginning of 2021 you have $1030.6223120000002.
At the beginning of 2022 you have $1039.4596507200004.
At the beginning of 2023 you have $1048.8272297632004.
At the beginning of 2024 you have $1058.7568635489924.
At the beginning of 2025 you have $1069.282275361932.
At the beginning of 2026 you have $1080.439211883648.
At the beginning of 2027 you have $1092.265564596667.
At the beginning of 2028 you have $1104.801498472467.
At the beginning of 2029 you have $1118.0895883808153.
At the beginning of 2030 you have $1132.1749636836641.
At the beginning of 2031 you have $1147.105461504684.
At the beginning 

In [99]:
import time
temperature = 143 
while temperature > 110:     # first while loop code
    temperature = temperature - 1
    if temperature % 5 != 0: # If the temperature is not divisible by 5
        continue             # We keep running the loop, but will not print
                             # the temperature and have a delay
    time.sleep(0.5)
    print(temperature)
     
print('The tea is cool enough.')

140
135
130
125
120
115
110
The tea is cool enough.


### for Statements:

**See also LPTHW, Exp 32.**

`for` statements are a convenient way to iterate through the values contained in a data structure. Going through the elements in a data structure one at a time, this element is assigned to variable. The code block associated with the for statement (or for loop) is then evaluated with this value.

In [100]:
set_a = {1, 2, 3, 4}
for i in set_a:
    print(f"{i} squared is: {i*i}")

1 squared is: 1
2 squared is: 4
3 squared is: 9
4 squared is: 16


In [101]:
print("a more complex block")
set_a = {1, 2, 3, 4, 5, 6}
for i in set_a:
    # print(i)
    if i >= 3:
        print(f"==> {i} squared is: {i*i}")
    else:
        print(f"We will ignore {i}")

a more complex block
We will ignore 1
We will ignore 2
==> 3 squared is: 9
==> 4 squared is: 16
==> 5 squared is: 25
==> 6 squared is: 36


In [102]:
# Look for a person with many possible nicknames
names            = ['amy', 'brandon', 'charlotte', 'joe']
joseph_nicknames = ['joe', 'joey', 'joseph', 'giuseppe']

for n in names:                # Look through all names in list
    if n in joseph_nicknames:  # Check if the name is one of Joseph's nicknames
        print(f"Found a version of Joseph, called {n}.")

Found a version of Joseph, called joe.


In [103]:
# To build some intuition on how everything is working, 
# let's just run some nested for loops and print the content
list_a     = ['amy', 'brandon', 'charlotte', 'joseph']
candidates = ['joe', 'joey', 'joseph', 'giuseppe']
letters    = ['a', 'b', 'c']
for name in list_a:
    for cand in candidates:
        for l in letters:
            print(f"Here, name={name} and cand={cand} and letter={l}")

Here, name=amy and cand=joe and letter=a
Here, name=amy and cand=joe and letter=b
Here, name=amy and cand=joe and letter=c
Here, name=amy and cand=joey and letter=a
Here, name=amy and cand=joey and letter=b
Here, name=amy and cand=joey and letter=c
Here, name=amy and cand=joseph and letter=a
Here, name=amy and cand=joseph and letter=b
Here, name=amy and cand=joseph and letter=c
Here, name=amy and cand=giuseppe and letter=a
Here, name=amy and cand=giuseppe and letter=b
Here, name=amy and cand=giuseppe and letter=c
Here, name=brandon and cand=joe and letter=a
Here, name=brandon and cand=joe and letter=b
Here, name=brandon and cand=joe and letter=c
Here, name=brandon and cand=joey and letter=a
Here, name=brandon and cand=joey and letter=b
Here, name=brandon and cand=joey and letter=c
Here, name=brandon and cand=joseph and letter=a
Here, name=brandon and cand=joseph and letter=b
Here, name=brandon and cand=joseph and letter=c
Here, name=brandon and cand=giuseppe and letter=a
Here, name=bra

In [104]:
print("dictionaries let you iterate through keys, values, or both")
phones = {
    "Panos": "212-998-0803",
    "Maria": "656-233-5555",
    "John":  "693-232-5776",
    "Jake": "415-794-3423"
}

dictionaries let you iterate through keys, values, or both


In [105]:
# Iterate keys
print("Iterating over keys")
for k in phones.keys():
    print(k)

Iterating over keys
Panos
Maria
John
Jake


In [106]:
# Iterate values
print("Iterating over values")
for v in phones.values():
    print(v)

Iterating over values
212-998-0803
656-233-5555
693-232-5776
415-794-3423


In [107]:
print("Iterating over both keys and values")
# Items returns *tuples* that correspond to key-value pairs
# ("Panos", "212-998-0803"), ("Maria": "656-233-5555"), etc.
for (k,v) in phones.items():
    print(k,v)

Iterating over both keys and values
Panos 212-998-0803
Maria 656-233-5555
John 693-232-5776
Jake 415-794-3423


In [108]:
# Observe that .items() gives us (key, value) pairs 
phones.items()

dict_items([('Panos', '212-998-0803'), ('Maria', '656-233-5555'), ('John', '693-232-5776'), ('Jake', '415-794-3423')])

In [109]:
nba_teams = ["Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", 
             "Charlotte Hornets", "Chicago Bulls", "Cleveland Cavaliers", 
             "Dallas Mavericks", "Denver Nuggets", "Detroit Pistons", 
             "Golden State Warriors", "Houston Rockets", "Indiana Pacers", 
             "LA Clippers", "Los Angeles Lakers", "Memphis Grizzlies", 
             "Miami Heat", "Milwaukee Bucks", "Minnesota Timberwolves", 
             "New Orleans Pelicans", "New York Knicks", "Oklahoma City Thunder", 
             "Orlando Magic", "Philadelphia 76ers", "Phoenix Suns", "Portland Trail Blazers", 
             "Sacramento Kings", "San Antonio Spurs", "Toronto Raptors", "Utah Jazz"]
print("The list contains", len(nba_teams), "teams:")


i=0  # Initialize a loop counter
for team in nba_teams:
    
    print(i, team) # Print the team  (per the counter) and the team name
    i+=1           # Increment the counter

The list contains 29 teams:
0 Atlanta Hawks
1 Boston Celtics
2 Brooklyn Nets
3 Charlotte Hornets
4 Chicago Bulls
5 Cleveland Cavaliers
6 Dallas Mavericks
7 Denver Nuggets
8 Detroit Pistons
9 Golden State Warriors
10 Houston Rockets
11 Indiana Pacers
12 LA Clippers
13 Los Angeles Lakers
14 Memphis Grizzlies
15 Miami Heat
16 Milwaukee Bucks
17 Minnesota Timberwolves
18 New Orleans Pelicans
19 New York Knicks
20 Oklahoma City Thunder
21 Orlando Magic
22 Philadelphia 76ers
23 Phoenix Suns
24 Portland Trail Blazers
25 Sacramento Kings
26 San Antonio Spurs
27 Toronto Raptors
28 Utah Jazz


### Exercise

* print the names of the people from the dictionary below, by iterating through the keys
* print the age of each person, by iterating through the keys, and then looking up the "YOB" entry.
* print the names of people born after 1980
* print the number of children for each person. You need to check if the "Children" list exists in the dictionary.

In [110]:
data = {
        "Foster": {
            "Job": "Professor", 
            "YOB": 1965, 
            "Children": ["Hannah"],
            "Awards": ["Best Teacher 2014", "Best Researcher 2015"],
            "Salary": 120000
        }, 
        "Joe": {
            "Job": "Data Scientist", 
            "YOB": 1981,
            "Salary": 200000
        },
        "Maria": { 
            "Job": "Software Engineer", 
            "YOB": 1993, 
            "Children": [],
            "Awards": ["Dean's List 2013", "Valedictorian 2011", "First place in Math Olympiad 2010"]
        }, 
        "Panos": { 
            "Job": "Professor", 
            "YOB": 1976, 
            "Children": ["Gregory", "Anna"]
        },
    }

In [111]:
# Print the names of people in the data who are born after 1980
for name in data.keys():
    age = 2018 - data[name]['YOB']  # Get their year of birth from each person's dictionary
    
    if age>38:                      # Print only if > 38, e.g. born after 1980
    # alternate: 
    # if data[name]['YOB']>1980:
        print(f"Name: {name}, Age: {age}")  # Print the results
    

Name: Foster, Age: 53
Name: Panos, Age: 42


In [112]:
# Print how many children everyone has, including if they have none
for name in data.keys():              # Go through the master dictionary key by key
    
    if 'Children' not in data[name]:  # If a person's dictionary does not have a "Children" key
        data[name]['Children'] = []   # Give them an empty list of children
    
    children   = data[name]['Children']          # Retrieve everyone's children
    n_children = len(children)                   # Count the children
    
    print(f"{name:6s} has {n_children} children." ) # Print number of children

Foster has 1 children.
Joe    has 0 children.
Maria  has 0 children.
Panos  has 2 children.


In [113]:
# Give everyone a new child named Katherine
for name in data.keys():
    data[name]['Children'].append("Katherine") 

print(data)

{'Foster': {'Job': 'Professor', 'YOB': 1965, 'Children': ['Hannah', 'Katherine'], 'Awards': ['Best Teacher 2014', 'Best Researcher 2015'], 'Salary': 120000}, 'Joe': {'Job': 'Data Scientist', 'YOB': 1981, 'Salary': 200000, 'Children': ['Katherine']}, 'Maria': {'Job': 'Software Engineer', 'YOB': 1993, 'Children': ['Katherine'], 'Awards': ["Dean's List 2013", 'Valedictorian 2011', 'First place in Math Olympiad 2010']}, 'Panos': {'Job': 'Professor', 'YOB': 1976, 'Children': ['Gregory', 'Anna', 'Katherine']}}


In [114]:
# Give Joe only a child called Joe Jr
data['Joe']['Children'].append("Joe Jr") 

# Create a new key for Joe's dictionary, called "Pets"
data['Joe']['Pets'] = ''  # Initialize the value of Joe's pets to be an empty string
                          # (Can also use 'NA', or empty list [])

# Check on our modifications
data['Joe']

{'Job': 'Data Scientist',
 'YOB': 1981,
 'Salary': 200000,
 'Children': ['Katherine', 'Joe Jr'],
 'Pets': ''}

Let's do some more practice with list modification.

In [115]:
# Create a new dictionary
practice = {'thing1': [],
            'thing2': 'summertime'}

# Add new k,v pair
practice['thing3'] = {'a':1, 'b':2} 
practice

{'thing1': [], 'thing2': 'summertime', 'thing3': {'a': 1, 'b': 2}}

In [116]:
# Overwrite k,v pair that already exists
practice['thing3'] = ['ice cream', 'watermelon'] 
practice

{'thing1': [], 'thing2': 'summertime', 'thing3': ['ice cream', 'watermelon']}

In [117]:
# Append a list, ['corn']
practice['thing3'].append(['corn'])
practice

{'thing1': [],
 'thing2': 'summertime',
 'thing3': ['ice cream', 'watermelon', ['corn']]}

In [118]:
# Oops! I didn't mean to do this. Remove the third item of 'thing3'
practice['thing3'].pop(2)
practice

{'thing1': [], 'thing2': 'summertime', 'thing3': ['ice cream', 'watermelon']}

In [119]:
# Append a string, 'corn'
practice['thing3'].append('corn')
practice

{'thing1': [],
 'thing2': 'summertime',
 'thing3': ['ice cream', 'watermelon', 'corn']}

In [120]:
# Append two things
practice['thing3'].extend(['barbecue', 'popsicles'])
practice

{'thing1': [],
 'thing2': 'summertime',
 'thing3': ['ice cream', 'watermelon', 'corn', 'barbecue', 'popsicles']}

**Answer**: <span style="color:white">
'''Print the names of people in the data'''
for person in data.keys():
    print(person)
'''Print the names and age'''
for person in data.keys():
    age = 2018 - data[person]['YOB']
    print(f"{person:>6s} is {age} years old.")
'''Print the names of people born after 1980'''
for person in data.keys():
    if data[person]['YOB']>1980:
        print(person)
'''Print the number of children for each person'''
for person in data.keys():
    if 'Children' in data[person].keys():
        n_children = len(data[person]['Children'])
        if n_children==1:
            print(f"{person} has {n_children} child.")
        else:
            print(f"{person} has {n_children} children.")
    else:
        print(f"{person} may or may not have children.")

### Using Break/Continue with for loops

Let's see an example of using `break` and `continue` within a for loop.

In [121]:
nba_teams = ["Atlanta Hawks", "Boston Celtics", "Brooklyn Nets", "Charlotte Hornets", 
             "Chicago Bulls", "Cleveland Cavaliers", "Dallas Mavericks", "Denver Nuggets", 
             "Detroit Pistons", "Golden State Warriors", "Houston Rockets", "Indiana Pacers", 
             "LA Clippers", "Los Angeles Lakers", "Memphis Grizzlies", "Miami Heat", "Milwaukee Bucks",
             "Minnesota Timberwolves", "New Orleans Pelicans", "New York Knicks", "Oklahoma City Thunder", 
             "Orlando Magic", "Philadelphia 76ers", "Phoenix Suns", "Portland Trail Blazers", "Sacramento Kings", 
             "San Antonio Spurs", "Toronto Raptors", "Utah Jazz"]
print("The list contains", len(nba_teams), "teams")

The list contains 29 teams


We will now search through the list of teams, to find whether there is a team that contains the `looking_for` string. Try the variant with the `break` and with the `continue`, to see the difference.

In [122]:
looking_for = "Brooklyn"
for team in nba_teams:
    if looking_for in team: 
        print(f"We found the team: {team} containing {looking_for}.")
        print("We will stop searching now.")
        # we go out of the loop
        continue # we skip the remaining of the code in the nested block
    # else:
    print(team, "does not contain the string", looking_for)
    
print("Out of the loop!")

Atlanta Hawks does not contain the string Brooklyn
Boston Celtics does not contain the string Brooklyn
We found the team: Brooklyn Nets containing Brooklyn.
We will stop searching now.
Charlotte Hornets does not contain the string Brooklyn
Chicago Bulls does not contain the string Brooklyn
Cleveland Cavaliers does not contain the string Brooklyn
Dallas Mavericks does not contain the string Brooklyn
Denver Nuggets does not contain the string Brooklyn
Detroit Pistons does not contain the string Brooklyn
Golden State Warriors does not contain the string Brooklyn
Houston Rockets does not contain the string Brooklyn
Indiana Pacers does not contain the string Brooklyn
LA Clippers does not contain the string Brooklyn
Los Angeles Lakers does not contain the string Brooklyn
Memphis Grizzlies does not contain the string Brooklyn
Miami Heat does not contain the string Brooklyn
Milwaukee Bucks does not contain the string Brooklyn
Minnesota Timberwolves does not contain the string Brooklyn
New Orle

In [123]:
# Note that we can achieve the same thing as continue with 'if-else'
looking_for = "Brooklyn"
for team in nba_teams:
    if looking_for in team: 
        print(f"We found the team: {team} containing {looking_for}.")
        print("We will stop searching now.")
        # we go out of the loop
        # continue # we skip the remaining of the code in the nested block
    else:
        print(team, "does not contain the string", looking_for)
    
print("Out of the loop!")

Atlanta Hawks does not contain the string Brooklyn
Boston Celtics does not contain the string Brooklyn
We found the team: Brooklyn Nets containing Brooklyn.
We will stop searching now.
Charlotte Hornets does not contain the string Brooklyn
Chicago Bulls does not contain the string Brooklyn
Cleveland Cavaliers does not contain the string Brooklyn
Dallas Mavericks does not contain the string Brooklyn
Denver Nuggets does not contain the string Brooklyn
Detroit Pistons does not contain the string Brooklyn
Golden State Warriors does not contain the string Brooklyn
Houston Rockets does not contain the string Brooklyn
Indiana Pacers does not contain the string Brooklyn
LA Clippers does not contain the string Brooklyn
Los Angeles Lakers does not contain the string Brooklyn
Memphis Grizzlies does not contain the string Brooklyn
Miami Heat does not contain the string Brooklyn
Milwaukee Bucks does not contain the string Brooklyn
Minnesota Timberwolves does not contain the string Brooklyn
New Orle

Technically, we can simulate the use of `break` and `continue` with `if-else` statements, but their usage often makes the code easier to read. Consider the following example, where we want to search for a team, and if we find a team that matches, we want to see if they made it to the play offs.

In [124]:
playoff = ["Atlanta Hawks", "Boston Celtics", "Charlotte Hornets","Cleveland Cavaliers", 
           "Dallas Mavericks", "Detroit Pistons","Golden State Warriors", "Houston Rockets", 
           "Indiana Pacers", "LA Clippers", "Memphis Grizzlies", "Miami Heat", "Oklahoma City Thunder", 
           "Portland Trail Blazers", "San Antonio Spurs", "Toronto Raptors"]
print("The list contains", len(playoff), "teams")

The list contains 16 teams


In [125]:
looking_for = "Knicks"

for team in nba_teams:
    #print(team)
    # If the team does not match, we continue searching
    # without executing the remaining code
    if looking_for not in team: 
        continue
 
    # If we have found a matching team, we check for their status in playoffs
    if team in playoff:
        print(team, "was in the playoffs!")
    else:
        print(team, "was not in the playoffs...")


New York Knicks was not in the playoffs...


### Ranges of Integers:

Often it is convenient to define (and iterate through) ranges of integers. Python has a convenient `range` function that allows you to do just this.
- Where `range(j)` is given only one parameter, it implicitly uses 0 as a starting point and takes steps of 1
- Where `range(i,j)` is given only two parameters, it uses `j` as a starting point and takes steps of 1
- Where `range(i,j,k)` is given three parameters, `i` is the starting point, `j` is the end, and `k` is the step size
- Note that `range` is not inclusive on the upper end, i.e. `range(i,j)` will not include `j`

In [126]:
list(range(10)) # start at zero, < the specified ceiling value
                # range(10) <=> range(0,10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [127]:
# range is convenient for use in loops
for i in range(10):
    print(f"{i} squared is {i*i}")

0 squared is 0
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
6 squared is 36
7 squared is 49
8 squared is 64
9 squared is 81


In [128]:
# When the range command has two parameters, 
# it starts from the first parameter
# and finishes at the second 
print(list(range(5, 10))) # from the left value, < right value

[5, 6, 7, 8, 9]


In [129]:
# When range has a third argument, this is the "step" value
print(list(range(-5, 50, 5)) )# from the left value, to the middle value, incrementing by the right value

[-5, 0, 5, 10, 15, 20, 25, 30, 35, 40, 45]


In [130]:
# We can also go in reverse
print(list(range(50, -21, -5)) )

[50, 45, 40, 35, 30, 25, 20, 15, 10, 5, 0, -5, -10, -15, -20]


#### Warning

Those that are already familiar with programming will tend to write code like this:

In [131]:
# Old style, using indexing for loops
names = ["Abe", "Bill", "Chris", "Dorothy", "Ellis"]
for i in range(0,len(names)):
    print(f"Retrieve index {i}, which is {names[i]}")

Retrieve index 0, which is Abe
Retrieve index 1, which is Bill
Retrieve index 2, which is Chris
Retrieve index 3, which is Dorothy
Retrieve index 4, which is Ellis


instead of 

In [132]:
# Pythonic style, use iterators
names = ["Abe", "Bill", "Chris", "Dorothy", "Ellis"]
for n in names:
    print(n)

Abe
Bill
Chris
Dorothy
Ellis


*Avoid* using the indexing style method for iterating through data structures. While technically both generate the same result, the "Pythonic" way of doing things is the latter: It is simpler, more readable, and less prone to errors. 

#### Exercise

* print your name 10 times (easy, peasy). 
* print on the screen a "triangle", by printing first "#", then "##", then "###", etc. Repeat 10 times.   
_Hint: The command `print(i*'#')` will print the character '#' a total of `i` times._

```
    #
    ##
    ###
    ####
    #####
    ######
    #######
    ########
    #########
    ##########
```

In [133]:
# Print Katherine 10 times
i=10
print(i*"Katherine ")

Katherine Katherine Katherine Katherine Katherine Katherine Katherine Katherine Katherine Katherine 


In [134]:
# Print Katherine 10 times, the FOR loop way 
for i in range(10):
    print(i, "Katherine")

0 Katherine
1 Katherine
2 Katherine
3 Katherine
4 Katherine
5 Katherine
6 Katherine
7 Katherine
8 Katherine
9 Katherine


In [135]:
# Print Katherine 10 times, the WHILE loop way
i = 0
while i<10:
    print(i, "Katherine")
    
    i+=1  # IMPORTANT!!!! Increment your counter to break the 'while' condition
          # If you don't, you could go infinitely

0 Katherine
1 Katherine
2 Katherine
3 Katherine
4 Katherine
5 Katherine
6 Katherine
7 Katherine
8 Katherine
9 Katherine


In [136]:
# Build ascending triangle
for i in range(1,10):
    print(i*'#')

#
##
###
####
#####
######
#######
########
#########


In [137]:
# Do this with a while loop
i=1
while i<11:
    print(i*'#')
    i+=1

#
##
###
####
#####
######
#######
########
#########
##########


In [138]:
# Build descending triangle
for i in range(10,0, -1):
    print(i*'#')

##########
#########
########
#######
######
#####
####
###
##
#


In [139]:
# Have some fun with nested loops 

for j in [1,2]:               # Multipliers == 1*nr_hashtags, 2*nr_hashtags
    for i in range(1,6):      # Build ascending triangle
        print(i*j*'#')
    
    for i in range(5,0, -1):  # Build descending triangle
        print(i*j*'#')

#
##
###
####
#####
#####
####
###
##
#
##
####
######
########
##########
##########
########
######
####
##


**Answer:** <span style="color:white">
\# Print your name
for i in range(10):
    print("Katherine")
\# Print triangle
for i in range(10):
    for i in range(10):
        print(i*"#")
    for i in range(10,0,-1):
        print(i*"#")

List Comprehension
-------------------

The practical data scientist often faces situations where one list is to be transformed into another list, transforming the values in the input array, filtering out certain undesired values, etc. List comprehensions are a natural, flexible way to perform these transformations on the elements in a list. 

The syntax of list comprehensions is based on the way mathematicians define sets and lists, a syntax that leaves it clear what the contents should be:

+ `S = {x² : x in {0 ... 9}}`

Python's list comprehensions give a very natural way to write statements just like these. It may look strange early on, but it becomes a very natural and concise way of creating lists, without having to write for-loops.

In [140]:
# This code below will create a list with the squares
# of the numbers from 0 to 9 
S = []              # we create an empty list
for i in range(10): # We iterate over all numbers from 0 to 9
    S.append(i*i)   # We add in the list the square of the number i
print(S)            # we print(the list)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [141]:
# List comprehension way
S = [i*i for i in range(10)]
print(S)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [142]:
# OR:
S = [i**2 for i in range(10)]
print(S)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [143]:
# What about powers of 5 greater than 50?
S = [i**5 for i in range(10) if i**5>50]
print(S)

[243, 1024, 3125, 7776, 16807, 32768, 59049]


Now let's do one more example:

`V` $= [2^0, 2^1, 2^2, \ldots, 2^{12}]$


In [144]:
V= []
for i in range(13):
    V.append(2**i)
print(V)

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]


In [145]:
# List comprehension way
V = [2**i for i in range(13)]
print(V)

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]


In [146]:
# Another simple example of adding 100 to numbers
l = [1,2,3,4]
[i+100 for i in l]

[101, 102, 103, 104]

In [147]:
# Recall that the following will just return the original list
[i for i in l]

[1, 2, 3, 4]

In [148]:
# Recall that if we iterate through a string, we will go letter by letter:
for i in 'long string':
    print(i)

l
o
n
g
 
s
t
r
i
n
g


In [149]:
# Therefore using list comprehension on a string returns a set of letters
[i for i in 'long string']

['l', 'o', 'n', 'g', ' ', 's', 't', 'r', 'i', 'n', 'g']

In [150]:
# Remember that we can join strings like so:
'_'.join([i for i in 'long string'])

'l_o_n_g_ _s_t_r_i_n_g'

In [151]:
# The join character can be anything we like. For example, it can be an empty string:
''.join([i for i in 'long string'])

'long string'

In [152]:
# The join character can be anything we like. Or, it can be some text
' ZZZ '.join([i for i in 'long string'])

'l ZZZ o ZZZ n ZZZ g ZZZ   ZZZ s ZZZ t ZZZ r ZZZ i ZZZ n ZZZ g'

### The *if* statement within a list comprehension

Now let's consider the following case:

+ `M = {x | x in S and x even}`

**Note that the list comprehension for deriving M uses a "if statement" to filter out those values that aren't of interest**, restricting to only the even squares.

In [153]:
M = []
for i in S:         # iterate through all elements in S
    if i%2 == 0:    # if i is an even number
        M.append(i) # ... add it to the list
print(M)

[1024, 7776, 32768]


In [154]:
# Let's do the same with list comprehension
M = [x for x in S if x%2 == 0]
print(M)

[1024, 7776, 32768]


These are simple examples, using numerical computation. Let's see a more "practical" use: In the following operation we transform a string into an list of values, a more complex operation: 

In [155]:
words = 'The quick brown fox jumps over the lazy dog'

# Split the list of words
split_words = words.split()

# Iterate through and return a tuple of UPPERCASE/LOWERCASE/LENGTH
[(w.upper(), w.lower(), len(w)) for w in split_words]

[('THE', 'the', 3),
 ('QUICK', 'quick', 5),
 ('BROWN', 'brown', 5),
 ('FOX', 'fox', 3),
 ('JUMPS', 'jumps', 5),
 ('OVER', 'over', 4),
 ('THE', 'the', 3),
 ('LAZY', 'lazy', 4),
 ('DOG', 'dog', 3)]

In [156]:
# We can do this in one line
[(w.upper(), w.lower(), len(w)) for w in words.split()]

[('THE', 'the', 3),
 ('QUICK', 'quick', 5),
 ('BROWN', 'brown', 5),
 ('FOX', 'fox', 3),
 ('JUMPS', 'jumps', 5),
 ('OVER', 'over', 4),
 ('THE', 'the', 3),
 ('LAZY', 'lazy', 4),
 ('DOG', 'dog', 3)]

**Exercise:**
Now let's abbreviate the sentence: `The quick brown fox jumps over the lazy dog`
- Get a list with the first letter of every word in the sentence.
- Make all of the letters uppercase, and add a period afterwards.
- Join the letters (with periods) together to create the abbreviation.

You should wind up with something like: `T.Q.B.F.J.O.T.L.D.`

In [157]:
words

'The quick brown fox jumps over the lazy dog'

In [158]:
# Let's build our loop slowly. 
# Step 1: Write a for statement
for w in words:
    print(w)

T
h
e
 
q
u
i
c
k
 
b
r
o
w
n
 
f
o
x
 
j
u
m
p
s
 
o
v
e
r
 
t
h
e
 
l
a
z
y
 
d
o
g


In [159]:
# Oops! We iterated over words, which is a string.
# Therefore, every step assigns w to a LETTER.
# We want w to be assigned to a WORD.
# For that, we need to split 'words' into a list of words.

# Step 2: Split 'words'
for w in words.split(): # Split into list of words, on spaces
    print(w)

The
quick
brown
fox
jumps
over
the
lazy
dog


In [160]:
# Much better. 

# Step 3: Get the first letter of every word
for w in words.split(): 
    print(w[0])   # Extract first character

T
q
b
f
j
o
t
l
d


In [161]:
# Step 4: Make the first letter uppercase
for w in words.split(): 
    print(w[0].upper()) # Make first character uppercase

T
Q
B
F
J
O
T
L
D


In [162]:
# Now, we need to store our letters somewhere!

# Step 4: Store letters in a list
abbrev = []                     # Make a list for our letters
for w in words.split(): 
    abbrev.append(w[0].upper()) # Add each letter to abbrev
    print(f"After seeing word {w:5s} and adding letter {w[0].upper()}, abbrev = {abbrev}") 

After seeing word The   and adding letter T, abbrev = ['T']
After seeing word quick and adding letter Q, abbrev = ['T', 'Q']
After seeing word brown and adding letter B, abbrev = ['T', 'Q', 'B']
After seeing word fox   and adding letter F, abbrev = ['T', 'Q', 'B', 'F']
After seeing word jumps and adding letter J, abbrev = ['T', 'Q', 'B', 'F', 'J']
After seeing word over  and adding letter O, abbrev = ['T', 'Q', 'B', 'F', 'J', 'O']
After seeing word the   and adding letter T, abbrev = ['T', 'Q', 'B', 'F', 'J', 'O', 'T']
After seeing word lazy  and adding letter L, abbrev = ['T', 'Q', 'B', 'F', 'J', 'O', 'T', 'L']
After seeing word dog   and adding letter D, abbrev = ['T', 'Q', 'B', 'F', 'J', 'O', 'T', 'L', 'D']


In [163]:
# Almost done! Now we need to join the characters in abbrev together.
# Let's do that with a period.
".".join(abbrev)

'T.Q.B.F.J.O.T.L.D'

In [164]:
# Now try it with list comprehension
print("___Step 1: Iterate letters \n\n [w for w in words] --> \n")
print([w for w in words])

print("\n___Step 2: Iterate through WORDS using split()\n\n[w for w in words.split()] --> \n")
print([w for w in words.split()])

print("\n___Step 3: Take first letter only\n\n[w[0] for w in words.split()] --> \n")
print([w[0] for w in words.split()])

print("\n___Step 4: Uppercase it \n\n[w[0].upper() for w in words.split()] --> \n")
print([w[0].upper() for w in words.split()])

print("\n___Step 5: Join letters together\n\n'.'.join([w[0].upper() for w in words.split()] -->\n")
print('.'.join([w[0].upper() for w in words.split()]))

___Step 1: Iterate letters 

 [w for w in words] --> 

['T', 'h', 'e', ' ', 'q', 'u', 'i', 'c', 'k', ' ', 'b', 'r', 'o', 'w', 'n', ' ', 'f', 'o', 'x', ' ', 'j', 'u', 'm', 'p', 's', ' ', 'o', 'v', 'e', 'r', ' ', 't', 'h', 'e', ' ', 'l', 'a', 'z', 'y', ' ', 'd', 'o', 'g']

___Step 2: Iterate through WORDS using split()

[w for w in words.split()] --> 

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

___Step 3: Take first letter only

[w[0] for w in words.split()] --> 

['T', 'q', 'b', 'f', 'j', 'o', 't', 'l', 'd']

___Step 4: Uppercase it 

[w[0].upper() for w in words.split()] --> 

['T', 'Q', 'B', 'F', 'J', 'O', 'T', 'L', 'D']

___Step 5: Join letters together

'.'.join([w[0].upper() for w in words.split()] -->

T.Q.B.F.J.O.T.L.D


In [165]:
words = "you only live once"
print(words)
print([w            for w in words.split()]) # Copy structure of our original for loop
print([w[0]         for w in words.split()]) # Take the first character
print([w[0].upper() for w in words.split()]) # Uppercase
print('.'.join([w[0].upper() for w in words.split()])) # Join

you only live once
['you', 'only', 'live', 'once']
['y', 'o', 'l', 'o']
['Y', 'O', 'L', 'O']
Y.O.L.O


**Answer:**<span style="color:white">
''.join([w[0].upper()+"." for w in words.split()])

#### Exercise

* List each word and its length from the string `The quick brown fox jumps over the lazy dog`, conditioned on the length of the word being four characters and above
* List only words with the letter o in them

In [166]:
# List each word and its length from the string 
# 'The quick brown fox jumps over the lazy dog', 
# conditioned on the length of the word being four characters and above

# words = 'The quick brown fox jumps over the lazy dog'
words = 'you only live once'

long_words = []  # Make a list of long words to keep

for w in words.split():       # Iterate over words in sentence
    if len(w)>3:              # Check length
        long_words.append(w)  # Add to long words
        print(f"We added '{w}' and now we have long_words={long_words}")

We added 'only' and now we have long_words=['only']
We added 'live' and now we have long_words=['only', 'live']
We added 'once' and now we have long_words=['only', 'live', 'once']


In [167]:
# Let's write our list comprehension
# [ (expression) for (sequence) if (condition)]

# Print all the words
print([w for w in words.split()])

['you', 'only', 'live', 'once']


In [168]:
# Print words with length>3
print([w for w in words.split() if len(w)>3])

['only', 'live', 'once']


In [169]:
# Print words containing "on"
print([w for w in words.split() if 'on' in w])

['only', 'once']


In [170]:
# Print the first three letters, uppercase, of words containing "on"
print([w.upper()[0:3] for w in words.split() if 'on' in w])

['ONL', 'ONC']


In [171]:
words.split()

['you', 'only', 'live', 'once']

In [172]:
# Look for a word in the list
'you' in words.split()

True

In [173]:
# Note that this does not find characters INSIDE words.split()
'on' in words.split()

False

**Answer:** <span style="color:white">
[(w, len(w)) for w in words.split() if len(w)>4]
[(w, len(w)) for w in words.split() if 'o' in w]

* You are given the `wsj` article below. Write a list comprehension for getting the words that appear more than once. 
    * Use the `.split()` command for splitting, without passing a parameter.
    * When counting words, case does not matter (i.e., YAHOO is the same as Yahoo).

* Find all the *characters* in the article that are not letters or numbers. You can use the isdigit() and isalpha() functions, which work on strings. (e.g, `"Panos".isalpha()` and `"1234".isdigit()` return True) 

In [174]:
wsj = """
Yahoo Inc. disclosed a massive security breach by a “state-sponsored actor” affecting at least 500 million users, potentially the largest such data breach on record and the latest hurdle for the beaten-down internet company as it works through the sale of its core business.
Yahoo said certain user account information—including names, email addresses, telephone numbers, dates of birth, hashed passwords and, in some cases, encrypted or unencrypted security questions and answers—was stolen from the company’s network in late 2014 by what it believes is a state-sponsored actor.
Yahoo said it is notifying potentially affected users and has taken steps to secure their accounts by invalidating unencrypted security questions and answers so they can’t be used to access an account and asking potentially affected users to change their passwords.
Yahoo recommended users who haven’t changed their passwords since 2014 do so. It also encouraged users change their passwords as well as security questions and answers for any other accounts on which they use the same or similar information used for their Yahoo account.
The company, which is working with law enforcement, said the continuing investigation indicates that stolen information didn't include unprotected passwords, payment-card data or bank account information.
With 500 million user accounts affected, this is the largest-ever publicly disclosed data breach, according to Paul Stephens, director of policy and advocacy with Privacy Rights Clearing House, a not-for-profit group that compiles information on data breaches.
No evidence has been found to suggest the state-sponsored actor is currently in Yahoo’s network, and Yahoo didn’t name the country it suspected was involved. In August, a hacker called “Peace” appeared in online forums, offering to sell 200 million of the company’s usernames and passwords for about $1,900 in total. Peace had previously sold data taken from breaches at Myspace and LinkedIn Corp.
"""

In [175]:
wsj_words = wsj.lower().split() # List of lowercase words
n_words = 0                     # Count how many words we have seen
keep_words = []                 # List of words appearing more than once

for word in wsj_words:            # Iterate over words in wsj_words
    n_words +=1                   # Count how many words we saw
    n = wsj_words.count(word)     # Count appearances of a given word
    
    # Print statement that we were (trying, unsuccessfully) to achieve in class
    if word not in keep_words and n>1:  # Word appears > once and not already seen
        print(f"{word:20s}: appears {n} times.")
    elif n>1:                           # Word appears > once and seen before
        print(f"{word:20s}: already seen this word before.")
    else:                               # Word only appears once
        print(f"{word:20s}: only appears once.")
    
    if n>1:                      # If word appears more than once (n>1)
        keep_words.append(word)  # Add the word to keep_words
    
    
    if n_words>=15:              # Only do this for 15 words
        break                    # (so that we don't print too much output - can change this later)
set(keep_words) # Remove duplicates and view keep words

yahoo               : appears 6 times.
inc.                : only appears once.
disclosed           : appears 2 times.
a                   : appears 5 times.
massive             : only appears once.
security            : appears 4 times.
breach              : appears 2 times.
by                  : appears 3 times.
a                   : already seen this word before.
“state-sponsored    : only appears once.
actor”              : only appears once.
affecting           : only appears once.
at                  : appears 2 times.
least               : only appears once.
500                 : appears 2 times.


{'500', 'a', 'at', 'breach', 'by', 'disclosed', 'security', 'yahoo'}

In [176]:
# List comprehension version
# [expression for sequence if condition ]
# set([word for word in wsj_words if wsj_words.count(word)>1])
set([(word, wsj_words.count(word)) for word in wsj_words if wsj_words.count(word)>1])

{('2014', 2),
 ('500', 2),
 ('a', 5),
 ('account', 3),
 ('accounts', 3),
 ('affected', 2),
 ('and', 10),
 ('answers', 2),
 ('as', 3),
 ('at', 2),
 ('breach', 2),
 ('by', 3),
 ('change', 2),
 ('company’s', 2),
 ('data', 5),
 ('disclosed', 2),
 ('for', 4),
 ('from', 2),
 ('has', 2),
 ('in', 6),
 ('information', 3),
 ('is', 5),
 ('it', 5),
 ('million', 3),
 ('of', 4),
 ('on', 3),
 ('or', 3),
 ('passwords', 4),
 ('potentially', 3),
 ('questions', 3),
 ('said', 3),
 ('security', 4),
 ('state-sponsored', 2),
 ('stolen', 2),
 ('taken', 2),
 ('that', 2),
 ('the', 12),
 ('their', 5),
 ('they', 2),
 ('to', 6),
 ('unencrypted', 2),
 ('used', 2),
 ('user', 2),
 ('users', 4),
 ('which', 2),
 ('with', 3),
 ('yahoo', 6)}

In [177]:
# Some more practice with lists
test = ["hello, hi, goodbye hello", 
        "hello, world", 
        "goodbye world", 
        "hello"]

In [178]:
# "hello" is in test
#"hi" is not (even though it appears in the first item of the "test" list)
print("Is 'hello' in test?", "hello" in test)
print("Is 'good' in test?", "good" in test)

Is 'hello' in test? True
Is 'good' in test? False


In [179]:
# Instead, we can search INSIDE each item of "test" for the word "good" using list comprehension
# Keep all words with "good" in them
[t for t in test if "good" in t]

['hello, hi, goodbye hello', 'goodbye world']

In [180]:
# Another example: find words with "1" in them
test2 = ['hello1', 'goodbye3', 'good morning 1']
[ t for t in test2 if '1' in t]

['hello1', 'good morning 1']

In [181]:
# Find characters that are not letters or numbers
test = "650-888-9999 hello friend"
''.join([l for l in test if not l.isdigit() and not l.isalpha()])

'--  '

In [182]:
# Find only letters or numbers
''.join([l for l in test if  l.isdigit() or l.isalpha()])

'6508889999hellofriend'

In [183]:
# Keep only numbers or spaces
''.join([l for l in test if l.isdigit() or l==' '])

'6508889999  '

In [184]:
# Remove spaces
''.join([l for l in test if l!=' '])

'650-888-9999hellofriend'

In [185]:
# Remove digits
''.join([l for l in test if l.isdigit()==False])

'-- hello friend'

In [186]:
# Generate a list of numbers divisible by five
# Multiply each of them by 100
# Convert the result to a string
# Add the word "number"
nums = range(0,100)
print(", ".join([str(n*100) + ' number' for n in nums if n%5 == 0]))

0 number, 500 number, 1000 number, 1500 number, 2000 number, 2500 number, 3000 number, 3500 number, 4000 number, 4500 number, 5000 number, 5500 number, 6000 number, 6500 number, 7000 number, 7500 number, 8000 number, 8500 number, 9000 number, 9500 number


In [187]:
# Test another separator
print("~~~~".join([str(n*100) + ' number' for n in nums if n%5 == 0]))

0 number~~~~500 number~~~~1000 number~~~~1500 number~~~~2000 number~~~~2500 number~~~~3000 number~~~~3500 number~~~~4000 number~~~~4500 number~~~~5000 number~~~~5500 number~~~~6000 number~~~~6500 number~~~~7000 number~~~~7500 number~~~~8000 number~~~~8500 number~~~~9000 number~~~~9500 number


In [188]:
# Test another separator
print("\n".join([str(n*100) + ' number' for n in nums if n%5 == 0]))

0 number
500 number
1000 number
1500 number
2000 number
2500 number
3000 number
3500 number
4000 number
4500 number
5000 number
5500 number
6000 number
6500 number
7000 number
7500 number
8000 number
8500 number
9000 number
9500 number


**Answer:** <span style="color:white">
list(set([w for w in wsj.lower().split() if wsj.lower().count(w)>1]))
list(set([w for w in wsj if not w.isalpha() and not w.isdigit()]))