In [4]:
#Behind the scenes work to import the baseball information that by default DataCamp provided

#Set working directory to import data files
import os
os.chdir(r'c:\datacamp\data)

import pandas as pd
import numpy as np
MLB = pd.read_csv('baseball.csv')
height = MLB.iloc[:,3].tolist()
weight = MLB.iloc[:,4].tolist()
baseball = [[height[i], weight[i]] for i in range(0, len(height))]

np_height = np.array(height)
np_weight = np.array(weight)
np_baseball = np.array(baseball)

# Intermediate Python for Data Science

## Chapter 4 - Loops

### while Loop

if-elif-else statements go through their construct only once. Once one of the conditions is met, all the remaining conditions are ignored and the code moves on past the if statements. While loops are a type of if statement in that it executes the code if the condition is true and unlike the if statement, it will continue to execute the code over and over again as long as the condition is True. 

The syntax of a while loop is:

![whileloopsyntax.png](attachment:whileloopsyntax.png)


The while loop is not the common, but can be very useful. For example: Suppose you are calculating at numerical model based on your data. This typically involves taking the same steps over and over again until the error between your model and your data is below some boundary. When you can formulate the problem to repeat some action until some condition is met, a while loop is often the way to go. 

Let's say we have an error that starts at 50 and our algorithm divides the error by 4 on each run until the error is no longer above 1. 

In [1]:
error = 50
while error > 1:
    error = error/4
    print(error)

12.5
3.125
0.78125


### Exercise 1

#### Basic while loop
Below you can find the example from the video where the error variable, initially equal to 50.0, is divided by 4 and printed out on every run:<br>
<br>
error = 50.0<br>
while error > 1 :<br>
> error = error / 4<br>
> print(error)<br>

This example will come in handy, because it's time to build a while loop yourself! We're going to code a while loop that implements a very basic control system for an inverted pendulum. If there's an offset from standing perfectly straight, the while loop will incrementally fix this offset.<br>
<br>
Note that if your while loop takes too long to run, you might have made a mistake. In particular, remember to indent the contents of the loop!

__Instructions:__
* Create the variable offset with an initial value of 8.
* Code a while loop that keeps running as long as offset is not equal to 0. Inside the while loop:
* Print out the sentence "correcting...".
* Next, decrease the value of offset by 1. You can do this with offset = offset - 1.
* Finally, still within your loop, print out offset so you can see how it changes.

In [1]:
offset = 8

while offset != 0 :
    print("correcting...")
    offset = offset - 1
    print(offset)

correcting...
7
correcting...
6
correcting...
5
correcting...
4
correcting...
3
correcting...
2
correcting...
1
correcting...
0


#### Add conditionals

The while loop that corrects the offset is a good start, but what if offset is negative? You can try to run the following code where offset is initialized to -6: <br>
<br>
Comment: Initialize offset
offset = -6

Comment: Code the while loop<br>
while offset != 0 :
> print("correcting...")<br>
> offset = offset - 1<br>
> print(offset)<br>

<br>
but your session will be disconnected. The while loop will never stop running, because offset will be further decreased on every run. offset != 0 will never become False and the while loop continues forever.<br>
<br>
Fix things by putting an if-else statement inside the while loop. If your code is still taking too long to run, you probably made a mistake!<br>

__Instructions:__
*  Inside the while loop, complete the if-else statement:
*  If offset is greater than zero, you should decrease offset by 1.
*  Else, you should increase offset by 1.
*  If you've coded things correctly, hitting Submit Answer should work this time.
*  If your code is still taking too long to run (or your session is expiring), you probably made a mistake. Check your code and make sure that the statement offset != 0 will eventually evaluate to FALSE!'''


In [1]:
offset = -6

while offset != 0 :
    print('correcting')
    if offset > 1 :
        offset = offset - 1
    else :
        offset = offset + 1
    print(offset)

correcting
-5
correcting
-4
correcting
-3
correcting
-2
correcting
-1
correcting
0


### for Loop

![forloopsyntax.png](attachment:forloopsyntax.png)

The for loop syntax can be read as: "for each var in sequence, execute expression." Let's look at the fam list again for an example. Rather than printing out the entire list, you want to print out each element separately. You could call 4 print statements, each with a different element from the list, but it would be best to use a for loop:

In [3]:
fam = [1.73, 1.68, 1.71, 1.89]
print(fam)

[1.73, 1.68, 1.71, 1.89]


In [4]:
print(fam[0])
print(fam[1])
print(fam[2])
print(fam[3])

1.73
1.68
1.71
1.89


In [5]:
for height in fam:
    print(height)

1.73
1.68
1.71
1.89


During each iteration of the for loop, Python assigns the element, in order, to the variable height, defined in the for loop, and executes the indented code following the colon, in this case, printing the variable assignment for height until there are no more elements in the list. 

#### Enumerate
If you want to also include the index values, you will need to use enumerate, as a for loop does not have access to the index values. This means you can use a more complicated print statement.

In [7]:
for index, height in enumerate(fam):
    print("index " + str(index) + " : " + str(height))

index 0 : 1.73
index 1 : 1.68
index 2 : 1.71
index 3 : 1.89


For loops can also iterate over every character in a string:

In [8]:
for c in "family":
    print(c.capitalize())

F
A
M
I
L
Y


### Exercise 2

#### Loop over a list
Have another look at the for loop that Filip showed in the video:<br>
<br>
fam = [1.73, 1.68, 1.71, 1.89]<br>
for height in fam : <br>
> print(height)<br>

As usual, you simply have to indent the code with 4 spaces to tell Python which code should be executed in the for loop.<br>
<br>
The areas variable, containing the area of different rooms in your house, is already defined.

__Instructions:__
* Write a for loop that iterates over all elements of the areas list and prints out every element separately.

In [2]:
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

for a in areas:
    print(a)

11.25
18.0
20.0
10.75
9.5


#### Indexes and values (1)
Using a for loop to iterate over a list only gives you access to every list element in each run, one after the other. If you also want to access the index information, so where the list element you're iterating over is located, you can use enumerate().<br>
<br>
As an example, have a look at how the for loop from the video was converted:<br>
<br>
fam = [1.73, 1.68, 1.71, 1.89]<br>
for index, height in enumerate(fam) :<br>
> print("person " + str(index) + ": " + str(height))<br>

__Instructions:__
* Adapt the for loop in the sample code to use enumerate() and use two iterator variables.
* Update the print() statement so that on each run, a line of the form "room x: y" should be printed, where x is the index of the list element and y is the actual list element, i.e. the area. Make sure to print out this exact string, with the correct spacing.

In [3]:
for index, a in enumerate(areas):
    print ('room ' + str(index) + " : " + str(a))

room 0 : 11.25
room 1 : 18.0
room 2 : 20.0
room 3 : 10.75
room 4 : 9.5


#### Indexes and values (2)
For non-programmer folks, room 0: 11.25 is strange. Wouldn't it be better if the count started at 1?<br>
<br>
__Instructions:__
* Adapt the print() function in the for loop on the right so that the first printout becomes "room 1: 11.25", the second one "room 2: 18.0" and so on.

In [11]:
for index, a in enumerate(areas):
    print("room " + str(index+1) + " : " + str(a))

room 1 : 11.25
room 2 : 18.0
room 3 : 20.0
room 4 : 10.75
room 5 : 9.5


#### Loop over list of lists
Remember the house variable from the Intro to Python course? Have a look at its definition on the right. It's basically a list of lists, where each sublist contains the name and area of a room in your house.<br>
<br>
It's up to you to build a for loop from scratch this time!<br>

__Instructions:__
* Write a for loop that goes through each sublist of house and prints out the x is y sqm, where x is the name of the room and y is the area of the room.

In [16]:
house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]]

for index, a in enumerate(fam) : 
    room = house[index]
    print("the " + str(room[0]) + " is " + str(room[1]) + " sqm")

the hallway is 11.25 sqm
the kitchen is 18.0 sqm
the living room is 20.0 sqm
the bedroom is 10.75 sqm


### Looping Data Structures, Part 1

#### Loop over Dictionaries

For loops can be used over dictionaries, NumPy arrays and other data structures, but the way you define the sequence you are going to iterate over will differ depending on the data structure. To print the key and value in the world dictionary, defined below, you need to call the .items() method on the dictionary in the for loop. The names of the variables in the for loop for the items in the dictionary don't matter, but the order does, the first variable name will be assigned the key in the dictionary and the second variable name will be assigned the value. 

In [17]:
world = {"afghanistan": 30.55, 'albania': 2.77, 'algeria': 39.21}

for key, value in world.items():
    print(key + " -- " + str(value))

afghanistan -- 30.55
albania -- 2.77
algeria -- 39.21


#### Loop over NumPy Arrays

Using the height and weight arrays, you see the most basic for loop works for the bmi. 2D arrays, like the meas below, are really just built up from a pair of 1D arrays, so the same for loop would simply print each array on a seperate line. To iterate over every item in the array, you use the .nditer() function when you call the sequence in the for loop.

In [18]:
import numpy as np
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
bmi = np_weight / np_height**2

for val in bmi:
    print(val)

21.85171572722109
20.97505668934241
21.750282138093777
24.74734749867025
21.44127836209856


In [20]:
meas = np.array([np_height, np_weight])
for val in meas:
    print(val)

[1.73 1.68 1.71 1.89 1.79]
[65.4 59.2 63.6 88.4 68.7]


In [21]:
for val in np.nditer(meas):
    print(val)

1.73
1.68
1.71
1.89
1.79
65.4
59.2
63.6
88.4
68.7


#### Recap
To iterate over a dictionary, use the .items() method:
    for key, val in my_dict.items() :

To iterate over a NumPy array, use the .nditer() function:
    for val in np.nditer(my_array) :

### Exercise 3

#### Loop over dictionary
In Python 3, you need the items() method to loop over a dictionary:<br>
<br>
world = { "afghanistan":30.55, "albania":2.77, "algeria":39.21 }

for key, value in world.items() :
>print(key + " -- " + str(value))<br>

<br>
Remember the europe dictionary that contained the names of some European countries as key and their capitals as corresponding value? Go ahead and write a loop to iterate over it!

__Instructions:__
* Write a for loop that goes through each key:value pair of europe. On each iteration, "the capital of x is y" should be printed out, where x is the key and y is the value of the pair.

In [22]:
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }

for key, value in europe.items():
    print('the capital of ' + key + " is " + str(value))

the capital of spain is madrid
the capital of france is paris
the capital of germany is berlin
the capital of norway is oslo
the capital of italy is rome
the capital of poland is warsaw
the capital of austria is vienna


#### Loop over Numpy array
If you're dealing with a 1D Numpy array, looping over all elements can be as simple as:<br>
<br>
for x in my_array :<br>
>...<br>

If you're dealing with a 2D Numpy array, it's more complicated. A 2D array is built up of multiple 1D arrays. To explicitly iterate over all separate elements of a multi-dimensional array, you'll need this syntax:
<br>
for x in np.nditer(my_array) :<br>
>...

Two Numpy arrays that you might recognize from the intro course are available in your Python session: np_height, a Numpy array containing the heights of Major League Baseball players, and np_baseball, a 2D Numpy array that contains both the heights (first column) and weights (second column) of those players.

__Instructions:__
* Import the numpy package under the local alias np.
* Write a for loop that iterates over all elements in np_height and prints out "x inches" for each element, where x is the value in the array.
* Write a for loop that visits every element of the np_baseball array and prints it out.

In [5]:
for h in np_height :
    print(str(h) + " inches")
    
for m in np.nditer(np_baseball) : 
    print(m)

74 inches
74 inches
72 inches
72 inches
73 inches
69 inches
69 inches
71 inches
76 inches
71 inches
73 inches
73 inches
74 inches
74 inches
69 inches
70 inches
73 inches
75 inches
78 inches
79 inches
76 inches
74 inches
76 inches
72 inches
71 inches
75 inches
77 inches
74 inches
73 inches
74 inches
78 inches
73 inches
75 inches
73 inches
75 inches
75 inches
74 inches
69 inches
71 inches
74 inches
73 inches
73 inches
76 inches
74 inches
74 inches
70 inches
72 inches
77 inches
74 inches
70 inches
73 inches
75 inches
76 inches
76 inches
78 inches
74 inches
74 inches
76 inches
77 inches
81 inches
78 inches
75 inches
77 inches
75 inches
76 inches
74 inches
72 inches
72 inches
75 inches
73 inches
73 inches
73 inches
70 inches
70 inches
70 inches
76 inches
68 inches
71 inches
72 inches
75 inches
75 inches
75 inches
75 inches
68 inches
74 inches
78 inches
71 inches
73 inches
76 inches
74 inches
74 inches
79 inches
75 inches
73 inches
76 inches
74 inches
74 inches
73 inches
72 inches
74 inches


75
200
73
215
76
229
78
240
75
207
73
205
77
208
74
185
72
190
74
170
72
208
71
225
73
190
75
225
73
185
67
180
67
165
76
240
74
220
73
212
70
163
75
215
70
175
72
205
77
210
79
205
78
208
74
215
75
180
75
200
78
230
76
211
75
230
69
190
75
220
72
180
75
205
73
190
74
180
75
205
75
190
73
195


###  Looping Data Structures, Part 2

#### Pandas DataFrames
For Pandas, you have to explictly tell Python you want to iterate over the rows by calling the .iterrows() method on the DataFrame you want to loop over. The iterrows() method looks at the DataFrame and on each iteration, generates 2 pieces of data, the label of the row and then the actual data in the row as a Pandas Series. 

In [6]:
import pandas as pd
brics = pd.read_csv('brics.csv', index_col = 0)
brics

for lab, row in brics.iterrows():
    print(lab)
    print(row)

BR
country         Brazil
capital       Brasilia
area             8.516
population       200.4
Name: BR, dtype: object
RU
country       Russia
capital       Moscow
area            17.1
population     143.5
Name: RU, dtype: object
IN
country           India
capital       New Delhi
area              3.286
population         1252
Name: IN, dtype: object
CH
country         China
capital       Beijing
area            9.597
population       1357
Name: CH, dtype: object
SA
country       South Africa
capital           Pretoria
area                 1.221
population           52.98
Name: SA, dtype: object


In the first iteration through the brics DataFrame, the label (lab) is BR and the row is the entire Pandas series. Because the row value is a series, we can subset the data using square brackets. 

In [8]:
for lab, row in brics.iterrows() : 
    print(lab + " : " + row['capital'])

BR : Brasilia
RU : Moscow
IN : New Delhi
CH : Beijing
SA : Pretoria


##### Add a Column

We can use the for loop to do more than just print out values. We can add a column, let's say the length of each country name. We will need to blend together a number of different tools. 

In [9]:
for lab, row in brics.iterrows() :
    brics.loc[lab,"name_length"] = len(row["country"])
print(brics)

         country    capital    area  population  name_length
BR        Brazil   Brasilia   8.516      200.40          6.0
RU        Russia     Moscow  17.100      143.50          6.0
IN         India  New Delhi   3.286     1252.00          5.0
CH         China    Beijing   9.597     1357.00          5.0
SA  South Africa   Pretoria   1.221       52.98         12.0


##### apply( )
While the code worked, it's not very efficient, as we are creating a Pandas Series object on every iteration. Not a big deal on a small DataFrame, but can be an issue when dealing with large datasets. If want to calculate an entire DataFrame column by applying a function on a particular column in an element wise fashion, it's better to use the apply() function. In this case, you don't even need a for loop.

In this case you are selecting the country column from the brics DataFrame and applying the length function (len()) to create a new array that can then be stored in the new column, 'name_length'. 

In [18]:
brics['name_length'] = brics['country'].apply(len)
print(brics)

         country    capital    area  population  name_length
BR        Brazil   Brasilia   8.516      200.40            6
RU        Russia     Moscow  17.100      143.50            6
IN         India  New Delhi   3.286     1252.00            5
CH         China    Beijing   9.597     1357.00            5
SA  South Africa   Pretoria   1.221       52.98           12


### Exercise 4

#### Loop over DataFrame (1)
Iterating over a Pandas DataFrame is typically done with the iterrows() method. Used in a for loop, every observation is iterated over and on every iteration the row label and actual row contents are available:
<br>
for lab, row in brics.iterrows() :<br>
> ...<br>

In this and the following exercises you will be working on the cars DataFrame. It contains information on the cars per capita and whether people drive right or left for seven countries in the world.<br>

__Instructions:__
* Write a for loop that iterates over the rows of cars and on each iteration perform two print() calls: one to print out the row label and one to print out all of the rows contents.


In [7]:
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

for lab, row in cars.iterrows() :
    print(lab)
    print(row)

US
cars_per_cap              809
country         United States
drives_right             True
Name: US, dtype: object
AUS
cars_per_cap          731
country         Australia
drives_right        False
Name: AUS, dtype: object
JAP
cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
IN
cars_per_cap       18
country         India
drives_right    False
Name: IN, dtype: object
RU
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object
MOR
cars_per_cap         70
country         Morocco
drives_right       True
Name: MOR, dtype: object
EG
cars_per_cap       45
country         Egypt
drives_right     True
Name: EG, dtype: object


#### Loop over DataFrame (2)
The row data that's generated by iterrows() on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets:<br>
<br>
for lab, row in brics.iterrows() :<br>
> print(row['country'])<br>

__Instructions:__
* Using the iterators lab and row, adapt the code in the for loop such that the first iteration prints out "US: 809", the second iteration "AUS: 731", and so on.
* The output should be in the form "country: cars_per_cap". Make sure to print out this exact string (with the correct spacing).
* You can use str() to convert your integer data to a string so that you can print it in conjunction with the country label.

In [11]:
for lab, row in cars.iterrows() : 
    print(str(lab) + " : " + str(row['cars_per_cap']))

US : 809
AUS : 731
JAP : 588
IN : 18
RU : 200
MOR : 70
EG : 45


#### Add column (1)
In the video, Filip showed you how to add the length of the country names of the brics DataFrame in a new column:<br>
<br>
for lab, row in brics.iterrows() :<br>
> brics.loc[lab, "name_length"] = len(row["country"])<br>

You can do similar things on the cars DataFrame.

__Instructions:__
* Use a for loop to add a new column, named COUNTRY, that contains a uppercase version of the country names in the "country" column. You can use the string method upper() for this.
* To see if your code worked, print out cars. Don't indent this code, so that it's not part of the for loop.

In [12]:
for lab, row in cars.iterrows():
    cars.loc[lab, "COUNTRY"] = row['country'].upper()

print(cars)

     cars_per_cap        country  drives_right        COUNTRY
US            809  United States          True  UNITED STATES
AUS           731      Australia         False      AUSTRALIA
JAP           588          Japan         False          JAPAN
IN             18          India         False          INDIA
RU            200         Russia          True         RUSSIA
MOR            70        Morocco          True        MOROCCO
EG             45          Egypt          True          EGYPT


#### Add column (2)
Using iterrows() to iterate over every observation of a Pandas DataFrame is easy to understand, but not very efficient. On every iteration, you're creating a new Pandas Series.<br>
<br>
If you want to add a column to a DataFrame by calling a function on another column, the iterrows() method in combination with a for loop is not the preferred way to go. Instead, you'll want to use apply().<br>
<br>
Compare the iterrows() version with the apply() version to get the same result in the brics DataFrame:<br>
<br>
for lab, row in brics.iterrows() :<br>
> brics.loc[lab, "name_length"] = len(row["country"])<br>
<br>
brics["name_length"] = brics["country"].apply(len)<br>
We can do a similar thing to call the upper() method on every name in the country column. However, upper() is a method, so we'll need a slightly different approach:

__Instructions:__
* Replace the for loop with a one-liner that uses .apply(str.upper). The call should give the same result: a column COUNTRY should be added to cars, containing an uppercase version of the country names.
* As usual, print out cars to see the fruits of your hard labor.

In [8]:
cars = pd.read_csv('cars.csv', index_col = 0)
cars["COUNTRY "] = cars['country'].apply(str.upper)
print(cars)

     cars_per_cap        country  drives_right       COUNTRY 
US            809  United States          True  UNITED STATES
AUS           731      Australia         False      AUSTRALIA
JAP           588          Japan         False          JAPAN
IN             18          India         False          INDIA
RU            200         Russia          True         RUSSIA
MOR            70        Morocco          True        MOROCCO
EG             45          Egypt          True          EGYPT
