<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<br><br><br>
<h1>Python for Business Analytics</h1>
<em>A Nontechnical Approach for Nontechnical People</em><br><br>
<em><strong>Custom Edition for Hult International Business School</strong></em><br>

Written by Chase Kusterer - Faculty of Analytics <br>
Hult International Business School <br>
https://github.com/chase-kusterer<br><br><br><br><br>

<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<hr style="height:.9px;border:none;color:#333;background-color:#333;" />
<br>

<h1><u>Chapter 8: while Loops and Making Assumptions</u></h1>

As mentioned in <strong>Chapter 7: for Loops and Basic Data Manipulation</strong>, a <strong>while loop</strong> can be thought of as an extension of a conditional statement. More specifically, a <strong>while loop</strong> is like a hybrid between a conditional statement and a for loop. They are similar to for loops in the sense that they iterate. Similar to conditional statements, as long as the condition(s) specified is met, a <strong>while loop</strong> will continue running. This powerful coding structure is incredibly useful, as it enables programmers to accomplish a wide array of tasks. For example, they can be used to:

* take a list of all the students in a cohort and break them into teams of four
* allow a user to reenter their password if their first attempt was invalid
* calculate how many NBA seasons it took for Michael Jordan to score 25,000 points

<br>
In addition to the fundamentals of <strong>while loops</strong>, this chapter will also cover some syntax that is also useful in other coding structures, namely for loops and user-defined functions (covered in the next chapter). For example, in many situations a programmer needs to <strong>break</strong> out of the body of a loop before an iteration has finished. At other times, it may be necessary to skip over an item in an iterable and <strong>continue</strong> the loop by iterating on the next item. As you may have guessed, the syntaxes that enable such functionality are <a href="https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops">break</a> and <a href="https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops">continue</a>. Both are essential in building a solid foundation in Python.
<br><br>
<strong>Note:</strong> As mentioned, <strong>while loops</strong> will run until a specified condition is no longer met. If we are not careful, we may get stuck in a loop that does not stop running. Before moving forward, you may want to go back to <strong>Chapter 1: Setting Up for Success</strong> to refresh on what to do in such a situation. As an example, notice the difference between <em>Codes 8.0.1</em> and <em>8.0.2</em>. Without the final line in the body of the <strong>while loop</strong>, the loop's condition is always met and the code runs on into infinity (<em>Code 8.0.2</em>). <strong>Make sure you understand how to interrupt your kernel before running this code!</strong>

<br>

In [None]:
## Code 8.0.1 ##

# adapted from Code 7.1.2

# declaring x
x = 5

# while loop
while x > 0:
    print(x)
    x -= 1

In [None]:
## Code 8.0.2 ##

#!# WARNING! The while loop will run forever.

# declaring x
x = 5

# while loop with final line commented out
while x > 0:
    print(x)
#   x -= 1

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>8.1 The Fundamentals of <em>while loops</em></h2>

As mentioned in the introduction to this chapter, <strong>while loops</strong> can be utilized to accomplish tasks such as taking a list of all the students in a cohort and breaking them into teams of four. To exemplify this, let's assume our cohort had a total of eight students, as in <em>Code 8.1.1</em>. Let's also assume that teams need to be random and that no student is allowed to be on more than one team. In order to accomplish our task, we need to develop a code that:

* randomly selects a student and puts them on a team
* removes the student from the selection pool so that they do not end up on more than one team (sampling without replacement)
* fills each team to exactly four members
* keeps iterating until every student has been placed on a team
<br><br>

We will start by importing the random package so that we can utilize the method <a href="https://docs.python.org/3/library/random.html#functions-for-sequences">choice</a> in our code. This method is designed to randomly choose one element from a population. Given our requirements, a drawback to using this method is that it samples with replacement, meaning there is a chance that a student will be selected more than once.

Unfortunately, there is no optional argument in <strong>random.choice</strong> that enables sampling without replacement. However, the problem of selecting a student more than once can be avoided through a clever design of our code. What if, after a student is selected in an iteration of our loop, they were removed from the <em>students</em> list? This way, there would be no chance of the same student being selected in the next iteration of the loop, as the student would no longer be available for selection. This can be achieved by applying the <strong>.remove()</strong> method on <em>students</em>.
<br><br>
Also note that at first, the condition of the <strong>while loop</strong> may seem unclear. Essentially, this condition is telling Python to keep looping as long as there is something to iterate over in <em>students</em>. In fact, the syntax:
<br><br>

~~~
while students:
~~~

<br>
Could also be written as:
<br><br>

~~~
while len(students) > 0:
~~~

<br>
Both syntaxes will achieve the same result, although the first is much more common. Also, note that once the loop has iterated over every item in <em>students</em>, the condition that there is still something left to iterate over evaluates to <em>False</em> and the loop stops running.
<br><br>
Finally, each time <em>Code 8.1.1</em> is run, different teams are created. This does not violate any of the requirements for our task, but it makes it difficult to replicate our results. If our goal was to avoid this, we could employ the use of <strong>random.seed()</strong>, as explained in <strong>Chapter 4: Numbers, Comparisons, and Randomness</strong>.

In [None]:
## Code 8.1.1 ##

# importing random
import random

# list of people
students = ['Neil', 'Ariel', 'Alex', 'Cristine', 
            'Andy', 'Alena', 'Ross', 'Isabel']

team_1 = []
team_2 = []

# while loop
while students:
    person = random.choice(students)

    if len(team_1) < 4:
        team_1.append(person)
    
    elif len(team_1) >= 4:
        team_2.append(person)
    
    else:
        print("Something went wrong.")
        
    # removing so students don't get repeated
    students.remove(person)

# printing teams
print(team_1)
print(team_2)

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>8.2 <em>while True</em> and <em>break</em></h2>

Sometimes, it makes sense to allow a loop to run on into infinity. This technique becomes very useful in situations such as developing code to allow a user to reenter their password if their first attempt was invalid (user input does not match the stored password). Using a stand-alone conditional statement such as the one below will only permit a user to make a single input attempt, as after evaluating whether or not <em>pwd_attempt</em> and <em>password</em> match, Python will move on to the next line of code formatted at Column 0.
<br><br>

~~~
if pwd_attempt == password:
    [do something]

elif pwd_attempt != password:
    [do something else]

else:
    [catch bugs]
~~~

<br>
However, by wrapping a <strong>while True</strong> loop around this code, a user will have an infinite number of tries to correctly input their password. To reiterate, <strong>a while loop will run until a condition is no longer met</strong> (i.e., the condition evaluates to <em>False</em>). Since by default the condition of the loop is set to <em>True</em>, the loop will run indefinitely as there is no syntax to change its evaluation to <em>False</em>. Given that our goal is to stop looping when a user enters the correct password, we need to tell Python to <strong>break</strong> out of the loop when this happens. This can be coded as follows:
<br><br>

~~~
while True:
    if pwd_attempt == password:
        break

    elif pwd_attempt != password:
        [do something else]
    
    else:
        [catch bugs]
~~~

<br>
<em>Code 8.2.1</em> exemplifies the use of these structures to create a basic check to see if a user's input matches a stored password.

<br>

In [None]:
## Code 8.2.1 ##

# creating a password
password = 'please open the door'


# while True
while True:
    pwd_attempt = input("Please enter your password.\n> ")

    if pwd_attempt == password:
        print('\nThat is correct. You may enter.\n')
        input('< Press enter to continue. >\n')
        break

    elif pwd_attempt != password:
        print('\nThat is not the correct password.\n')
        input('< Press enter to try again. >\n')
    
    else:
        print("Something went wrong.")

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>8.3 <em>while loops</em> with specific conditions</h2>

Although we have met the requirement of allowing a user to have multiple password entry attempts, allowing unlimited attempts is dangerous as it would be susceptible to malicious attacks on a user's account. Thus, it would be wise to modify our code so that it only allows a user a specific number of tries. In order to achieve this, the condition of the original <strong>while loop</strong>:
<br><br>

~~~
while True:
    [do something]
~~~

<br>
can be altered using a comparison operator (covered in <strong>Chapter 4: Numbers, Comparisons, and Randomness</strong>). In other words, the <em>'True'</em> in <strong>while True</strong> can be replaced with something such as <em>'password_attempts > 0'</em>. For example, if a user were to be allowed only three attempts, <em>Code 8.2.1</em> could be modified to include an object with a stored value of three. Then, the loop could diminish this value by one each time a user inputs an incorrect password. This has been done in <em>Code 8.3.1</em>.

<br>

In [None]:
## Code 8.3.1 ##

# adapted from Code 8.2.1

# creating a password
password = 'please open the door'

# setting password attempts to three
password_attempts = 3

# exiting the loop after three attempts
while password_attempts > 0:
    pwd_attempt = input("Please enter your password.\n> ")

    if pwd_attempt == password:
        print('\nThat is correct. You may enter.\n')
        input('< Press enter to continue. >\n')
        break

    # diminishing password_attempts
    elif pwd_attempt != password:
        password_attempts -= 1
        
        print(f"""
That is not the correct password.
You have {password_attempts} attempt(s) remaining.\n""")
    
    else:
        print("Something went wrong.")

<br>
<h3>Looping over Michael Jordan's NBA Career</h3>

To further exemplify <strong>while loops</strong> with specific conditions, let's turn our attention to basketball legend <a href="https://www.basketball-reference.com/players/j/jordami01.html">Michael Jordan</a>. According to <a href="https://www.basketball-reference.com/">basketball-reference.com</a>, Jordan played in over 1,000 NBA games across 15 seasons, and in 11 of those seasons, he led the league in total points scored. He also made 14 All-Star appearances and won the league MVP award 5 times. Additionally, he spent the <em>'93-'94</em> season playing Minor League Baseball for the Birmingham Barons (labeled as <em>'DNP'</em>), and went into retirement for three seasons between 1998 and 2001 (labeled as <em>'Retired'</em>). Each sublist in <em>Code 8.3.2</em> represents total points scored for each season in Jordan's NBA career with the following format:
<br><br>

~~~
[ SEASON, TOTAL POINTS SCORED, LEAGUE SCORING LEADER Y/N ]
~~~

<br>
First, let's load this information into our working environment and utilize it to determine how many seasons it took for Jordan to surpass 25,000 points.
<br><br>

In [None]:
## Code 8.3.2 ##

# declaring stats list
jordan_stats = [["'84-'85", 2313, 'Y'],
                ["'85-'86", 408,  'N'],
                ["'86-'87", 3041, 'Y'],
                ["'87-'88", 2868, 'Y'],
                ["'88-'89", 2633, 'Y'],
                ["'89-'90", 2753, 'Y'],
                ["'90-'91", 2580, 'Y'],
                ["'91-'92", 2404, 'Y'],
                ["'92-'93", 2541, 'Y'],
                ["'93-'94", 'DNP', 'DNP'],
                ["'94-'95", 457,  'N'],
                ["'95-'96", 2491, 'Y'],
                ["'96-'97", 2431, 'Y'],
                ["'97-'98", 2357, 'Y'],
                ["'98-'99", 'Retired', 'Retired'],
                ["'99-'00", 'Retired', 'Retired'],
                ["'00-'01", 'Retired', 'Retired'],
                ["'01-'02", 1375, 'N'],
                ["'02-'03", 1640, 'N']]


# looping over results
for stat in jordan_stats:
    print(stat)

<br>
The long solution to this task would be to sum points in each season one-by-one until we reached the desired amount:
<br><br>

~~~
jordan_stats[0][1] + jordan_stats[1][1] + jordan_stats[2][1] ...
~~~

<br>

However, this is improper for a number of reasons:

1. This is a tedious copy/paste approach. Copy/pasting is inefficient and prone to bugs (a programmer would need to remember to update the code after a new line has been pasted). Forgetting to update just one line of code would lead to a bug that may go unnoticed (the code will likely run without throwing an error).
<br>

2. What if we wanted to change our code so that it calculated something else, such as how many points Jordan scored in seasons where he also led the league in scoring? This slight alteration to our requirements would require us to significantly rewrite our code.
<br>

3. Let's say our requirements changed again and we wanted to calculate Michael Jordan's total career points. However, let's also say Jordan wasn't retired. In other words, what if new lists were continually being added to <em>jordan_stats</em>? This would require continual updates to our copy/paste solution.

<br>
Given the above, it is highly beneficial to use loops. First, this would significantly alleviate concerns of accidental copy/paste bugs. Second, if our requirements changed, we may be able to take advantage of other coding structures, such as a conditional statement in the body of the loop. Finally, if the <em>jordan_stats</em> list were continually growing, our loop could adjust to its new length. In other words, it makes no difference if Jordan keeps playing as the loop could be programmed to continue iterating until it has run out of things to iterate over. Before developing our code, however, let's take a moment to discuss the assumptions involved in our task.
<br>

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>8.4 Making Assumptions about Data</h2>

Before making our calculation, we should clean up our data as this would simplify the coding for our analysis. In seasons where Jordan did not play in the NBA, total points scored is coded as a string. Given our goal, it seems unfair to consider these seasons in our calculation, and thus they should be removed. In this, we are making an assumption, and <strong>all assumptions should be documented.</strong> Another analyst could approach the same problem and feel it is more appropriate to include seasons where Jordan did not play in the NBA. The rationale for such a decision could be argued from a number of viewpoints, such as:

* This calculation should be made as the total number of seasons between Jordan's first and the one in which he surpassed 25,000 points.
* Jordan choose not to play basketball in the '93-'94 season even though he was capable of doing so.
* Even though Jordan retired, he came back from retirement. Therefore, those seasons should be counted.

<br>
The value of an analysis stretches far beyond the sciences, and in many cases, it is much more of an art. Given this, each analysis requires substantial thought regarding the problem at hand, and such considerations lead to various assumptions that make each analytical perspective unique. Without documentation, it is incredibly difficult to understand an analyst's thought process. Moreover, it becomes challenging to trust the findings and actionable insights that resulted from such work. Assumptions need to be documented, whether embedded within scripts, in a document or project management tool, or even in a notebook that you keep next to you while you are working. Although this takes time up front, it will save you countless hours in the long run, especially when working with other analysts or reviewing an analysis that you conducted several months ago.
<br><br>
<em>Code 8.4.1</em> is designed to create a new list of lists that excludes seasons where Jordan did not play in the NBA. The aforementioned assumption has been embedded at the top of the code block as a triple-quoted string.
<br><br>

In [None]:
## Code 8.4.1 ##

"""
Assumptions:
    Calculations should only include seasons where Jordan played.
"""

# creating an empty list
jordan_stats_2 = []


# writing a loop on original stats lists
for season, points, lead_scorer in jordan_stats:
    
    # appending new list if points is an integer
    if type(points) == int:
        jordan_stats_2.append([season, points, lead_scorer])


# writing a loop to print new stats lists one-by-one
for stats in jordan_stats_2:
    print(stats)

<br>
<h4>Developing our Solution</h4>
In order to calculate how many seasons it took for Jordan to score more than 25,000 points, we first need to develop a method to iterate over total points in each list of <em>jordan_stats_2</em>. This can be accomplished with the use of a for loop, as covered in <strong>Chapter 7 - for Loops and Basic Data Manipulation</strong>. Our goal is to stop iteration after accumulating a certain number of total points, and for this we have multiple options. More specifically, we may choose to fit out the body of the for loop with either a conditional statement or a <strong>while loop</strong>, as both could solve this task. Since conditional statements were already covered in <strong>Chapter 6 - Conditional Statements and Controlling Input</strong>, we will build a solution using this method first. <em>Code 8.4.2</em> has been left open for this task. Make sure to:

* iterate over <em>jordan_stats</em>
* stop iteration after accumulating at least 25,000 points
* use a conditional statement inside the body of a for loop (don't use a <strong>while loop</strong>)
* avoid soft coding wherever feasible
<br><br>

Below is a skeleton (i.e., an outline) to help get you started:
<br><br>

~~~
for _______ in jordan_stats_2:

    if ________:
~~~

<br><br>
After developing a conditional statement solution, compare your results to the sample solution. When you are ready, develop a solution using a <strong>while loop</strong> in <em>Code 8.4.3</em>, which has also been left open. Below is another skeleton to help get you started:
<br><br>

~~~
for _______ in jordan_stats_2:

    while ________:
~~~

<br>

In [None]:
## Code 8.4.2 ##

# open coding block (for loop + conditional statement)



In [None]:
## Sample Solution 8.4.2 ##

"""
Assumptions:
    Calculations should only include seasons where Jordan played.
"""

# declaring objects
total_points  = 0
total_seasons = 0
point_limit   = 25000 # avoiding soft coding

# writing the loop
for season, points, lead_scorer in jordan_stats_2:

    # writing the conditional
    if total_points < point_limit:
        total_points  += points
        total_seasons += 1
    
    elif total_points >= point_limit:
        break
    
    else:
        print('Something went wrong.')


# printing the results
print(f"""
{'*' * 40}

It took {total_seasons} seasons for Jordan to score
more than {point_limit} points (scoring {total_points}).

{'*' * 40}
""")

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

In [None]:
## Code 8.4.3 ##

# open coding block (for loop + while loop)


In [None]:
## Sample Solution 8.4.3 ##

"""
Assumptions:
    Calculations should only include seasons where Jordan played.
"""

# declaring objects
total_points  = 0
total_seasons = 0
point_limit   = 25000 # avoiding soft coding


# writing the outer loop
for season, points, lead_scorer in jordan_stats_2:
    
    # writing the inner loop
    while total_points < point_limit:
            total_points  += points
            total_seasons += 1
            break


# printing the results
print(f"""
{'*' * 40}

It took {total_seasons} seasons for Jordan to score
more than {point_limit} points (scoring {total_points}).

{'*' * 40}
""")

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h3>Benefits of using a <em>while</em> loop</h3>

The bodies of <em>Codes 8.4.2</em> and <em>8.4.3</em> look very similar. As mentioned earlier in this chapter: <strong>a while loop can be thought of as an extension of a conditional statement.</strong> Also, as expected, both codes lead to the same result. The solution utilizing the <strong>while loop</strong>, however, uses less lines of code. Additionally, this solution appears to be easier to extend if there was a change in our requirements. For example, let's assume that it was decided that points should only be accumulated in seasons where Jordan led the league in scoring. This change is very easy to implement given the <strong>while loop</strong> solution, as only a single line of code needs to be modified:
<br><br>

~~~
while total_points < point_limit and lead_scorer == 'Y':
~~~

<br>
This has been implemented in <em>Code 8.4.4</em>. Notice how the assumptions and print statement have also been updated.

<br>

In [None]:
## Code 8.4.4 ##

# adapted from Code 8.4.3

"""
Assumptions:
    Calculations should only include seasons where Jordan:
      - played
      - led the league points
"""

# declaring objects
total_points  = 0
total_seasons = 0
point_limit   = 25000 # avoiding soft coding


# writing the outer loop
for season, points, lead_scorer in jordan_stats_2:

    # MODIFIED inner loop
    while total_points < point_limit and lead_scorer == 'Y':
        total_points  += points
        total_seasons += 1
        break
        
# MODIFIED the print statement
print(f"""
{'*' * 40}

In seasons where he led the league in scoring,
Jordan surpassed {point_limit} points in {total_seasons}
seasons (scoring {total_points}).

{'*' * 40}
""")

<br>
<h4>Adjusting the Conditional Statement Solution</h4>
Adjusting the conditional statement solution is a bit more daunting, as the same modification would result in the <em>else</em> clause running when either of the two conditions is not met. This can be observed in <em>Code 8.4.5</em>.
<br><br>

In [None]:
## Code 8.4.5 ##

# adapted from Code 8.4.2

# declaring objects
total_points  = 0
total_seasons = 0
point_limit   = 25000 # avoiding soft coding

# writing the loop
for season, points, lead_scorer in jordan_stats_2:

    # MODIFIED conditional
    if total_points < point_limit and lead_scorer == 'Y':
        total_points  += points
        total_seasons += 1
        print('All is well!') # added print statement for clarity
    
    elif total_points >= point_limit:
        break
    
    else:
        print('Something went wrong.')

<br>
By rewriting our code to include a nested conditional statement, we can attain the functionality we desire. If you need a refresher on nested conditionals, please see <strong>Chapter 6: Conditional Statements and Controlling Input</strong>. <em>Code 8.4.6</em> includes this change and also introduces a new syntax: <strong>continue</strong>. The role of this syntax is to terminate the current iteration of a loop and move on to the next iteration. In other words, it stops what's currently happening and <em>continues</em> by starting the next iteration. Since we expect <em>lead_scorer</em> to be equal to <em> 'N' </em> in some iterations, it is a good practice to write an <em>elif</em> statement to <strong>continue</strong> when this is the case. If we do not do this, the <em>else</em> clause will run, falsely indicating that something went wrong.
<br><br>

In [None]:
## Code 8.4.6 ##

# adapted from Code 8.4.5

"""
Assumptions:
    Calculations should only include seasons where Jordan:
      - played
      - led the league points
"""

# declaring objects
total_points  = 0
total_seasons = 0
point_limit   = 25000 # avoiding soft coding

# writing the loop
for season, points, lead_scorer in jordan_stats_2:

    # writing the conditional
    if total_points < point_limit:
        
        if lead_scorer == 'Y':
            total_points  += points
            total_seasons += 1
    
        # applying continue
        elif lead_scorer == 'N':
            continue
        
        else:
            print('Something went wrong.')
    
    elif total_points >= point_limit:
        break
    
    else:
        print('Something went wrong.')


# printing the results
print(f"""
{'*' * 40}

In seasons where he led the league in scoring,
Jordan surpassed {point_limit} points in {total_seasons}
seasons (scoring {total_points}).

{'*' * 40}
""")

<br>
As expected, the results when using a conditional statement are the same as when using a <strong>while</strong> loop. As with before, the <strong>while</strong> loop uses less lines of code. One drawback, however, is that it does not contain an <em>else</em> clause to help catch bugs. This is because <em>else</em> can only be applied to conditional statements. Thus, we need to be diligent in testing our code to ensure it does what it is intended to do. Additionally, if there are potential errors in our code due to things that should have been controlled for, they may go undetected.
<br><br>
To illustrate, let's assume we wanted to run our code on every player in the history of the NBA. One challenge we may run into is that what we call the NBA today is actually the result of a merger that took place in 1976. From 1967-1976, two nationally-scoped basketball leagues existed in the United States (the National Basketball Association and the <a href="https://en.wikipedia.org/wiki/American_Basketball_Association">American Basketball Association</a>). Thus, each season played within these years has two scoring leaders. If our code was designed to allow only one scoring leader per season, it may throw an error. We may also run into a situation where a player was traded and thus has two records for a given season. This may prove to be a challenge when calculating how many seasons the player took to attain a certain number of points. The good news is that Python contains more advanced coding syntaxes that are incredibly useful in such situations (covered in the next chapter).

<br><hr style="height:.9px;border:none;color:#333;background-color:#333;" /><br>

<h2>8.4 Summary</h2>

A <strong>while loop</strong> is a useful coding structure that allows programmers to accomplish several tasks. They can be thought of as a hybrid between a conditional statement and a for loop in the sense that they: 1) iterate, and 2) run until a condition or set of conditions is no longer met. The syntax <strong>break</strong> stops a loop from iterating further, and the syntax <strong>continue</strong> ends a loop's current iteration and moves on to the next one.

<br>

~~~
             _                  __  __     _                                _ 
  /\  /\__ _| |_ ___      ___  / _|/ _|   | |_ ___      _   _  ___  _   _  / \
 / /_/ / _` | __/ __|    / _ \| |_| |_    | __/ _ \    | | | |/ _ \| | | |/  /
/ __  / (_| | |_\__ \   | (_) |  _|  _|   | || (_) |   | |_| | (_) | |_| /\_/ 
\/ /_/ \__,_|\__|___/    \___/|_| |_|      \__\___/     \__, |\___/ \__,_\/   
                                                        |___/                 
~~~

<br>