In [None]:
import babypandas as bpd
import numpy as np
nba = bpd.read_csv('data/nba_salaries.csv')
nba = nba.assign(SALARY=nba.get("'15-'16 SALARY")).drop(columns=["'15-'16 SALARY"])

## When NOT to use `for`-loops ❌

### To create ranges

Instead, use `np.arange`.

In [None]:
np.arange(1, 21)

There are ways to create such arrays using `for`-loops, but they're almost always more complicated than this.

Here's an example of what **not ❌** to do:

In [None]:
# BAD! ❌
numbers = np.array([])
for i in np.arange(1, 21):
    numbers = np.append(numbers, i)
numbers

### To perform some operation for every element of an array/Series

Instead, use array/DataFrame methods from earlier in the quarter.

In [None]:
nba

Suppose I want to determine the number of players whose `'SALARY'` was above 11 million.

We learned how to do this in the second week of the class, well before we knew about `for`-loops:

In [None]:
nba[nba.get('SALARY') > 11].shape[0]

I _could_ do this with a `for`-loop:

In [None]:
# BAD! ❌
num_players = 0
for salary in np.array(nba.get('SALARY')):
    if salary > 11:
        num_players = num_players + 1
num_players

But again, using a `for`-loop here is a **bad** idea; the first way is much quicker to execute under the hood.

### To sample repeatedly from an array

Instead, provide a second argument to `np.random.choice`.

In [None]:
moves = np.array(['Rock', 'Paper', 'Scissors'])

Suppose we want to randomly pick a move from `moves` 50 times.

In [None]:
np.random.choice(moves)

In [None]:
np.random.choice(moves, 50)

In [None]:
# What proportion of these moves were 'Paper'?
np.count_nonzero(np.random.choice(moves, 50) == 'Paper') / 50

This could be done with a `for`-loop, but again this is much simpler and more efficient.

#### Aside: `np.random.multinomial`

- `np.random.choice` samples from an array of pre-determined options, whereas `np.random.multinomial` samples from a categorical distribution.
- To randomly choose between `'Rock'`, `'Paper'`, and `'Scissors'` 50 times, we can also use `np.random.multinomial`.

In [None]:
np.random.multinomial(50, [1/3, 1/3, 1/3])

## So when do we need `for`-loops? 🤔

### To perform an experiment multiple times

- Suppose one "experiment" entails flipping a coin 50 times and determining the number of heads.
- We don't need a `for`-loop to run one experiment.
- If we want to run 100,000 experiments (i.e. 100,000 repetitions), we'd need a `for`-loop.
    - **The majority of `for`-loops you will write in DSC 10 will be for this reason and this reason only.**

In [None]:
def one_simulation():
    flips = np.random.choice(['Heads', 'Tails'], 50)
    return np.count_nonzero(flips == 'Heads')

In [None]:
one_simulation()

In [None]:
# GOOD! ✅
results = np.array([])
for i in np.arange(100000):
    result = one_simulation() # A number of heads in 50 coin flips
    results = np.append(results, result)

In [None]:
results

In [None]:
bpd.DataFrame().assign(results=results).plot(kind='hist', density=True, ec='w');

Note that we **could have** done this with a "nested" `for`-loop. With that said, you will almost never need to write one in this class.

In [None]:
for i in np.arange(1, 6):
    for j in np.arange(1, 6):
        print(i, 'x', j, '=', i * j)

In [None]:
# BAD! ❌
results = np.array([])
for n in np.arange(100000):
    number_heads = 0
    for j in np.arange(50):
        flip = np.random.choice(['Heads', 'Tails']) # Flip a coin just once
        if flip == 'Heads':
            number_heads = number_heads + 1
    results = np.append(results, number_heads)