# Python Data Processing

When processing data sets we almost always want to repeat the same operation *for every element* of that dataset..

In [1]:
balances = [1_000, 2_500, 10_678, -2_000]

In [3]:
decisions = [0, 1, 1, 0]  # 0 = No, 1 = Yes

We could copy/paste,

In [9]:
entry = balances[0]
if entry > 1_500:
    print(entry, "Yes")
    
entry = balances[1]
if entry > 1_500:
    print(entry, "Yes")
    
    
entry = balances[2]
if entry > 1_500:
    print(entry, "Yes")
    
    
entry = balances[3]
if entry > 1_500:
    print(entry, "Yes")

2500 Yes
10678 Yes


This approach isnt sustainable, in general, we wont know how many entries in the collection there will be (eg. consider a database, file, etc.). 

Python has a *looping* (repeating) syntax which works on datasets,

In [44]:
for entry in balances:  # entry = balances[index], REPEAT:
    if entry > 1_500:
        print(entry, "Yes")

2500 Yes
10678 Yes


In [12]:
balances

[1000, 2500, 10678, -2000]

Consider the example below, which repeats `print()` for every entry in `balances`, 

In [14]:
# b = balances[index]
for b in balances:
    # repeat this code for each entry in balances
    print(b)

1000
2500
10678
-2000


### English Algorithms in Python

Suppose, in english, I want to:

```
FOR EACH ENTRY, CALL IT b, 
    IN THE DATASET balances
        REPORT THE VALUE of b
            
```
---
```
FOR EACH ENTRY, CALL IT balance,
    IN THE DATASET balances    
            REPORT THE balance 
            AND WHETHER THE balance is MORE THAN 100
            
```
---
```
FOR EACH ENTRY, CALL IT customer_balance,
    IN THE DATASET balances    
            REPORT WHETHER THE customer_balance is POSITIVE
            
```

Let's write those in python,

In [15]:
for b  in balances:
    print(b)

1000
2500
10678
-2000


In [19]:
for balance in balances:
    if balance > 100:
        print(balance, "is more than 100")

1000 is more than 100
2500 is more than 100
10678 is more than 100


In [45]:
for customer_balance  in balances:
    if customer_balance > 0:
        print(customer, "is positive")

-2000 is positive
-2000 is positive
-2000 is positive


## Understanding the Syntax

```python

for TEMPORARY_NAME_OF_ENTRY in EXISTING_DATASET
REPEAT:
        CODE_TO_REPEAT( TEMPORARY_NAME_OF_ENTRY )

```

We can loop over lots of different types of collections,

In [22]:
names = ["Alice", "Eve", "Bob"]

In [42]:
for n in names: 
    print(n[0].upper()) # uppercase first letter

A
E
B


In [43]:
prices = [2, 54, 7]

for p in prices:
    print(f"The price is £{p}.00")

The price is £2.00
The price is £54.00
The price is £7.00


## Exercise

### Part 1
Consider the survey below which asks several questions and records their answers...

In [39]:
questions = [
    "How happy are you working from home (/5)?",
    "How technical is technical is your job (/5)?",
    "To what degree have you saved money (-5, 5)?",    
]

In [40]:
answers = []

for q in questions:
    answer = input(q)
    
    answers.append(answer)

How happy are you working from home (/5)?5
How technical is technical is your job (/5)?4
To what degree have you saved money (-5, 5)?3


In [41]:
answers

['5', '4', '3']

* Modify the above loop to 
    * store the `float()` of your answer
        * HINT: use `float()` on anwers
    * print the answer 
        * EXTRA: f"" to add a little formatting
        * eg., "Your answer was..."

### Part 2

* write a loop over the `answers` variable 
    * if any answer is more than 3, report "That's Bad!"
    * otherwise, report "That's GOOD!"
    
* HINT:
```python
for .. in ..: 
    if ....:
    ...
    else:
    ...
```

## Advanced: Built-In Operations for Looping

Python has some built-in operations that help us when looping some specific situations,

* `zip()`
* `range()`
* others
    * `enumerate()`


In [47]:
for i in range(0, 10, 2):
    print(i)

0
2
4
6
8


In [49]:
numbers = [10, 23, 13]
colors = ["R", "G", "B"]

for n, c in zip(numbers, colors):
    print(n, c)

10 R
23 G
13 B


In [51]:
numbers = [10, 23, 13]
colors = ["R", "G", "B"]

for i, n, c in zip(range(0, 3), numbers, colors):
    print(i, n, c)

0 10 R
1 23 G
2 13 B


In [55]:
colors = ["R", "G", "B"]

for i, c in enumerate(colors): # enumerate = range + zip
    print(i, c)

0 R
1 G
2 B
