# Python Programming: Review
### with ML, DS

Store, Products

What will sell *well* based on location in the store *and* price?

$X : (X_0, X_1) : $ (Location, Price)

In [1]:
X = [
    (0, 1),
    (0, 1.50),
    (1, 1),
    (1, 1.50),
    (2, 1),
    (2, 1.50),
]

In [4]:
len(X)

6

### Regression

$y$ : Quantity

$y$ number of items sold / month,

In [3]:
y_orders = [
    10,
    12,
    8,
    6,
    5,
    4
]

Attempting a solution,

$\hat{y}$ is a guess for $y$, 

In [21]:
base = 9
location_weight = -2
price_weight = -1


for (x0, x1), y in zip(X, y_orders):
    yhat = location_weight * x0 + price_weight * x1 + base
    
    print(yhat, y)

8 10
7.5 12
6 8
5.5 6
4 5
3.5 4


Keeping track of the errors, predictions,

In [37]:
base = 9
location_weight = -2
price_weight = -1


predictions = []
errors = []

for (x0, x1), y in zip(X, y_orders):
    yhat = location_weight * x0 + price_weight * x1 + base
    
    predictions.append(yhat)
    errors.append(abs(y - yhat)) # abs = throw away sign

In [38]:
sum(errors)

10.5

### Classification

$y$ : (True, False)

In [75]:
y_like = [
    True,
    True,
    True, 
    False, 
    False,
    False
]

In [76]:
X

[(0, 1), (0, 1.5), (1, 1), (1, 1.5), (2, 1), (2, 1.5)]

"Decision-Tree" algorithmic solution,

In [129]:
predictions_like = []
errors_like = []


for (x_loc, x_price), y in zip(X, y_like):
    yhat = None
    
    if x_loc <= 0:
        yhat = True
    elif x_price -1 <= 0:
        yhat = True
    else:
        yhat = False
    
    predictions_like.append(yhat)
    errors_like.append( int(y != yhat) )
        

In [130]:
predictions_like

[True, True, True, False, True, False]

In [131]:
sum(errors_like)

1

#### Aside: Using logical operators,

In [138]:
for (x_loc, x_price), y in zip(X, y_like):
    yhat = (x_loc <= 0) or (x_price -1 <= 0)
    
    print(y, yhat)
    
    # Q. this formula produces the same predictions, why?
    # yhat = bool((x_loc <= 0) + (x_price -1 <= 0))

True True
True True
True True
False False
False True
False False


### Classification: Score-based solution,

In [87]:
score_like = {'location': 3, 'price': 2, 'base': -7}

In [88]:
y_like

[True, True, True, False, False, False]

In [89]:
for (x0, x1) in X:
    print(x0 * score_like['location']  + x1 * score_like['price'])

2
3.0
5
6.0
8
9.0


In [90]:
predictions_score = []

In [91]:
for (x0, x1) in X:
    score = x0 * score_like['location']  + x1 * score_like['price'] + score_like['base']
    
    predictions_score.append( score < 0 )

In [92]:
y_like

[True, True, True, False, False, False]

In [93]:
predictions_score

[True, True, True, True, False, False]

## Reflections

$(X, y) \rightarrow \mathcal{Alg} \rightarrow (w, b) \rightarrow \hat{y} = f(X; w, b)$

A prediction function $\hat{f}$ which produces guesses for $y$, $\hat{y}$ uses the parameters $(w,b)$ directly to compute the quantity of interest,

$\hat{y} = \hat{f}(X) = w_0x_0 + w_1x_1 + b$ 

In the case of classification we can reuse this same prediction function to produce a *score* which we then *test* to give us `True`,`False`, 

$\hat{y} = \hat{f}(X) = (w_0x_0 + w_1x_1 + b) < 0 $ 

Other approaches will use $(w, b)$ differently,

Aproximiately, recall the formula for the decision tree is something like,

```python
    yhat = (x_loc <= 0) or (x_price -1 <= 0)
```

$\hat{y} = \hat{f}(X) = \mathbb{1}(x_0 + w_0 <= 0) + \mathbb{1}(x_1 + w_1 <= 0)$

Where $\mathbb{1}(C)$ means 1 if C is `True`, `0` otherwise. 

---

The principal question which remains is: what $\mathcal{Alg}$ are there, and what prediction models, $\hat{f}$, 

$(X, y) \rightarrow \mathcal{Alg} \rightarrow (w, b) \rightarrow \hat{y} = \hat{f}(X; w, b)$

## Exercise

* Loop over `X_age_location` using `zip` as above.

* Define a `age_weight`, `location_weight` and a `base`.

* Use the linear formula above (age * age_weight ...) to a compute a prediction.

* Revise your weights/base until your computed prediction "is close to" to the observed `y_holiday` entry.

In [97]:
X_age_location = [
    (18, 100), # age, distance from london
    (20, 5), 
    (34, 15),
    (52, 20),
]

y_holiday = [ # cost spent on holiday
    100,
    200,
    500, 
    550
]

HINT: the first `zip` loop above.

---

#### Stretch
* using three nested loops, find a good set of weights/base:

```python
attempts = []

for a_weight in range(-10, 10):
    for l_weight in range(-10, 10):
        for x,y in ...
            yhat = ... a_weight/10 ... + l_weight/10 ..  + ...
        total_error = sum(errors)
        
        attempts.append( (total_error, (a_weight, l_weight) )
                        
min(attempts) # entry with the lowest error
```