# [Elements of AI: Building AI](https://buildingai.elementsofai.com/)

# Getting started with AI

In [30]:
# Install necessary packages withing the current environment
!python -m pip install numpy
!python -m pip install -U scikit-learn



<IPython.core.display.Javascript object>

## II.Optimization

### Exercise 1: Listing pineapple routes

###  -- Advanced

Imagine that you've been assigned the task to plan the route of a container ship loaded with pineapples. The ship starts in Panama, loaded with delicious Fairtrade pineapples. There are four other ports, New York, Casablanca, Amsterdam, and Helsinki, where pineapple-craving citizens are eagerly waiting. The ship must visit each of the four destination ports exactly once, but the order in which each port is visited is free. The goal is to minimize the carbon emissions, which means that a shorter route is better than a longer one.

To solve this problem, it is enough to list all the possible routes that start from Panama and visit each of the other ports exactly once, calculate the carbon emissions of each route, and print out the one with the least emissions.

Before we try to find the optimal route, let's start by listing all the alternative routes. After all, it wouldn't make sense to stop at any port more than once.

Write a program that takes a list (in this case, the names of the ports) and prints out all the possible orderings of them. The mathematical term for such orderings is a permutation. Note that your program should work for an input list of any length. The order in which the permutations are printed doesn't matter, but they should all begin with Panama (PAN).

The format of the output should be such that each permutation is printed on its own row as one string with the port names separated by spaces. You can use the join function as follows: `print(' '.join([portnames[i] for i in route]))`.

In [31]:
from itertools import permutations as perm

portnames = ["PAN", "AMS", "CAS", "NYC", "HEL"]


def permutations(route, ports):
    for route in perm(ports):
        print("PAN " + " ".join([portnames[i] for i in route]))


# this will start the recursion with 0 as the first stop
permutations([0], list(range(1, len(portnames))))

PAN AMS CAS NYC HEL
PAN AMS CAS HEL NYC
PAN AMS NYC CAS HEL
PAN AMS NYC HEL CAS
PAN AMS HEL CAS NYC
PAN AMS HEL NYC CAS
PAN CAS AMS NYC HEL
PAN CAS AMS HEL NYC
PAN CAS NYC AMS HEL
PAN CAS NYC HEL AMS
PAN CAS HEL AMS NYC
PAN CAS HEL NYC AMS
PAN NYC AMS CAS HEL
PAN NYC AMS HEL CAS
PAN NYC CAS AMS HEL
PAN NYC CAS HEL AMS
PAN NYC HEL AMS CAS
PAN NYC HEL CAS AMS
PAN HEL AMS CAS NYC
PAN HEL AMS NYC CAS
PAN HEL CAS AMS NYC
PAN HEL CAS NYC AMS
PAN HEL NYC AMS CAS
PAN HEL NYC CAS AMS


<IPython.core.display.Javascript object>

### Exercise 2: Pineapple route emissions

### -- Advanced

Having listed the alternatives, next we can calculate the carbon emissions for each of them.
Modify the code so that it finds the route with minimum carbon emissions and prints it out. Again, the program should work for any number of ports. You can assume that the distances between the ports are given in an array of the appropriate size so that the distance between ports i and j is found in `D[i][j]`.

In [32]:
from itertools import permutations as perm

portnames = ["PAN", "AMS", "CAS", "NYC", "HEL"]

# https://sea-distances.org/
# nautical miles converted to km

D = [
    [0, 8943, 8019, 3652, 10545],
    [8943, 0, 2619, 6317, 2078],
    [8019, 2619, 0, 5836, 4939],
    [3652, 6317, 5836, 0, 7825],
    [10545, 2078, 4939, 7825, 0],
]

# https://timeforchange.org/co2-emissions-shipping-goods
# assume 20g per km per metric ton (of pineapples)

co2 = 0.020

# DATA BLOCK ENDS

# these variables are initialised to nonsensical values
# your program should determine the correct values for them
smallest = 1000000
bestroute = [0, 0, 0, 0, 0]


def permutations(route, ports):
    global smallest, bestroute

    for r in perm(ports):
        r = route + list(r)
        # em = co2 * (D[r[0]][r[1]] + D[r[1]][r[2]] + D[r[2]][r[3]] + D[r[3]][r[4]])
        em = co2 * sum(D[i][j] for i, j in zip(r[:-1], r[1:]))
        if em < smallest:
            smallest = em
            bestroute = r


def main():
    # this will start the recursion
    permutations([0], list(range(1, len(portnames))))

    # print the best route and its emissions
    print(" ".join([portnames[i] for i in bestroute]) + " %.1f kg" % smallest)


main()

PAN NYC CAS AMS HEL 283.7 kg


<IPython.core.display.Javascript object>

## III.Hill climbing

### Exercise 3: Reach the highest summit

### -- Advanced

Let the elevation at each point on the mountain be stored in array h of size 100. The elevation at the leftmost point is thus stored in `h[0]` and the elevation at the rightmost point is stored in `h[99]`.

The following program starts at a random position and keeps going to the right until Venla can no longer go up. However, perhaps the mountain is a bit rugged which means it's necessary to look a bit further ahead.

Edit the program so that Venla doesn't stop climbing as long as she can go up by moving up to five steps either left or right. If there are multiple choices within five steps that go up, any one of them is good. To check how your climbing algorithm works in action, you can plot the results of your hill climbing using the Plot button. The summit will be marked with a blue triangle.

In [33]:
import math
import random  # just for generating random mountains

# generate random mountains
w = [0.05, random.random() / 3, random.random() / 3]
h = [
    1.0
    + math.sin(1 + x / 0.6) * w[0]
    + math.sin(-0.3 + x / 9.0) * w[1]
    + math.sin(-0.2 + x / 30.0) * w[2]
    for x in range(100)
]


def climb(x, h):
    # keep climbing until we've found a summit
    summit = False
    steps_max = 5  # range to check

    # Edit the program so that Venla doesn't stop climbing as long as she can go up by moving up to five steps either left or right.
    while not summit:
        summit = True
        for x_new in range(max(0, x - steps_max), min(99, x + steps_max)):
            if h[x_new] > h[x]:
                x = x_new  # here is higher, go here
                summit = False  # and keep going
    return x


def main(h):
    # start at a random place
    x0 = random.randint(1, 98)
    x = climb(x0, h)

    return x0, x


main(h)

(81, 72)

<IPython.core.display.Javascript object>

### Exercise 4: Probabilities

### -- Advanced

Write a program that prints "I love" followed by one word: the additional word should be 'dogs' with 80% probability, 'cats' with 10% probability, and 'bats' with 10% probability.

In [34]:
import random

x = random.random()
if x < 0.8:
    favourite = "dogs"
elif x < 0.9:
    favourite = "cats"
else:
    favourite = "bats"

print("I love", favourite)

I love dogs


<IPython.core.display.Javascript object>

### Exercise 5: Warm-up Temperature

### -- Advanced

**Simulated Annealing: the math**

The probability of accepting the new solution with score `S_new` when the current solution has score `S_old` is given by the formula:

`prob = exp(–(S_old – S_new)÷T)`

where T is the temperature. (Remember that the temperature is an abstract concept that ideally starts high and gradually decreases towards zero.) The function `exp(x)` is the exponent function which can also be written mathematically as `e^x` (the so called Euler's constant e ≅ 2.71828 raised to power x).

Suppose the current solution has score S_old = 150 and you try a small modification to create a new solution with score S_new = 140. In the greedy solution, this new solution wouldn't be accepted because it would mean a decrease in the score. In simulated annealing, the new solution is accepted with a certain probability as explained above.

Modify the accept_prob function so that it returns the probability of accepting the new state using simulated annealing. The program should take the two score values (the current and the new) and the temperature value as arguments.

In [35]:
import random
from numpy import exp

# from math import e


def accept_prob(S_old, S_new, T):
    # this is the acceptance "probability" in the greedy hill-climbing method
    # where new solutions are accepted if and only if they are better
    # than the old one.
    # change it to be the acceptance probability in simulated annealing
    return 1.0 if S_new > S_old else exp(-(S_old - S_new) / T)


# the above function will be used as follows. this is shown just for
# your information; you don't have to change anything here
def accept(S_old, S_new, T):
    if random.random() < accept_prob(S_old, S_new, T):
        print(True)
    else:
        print(False)

<IPython.core.display.Javascript object>

### Exercise 6: Simulated Annealing

### --Intermediate

1D simulated annealing: modify the program below to use simulated annealing instead of plain hill climbing. In simulated annealing the probability of accepting a solution that lowers the score is given by `prob = exp(-(S_old - S_new)/T)`. Setting the temperature T and gradually decreasing can be done in many ways, some of which lead to better outcomes than others. A good choice in this case is for example: `T = 2*max(0, ((steps-step*1.2)/steps))**3`.

The code below uses the plain hill-climbing strategy to only go up towards a peak. As you can see, the hill-climbing strategy tends to get stuck in local optima.

In [36]:
import math, random  # just for generating random mountains
import numpy as np

n = 10000  # size of the problem: number of possible solutions x = 0, ..., n-1

# generate random mountains
def mountains(n):
    h = [0] * n
    for i in range(50):
        c = random.randint(20, n - 20)
        w = random.randint(3, int(math.sqrt(n / 5))) ** 2
        s = random.random()
        h[max(0, c - w) : min(n, c + w)] = [
            h[i] + s * (w - abs(c - i)) for i in range(max(0, c - w), min(n, c + w))
        ]

    # scale the height so that the lowest point is 0.0 and the highest peak is 1.0
    low = min(h)
    high = max(h)
    h = [y - low for y in h]
    h = [y / (high - low) for y in h]
    return h


h = mountains(n)

# start at a random place
x0 = random.randint(1, n - 1)
x = x0

# keep climbing for 5000 steps
steps = 5000


def main(h, x):
    n = len(h)
    # the climbing starts here
    for step in range(steps):
        # this is our temperature to to be used for simulated annealing
        # it starts large and decreases with each step. you don't have to change this
        T = 2 * max(0, ((steps - step * 1.2) / steps)) ** 3

        # let's try randomly moving (max. 1000 steps) left or right
        # making sure we don't fall off the edge of the world at 0 or n-1
        # the height at this point will be our candidate score, S_new
        # while the height at our current location will be S_old
        x_new = random.randint(max(0, x - 1000), min(n - 1, x + 1000))

        if h[x_new] > h[x]:
            x = x_new  # the new position is higher, go there
        else:
            if T != 0 and random.random() <= np.exp(-(h[x] - h[x_new]) / T):
                x = x_new
            # if T == 0:
            #    pass
            # elif random.random() <= np.exp(-(h[x] - h[x_new])/T):
            #    x = x_new
    return x


x = main(h, x0)
print("Ended up at %d, highest point is %d" % (x, np.argmax(h)))

Ended up at 4958, highest point is 883


<IPython.core.display.Javascript object>

### --Advanced

Let's use simulated annealing to solve a simple two-dimensional optimization problem. The following code runs 50 optimization tracks in parallel (at the same time). It currently only looks around the current solution and only accepts moves that go up. Modify the program so that it uses simulated annealing.

Remember that the probability of accepting a solution that lowers the score is given by `prob = exp(–(S_old - S_new)/T)`. Remember to also adjust the temperature in a way that it decreases as the simulation goes on, and to handle T=0 case correctly.

Your goal is to ensure that on the average, at least 30 of the optimization tracks find the global optimum (the highest peak).

In [37]:
import numpy as np
import random

N = 100  # size of the problem is N x N
steps = 3000  # total number of iterations
tracks = 50

# generate a landscape with multiple local optima
def generator(x, y, x0=0.0, y0=0.0):
    return (
        np.sin((x / N - x0) * np.pi)
        + np.sin((y / N - y0) * np.pi)
        + 0.07 * np.cos(12 * (x / N - x0) * np.pi)
        + 0.07 * np.cos(12 * (y / N - y0) * np.pi)
    )


x0 = np.random.random() - 0.5
y0 = np.random.random() - 0.5
h = np.fromfunction(np.vectorize(generator), (N, N), x0=x0, y0=y0, dtype=int)
peak_x, peak_y = np.unravel_index(np.argmax(h), h.shape)

# starting points
x = np.random.randint(0, N, tracks)
y = np.random.randint(0, N, tracks)


def main():
    global x
    global y

    for step in range(steps):
        # add a temperature schedule here
        T = 2 * max(0, ((steps - step * 1.2) / steps)) ** 3
        # update solutions on each search track
        for i in range(tracks):
            # try a new solution near the current one
            x_new = np.random.randint(max(0, x[i] - 2), min(N, x[i] + 2 + 1))
            y_new = np.random.randint(max(0, y[i] - 2), min(N, y[i] + 2 + 1))
            S_old = h[x[i], y[i]]
            S_new = h[x_new, y_new]

            # change this to use simulated annealing
            if S_new > S_old:
                x[i], y[i] = x_new, y_new  # new solution is better, go there
            else:
                if T != 0 and random.random() <= np.exp(-(S_old - S_new) / T):
                    x[i], y[i] = x_new, y_new

    # Number of tracks found the peak
    print(sum([x[j] == peak_x and y[j] == peak_y for j in range(tracks)]))


main()

39


<IPython.core.display.Javascript object>

# Dealing with uncertainty

## I.Probability fundamentals

### Exercise 7: Flip the coin

### -- Advanced

Write a program that generates 10000 random zeros and ones where the probability of one is p1 and the probability of zero is 1-p1 (hint: `np.random.choice([0,1], p=[1-p1, p1], size=10000)`), counts the number of occurrences of 5 consecutive ones ("11111") in the sequence, and outputs this number as a return value. Check that for p1 = 2/3, the count is close to 10000 x (2/3)^5 ≈ 1316.9.

In [38]:
import numpy as np


def generate(p1):
    # change this so that it generates 10000 random zeros and ones
    # where the probability of one is p1
    seq = np.empty(10000)
    seq = np.random.choice([0, 1], p=[1 - p1, p1], size=10000)
    return seq


def count(seq):
    # insert code to return the number of occurrences of 11111 in the sequence
    seq = "".join(map(str, seq))
    tofind = "11111"
    found_num = 0
    i = seq.find(tofind)
    while i != -1:
        found_num += 1
        i = seq.find(tofind, i + 1)
    return found_num


def main(p1):
    seq = generate(p1)
    return count(seq)


print(main(2 / 3))

1352


<IPython.core.display.Javascript object>

*The probability of "11111" at any given position in the sequence can be calculated as (2/3)^5 ≈ 0.13169. The number of occurrences is close to 10000 times this: 1316.9. To be more precise, the expected number of occurrences is about 0.13169 x 9996 ≈ 1316.3, because there are only 9996 places for a subsequence of length five in a sequence of 10000. The actual number will usually (in fact, with over 99% probability) be somewhere between 1230 and 1404. We check the solution allowing for an even wider margin that covers 99.99% of the cases.*

### Exercise 8: Fishing in the Nordics

### -- Advanced

Suppose we also happen to know the gender of the lottery winner. Here are same OECD statistics as above broken down by gender:

|Country	|Population	|Male fishers	|Female fishers	|Fishers (total)|
|:----------|:----------|:--------------|:--------------|:--------------|
|Denmark	|5,615,000	|1822	        |69	            |1891           |
|Finland	|5,439,000	|2575	        |77	            |2652           |
|Iceland	|324,000	|3400	        |400            |3800           |
|Norway	    |5,080,000	|11,291	        |320	        |11,611         |
|Sweden 	|9,609,000	|1731	        |26	            |1757           |
|TOTAL	    |26,067,000	|20,819	        |892	        |21,711         |

Write a function that uses the above numbers and tries to guess the nationality of the winner when we know that the winner is a fisher and their gender (either female or male).

The argument of the function should be the gender of the winner ('female' or 'male'). The return value of the function should be a pair (country, probability) where country is the most likely nationality of the winner and probability is the probability of the country being the nationality of the winner.

In [39]:
countries = ["Denmark", "Finland", "Iceland", "Norway", "Sweden"]
populations = [5615000, 5439000, 324000, 5080000, 9609000]
male_fishers = [1822, 2575, 3400, 11291, 1731]
female_fishers = [69, 77, 400, 320, 26]


def guess(winner_gender):
    """guess the nationality of the winner
    when we know that the winner is a fisher and their gender
    """
    # P(nat ∣ male_fisher)= male_fishers(nat)÷fishers(total)
    # P(nat ∣ female_fisher)= female_fishers(nat)÷fishers(total)

    if winner_gender == "female":
        fishers = female_fishers
    else:
        fishers = male_fishers

    # write your solution here
    guess = None
    biggest = 0.0
    for country, fisher in zip(countries, fishers):
        frac = fisher / sum(fishers) * 100
        if frac > biggest:
            guess = country
            biggest = frac
    return (guess, biggest)


def main():
    country, fraction = guess("male")
    print(
        "if the winner is male, my guess is he's from %s; probability %.2f%%"
        % (country, fraction)
    )
    country, fraction = guess("female")
    print(
        "if the winner is female, my guess is she's from %s; probability %.2f%%"
        % (country, fraction)
    )


main()

if the winner is male, my guess is he's from Norway; probability 54.23%
if the winner is female, my guess is she's from Iceland; probability 44.84%


<IPython.core.display.Javascript object>

## II.The Bayes Rule

### Exercise 9: Block or not

### -- Advanced

Let's suppose you have a social media account on Instagram, Twitter, or some other platform. (Just in case you don't, it doesn't matter. We'll fill you in with the relevant information.) You check your account and notice that you have a new follower – this means that another user has decided to start following you to see things that you post. You don't recognize the person, and their username (or "handle" as it's called) is a little strange: John37330190. You don't want to have creepy bots following you, so you wonder. To decide whether you should block the new follower, you decide to use the Bayes rule!

Suppose we know the probability that a new follower is a bot. You'll be writing a program that takes this value as an input. For now, let's just call this value P(bot). You'll also be given the probability that the username of a bot account includes an 8-digit number, which we'll call P(8-digits | bot), as well as the same probability for human (non-bot) accounts, P(8-digits | human).

To use the Bayes rule, we'll also need to know the probability that a new follower (can be either bot or human) has an 8-digit number in their username, P(8-digits). The nice thing is that we can calculate P(8-digits) from the above information. The formula is as follows:

`P(8-digits) = P(8-digits | bot) x P(bot) + P(8-digits | human) x P(human)`

Remember that you can get P(human) simply as 1–P(bot), since these are the only options. (We consider business and other accounts as "human" as long as they aren't bots.)

Write a program that takes as input the probability of a follower being a bot (pbot), the probability of a bot having a username with 8 digits (p8_bot), and the probability of a human having a username with 8 digits (p8_human). The values for these inputs are free for you to choose, but they have to be probabilitites, so they have to be between 0 and 1.

Using the numbers you give the program calculate P(8-digits) and then use it and the Bayes rule to calculate and print out the probability of the new follower being a bot, P(bot | 8-digits):

`P(bot | 8-digits) =  P(8-digits | bot) x P(bot) / P(8-digits)`.

In [40]:
def bot8(pbot, p8_bot, p8_human):
    # P(8-digits) = P(8-digits | bot) x P(bot) + P(8-digits | human) x P(human)
    p8 = p8_bot * pbot + p8_human * (1 - pbot)
    # P(bot | 8-digits) =  P(8-digits | bot) x P(bot) / P(8-digits)
    pbot_8 = p8_bot * pbot / p8
    print(pbot_8)


# you can change these values to test your program with different values
pbot = 0.1
p8_bot = 0.8
p8_human = 0.05

bot8(pbot, p8_bot, p8_human)

0.64


<IPython.core.display.Javascript object>

## III.Naive Bayes classifier

### Exercise 10: Naive Bayes classifier

### -- Advanced

We have two dice in our desk drawer. One is a normal, plain die with six sides such that each of the sides comes up with equal 1/6 probability. The other one is a loaded die that also has six sides, but that however gives the outcome 6 with every second try on the average, the other five sides being equally probable.

Thus with the first, normal die the probabilities of each side are the same, 0.167 (or 16.7 %). With the second, loaded die, the probability of 6 is 0.5 (or 50 %) and each of the other five sides has probability 0.1 (or 10 %).

The following program gets as its input the choice of the die and then simulates a sequence of ten rolls.

Your task: starting from the odds 1:1, use the naive Bayes method to update the odds after each outcome to decide which of the dice is more likely. Edit the function bayes so that it returns True if the most likely die is the loaded one, and False otherwise. Remember to be careful with the indices when accessing list elements!

In [41]:
import numpy as np

p1 = [1 / 6, 1 / 6, 1 / 6, 1 / 6, 1 / 6, 1 / 6]  # normal
p2 = [0.1, 0.1, 0.1, 0.1, 0.1, 0.5]  # loaded


def roll(loaded):
    if loaded:
        print("Rolling a loaded die")
        p = p2
    else:
        print("Rolling a normal die")
        p = p1

    # roll the dice 10 times
    # add 1 to get dice rolls from 1 to 6 instead of 0 to 5
    sequence = np.random.choice(6, size=10, p=p) + 1
    for roll in sequence:
        print("rolled %d" % roll)
    return sequence


def bayes(sequence):
    """
    Starting from the odds 1:1, use the naive Bayes method to update the odds after each outcome to decide which of the dice is more likely
    Edit the function bayes so that it returns True if the most likely die is the loaded one, and False otherwise.
    """
    odds = 1.0  # start with odds 1:1
    for roll in sequence:
        odds *= p2[roll - 1] / p1[roll - 1]
        # edit here to update the odds
    return True if odds > 1 else False


sequence = roll(False)  # False = normal die, try changing to True
if bayes(sequence):
    print("I think loaded")
else:
    print("I think normal")

Rolling a normal die
rolled 6
rolled 4
rolled 1
rolled 4
rolled 2
rolled 6
rolled 6
rolled 6
rolled 6
rolled 4
I think loaded


<IPython.core.display.Javascript object>

# Machine learning

## I.Linear regression

### Exercise 11: Real estate price predictions

### -- Advanced

Edit the following program so that it can process multiple cabins that may be described by any number of details (like five below), at the same time. You can assume that each of the lists contained in the list x and the coefficients c contain the same number of elements.

In [42]:
# input values for three mökkis: size, size of sauna, distance to water, number of indoor bathrooms,
# proximity of neighbors
X = [[66, 5, 15, 2, 500], [21, 3, 50, 1, 100], [120, 15, 5, 2, 1200]]
c = [3000, 200, -50, 5000, 100]  # coefficient values


def predict(X, c):
    for cabin in range(len(X)):
        price = sum(map(lambda xx, cc: xx * cc, X[cabin], c))
        print(price)


predict(X, c)

258250
76100
492750


<IPython.core.display.Javascript object>

### Exercise 12: Least squares

### --Advanced

Write a program that calculates the squared error for multiple sets of coefficient values and prints out the index of the set that yields the smallest squared error: this is a poor man's version of the least squares method where we only consider a fixed set of alternative coefficient vectors instead of finding the global optimum.

In [43]:
import numpy as np

# data
X = np.array([[66, 5, 15, 2, 500], [21, 3, 50, 1, 100], [120, 15, 5, 2, 1200]])
y = np.array([250000, 60000, 525000])

# alternative sets of coefficient values
c = np.array(
    [
        [3000, 200, -50, 5000, 100],
        [2000, -250, -100, 150, 250],
        [3000, -100, -150, 0, 150],
    ]
)


def find_best(X, y, c):
    smallest_error = np.Inf
    best_index = -1
    for ind, coeff in enumerate(c):
        sqerr = sum((y - X @ coeff) ** 2)
        if sqerr < smallest_error:
            best_index = ind
            smallest_error = sqerr
    print("the best set is set %d" % best_index)


find_best(X, y, c)

the best set is set 1


<IPython.core.display.Javascript object>

### Exercise 13: Predictions with more data

### -- Advanced

Write a program that reads cabin details and prices from a CSV file (a standard format for tabular data) and fits a linear regression model to it. The program should be able to handle any number of data points (cabins) described by any number of features (like size, size of sauna, number of bathrooms, ...).

You can read a CSV file with the function `np.genfromtxt(datafile, skip_header=1)`. This will return a numpy array that contains the feature data in the columns preceding the last one, and the price data in the last column. The option skip_header=1 just means that the first line in the file is supposed to contain just the column names and shouldn't be included in the actual data.

The output of the program should be the **estimated** coefficients and the **predicted or "fitted"** prices for the same set of cabins used to estimate the parameters. So if you fit the model using data for six cabins with known prices, the program will print out the prices that the model predicts for those six cabins (even if the actual prices are already given in the data).

Note that here we will not actually only simulate the file input using Python's **io.StringIO** function that takes an input string and pretends that the contents is coming from a file. In practice, you would just name the input file that contains the data in the same format as the string input below.

In [44]:
import numpy as np
from io import StringIO

input_string = """
25 2 50 1 500 127900
39 3 10 1 1000 222100
13 2 13 1 1000 143750
82 5 20 2 120 268000
130 6 10 2 600 460700
115 6 10 1 550 407000
"""

np.set_printoptions(
    precision=1
)  # this just changes the output settings for easier reading


def fit_model(input_file):
    # read the data in and fit it. the values below are placeholder values
    c = np.asarray([])  # coefficients of the linear regression
    x = np.asarray([])  # input data to the linear regression
    # This will return a numpy array that contains the feature data in the columns preceding the last one,
    # and the price data in the last column.
    data = np.genfromtxt(input_file, skip_header=1)
    x = data[:, :-1]
    y = data[:, -1]
    c = np.linalg.lstsq(x, y)[0]
    print(c)
    print(x @ c)


# simulate reading a file
input_file = StringIO(input_string)
fit_model(input_file)

[2989.6  800.6  -44.8 3890.8   99.8]
[127907.6 222269.8 143604.5 268017.6 460686.6 406959.9]
  c = np.linalg.lstsq(x, y)[0]


<IPython.core.display.Javascript object>

### Exercise 14: Training data vs test data

### -- Advanced

Write a program that reads data about one set of cabins (training data), estimates linear regression coefficients based on it, then reads data about another set of cabins (test data), and predicts the prices in it. Note that both data sets contain the actual prices, but the program should ignore the prices in the second set. They are given only for comparison.

You can read the data into the program the same way as in the previous exercise.

You should then separate the feature and price data that you have just read from the file into two separate arrays names `x_train` and `y_train`, so that you can use them as argument to `np.linalg.lstsq`.

The program should work even if the number of features used to describe the cabins differs from five (as long as the same number of features are given in each file).

The output should be the set of coefficients for the linear regression and the predicted prices for the second set of cabins.

In [45]:
import numpy as np
from io import StringIO


train_string = """
25 2 50 1 500 127900
39 3 10 1 1000 222100
13 2 13 1 1000 143750
82 5 20 2 120 268000
130 6 10 2 600 460700
115 6 10 1 550 407000
"""

test_string = """
36 3 15 1 850 196000
75 5 18 2 540 290000
"""


def main():
    np.set_printoptions(
        precision=1
    )  # this just changes the output settings for easier reading

    # read in the training data and separate it to x_train and y_train
    #   simulate reading a file
    input_train, input_test = StringIO(train_string), StringIO(test_string)
    data_train, data_test = np.genfromtxt(input_train, skip_header=1), np.genfromtxt(
        input_test, skip_header=1
    )

    x_train = data_train[:, :-1]
    y_train = data_train[:, -1]

    # fit a linear regression model to the data and get the coefficients
    c = np.asarray([])
    c = np.linalg.lstsq(x_train, y_train)[0]

    # read in the test data and separate x_test from it
    x_test = np.asarray([])
    x_test = data_test[:, :-1]

    # print out the linear regression coefficients
    print(c)

    # this will print out the predicted prics for the two new cabins in the test data set
    print(x_test @ c)


main()

[2989.6  800.6  -44.8 3890.8   99.8]
[198102.4 289108.3]
  c = np.linalg.lstsq(x_train, y_train)[0]


<IPython.core.display.Javascript object>

## II.The nearest neighbor method

### Exercise 15: Vector distances

### -- Advanced

You are given an array x_train with multiple input vectors (the "training data") and another array x_test with one more input vector (the "test data"). Find the vector in x_train that is most similar to the vector in x_test. In other words, find the nearest neighbor of the test data point x_test.

The code template gives the function dist to calculate the distance between any two vectors. What you need to add is the implementation of the function nearest that takes the arrays x_train and x_test and prints the index (as an integer between 0, ..., len(x_train)-1) of the nearest neighbor.

In [46]:
import numpy as np

x_train = np.random.rand(10, 3)  # generate 10 random vectors of dimension 3
x_test = np.random.rand(3)  # generate one more random vector of the same dimension


def dist(a, b):
    sum = 0
    for ai, bi in zip(a, b):
        sum = sum + (ai - bi) ** 2
    return np.sqrt(sum)


def nearest(x_train, x_test):
    nearest = -1
    min_distance = np.Inf
    # add a loop here that goes through all the vectors in x_train and finds the one that
    # is nearest to x_test. return the index (between 0, ..., len(x_train)-1) of the nearest
    # neighbor
    for i, x in enumerate(x_train):
        distance = dist(x, x_test)
        if distance < min_distance:
            min_distance = distance
            nearest = i
    print(nearest)


nearest(x_train, x_test)

3


<IPython.core.display.Javascript object>

### Exercise 16: Nearest neighbor

### --

In the basic nearest neighbor classifier, the only thing that matters is the class label of the nearest neighbor. But the nearest neighbor may sometimes be noisy or otherwise misleading. Therefore, it may be better to also consider the other nearby data points in addition to the nearest neighbor.

This idea leads us to the so called k-nearest neighbor method, where we consider all the k nearest neighbors. If k=3, for example, we'd take the three nearest points and choose the class label based on the majority class among them.

The program below uses the library **sklearn** to create random data. Our input variable X has two features (compare to, say, cabin size and cabin price) and our target variable y is binary: it is either 0 or 1 (again think, for example, "is the cabin awesome or not.")

Complete the following program so that it finds the three nearest data points (k=3) for each of the test data points and classifies them based on the majority class among the neighbors. Currently it generates the random data, splits it into training and test sets, calculates the distances between each test set items and the training set items, but it fails to classify the test set items according to the correct class, setting them all to belong to class 0. Instead of looking at just the nearest neighbor's class, it should use three neighbors and pick the majority class (the most common) class among the three neighbours, and use that as the class for the test item.

In [47]:
## INTERM
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split


# create random data with two classes
X, y = make_blobs(n_samples=16, n_features=2, centers=2, center_box=(-2, 2))

# scale the data so that all values are between 0.0 and 1.0
X = MinMaxScaler().fit_transform(X)

# split two data points from the data as test data and
# use the remaining n-2 points as the training data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=2)

# place-holder for the predicted classes
y_predict = np.empty(len(y_test), dtype=np.int64)

# produce line segments that connect the test data points
# to the nearest neighbors for drawing the chart
lines = []


# distance function
def dist(a, b):
    sum = 0
    for ai, bi in zip(a, b):
        sum = sum + (ai - bi) ** 2
    return np.sqrt(sum)


def main(X_train, X_test, y_train, y_test):

    global y_predict
    global lines

    # process each of the test data points
    for i, test_item in enumerate(X_test):
        # calculate the distances to all training points
        distances = [dist(train_item, test_item) for train_item in X_train]

        # find the index of the nearest neighbor
        nearest = np.argmin(distances)

        # create a line connecting the points for the chart
        lines.append(np.stack((test_item, X_train[nearest])))

        # add your code here:
        # y_predict[i] = 0          # this just classifies everything as 0
        y_predict[i] = y_train[nearest]

    print(y_predict)


main(X_train, X_test, y_train, y_test)

[0 1]


<IPython.core.display.Javascript object>

In [48]:
## ADVANCED

import numpy as np
from sklearn.datasets import make_blobs
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split


# create random data with two classes
X, Y = make_blobs(n_samples=16, n_features=2, centers=2, center_box=(-2, 2))

# scale the data so that all values are between 0.0 and 1.0
X = MinMaxScaler().fit_transform(X)

# split two data points from the data as test data and
# use the remaining n-2 points as the training data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=2)

# place-holder for the predicted classes
y_predict = np.empty(len(y_test), dtype=np.int64)

# produce line segments that connect the test data points
# to the nearest neighbors for drawing the chart
lines = []

# distance function
def dist(a, b):
    sum = 0
    for ai, bi in zip(a, b):
        sum = sum + (ai - bi) ** 2
    return np.sqrt(sum)


def main(X_train, X_test, y_train, y_test):

    global y_predict
    global lines

    k = 3  # classify our test items based on the classes of 3 nearest neighbors

    # process each of the test data points
    for i, test_item in enumerate(X_test):
        # calculate the distances to all training points
        distances = [dist(train_item, test_item) for train_item in X_train]

        # add your code here
        # nearest = np.argmin(distances)       # this just finds the nearest neighbour (so k=1)
        y_train_k = [y_train[i] for i in np.argpartition(distances, k)[:k]]
        nearest = np.bincount(y_train_k).argmax()

        # create a line connecting the points for the chart
        # you may change this to do the same for all the k nearest neigbhors if you like
        # but it will not be checked in the tests
        lines.append(np.stack((test_item, X_train[nearest])))

        y_predict[i] = nearest  # 0          # this just classifies everything as 0

    print(y_predict)


main(X_train, X_test, y_train, y_test)

[0 1]


<IPython.core.display.Javascript object>

## III. Working with text

### Exercise 17: Bag of words

### -- Advanced

Your task is to write a program that calculates the distances (or differences) between every pair of lines in the This Little Piggy rhyme and find the most similar pair. Use the Manhattan distance as your distance metric.

You can start by building a numpy array with all the distances. Notice that the diagonal elements (elements at positions [i, j] with i=j) will be equal to zero because each row is equal to itself. To avoid selecting them, you can assign the value np.inf (the maximum possible floating point value). Note that to do this, it's necessary to make sure the type of the array is float. A convenient and fast way to get the index of the element with the lowest value in a 2D array (or in fact, any dimension) is by the function
np.unravel_index(np.argmin(dist), dist.shape))
where dist is the array. This will return the index of the lowest valued element as a list of length two (assuming the array is two-dimensional).

In [49]:
import numpy as np

data = [
    [1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1],
    [1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1],
    [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 0, 1, 3, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1],
]


def distance(row1, row2):
    return sum(abs(i - j) for i, j in zip(row1, row2))


def find_nearest_pair(data):
    N = len(data)
    dist = np.empty((N, N), dtype=float)

    # for i in range(N):
    #  for j in range(N):
    #    dist[i, j] = np.inf if i == j else distance(data[i], data[j])
    # shorter version:
    dist = np.array(
        [
            np.array(
                [distance(sent1, sent2) if sent1 != sent2 else np.inf for sent1 in data]
            )
            for sent2 in data
        ]
    )

    print(np.unravel_index(np.argmin(dist), dist.shape))


find_nearest_pair(data)

(2, 3)


<IPython.core.display.Javascript object>

### Exercise 18: TF-IDF

### -- Intermediate

Modify the following program to print out the tf-idf values for each document and each word. The following code calculates the tf and df values, so you'll just need to combine them according to the correct formula. There are three documents (sentences) and a total of eight terms (unique words), so the output should be three lists of eight tf-idf values each.

In [50]:
# Modify the following program to print out the tf-idf values for each document and each word.

# DATA BLOCK

text = """he really really loves coffee
my sister dislikes coffee
my sister loves tea"""

import math


def main(text):
    # split the text first into lines and then into lists of words
    docs = [line.split() for line in text.splitlines()]

    N = len(docs)

    # create the vocabulary: the list of words that appear at least once
    vocabulary = list(set(text.split()))

    df = {}
    tf = {}
    for word in vocabulary:
        # tf: number of occurrences of word w in document divided by document length
        # note: tf[word] will be a list containing the tf of each word for each document
        # for example tf['he'][0] contains the term frequence of the word 'he' in the first
        # document
        tf[word] = [doc.count(word) / len(doc) for doc in docs]

        # df: number of documents containing word w
        df[word] = sum([word in doc for doc in docs]) / N

    # loop through documents to calculate the tf-idf values
    for doc_index, doc in enumerate(docs):
        tfidf = []
        for word in vocabulary:
            to_append = tf[word][doc_index] * math.log(1 / df[word], 10)
            if to_append != 0:
                tfidf.append(to_append)
        print(tfidf)


main(text)

[0.19084850188786498, 0.09542425094393249, 0.03521825181113625, 0.03521825181113625]
[0.04402281476392031, 0.04402281476392031, 0.04402281476392031, 0.11928031367991561]
[0.04402281476392031, 0.04402281476392031, 0.04402281476392031, 0.11928031367991561]


<IPython.core.display.Javascript object>

### -- Advanced

Write a program that uses the tf-idf vectors to find the most similar pair of lines in a given data set. You can test your solution with the example text below. Note, however, that your solution will be tested on other data sets too, so make sure you don't make use of any special properties of the example data (like there being four lines of text).

In [51]:
text = """Humpty Dumpty sat on a wall
Humpty Dumpty had a great fall
all the king's horses and all the king's men
couldn't put Humpty together again"""

import math
import numpy as np


def main(text):
    # 1. split the text into words, and get a list of unique words that appear in it
    # a short one-liner to separate the text into sentences (with words lower-cased to make words equal
    # despite casing) can be done with
    # docs = [line.lower().split() for line in text.split('\n')]
    text = text.lower()
    voc = list(set(text.split()))
    docs = [line.split() for line in text.split("\n")]

    # 2. go over each unique word and calculate its term frequency, and its document frequency
    # The document frequency of a word is the number of documents that contain at least one occurrence of the word
    tf = dict()
    df = dict()
    for word in voc:
        tf[word] = [doc.count(word) / len(doc) for doc in docs]
        df[word] = sum([word in doc for doc in docs]) / len(docs)

    # 3. after you have your term frequencies and document frequencies, go over each line in the text and
    # calculate its TF-IDF representation, which will be a vector
    tfdf = []  # TF-IDF vector or all documents
    for doc_index, doc in enumerate(docs):
        tfidf = []
        for word in voc:
            to_append = tf[word][doc_index] * math.log(1 / df[word], 10)
            # adding even 0 values as otherwise vectors have different length
            tfidf.append(to_append)
        tfdf.append(tfidf)

    # 4. after you have calculated the TF-IDF representations for each line in the text, you need to
    # calculate the distances between each line to find which are the closest.
    def distance(row1, row2):
        return sum(abs(i - j) for i, j in zip(row1, row2))

    def find_nearest_pair(data):
        N = len(data)
        dist = np.empty((N, N), dtype=float)
        # SAME: dist = np.array([np.array([distance(sent1, sent2) if sent1 != sent2 else np.inf for sent1 in data]) for sent2 in data])
        for i in range(N):
            for j in range(N):
                dist[i, j] = np.inf if i == j else distance(data[i], data[j])
        print(np.unravel_index(np.argmin(dist), dist.shape))

    find_nearest_pair(tfdf)


main(text)

(0, 1)


<IPython.core.display.Javascript object>

## IV.Overfitting

### Exercise 19: Looking out for overfitting
### -- Advanced
The program below uses the k-nearest neighbors algorithm. The idea is to not only look at the single nearest training data point (neighbor) but for example the five nearest points, if k=5. The normal nearest neighbor classifier amounts to using k=1.

Write a program that does the classification for some value of k and prints out the training and testing accuracy.

Hint: You can get the model accuracy for a given set using the function knn.score.

Try different values of k to answer the questions below.

In [52]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import numpy as np

# do not edit this
# create fake data
x, y = make_moons(
    n_samples=500, random_state=42, noise=0.3  # the number of observations
)
x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.33, random_state=42
)

# Create a classifier and fit it to our data
knn = KNeighborsClassifier(n_neighbors=42)  # <-- that's the k!
knn.fit(x_train, y_train)

train_acc = knn.score(x_train, y_train)
test_acc = knn.score(x_test, y_test)
print(f"training accuracy: {train_acc}")
print(f"testing accuracy: {test_acc}")

training accuracy: 0.9253731343283582
testing accuracy: 0.9090909090909091


<IPython.core.display.Javascript object>

**What would be a reasonable baseline accuracy your model should outperform in order for it to be considered useful?**

- [x] 0.50
- [ ] 0.25
- [ ] any performance that is better than all wrong is enough as a baseline

There are two classes, and the data points are evenly split among them. Assigning every point to either class, or picking a class randomly would result in a 50% accuracy.

**Which of the following values of k do you think was "best"?**

- [ ] the choice of k doesn't matter
- [ ] k = 1
- [ ] k = 250
- [x] k = 42

**Why?**

- [ ] it gave the lowest training accuracy
- [ ] it gave the highest training accuracy
- [x] it gave the highest testing accuracy
- [ ] it gave the lowest testing accuracy
- [ ] the choice of k doesn't matter

**Is it possible to have a higher test set accuracy than training set accuracy?**

- [x] yes
- [ ] no

# Neural networks

## I.Logistic regression

### Exercise 20: Logistic regression

### -- Advanced

You are given a set of three input values and you also have multiple alternative sets of three coefficients. Calculate the predicted output value using the linear formula combined with the logistic activation function.

Do this with all the alternative sets of coefficients. Which of the coefficient sets yields the highest sigmoid output?

In [53]:
import math
import numpy as np

x = np.array([4, 3, 0])
c1 = np.array([-0.5, 0.1, 0.08])
c2 = np.array([-0.2, 0.2, 0.31])
c3 = np.array([0.5, -0.1, 2.53])


def sigmoid(z):
    # add your implementation of the sigmoid function here
    # Sigmoid function: s(z) = 1÷(1+exp(−z))
    z = -z  # exp(z) does not accept "-z" as argument
    print(1 / (1 + math.exp(z)))


# calculate the output of the sigmoid for x with all three coefficients
sigmoid(x @ c1)
sigmoid(x @ c2)
sigmoid(x @ c3)  # <-- this one

0.1544652650835347
0.45016600268752216
0.8455347349164652


<IPython.core.display.Javascript object>

## II.From logistic regression to neural networks

### Exercise 21: Neural Networks

### -- Advanced

We have trained a simple neural network with a larger set of cabin price data. The network predicts the price of the cabin based on the attributes of the cabin. The network consists of an input layer with five nodes, a hidden layer with two nodes, a second hidden layer with two nodes, and finally an output layer with a single node. In addition, there is a single bias node for each hidden layer and the output layer.

The program below uses the weights of this trained network to perform what is called a forward pass of the neural network. The forward pass is running the input variables through the neural network to obtain output, in this case the price of a cabin of given attributes.

The program is incomplete though. The bias nodes are not used in the version below, and the activation functions for the hidden layers and the output layer have not been properly defined.

Modify the program to use the bias nodes, and to use the ReLU activation function for the hidden nodes, and a linear (identity) activation for the output node. ReLU activation function returns either the input value of the function, or zero, whichever is the largest, and linear activation just returns the input as output. After these are done, get a prediction for the price of a cabin which is described by the following feature vector `[74, 5, 10, 2, 100]`.

In [54]:
import numpy as np

w0 = np.array(
    [
        [1.19627687e01, 2.60163283e-01],
        [4.48832507e-01, 4.00666119e-01],
        [-2.75768443e-01, 3.43724167e-01],
        [2.29138536e01, 3.91783025e-01],
        [-1.22397711e-02, -1.03029800e00],
    ]
)

w1 = np.array([[11.5631751, 11.87043684], [-0.85735419, 0.27114237]])

w2 = np.array([[11.04122165], [10.44637262]])

b0 = np.array([-4.21310294, -0.52664488])
b1 = np.array([-4.84067881, -4.53335139])
b2 = np.array([-7.52942418])

x = np.array(
    [
        [111, 13, 12, 1, 161],
        [125, 13, 66, 1, 468],
        [46, 6, 127, 2, 961],
        [80, 9, 80, 2, 816],
        [33, 10, 18, 2, 297],
        [85, 9, 111, 3, 601],
        [24, 10, 105, 2, 1072],
        [31, 4, 66, 1, 417],
        [56, 3, 60, 1, 36],
        [49, 3, 147, 2, 179],
    ]
)
y = np.array(
    [
        335800.0,
        379100.0,
        118950.0,
        247200.0,
        107950.0,
        266550.0,
        75850.0,
        93300.0,
        170650.0,
        149000.0,
    ]
)


def hidden_activation(z):
    # ReLU activation. fix this!
    # ReLU activation function returns either the input value of the function, or zero, whichever is the largest
    return np.maximum(0, z)


def output_activation(z):
    # identity (linear) activation. fix this!
    # linear activation just returns the input as output
    return z


x_test = [[72, 2, 25, 3, 450], [60, 3, 15, 1, 300], [74, 5, 10, 2, 100]]
for item in x_test:
    h1_in = (
        np.dot(item, w0) + b0
    )  # this calculates the linear combination of inputs and weights. it is missing the bias term, fix it!
    h1_out = hidden_activation(h1_in)  # apply activation function

    h2_in = (
        np.dot(h1_out, w1) + b1
    )  # the output of the previous layer is the input for this layer. it is missing the bias term, fix it!
    h2_out = hidden_activation(h2_in)

    out_in = np.dot(h2_out, w2) + b2
    out = output_activation(out_in)
    print(out)

[230008.7]
[183615.4]
[232721.4]


<IPython.core.display.Javascript object>

**What price does the neural network predict for the cabin in question?**
- roughly 233000

**What type of a machine learning problem is this?**

- [ ] unsupervised learning
- [x] supervised learning
- [ ] reinforcement learning

**How can we make sure we are not overfitting the neural network to the data?**

- [ ] neural network will always overfit because there are too many parameters for a linear problem like this
- [ ] use the full set of cabin data as a training set, and a small subset of it as a testing set
- [x] use cross-validation

### Exercise 21: Neural Networks

### -- Advanced

We have trained a simple neural network with a larger set of cabin price data. The network predicts the price of the cabin based on the attributes of the cabin. The network consists of an input layer with five nodes, a hidden layer with two nodes, a second hidden layer with two nodes, and finally an output layer with a single node. In addition, there is a single bias node for each hidden layer and the output layer.

The program below uses the weights of this trained network to perform what is called a forward pass of the neural network. The forward pass is running the input variables through the neural network to obtain output, in this case the price of a cabin of given attributes.

The program is incomplete though. The program only does the forward pass up to the first hidden layer and is missing the second hidden layer and the output layer.

Modify the program to do a full forward pass and print out the price prediction. To do this, write out the remaining forward pass operations and use the ReLU activation function for the hidden nodes, and a linear (identity) activation for the output node. ReLU activation function returns either the input value of the function, or zero, whichever is the largest, and linear activation just returns the input as output. After these are done, get a prediction for the price of a cabin which is described by the following feature vector `[82, 2, 65, 3, 516]`.

In [55]:
import numpy as np

w0 = np.array(
    [
        [1.19627687e01, 2.60163283e-01],
        [4.48832507e-01, 4.00666119e-01],
        [-2.75768443e-01, 3.43724167e-01],
        [2.29138536e01, 3.91783025e-01],
        [-1.22397711e-02, -1.03029800e00],
    ]
)

w1 = np.array([[11.5631751, 11.87043684], [-0.85735419, 0.27114237]])

w2 = np.array([[11.04122165], [10.44637262]])

b0 = np.array([-4.21310294, -0.52664488])
b1 = np.array([-4.84067881, -4.53335139])
b2 = np.array([-7.52942418])

x = np.array(
    [
        [111, 13, 12, 1, 161],
        [125, 13, 66, 1, 468],
        [46, 6, 127, 2, 961],
        [80, 9, 80, 2, 816],
        [33, 10, 18, 2, 297],
        [85, 9, 111, 3, 601],
        [24, 10, 105, 2, 1072],
        [31, 4, 66, 1, 417],
        [56, 3, 60, 1, 36],
        [49, 3, 147, 2, 179],
    ]
)
y = np.array(
    [
        335800.0,
        379100.0,
        118950.0,
        247200.0,
        107950.0,
        266550.0,
        75850.0,
        93300.0,
        170650.0,
        149000.0,
    ]
)


def hidden_activation(z):
    # ReLU activation. fix this!
    return np.maximum(0, z)


def output_activation(z):
    # identity (linear) activation. fix this!
    return z


x_test = [[82, 2, 65, 3, 516]]
for item in x_test:
    h1_in = (
        np.dot(item, w0) + b0
    )  # this calculates the linear combination of inputs and weights
    h1_out = hidden_activation(h1_in)  # apply activation function

    # fill out the missing parts:
    # the output of the first hidden layer, h1_out, will need to go through
    # the second hidden layer with weights w1 and bias b1
    h2_in = np.dot(h1_out, w1) + b1
    h2_out = hidden_activation(h2_in)  # apply activation function

    # and finally to the output layer with weights w2 and bias b2.
    # remember correct activations: relu in the hidden layers and linear (identity) in the output
    out_in = np.dot(h2_out, w2) + b2
    out = output_activation(out_in)
    print(out)

[257136.4]


<IPython.core.display.Javascript object>

# Conclusion

## Your AI idea

The optional final task of this course consists of your own AI idea. We are not giving you a made-up problem to solve. Instead, we want to hear what kind of a problem you'd like to solve using AI – and how.

To make it easier for you, we’re proposing you structure the project description around a list of topics. Once you have written down a few thoughts about each of these topics, you already have enough material to submit your project! If you’re up to it, you can also expand this into a working demo or prototype with code and data.

### The topics we’ll ask you to elaborate are:

1. Your idea in a nutshell: Name your project and prepare to describe it briefly.

2. Background: What is the problem your idea will solve? How common or frequent is this problem? What is your personal motivation? Why is this topic important or interesting?

3. Data and AI techniques: What data sources does your project depend on? Almost all AI solutions depend on some data. The availability and quality of the data are essential. Which AI techniques do you think will be helpful? Depending on whether you've been doing the programming exercises or not, you may choose to include a concrete demo implemented by coding, using some actual data!

4. How is it used: What is the context in which your solution is used, and by whom? Who are the people affected by it? It’s important to appreciate the viewpoints of all those affected.

5. Challenges: What does your project not solve? It’s important to understand that any technological solution will have its limitations.

6. What next: How could your project grow and become something even more?

7. Acknowledgments: If you’re using open source code or documents in your project, make sure you give credit to the creators. Mention your sources of inspiration, too.