In [1]:
import sys
assert sys.version_info >= (3,7), "This script requires at least Python 3"

import sklearn
assert sklearn.__version__ >= "0.20", "This script requires sklearn 0.20 or above"

# What is a model anyways?
> A machine learning model is a computer program that recognises patterns of data or makes predictions.
> It is created from machine learning algorithms which undergo a training process using either labeled, unlabeled, or mixed data. 

##### Neural Models
> A neural network is a type of machine learning model that is inspired by the human brain. It consists of interconnected layers of nodes, or “neurons”, each of which takes in input, processes it, and passes it on to the next layer. The network learns from data by adjusting the weights and biases of these connections based on the error of its predictions, a process known as backpropagation. Neural networks are particularly good at tasks that involve recognizing patterns or making predictions from complex, high-dimensional data. They’re used in a wide range of applications, from image and speech recognition to natural language processing and autonomous driving.

* The ***Perceptron***
> The perceptron is a simple way into networks and the base line artificial networks.
So the way they work is simple.
Several binary inputs go in and are all processed into a single output, also a binary one.
The way the algorithm works is simple >> Each value holds a ***Weight***, which are real numbers used to express the importance of the respective inputs.
It then has a threshold value, which is also a real number and a parameter of the neuron.

output = 0 if sum(w, x) <= threshold
output = 1 if sum(w, x) > threshold

* A way you can think about the perceptron is that it's a device that makes decisions by weighing up evidence.

Example: 
> It's not a very realistic example, but it's easy to understand, and we'll soon get to more realistic examples. Suppose the weekend is coming up, and you've heard that there's going to be a cheese festival in your city. You like cheese, and are trying to decide whether or not to go to the festival. You might make your decision by weighing up three factors:

> Is the weather good?
Does your boyfriend or girlfriend want to accompany you?
Is the festival near public transit? (You don't own a car).
We can represent these three factors by corresponding binary variables x1,x2 and x3.
>For instance, we'd have x1=1
 if the weather is good, and x1=0
 if the weather is bad. Similarly, x2=1
 if your boyfriend or girlfriend wants to go, and x2=0
 if not. And similarly again for x3
 and public transit.
Now, suppose you absolutely adore cheese, so much so that you're happy to go to the festival even if your boyfriend or girlfriend is uninterested and the festival is hard to get to. But perhaps you really loathe bad weather, and there's no way you'd go to the festival if the weather is bad. You can use perceptrons to model this kind of decision-making. One way to do this is to choose a weight w1=6
 for the weather, and w2=2
 and w3=2
 for the other conditions. The larger value of w1
 indicates that the weather matters a lot to you, much more than whether your boyfriend or girlfriend joins you, or the nearness of public transit. Finally, suppose you choose a threshold of 5
 for the perceptron. With these choices, the perceptron implements the desired decision-making model, outputting 1
 whenever the weather is good, and 0
 whenever the weather is bad. It makes no difference to the output whether your boyfriend or girlfriend wants to go, or whether public transit is nearby.
 

* We can simplify how we make perceptrons even more.

the sum(w, k) > threshold >> is not it.

instead of it being w + k it is w * k, where w and k are vectors representing weights and inputs.

Second change is adding the bias, which is just the absolute value of the threshold.
The formula would be:

output = 0 if w * x + b <= 0
output = 1 if w * x + b > 0

The bias increases the chance of the perceptron to be 1, which would represent how highly a value would be rated in a equation

#### NaNd gates
* **An NaNd gate is a logic gate which produces an output which is false only if all inputs are true**

What that means that in the formula:
w * x + b <= 0 / > 0
if we have a weight of -2 and a bias of 3 and we take 3 values >> 00, 01, 11
0 * (-2) + 0 * (-2) + 3 = 3 -- positive -> output is 1
0 * (-2) + 1 * (-2) + 3 = 1 -- positive -> output is 1
1 * (-2) + 1 * (-2) + 3 = -1 -- negative -> output is 0
**The third value 11, is technically all true as both values are above the 0 threshold, but when inserted into a weight that is -2 with a bias of 3 they always turn out negative**
* ***That is a NaNd gate***


When working with multiple pereptrons it is possible for one perceptron to act as the input to multiple others. We can just put multiple values to that as normal, but another way is merging the values and adding the weights together.

# SIGMOID NEURONS

When using perceptrons it will seem apparent that any change to the bias or threshold will cause a drastic change in all outputs...

***Well that's when sigmoid neurons come in.***

* Sigmoid neurons are similar to perceptrons, only that small changes to weights and biases cause small changes to outputs

We will depict Sigmoid neurons in the same way as perceptrons:
3 inputs.  They also have biases and weights.
The difference here is that sigmoid neurons can take on values between 0 and 1.

The formula is more complex:
1 / (1 + exp(- sum(x * w - b)))

They seem different but most cases, this formula helps:
z === w * x + b -> if z is a large value, the sigmoid function is approx 1
if z is a large negative value, the sigmoid func is approx 0

When it is a modest size there is a noticeable difference from the perceptrons.