
# <u>Intro to Neural Networks</u>



<img src="https://static1.squarespace.com/static/5800c6211b631b49b4d63657/t/5a6caf4753450a17187dd1d3/1517075821859/fullyconnected_525.gif" alt="gif" title="gif"/>
<img src="https://cdn-images-1.medium.com/max/1600/1*V1mNUbnpA7thNIUCHJNpuA.png" alt="abstraction" title="abstraction"/>


___

## <u>Neuron</u>:
### The basic unit of computation in a neural network is the neuron, often called a node or unit. It receives input from some other nodes, or from an external source and computes an output. Each input has an associated weight (w), which is assigned on the basis of its relative importance to other inputs. The node applies a function f (defined below) to the weighted sum of its inputs as shown in the Figure below:





<img src="https://ujwlkarn.files.wordpress.com/2016/08/screen-shot-2016-08-09-at-3-42-21-am.png?w=568&h=303" alt="Image of Neuron" title="An Image of a Neuron"/>



> ### **Operations at one neuron of a neural network**:

<img src="https://cdn-images-1.medium.com/max/1600/1*H2GFATdntBwfKcR4Kuc5mw.png" alt="op" title="ops"/>


> ### **Connections:**
> ### Information flows through the network across connections. A connection always has a weight value associated with it. Goal of the training is to update this weight value to decrease the loss(error). 

<img src="https://cdn-images-1.medium.com/max/1200/1*5zQUUvTbSTJyGQH_qdB2hA.png" alt="conn" title="conns"/>



> ### <u>Weights:</u>
> ### A weight represents the strength of the connection between units. If the weight from node 1 to node 2 has greater magnitude, it means that neuron 1 has greater influence over neuron 2. A weight brings down the importance of the input value. Weights near zero means changing this input will not change the output. Negative weights mean increasing this input will decrease the output. A weight decides how much influence the input will have on the output.
> ### <u>Bias(Offset):</u>
> ### A bias unit is an "extra" neuron added to each pre-output layer that stores the value of a constant (typically 1 or -1). Bias units aren't connected to any previous layer and in this sense don't represent a true "activity".

<img src="https://cdn-images-1.medium.com/max/1200/1*0NKtEk20-qnaLkwOa8DlnA.png" alt="conn" title="conns"/>




___

## <u>Activation Function</u>: 
### An activation function decides, whether a neuron should be activated or not by calculating weighted sum and further adding bias with it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

<img src="https://qph.fs.quoracdn.net/main-qimg-d131b1b1ffb1ae9d842e135f05635f1c" alt="act" title="acts" height="899.2" width="1199.2"/>

___
## <u>Layers</u>: 
### A node layer is a row of neurons that turn on or off as the input is fed through the net. Each layer’s output is simultaneously the subsequent layer’s input, starting from an initial input layer receiving your data.


> ### <u>Input Layer</u>: First layer - this is where information from the outside world enters the neural network. Nodes in this layer pass information to the hidden layer.
> ### <u>Hidden Layer</u>: Nodes in the hidden layer have no direct connection to the outside world - they perform computations and transfer information from the input nodes to the output nodes.
> ### <u>Output Layer</u>: This is the layer where information exits the neural network (ideally structured that it is interpretable by humans)


<img src="https://www.jeremyjordan.me/content/images/2017/07/Screen-Shot-2017-07-26-at-1.44.58-PM.png" alt="A diagram of a Neural Network" title="Neural Network Diagram" height="408.24" width="715.05"  />




___

## <u>Perceptron</u>: 
### The Perceptron is the precursor to modern neural networks. The Perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt, funded by the United States Office of Naval Research. 
> ### In a 1958 press conference organized by the US Navy, Rosenblatt made statements about the perceptron that caused a heated controversy among the fledgling AI community; based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence."

### Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had far greater processing power than perceptrons with one layer

### **Although the perceptron was invented decades ago, researchers lacked the computational power to make effective use of it. This algorithm only proved useful after rapid advances in computer hardware.**

![Figure 2.1](http://www.ryanleeallred.com/wp-content/uploads/2019/04/Screen-Shot-2019-04-01-at-2.34.58-AM.png)

____

# <u>Performing Logic Operations with a Single Perceptron</u>: 

<img src="http://slideplayer.com/slide/5070403/16/images/1/Logic%3A+Connectives+AND+OR+NOT+P+Q+%28P+%5E+Q%29+T+F+P+Q+%28P+v+Q%29+T+F+P+~P+T+F.jpg" alt="logcon" title="logconns"/>

## Propositional Calculus
>  Propositional calculus is a branch of logic. It is also called propositional logic, statement logic, sentential calculus, sentential logic, or sometimes zeroth-order logic. It deals with propositions (which can be true or false) and argument flow. 
 Compound propositions are formed by connecting propositions by logical connectives.
 Logical connectives are found in natural languages. In English for example, some examples are "and" (conjunction), "or" (disjunction), "not” (negation) and "if" (but only when used to denote material conditional).

**The following is an example of a very simple inference within the scope of propositional logic:**

>* #### Premise 1: If it's raining then it's cloudy.
* ####  Premise 2: It's raining.
* #### Conclusion: It's cloudy.

**Both premises and the conclusion are propositions. The premises are taken for granted and then with the application of modus ponens (an inference rule) the conclusion follows.**

>As propositional logic is not concerned with the structure of propositions beyond the point where they can't be decomposed any more by logical connectives, this inference can be restated replacing those atomic statements with statement letters, which are interpreted as variables representing statements:

>* #### Premise 1: ${\displaystyle P\to Q}$ 
* #### Premise 2: ${\displaystyle P} $
* #### Conclusion: ${\displaystyle Q} $

**The same can be stated succinctly in the following way:**

> ${\displaystyle P\to Q,P\vdash Q}$ 

**When P is interpreted as “It's raining” and Q as “it's cloudy” the above symbolic expressions can be seen to exactly correspond with the original expression in natural language. Not only that, but they will also correspond with any other inference of this form, which will be valid on the same basis that this inference is.**

>#### Propositional logic may be studied through a formal system in which formulas of a formal language may be interpreted to represent propositions. A system of inference rules and axioms allows certain formulas to be derived. These derived formulas are called theorems and may be interpreted to be true propositions. A constructed sequence of such formulas is known as a derivation or proof and the last formula of the sequence is the theorem. The derivation may be interpreted as proof of the proposition represented by the theorem.


___

<img src="https://cdn-images-1.medium.com/max/800/1*zq3cbyx-xd_SRq8EwzER0w.jpeg" alt="logic" title="loggates"/>

>#### Logic gates are the basic building blocks of any digital system. It is an electronic circuit having one or more than one input and only one output. The relationship between the input and the output is based on a certain logic. Based on this, logic gates are named as AND gate, OR gate, NOT gate etc.

_____

## The NAND gate (NOT-AND)
>#### The NAND or “Not AND” function is a combination of the two separate logical functions, the AND function and the NOT function in series.

<img src="https://bjc.edc.org/bjc-r/img/6-computers/LogicGates_img/TruthTables/NAND_TruthTableTF.png" alt="NAND" title="NANDG"/>

**The Logic NAND Function only produces an output when “ANY” of its inputs are not present and in Boolean Algebra terms the output will be TRUE only when any of its inputs are FALSE.**
____

## Creating a NAND gate with a single perceptron

In [1]:
import numpy as np

#### Establish training examples

In [2]:
#Random State for reproducibility

np.random.seed(43)


#The first two columns represent T/F, the third column is a constant bias
inputs = np.array([[0,0,1],
                   [0,1,1],
                   [1,0,1],
                   [1,1,1]])

#Correct Labels
correct_outputs = [[1],
                  [1],
                  [1],
                  [0]]

# Initialize weights, all in the range (-1, 1)
weights = 2 * np.random.random((3,1)) - 1

#### Creating an activation function

In [3]:
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
  return sigmoid(x) * (1 - sigmoid(x))

<img src="https://i.stack.imgur.com/inMoa.png" alt="sigmoid" title="sig"/>

In [4]:
for iteration in range(10000):
  
  # Weighted sum of inputs and weights
  weighted_sum = np.dot(inputs, weights)
  
  # Activate with sigmoid function
  activated_output = sigmoid(weighted_sum)
  
  # Calculate Error
  error = correct_outputs - activated_output
  
  # Calculate weight adjustments with sigmoid_derivative
  adjustments = error * sigmoid_derivative(activated_output)
  
  # Update weights
  weights += np.dot(inputs.T, adjustments)
  
print('optimized weights after training: ')
print(weights)

print("Output After Training:")
print(activated_output)

optimized weights after training: 
[[-11.83921936]
 [-11.83921936]
 [ 17.80769383]]
Output After Training:
[[0.99999998]
 [0.99744813]
 [0.99744813]
 [0.00281312]]


## The model converged on the correct values: 


| Output | Actual |
|----|----|
| 0.99  | 1 |
| 0.99  | 1  |
| 0.99  | 1  |
| 0.00  | 0  |

____
# Classification of Diabetic Patients with a Perceptron



In [5]:
import pandas as pd
import numpy as np

In [6]:
df = pd.read_csv('https://raw.githubusercontent.com/ryanleeallred/datasets/master/diabetes.csv')

## Context

#### **This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.**

>#### The dataset consists of several medical predictor variables and one target variable, **Outcome**. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.

In [7]:
df.head(10)

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
5,5,116,74,0,0,25.6,0.201,30,0
6,3,78,50,32,88,31.0,0.248,26,1
7,10,115,0,0,0,35.3,0.134,29,0
8,2,197,70,45,543,30.5,0.158,53,1
9,8,125,96,0,0,0.0,0.232,54,1


In [8]:
df.shape

(768, 9)

#### There are 768 patient records


## Creating a Perceptron Model

In [9]:

class Perceptron(object):

    def __init__(self, no_of_inputs, epochs=10000, learning_rate=0.01):
        self.epochs = epochs
        self.learning_rate = learning_rate
        self.weights = np.zeros(no_of_inputs + 1)
           
    def predict(self, inputs):
        summation = np.dot(inputs, self.weights[1:]) + self.weights[0]
        if summation > 0:
          activation = 1
        else:
          activation = 0            
        return activation

    def train(self, training_inputs, labels):
        for _ in range(self.epochs):
            for inputs, label in zip(training_inputs, labels):
                prediction = self.predict(inputs)
                self.weights[1:] += self.learning_rate * (label - prediction) * inputs
                self.weights[0] += self.learning_rate * (label - prediction)

### Creating a test for the model's predictions

In [10]:
from sklearn.model_selection import train_test_split
X = df.iloc[:,0:8].values
y = df.iloc[:,8:9].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

### Training the network

In [11]:
classifier = Perceptron(8)

In [12]:
classifier.train(X_train,y_train)

### Generating predictions

In [13]:
preds = []
for i in range(len(X_test)):
    preds.append(classifier.predict(X_test[i]))

### The network classified 77% of patients correctly

In [14]:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(preds, y_test)
print(accuracy)

0.7716535433070866
