# Intro to Deep learning and Tensorflow

- Deep learning: Artificial neural network with more than one hidden layer. 
- As you stack layers, they can extract high levels features: Board, shapes, forms.. 

## Tensorflow

- It is not specifically for neural networks - It´s more generally an architecture for executing a graph of numerical operations.
- Tensorflow can optimize the processing of the graph, and distribute its processing accross a network.
- It can also distribue work across GPU´s: Can handle massive scale.
- Runs on about anything
- Highly efficient C++ code with easy to use Python API´s
- Tensor is just a fancy name for an array or matrix of values

# Building Deep Neural Netowkrs with Keras, Normalization, and One-Hot Encoding

## Creating a neural network with tensorflow

- Mathematical Insights:
1. All those interconnected arrows multiplying weights can be thought of as a big matrix multiplication.
2. The bias term can just be added onto the result of that matrix multiplication. 

- So in Tensorflow, we can define a layer of a neural network as: output = td.matmul(previous_layer, layer_weights) + layer_biases
- By using Tensorflow directly we are kinda doing this the "hard way".

## Keras to the rescue

- Easy and fast prototyping 
1. Higher-level API for tensorflow
2. Made for building deep neural nets
4. scikit_learn integration
5. Less to think about - which often yields better results without even trying.
6. This is really important ! The faster you can experiment, the better your results. 

## Make sure your features are normalizes

- Neural network usually work best if your input data is normalized.
1. That is, 0 mean and unit variance
2. The real goal is that every input feature is comparable in terms of magnitude

- scikit_learn's StandardScaler can do this for you
- Many data sets are normalized to begin with

In [None]:
model = Sequential() # We can build the model one layer at the time
model.add(Dense(64, activation ="relu", input_dim =20))
model.add(Dense(64, activation ="relu"))
model.add(Dense(10, activation ="softmax")) # Output layer
sgd = SGD(lr = 0.01, decay = 1e-6, momentum = 0.9, nesterov = True)
model.compile(loss = 'categorical_crossentropy', optimizer = sgd, metrics = ['accuracy'])



: 

# ReLu Activation, and Precenting Overfitting with Dropout Regularzation

## Activation Function

- Step functions don´t work with gradient descent - there is no gradient!
1. Mathematically, they have no useful derivative.

- Alternatives:
1. Logistic(Sigmoid) function
2. Hyperbolic tangent function
3. Exponentional linear unit (ELU)
4. ReLu function(Rectified Linear Unit)

- ReLu is common. Fast to compute and works well.
1. Also: "Leaky ReLu", "Noisy ReLU"
2. ELU can sometimes lead to faster learning though

## Avoid Overfitting with regularization

1. With thousands of weights to tune, overfitting is a problem
2. Early stopping (when performance starts dropping)
3. Regularization terms added to cost function during training
4. Dropout - ignore say 50% of all neurons randomly at each training step
- Woks surprisingly well
- Forces your model to spread out its learning 

# Max Pooling

Max pooling is a down-sampling technique commonly used in convolutional neural networks (CNNs) for image processing and other applications. Its primary purpose is to reduce the spatial dimensions of the input feature maps while retaining the most significant features.

- How Max Pooling Works:
Input Feature Map: Max pooling operates on a feature map generated by a convolutional layer. This feature map contains activations that represent different features detected in the input.

- Pooling Window: A pooling window (or filter) is defined, typically of size 2x2 or 3x3. This window slides over the input feature map.

- Stride: The stride determines how far the window moves each time. A common stride value is 2, meaning the window shifts two pixels at a time.

- Operation: For each position of the pooling window, max pooling takes the maximum value within that window and outputs it to the new, down-sampled feature map.

- Output Feature Map: The result is a smaller feature map, which retains the most significant activations from the original. This helps reduce computational load and prevents overfitting.

- Benefits of Max Pooling:
Dimensionality Reduction: Reduces the number of parameters and computations in the network, making it more efficient.
Feature Retention: By selecting the maximum values, max pooling preserves the most prominent features while discarding less significant details.
Translation Invariance: Helps the model become more robust to small translations in the input, as the exact position of the features becomes less important.

## Image value definition

The value of each pixel in an image is defined based on the type of image and the color model being used. Here’s a breakdown of how pixel values are determined:

1. Color Models
Grayscale Images:
-Each pixel value represents a shade of gray, typically ranging from 0 (black) to 255 (white) in an 8-bit image. The value indicates the intensity of light; higher values correspond to lighter shades.

RGB Images:
In an RGB (Red, Green, Blue) image, each pixel is defined by three values, one for each color channel:
1. Red Channel: Intensity of red (0-255)
2. Green Channel: Intensity of green (0-255)
3. Blue Channel: Intensity of blue (0-255)
The combination of these three channels produces the final color of the pixel.
Other Color Models:

Other models, like CMYK (Cyan, Magenta, Yellow, Key/Black) for printing, or HSV (Hue, Saturation, Value), have different ways to define color, but the principle is similar.