## Neural Networks
#### Understanding Neural Networks and Tensors
Neural networks are computational models inspired by the structure of the human brain. They consist of layers of interconnected units called neurons, organized into an input layer, one or more hidden layers, and an output layer. These networks are designed to learn patterns in data by adjusting the weights of connections between neurons through a process called training. The hidden layers are especially important, as they transform the input data using weighted sums and activation functions to capture complex relationships and features.

In deep learning, data is represented using tensors, which are multi-dimensional arrays. A tensor can take various forms: a scalar (0D), a vector (1D), a matrix (2D), or higher-dimensional arrays (3D and beyond). For instance, images are commonly represented as 3D tensors—height, width, and color channels (e.g., RGB). So, a 64x64 pixel image with 3 channels would be a tensor of shape [64, 64, 3]. Tensors are foundational to how data flows through a neural network and are essential to model computation.

![alt text](image1.png)

#### Forward and Backward Propagation
`Forward` and `backward` **propagation** are the core processes that enable a neural network to learn. In forward propagation, input data (represented as tensors) flows through the network layer by layer. Each neuron in a layer computes a weighted sum of its inputs, applies an activation function (like ReLU or sigmoid), and passes the result to the next layer. This continues until the network produces an output—such as a class label in image classification.

Once the output is compared to the true label using a loss function, the network uses backward propagation to update its internal parameters (weights and biases). Backpropagation works by computing gradients of the loss with respect to each weight using the chain rule of calculus. These gradients indicate how much each weight contributed to the error, allowing the model to adjust the weights in the direction that minimizes the loss. This optimization is typically done using gradient descent or a variant like Adam. Through many iterations of forward and backward passes, the network gradually improves its performance.

#### Tensor Operations in Neural Networks
Neural networks rely heavily on tensor operations to process and transform data. Common operations include element-wise addition and multiplication, matrix multiplication, and broadcasting, which allows operations on tensors of different shapes. For example, when data flows through a layer, it’s multiplied by a weight matrix and added to a bias vector—both of which are tensors. These operations are performed efficiently using libraries like TensorFlow or PyTorch, which are optimized for hardware acceleration (like GPUs). Understanding how these operations work at the tensor level is crucial to grasp how information propagates and is transformed in a model.

![alt text](image.png)

In [13]:
import pandas as pd

data = pd.read_csv('./vendors.csv')

data.head()
# data.info()



Unnamed: 0,variety,pricing,location,sales
0,strawberry,3.396088,mall,1
1,chocolate,2.823228,park,0
2,strawberry,1.585136,mall,0
3,strawberry,3.437355,park,0
4,chocolate,2.74419,beach,0


In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers

In [9]:
model = models.Sequential()

In [None]:
from tensorflow.keras.optimizers import Adam

In [14]:
from tensorflow.keras import models
from tensorflow.keras import layers



#sequential
model = models.Sequential()


In [15]:
#building the layers
data_encoded = pd.get_dummies(data[['variety', 'location']], drop_first=True) #one-hot encoding


data_encoded['location_mall'] = data_encoded['location_mall'].astype(int)
data_encoded['location_park'] = data_encoded['location_park'].astype(int)
data_encoded['variety_vanilla'] = data_encoded['variety_vanilla'].astype(int)
data_encoded['variety_strawberry'] = data_encoded['variety_strawberry'].astype(int)

X = pd.concat([data['pricing'], data_encoded], axis=1)
y = data['sales'].astype(int)

from sklearn.model_selection import train_test_split

X_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state= 42)


In [16]:

##building a neural networks
model.add(layers.Dense(18, activation='relu')) #hidden layer 1

model.add(layers.Dense(9, activation='relu'))  #hidden layer 2

model.add(layers.Dense(3, activation='relu'))  #hidden layer 3

#output layer
model.add(layers.Dense(1, activation='sigmoid'))


#compiling your model
model.compile(optimizer=Adam(learning_rate=0.01), metrics=['accuracy'], loss='binary_crossentropy')

model.fit(X_train, y_train, epochs=20, batch_size=200, validation_split=0.2)



Epoch 1/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 18ms/step - accuracy: 0.6383 - loss: 0.6807 - val_accuracy: 0.7986 - val_loss: 0.6226
Epoch 2/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7964 - loss: 0.6029 - val_accuracy: 0.7986 - val_loss: 0.5322
Epoch 3/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.7927 - loss: 0.5183 - val_accuracy: 0.8021 - val_loss: 0.4825
Epoch 4/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.8114 - loss: 0.4750 - val_accuracy: 0.8071 - val_loss: 0.4755
Epoch 5/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8062 - loss: 0.4752 - val_accuracy: 0.8186 - val_loss: 0.4665
Epoch 6/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.8144 - loss: 0.4681 - val_accuracy: 0.8214 - val_loss: 0.4579
Epoch 7/20
[1m28/28[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x20726ea8750>

### Deep Neural networks


![alt text](image2.png)

Objectives:
- Understanding Gradient Descent
- Activation functions at the hidden layers
- Loss functions at the output layers
- Tuning neural networks
    - Tuning with regularization
    - Tuning with normalization
- Model Interpretability
    - White-box model(decision tree)
    - Black-box model(neural networks)


### Activation Functions
These are gatekeepers at each neuron that make sure to add non-linearity to the network.

Here are examples of activation functions:
- `ReLu` - if the output is positive then its not changed, however, if its negative its replaced with zero(`0`)
- `sigmoid`- often used in binary classification(0-1)
- `TanH` - outputs values between -1 and 1

### Adjusting weights 
Mathematically:
new weight = old weight - learning rate * gradient

### Gradient Descent Algorithms
1. **Adam(Adaptive Movement Estimation)** - Adjusts dynamically based on the steepness of the slope, so big steps for steeper slopes and small steps for shallow slopes.
2. **SGD( Stochastic Gradient Descent)** - 

In [None]:
#

### Tuning a neural Network
- Scaling of your features(use standard scaler or  min max scaler)
- Regularization
    * **L1(Lasso)** - this one can adjust weights to exactly zero(feature selection.)
    * **L2(Ridge)** - this one brings the weights close to zero but never zero.
    * **Dropout** - during training this technique randomly 'turns off' a certain percentage of neurons in a layer. 

In [17]:
#

##building a neural networks
model.add(layers.Dense(18, activation='relu')) #hidden layer 1

model.add(layers.Dropout(0.3))  #drop 30% of the neurons 



model.add(layers.Dense(9, activation='relu'))  #hidden layer 2

model.add(layers.Dense(3, activation='relu'))  #hidden layer 3

#output layer
model.add(layers.Dense(1, activation='sigmoid'))


#compiling your model
model.compile(optimizer=Adam(learning_rate=0.01), metrics=['accuracy'], loss='binary_crossentropy')

model.fit(X_train, y_train, epochs=20, batch_size=200, validation_split=0.2)

Epoch 1/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 26ms/step - accuracy: 0.7284 - loss: 0.6263 - val_accuracy: 0.7400 - val_loss: 0.5738
Epoch 2/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.7390 - loss: 0.5756 - val_accuracy: 0.7400 - val_loss: 0.5731
Epoch 3/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.7371 - loss: 0.5768 - val_accuracy: 0.7400 - val_loss: 0.5728
Epoch 4/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.7354 - loss: 0.5774 - val_accuracy: 0.7400 - val_loss: 0.5674
Epoch 5/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.7305 - loss: 0.5742 - val_accuracy: 0.7400 - val_loss: 0.5439
Epoch 6/20
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.7408 - loss: 0.5414 - val_accuracy: 0.8021 - val_loss: 0.5102
Epoch 7/20
[1m28/28[0m [32m━━━━━━

<keras.src.callbacks.history.History at 0x20727f18750>

### Model Interpretability
- **White box models** - these are models where the decision-making within the model is transparent and easy to understand.

- **Black box models** - these are opaque and less intuitive in interpretation. Their internal workins are complex and not easy to understand. 



In [None]:
import pickle


with open('first_classification.pkl', 'wb') as f:
    pickle.dump(model, f)

Here is the link to the tom&jerry image dataset. [Click here](https://www.kaggle.com/datasets/balabaskar/tom-and-jerry-image-classification/data)