### Artificial Neural Networks (ANN)

An Artificial Neural Network (ANN) is a computational model inspired by the structure and functioning of the human brain. It utilizes interconnected layers of nodes, often called neurons, to process information and learn from data.

The process begins with the **input layer**, which receives the raw data. This information then flows through one or more **hidden layers**. In these layers, each neuron applies a set of **weights** to its inputs, sums these weighted inputs, and then passes the sum through an **activation function**. This transformation process continues, with the data propagating forward until it reaches the **output layer**. The output layer produces the final result, which could be a prediction, classification, or another type of output depending on the task.

The network "learns" through a process called **backpropagation**. This involves calculating the error (the difference between the predicted output and the actual target output) at the output layer. This error is then systematically propagated backward through the network, and the connection weights between neurons are adjusted to minimize this error. Complex ANNs are capable of recognizing intricate patterns and relationships within data that simpler models might not detect, making them powerful tools for a wide variety of prediction tasks.

#### Pros
* Exceptional ability to model complex non-linear relationships
* Capable of automatic feature extraction, reducing the need for manual feature engineering
* Highly adaptable to different types of problems and data structures
* Can achieve state-of-the-art performance on many challenging tasks
* Handles high-dimensional data efficiently
* Can be transfer-learned from pre-trained models to new domains

#### Cons
* Often requires large amounts of training data to perform well
* Training can be computationally intensive and time-consuming
* Prone to overfitting without proper regularization techniques
* Limited interpretability ("black box" nature) compared to simpler models
* Requires careful hyperparameter tuning and architecture design
* May struggle with small datasets relative to the number of parameters


#### Key Components & Concepts

##### Activation Functions

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns.

| Activation                      | Use Case                        | Formula/Description                          |
| :------------------------------ | :------------------------------ | :------------------------------------------- |
| **ReLU** (Rectified Linear Unit) | Default for hidden layers       | `$f(x) = \max(0, x)$`                         |
| **Sigmoid** | Binary classification output    | `$f(x) = \frac{1}{1 + e^{-x}}$`                |
| **Tanh** (Hyperbolic Tangent)   | Output between -1 and 1         | `$f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$` |
| **Softmax** | Multi-class classification output | Converts outputs to probabilities            |
| **Linear** | Regression output               | `$f(x) = x$`                                 |

<br>

##### Task-Specific Configurations

The choice of output activation and loss function depends on the specific task.

| Task                       | Output Activation | Loss Function                 |
| :------------------------- | :---------------- | :---------------------------- |
| Binary Classification      | `sigmoid`         | `binary_crossentropy`         |
| Multi-Class Classification | `softmax`         | `categorical_crossentropy`    |
| Regression                 | `linear`          | `mean_squared_error` or `mae` |

<br>

##### Data Preprocessing & Handling

Proper data preparation is crucial for training effective ANNs.

| Concept                    | Why It Matters                                         | Tools / Methods                     | Notes                                         |
| :------------------------- | :----------------------------------------------------- | :---------------------------------- | :---------------------------------------------- |
| **Normalization** | Speeds up convergence and avoids exploding gradients     | `StandardScaler`, `MinMaxScaler`    | Use before training                             |
| **One-hot Encoding** | Converts categorical data into numerical format        | `pd.get_dummies`, `OneHotEncoder`   | Needed for classification                     |
| **Train-Test Split** | Ensures model is evaluated fairly on unseen data         | `train_test_split()`                | Typical ratio: 80/20 or 70/30                   |
| **Shuffling** | Avoids learning sequence bias                          | `shuffle=True` in `fit()`           | Especially important for time-irrelevant data   |
| **Handling Imbalanced Data** | Prevents biased predictions towards the majority class | SMOTE, class weights, resampling    | Use for classification tasks with uneven classes|

<br>

##### Loss Functions in TensorFlow

Loss functions measure how well the model's predictions match the actual target values.

| Task                       | Loss Function              | TensorFlow Name                |
| :------------------------- | :------------------------- | :----------------------------- |
| Binary Classification      | Binary Crossentropy        | `'binary_crossentropy'`        |
| Multi-Class Classification | Categorical Crossentropy   | `'categorical_crossentropy'`   |
| Regression                 | Mean Squared Error (MSE)   | `'mean_squared_error'`         |
| Regression (Robust)        | Mean Absolute Error (MAE)  | `'mean_absolute_error'`        |

<br>

##### Optimizers

Optimizers are algorithms used to adjust the weights of the network to minimize the loss function.

| Optimizer   | Use Case                      | Notes                             |
| :---------- | :---------------------------- | :-------------------------------- |
| **SGD** | Simple tasks                  | May converge slowly               |
| **Adam** | Default choice                | Adaptive + fast convergence       |
| **RMSprop** | Time series / RNNs            | Deals well with sparse data       |
| **Adagrad** | Rare features (e.g., NLP)     | Decreases learning rate over time |

<br>

##### Tips and Tricks for Training ANNs

| Trick                        | Why It Helps                                                        |
| :--------------------------- | :------------------------------------------------------------------ |
| **Use normalization** | Stabilizes gradients and helps faster training                      |
| **Use dropout** | Prevents overfitting by randomly dropping neurons during training     |
| **Use early stopping** | Stops training when validation performance stalls, preventing overfitting |
| **Start small** | Use a small network first; scale up complexity as needed            |
| **Plot learning curves** | Understand training vs. validation dynamics (loss/accuracy over epochs) |
| **Try multiple activations** | `ReLU` works most of the time, but don’t be afraid to experiment    |
| **Tune with validation set** | Avoid overfitting to training data by using a separate validation set |
| **Use callbacks** | For checkpoints (saving model periodically) and early stopping in Keras |

<br>

#### General Workflow for Building an ANN

1.  **Preprocess data:**
    * Normalize numerical features.
    * Encode categorical features (e.g., one-hot encoding).
    * Handle missing values.
2.  **Split data:**
    * Divide the dataset into training and testing sets (e.g., 80% train, 20% test).
    * Optionally, create a validation set from the training data.
3.  **Build ANN Architecture:**
    * Define the layers of the network. A common starting point:
        * Input Layer (implicitly defined by the first `Dense` layer's `input_shape`)
        * One or more `Dense` hidden layers with `ReLU` activation.
        * Output Layer with an activation function appropriate for the task (e.g., `sigmoid` for binary classification, `softmax` for multi-class, `linear` for regression).
4.  **Compile the Model:**
    * Specify the **Optimizer**: `Adam` is often a good default.
    * Specify the **Loss Function**: Choose based on the task (e.g., `binary_crossentropy`, `categorical_crossentropy`, `mean_squared_error`).
    * Specify **Metrics**: What to track during training and evaluation (e.g., `accuracy` for classification, `mse` or `mae` for regression).
5.  **Fit Model (Train):**
    * Train the model on the training data using the `fit()` method.
    * Specify the number of `epochs` (passes through the entire training dataset) and `batch_size`.
    * Consider using `callbacks` like `EarlyStopping` and `ModelCheckpoint`.
    * Provide validation data to monitor performance on unseen data during training.
6.  **Evaluate Model:**
    * Assess the trained model's performance on the test set using the `evaluate()` method.
7.  **Tune Model (Iterate):**
    * If performance is not satisfactory, adjust:
        * Number of layers and neurons per layer.
        * Learning rate of the optimizer.
        * Dropout rates for regularization.
        * Activation functions.
        * Batch size and number of epochs.
    * Re-train and re-evaluate.
8.  **Save Model (Optional):**
    * Once satisfied with the model, save its architecture and weights for future use or deployment.

In [None]:
%pip install --quiet tensorflow pandas scikit-learn matplotlib

import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix


url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
column_names = ['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 
                'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'income']
df= pd.read_csv(url, names=column_names, sep=',\s', engine='python')
df

  df= pd.read_csv(url, names=column_names, sep=',\s', engine='python')

[notice] A new release of pip is available: 24.3.1 -> 25.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,<=50K
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,>50K
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,<=50K
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,<=50K


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       32561 non-null  object
 2   fnlwgt          32561 non-null  int64 
 3   education       32561 non-null  object
 4   education-num   32561 non-null  int64 
 5   marital-status  32561 non-null  object
 6   occupation      32561 non-null  object
 7   relationship    32561 non-null  object
 8   race            32561 non-null  object
 9   sex             32561 non-null  object
 10  capital-gain    32561 non-null  int64 
 11  capital-loss    32561 non-null  int64 
 12  hours-per-week  32561 non-null  int64 
 13  native-country  32561 non-null  object
 14  income          32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


In [18]:
df = pd.get_dummies(df, drop_first=True)
df.head()
# One-hot encode categorical columns

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week,workclass_Federal-gov,workclass_Local-gov,workclass_Never-worked,workclass_Private,...,native-country_Puerto-Rico,native-country_Scotland,native-country_South,native-country_Taiwan,native-country_Thailand,native-country_Trinadad&Tobago,native-country_United-States,native-country_Vietnam,native-country_Yugoslavia,income_>50K
0,39,77516,13,2174,0,40,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
1,50,83311,13,0,0,13,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
2,38,215646,9,0,0,40,False,False,False,True,...,False,False,False,False,False,False,True,False,False,False
3,53,234721,7,0,0,40,False,False,False,True,...,False,False,False,False,False,False,True,False,False,False
4,28,338409,13,0,0,40,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False


In [19]:
# Separate features and labels
X = df.drop('income_>50K', axis=1)
y = df['income_>50K']

# Scale the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pd.DataFrame(X_scaled, columns=X.columns)

Unnamed: 0,age,fnlwgt,education-num,capital-gain,capital-loss,hours-per-week,workclass_Federal-gov,workclass_Local-gov,workclass_Never-worked,workclass_Private,...,native-country_Portugal,native-country_Puerto-Rico,native-country_Scotland,native-country_South,native-country_Taiwan,native-country_Thailand,native-country_Trinadad&Tobago,native-country_United-States,native-country_Vietnam,native-country_Yugoslavia
0,0.030671,-1.063611,1.134739,0.148453,-0.21666,-0.035429,-0.174295,-0.262097,-0.014664,-1.516792,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
1,0.837109,-1.008707,1.134739,-0.145920,-0.21666,-2.222153,-0.174295,-0.262097,-0.014664,-1.516792,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
2,-0.042642,0.245079,-0.420060,-0.145920,-0.21666,-0.035429,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
3,1.057047,0.425801,-1.197459,-0.145920,-0.21666,-0.035429,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
4,-0.775768,1.408176,1.134739,-0.145920,-0.21666,-0.035429,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,-2.932948,-0.045408,-0.022173
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32556,-0.849080,0.639741,0.746039,-0.145920,-0.21666,-0.197409,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
32557,0.103983,-0.335433,-0.420060,-0.145920,-0.21666,-0.035429,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
32558,1.423610,-0.358777,-0.420060,-0.145920,-0.21666,-0.035429,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173
32559,-1.215643,0.110960,-0.420060,-0.145920,-0.21666,-1.655225,-0.174295,-0.262097,-0.014664,0.659286,...,-0.033729,-0.059274,-0.019201,-0.049628,-0.039607,-0.023518,-0.024163,0.340954,-0.045408,-0.022173


In [20]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(X_train.shape[1],)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model with early stopping
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history = model.fit(X_train, y_train, validation_split=0.2, epochs=50, batch_size=64, callbacks=[early_stop])



Epoch 1/50
[1m326/326[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.7483 - loss: 0.4882 - val_accuracy: 0.8422 - val_loss: 0.3414
Epoch 2/50
[1m326/326[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8333 - loss: 0.3522 - val_accuracy: 0.8488 - val_loss: 0.3260
Epoch 3/50
[1m326/326[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8469 - loss: 0.3285 - val_accuracy: 0.8549 - val_loss: 0.3171
Epoch 4/50
[1m326/326[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8469 - loss: 0.3260 - val_accuracy: 0.8553 - val_loss: 0.3159
Epoch 5/50
[1m326/326[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8468 - loss: 0.3237 - val_accuracy: 0.8516 - val_loss: 0.3139
Epoch 6/50
[1m326/326[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8562 - loss: 0.3059 - val_accuracy: 0.8532 - val_loss: 0.3110
Epoch 7/50
[1m326/326[0m 

In [None]:
model.summary()


In [22]:
# Predict on test set
y_pred = (model.predict(X_test) > 0.5).astype('int32')

# Classification report
print(classification_report(y_test, y_pred))

# Confusion matrix
print(confusion_matrix(y_test, y_pred))




[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
              precision    recall  f1-score   support

       False       0.89      0.93      0.91      4942
        True       0.75      0.63      0.69      1571

    accuracy                           0.86      6513
   macro avg       0.82      0.78      0.80      6513
weighted avg       0.86      0.86      0.86      6513

[[4611  331]
 [ 579  992]]
