In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy

In [3]:
model =Sequential ([
    Dense(units =16, input_shape=(1,), activation= 'relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2,activation='softmax')
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [4]:
model.summary()

 model:

* `units`
* `input_shape`
* `activation='relu'` and `activation='softmax'`
* And how we decide the number of units per layer (e.g., 16, 32, 2)

---

## 🧠 Model Summary:

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])
```

---

### 🔹 `input_shape=(1,)`

* This tells the model what the **shape of each input sample** is.
* `input_shape=(1,)` means: each sample is a **single scalar value** (like a column vector).
* If you had, say, 5 features per sample, you'd use `input_shape=(5,)`.

---

### 🔹 `units=16` / `32` / `2`

* `units` defines how many **neurons** (also called nodes) are in that layer.
* Each neuron receives input from the previous layer and outputs one number.
* You can think of it as the **dimensionality** of the output of that layer.

#### So:

* `units=16`: this layer has 16 neurons.
* `units=32`: this layer has 32 neurons.
* `units=2`: final output layer has 2 neurons — useful for **binary classification** with **softmax**.

---

### 🔹 `activation='relu'` and `'softmax'`

* Activation functions introduce **non-linearity** into the model. Without them, the network would behave like a linear model, no matter how deep it is.

#### Common ones:

* **ReLU (Rectified Linear Unit)**: `relu(x) = max(0, x)`

  * Used in hidden layers
  * Efficient and helps with gradient flow
* **Softmax**: Converts output to **probabilities** that sum to 1.

  * Used in the final layer for **multi-class classification** (or binary classification with 2 output units)
  * Each value shows how likely the input is to belong to each class.

---

### 🧠 How to Decide the Units (16 → 32 → 2)

There’s **no single rule**, but here are some guidelines:

#### 🔹 Hidden Layers (16 and 32):

* These are called **dense (fully connected) hidden layers**.
* 16 → 32 is a common pattern of increasing capacity.
* You choose this based on:

  * Size/complexity of your input data
  * How nonlinear the mapping is
  * Trial and error + cross-validation
  * Rule of thumb: Start small and increase if underfitting

#### 🔹 Output Layer (2 with softmax):

* For **binary classification** using `softmax`, we use **2 output units**:

  * `[prob_class_0, prob_class_1]`
* For **multi-class (e.g., 3 classes)**, use 3 units.
* If you’re doing binary classification with a single neuron, you can use:

  ```python
  Dense(units=1, activation='sigmoid')
  ```

---

### ✅ Visual Summary:

```
Input (1 feature)
   ↓
Dense (16 units, ReLU)
   ↓
Dense (32 units, ReLU)
   ↓
Dense (2 units, Softmax) → Outputs: [P(class_0), P(class_1)]
```

---

### 🚀 TL;DR:

| Term          | Meaning                                                                               |
| ------------- | ------------------------------------------------------------------------------------- |
| `units`       | Number of neurons in the layer                                                        |
| `input_shape` | Shape of input sample (not batch!)                                                    |
| `relu`        | Activation function for hidden layers (fast, simple)                                  |
| `softmax`     | Turns outputs into probabilities (used in last layer for classification)              |
| `16 → 32 → 2` | Arbitrary starting choice based on intuition, trial-and-error, and problem complexity |

---



Understanding **features** and **samples** is fundamental in machine learning.

---

### What is a **Feature** and what is a **Sample**?

* A **feature** is an individual measurable property or characteristic of the data.
* A **sample** (also called an observation or data point) is **one instance** of your data that contains a set of features.

---

### 🔹 Example 1: **1 Feature per Sample**

Imagine you're predicting house prices based **only on the size of the house** (in square feet).

* Each house (sample) is described by **one feature**: size.
* So your dataset might look like this:

| House Size (sq ft) |
| ------------------ |
| 1200               |
| 1500               |
| 900                |
| 2000               |

Here:

* Each **row** is a **sample** (one house).
* Each **column** is a **feature** (house size).

**Input shape** in this case: `(1,)` — one feature.

---

### 🔹 Example 2: **5 Features per Sample**

Now imagine you want to predict house prices based on **5 features**:

1. Size (sq ft)
2. Number of bedrooms
3. Number of bathrooms
4. Distance to city center (miles)
5. Age of the house (years)

Your dataset might look like this:

| Size | Bedrooms | Bathrooms | Distance | Age |
| ---- | -------- | --------- | -------- | --- |
| 1200 | 3        | 2         | 5        | 10  |
| 1500 | 4        | 3         | 3        | 5   |
| 900  | 2        | 1         | 7        | 20  |
| 2000 | 5        | 4         | 2        | 2   |

Here:

* Each row is a **sample** (one house).
* Each row has **5 features** describing it.

**Input shape** in this case: `(5,)` — 5 features per sample.

---

### Summary:

| Term               | Meaning                        | Example                        |
| ------------------ | ------------------------------ | ------------------------------ |
| Sample             | One data point / observation   | One house                      |
| Feature            | One characteristic of a sample | House size, number of bedrooms |
| Input shape `(1,)` | Each sample has 1 feature      | `[1200]` (just size)           |
| Input shape `(5,)` | Each sample has 5 features     | `[1200, 3, 2, 5, 10]`          |

---
**imp**
If you’re feeding this data to a neural network, your input layer’s shape needs to match the number of features per sample.

---



Training ANN

In [6]:
model.compile(
    optimizer=Adam(learning_rate=0.0001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']   # metrics should be a list, even if just one metric
)


Your `model.compile` line is almost correct but there’s a small fix needed for the **metrics** argument.

---

### What you wrote:

```python
model.compile(
    optimizer=Adam(learning_rate=0.0001),
    loss='sparse_categorical_crossentropy',
    metrics='accuracy'
)
```

---

### ✅ Correct version:

```python
model.compile(
    optimizer=Adam(learning_rate=0.0001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']   # metrics should be a list, even if just one metric
)
```

---

### Explanation:

* **optimizer**: `Adam` is a popular, efficient optimizer.
* **learning\_rate**: 0.0001 is a small learning rate — good for fine tuning.
* **loss**: `'sparse_categorical_crossentropy'` is used for multi-class classification where labels are integers (not one-hot encoded).
* **metrics**: Should be a **list** of metrics, so wrap `'accuracy'` in square brackets `['accuracy']`.

---

