### **2. Debugging ML Code & Error Analysis (Interview Coding Questions with Solutions)**  

#### **Q1: The model is training, but accuracy is stuck at random (e.g., ~10% for 10-class classification). What could be wrong?**
##### **Problem Code:**
```python
import tensorflow as tf
import numpy as np

# Generate random data
train_data = np.random.rand(1000, 784)
train_labels = np.random.rand(1000, 10)  # Labels should be categorical integers

# Define a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile and train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=5)
```

##### **Issue:**
- `train_labels` should contain **integer class labels** (e.g., 0 to 9) for `sparse_categorical_crossentropy`, but it contains floating-point numbers.
- The model is treating the labels as regression targets instead of classification labels.

##### **Solution:**
Convert `train_labels` to integer class labels:
```python
train_labels = np.random.randint(0, 10, size=(1000,))  # Integer class labels
```
Now the model will correctly interpret labels and start training as expected.

---

#### **Q2: The model's loss is NaN during training. How would you debug it?**
##### **Problem Code:**
```python
# Define a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile with high learning rate
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=10.0), loss='categorical_crossentropy', metrics=['accuracy'])
```

##### **Issue:**
- The **learning rate is too high** (`10.0` instead of a typical range like `0.001`).
- This causes **exploding gradients**, leading to NaN values.
- High lr- exploding; low lr- vanishing

##### **Solution:**
Reduce the learning rate to a reasonable value:
```python
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
```

If the issue persists, check for NaNs in tensors:
```python
tf.debugging.check_numerics(model.trainable_variables[0], "Check NaN values in weights")
```

---

#### **Q3: The model is overfitting. What steps can you take to mitigate this?**
##### **Problem Code:**
```python
# Define a deep model without regularization
model = tf.keras.Sequential([
    tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile and train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_data, train_labels, epochs=50)  # Too many epochs, leading to overfitting
```

##### **Issue:**
- The model is **too deep** without **regularization**.
- **Too many epochs** cause overfitting to the training data.

##### **Solution:**
Introduce **dropout** and **L2 regularization**:
```python
from tensorflow.keras.regularizers import l2

model = tf.keras.Sequential([
    tf.keras.layers.Dense(512, activation='relu', kernel_regularizer=l2(0.01), input_shape=(784,)),
    tf.keras.layers.Dropout(0.5),  # Dropout to reduce overfitting
    tf.keras.layers.Dense(512, activation='relu', kernel_regularizer=l2(0.01)),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Reduce epochs to prevent excessive training
model.fit(train_data, train_labels, epochs=10)
```
These changes help **prevent overfitting** by reducing reliance on specific neurons and penalizing large weights.

---

#### **Q4: The model training is extremely slow. How do you speed it up?**
##### **Problem Code:**
```python
model = tf.keras.Sequential([
    tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Training on CPU with large dataset
model.fit(train_data, train_labels, epochs=10, batch_size=1)  # Batch size is too small!
```

##### **Issue:**
- **Very small batch size (batch_size=1)** slows down training because it's inefficient on GPUs.
- Training on CPU instead of GPU.

##### **Solution:**
1. Increase **batch size** to take advantage of parallel processing:
```python
model.fit(train_data, train_labels, epochs=10, batch_size=32)  # More efficient
```
2. Ensure you’re using a **GPU** if available:
```python
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
```
3. Use **prefetching** to speed up data loading:
```python
train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_labels))
train_dataset = train_dataset.shuffle(1000).batch(32).prefetch(tf.data.AUTOTUNE)
```

---

#### **Q5: The model's validation accuracy is much lower than training accuracy. What does this indicate?**
##### **Problem:**
- If training accuracy is **98%**, but validation accuracy is **65%**, the model is likely **overfitting**.

##### **Solution:**
- **Early stopping:** Stop training when validation loss stops improving.
```python
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3)
model.fit(train_data, train_labels, epochs=50, validation_data=(val_data, val_labels), callbacks=[callback])
```
- **Data Augmentation:** Helps generalize to unseen data.
```python
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, horizontal_flip=True)
```
- **Reduce model complexity:** Use a smaller architecture.

---



#### **Q6: You get a `ResourceExhaustedError` during training. What is happening?**
##### **Problem Code:**
```python
# Defining a very large model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(100000, activation='relu', input_shape=(784,)),  # Too many neurons
    tf.keras.layers.Dense(10, activation='softmax')
])
```

##### **Issue:**
- **Too many parameters** are being stored in memory, exhausting the GPU’s VRAM.

##### **Solution:**
- **Reduce model size:** Use fewer neurons or layers.
```python
model = tf.keras.Sequential([
    tf.keras.layers.Dense(512, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])
```
- **Reduce batch size:** A large batch size uses more memory.
```python
model.fit(train_data, train_labels, batch_size=16)  # Reduce batch size
```
- **Enable mixed precision training** for better memory efficiency:
```python
from tensorflow.keras.mixed_precision import set_global_policy
set_global_policy('mixed_float16')
```

---

### **Q1: Your model has high precision but low recall. What could be the issue?**
#### **Scenario:**
You are building a binary classifier for fraud detection. The model achieves **95% precision** but only **40% recall**.  

#### **Analysis:**
- **High precision, low recall** means that the model is being **too conservative**, predicting fraud only when it’s extremely confident.
- This results in **many false negatives** (fraud cases missed).  

#### **Possible Causes & Solutions:**
1. **Threshold Adjustment**  
   - The default decision threshold (0.5) might be too high. Lowering it can improve recall.  
   ```python
   y_pred_probs = model.predict(X_test)
   threshold = 0.3  # Reduce threshold to classify more cases as fraud
   y_pred = (y_pred_probs > threshold).astype(int)
   ```
   
2. **Class Imbalance**  
   - If the dataset is imbalanced (fraud cases are rare), the model may favor the majority class.
   - **Solution:** Use class weighting or oversampling.
   ```python
   from sklearn.utils.class_weight import compute_class_weight
   class_weights = compute_class_weight('balanced', classes=[0,1], y=y_train)
   model.fit(X_train, y_train, class_weight={0: class_weights[0], 1: class_weights[1]})
   ```
   
3. **Change Loss Function**  
   - Use **focal loss** instead of binary cross-entropy to focus on hard-to-classify samples.
   ```python
   import tensorflow_addons as tfa
   model.compile(loss=tfa.losses.SigmoidFocalCrossEntropy(), optimizer='adam', metrics=['accuracy'])
   ```

---

### **Q2: Your deep learning model is producing identical outputs for all inputs.**
#### **Scenario:**
You train a neural network, but when testing, it always outputs the **same predicted value** regardless of the input.

#### **Possible Causes & Fixes:**
1. **Vanishing Gradient (All Outputs Converge to Same Value)**
   - If all activations saturate, gradients become too small for learning.
   - **Solution:** Switch activation functions.
   ```python
   model.add(tf.keras.layers.Dense(128, activation='relu'))  # Replace sigmoid/tanh with ReLU
   ```

2. **High L2 Regularization (Weights Shrink Too Much)**
   - If L2 regularization is too strong, weights converge to near-zero.
   - **Solution:** Reduce regularization strength.
   ```python
   model.add(tf.keras.layers.Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.001)))
   ```

3. **Batch Normalization Issues**
   - If BatchNorm is misconfigured, it can cause the model to produce constant outputs.
   - **Solution:** Ensure BatchNorm is properly placed.
   ```python
   model.add(tf.keras.layers.BatchNormalization())
   ```

---

### **Q3: Your model is not improving beyond a certain accuracy after several epochs.**
#### **Scenario:**
You train a model for **image classification**, but its accuracy **plateaus at 70%**, despite additional training.

#### **Possible Causes & Fixes:**
1. **Learning Rate Too High or Too Low**
   - **Fix:** Use a learning rate scheduler to adjust it dynamically.
   ```python
   lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3)
   model.fit(X_train, y_train, validation_data=(X_val, y_val), callbacks=[lr_scheduler])
   ```

2. **Insufficient Model Capacity**
   - If the model is too simple, it may not have enough parameters to capture complexity.
   - **Fix:** Add more layers or units.
   ```python
   model.add(tf.keras.layers.Dense(512, activation='relu'))
   ```

3. **Data Augmentation**
   - More diverse training data can help the model generalize better.
   ```python
   datagen = tf.keras.preprocessing.image.ImageDataGenerator(rotation_range=20, zoom_range=0.2, horizontal_flip=True)
   datagen.fit(X_train)
   ```

---

### **Q4: Your model performs well on training but poorly on validation/test data (overfitting).**
#### **Scenario:**
Your model achieves **98% accuracy on training** but **60% on validation**.

#### **Possible Causes & Fixes:**
1. **Dropout**
   - Add dropout layers to prevent reliance on specific features.
   ```python
   model.add(tf.keras.layers.Dropout(0.5))
   ```

2. **Reduce Model Complexity**
   - A too-large model memorizes training data but fails to generalize.
   - **Fix:** Reduce the number of layers or neurons.
   ```python
   model = tf.keras.Sequential([
       tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
       tf.keras.layers.Dense(10, activation='softmax')
   ])
   ```

3. **Data Augmentation & Early Stopping**
   - Prevent overfitting by stopping training early.
   ```python
   early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
   ```

---


### **Q5: The model's predictions are biased towards a particular class.**
#### **Scenario:**
Your multi-class model **always predicts class 0**, even though multiple classes exist.

#### **Possible Causes & Fixes:**
1. **Class Imbalance**
   - Some classes might be underrepresented in the dataset.
   - **Fix:** Resample or use class weighting.
   ```python
   from imblearn.over_sampling import SMOTE
   smote = SMOTE()
   X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)
   ```

2. **Wrong Activation Function in Output Layer**
   - If the final layer uses `softmax`, but loss function is `binary_crossentropy`, the model behaves incorrectly.
   - **Fix:** Ensure correct configuration.
   ```python
   model.add(tf.keras.layers.Dense(10, activation='softmax'))
   model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
   ```

---

### **Q6: Model takes too long to train on large datasets.**
#### **Scenario:**
Your dataset has **millions of samples**, and training takes **hours per epoch**.

#### **Possible Causes & Fixes:**
1. **Use `tf.data` API for Efficient Loading**
   ```python
   train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
   train_dataset = train_dataset.batch(32).shuffle(1000).prefetch(tf.data.AUTOTUNE)
   ```

2. **Mixed Precision Training**
   - Speed up training using float16 precision.
   ```python
   from tensorflow.keras.mixed_precision import set_global_policy
   set_global_policy('mixed_float16')
   ```

3. **Use TensorFlow Lite for Deployment**
   ```python
   converter = tf.lite.TFLiteConverter.from_keras_model(model)
   tflite_model = converter.convert()
   ```

---

### **Q7: The model’s loss increases instead of decreasing during training.**
#### **Scenario:**
Instead of decreasing, your loss **keeps increasing** over epochs.

#### **Possible Causes & Fixes:**
1. **Too High Learning Rate**
   - Loss might diverge due to large weight updates.
   - **Fix:** Reduce learning rate.
   ```python
   optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
   ```

2. **Exploding Gradients**
   - **Fix:** Gradient clipping.
   ```python
   optimizer = tf.keras.optimizers.Adam(clipnorm=1.0)
   ```

3. **Incorrect Loss Function**
   - Example: Using `binary_crossentropy` for multi-class classification.
   - **Fix:** Ensure correct loss.
   ```python
   model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
   ```

---

Here are **5 intermediate-level debugging ML code & error analysis questions** with solutions:  

---

### **Q1: Your TensorFlow model trains successfully but fails during inference (runtime error).**  
#### **Scenario:**  
You trained a model successfully, but when running inference, it **throws a shape mismatch error**.  

#### **Possible Causes & Fixes:**  
1. **Different Input Shape During Training & Inference**  
   - **Fix:** Ensure input shapes match between training and inference.
   ```python
   print("Training Input Shape:", X_train.shape)
   print("Inference Input Shape:", X_test.shape)
   ```

2. **Batch Normalization Behavior Changes Between Training & Inference**  
   - **Fix:** Ensure `model.eval()` in PyTorch or `training=False` in Keras.
   ```python
   preds = model(X_test, training=False)
   ```

---

### **Q2: Model converges slowly or not at all.**  
#### **Scenario:**  
Your deep learning model **trains very slowly** or **fails to converge** after multiple epochs.  

#### **Possible Causes & Fixes:**  
1. **Learning Rate Too High or Too Low**  
   - **Fix:** Use **learning rate scheduling**.
   ```python
   lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
       initial_learning_rate=0.01,
       decay_steps=1000,
       decay_rate=0.9)
   optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
   ```

2. **Weight Initialization Issue**  
   - **Fix:** Use **He/Xavier initialization**.
   ```python
   model.add(tf.keras.layers.Dense(128, activation='relu', kernel_initializer='he_normal'))
   ```

3. **Gradient Vanishing in Deep Networks**  
   - **Fix:** Use **Batch Normalization**.*
   ```python
   model.add(tf.keras.layers.BatchNormalization())
   ```

---

### **Q3: Your model works well on training data but overfits on validation.**  
#### **Scenario:**  
Your **training accuracy is high**, but **validation accuracy is low**.  

#### **Possible Causes & Fixes:**  
1. **Lack of Regularization**  
   - **Fix:** Add **Dropout** or **L2 regularization**.
   ```python
   model.add(tf.keras.layers.Dropout(0.3))
   model.add(tf.keras.layers.Dense(128, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(0.01)))
   ```

2. **Data Augmentation Missing**  
   - **Fix:** Use `ImageDataGenerator` for image tasks.
   ```python
   from tensorflow.keras.preprocessing.image import ImageDataGenerator
   datagen = ImageDataGenerator(rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True)
   ```

3. **Reduce Model Complexity**  
   - **Fix:** Reduce number of layers or parameters.

---

### **Q4: Your multi-class classification model predicts only one class.**  
#### **Scenario:**  
Your model **always predicts the same class**, even though there are multiple classes.  

#### **Possible Causes & Fixes:**  
1. **Imbalanced Dataset**  
   - **Fix:** Use **class weights** to balance training.
   ```python
   from sklearn.utils.class_weight import compute_class_weight
   class_weights = compute_class_weight("balanced", classes=np.unique(y_train), y=y_train)
   model.fit(X_train, y_train, class_weight=dict(enumerate(class_weights)))
   ```

2. **Softmax Activation Issue**  
   - **Fix:** Ensure softmax is used for multi-class classification.
   ```python
   model.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
   ```

3. **Cross-Entropy Loss Issue**  
   - **Fix:** Use `sparse_categorical_crossentropy` if labels are integers.
   ```python
   model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
   ```

---

### **Q5: Your deep learning model runs out of memory (OOM error).**  
#### **Scenario:**  
When training a deep learning model, you **get an Out-Of-Memory (OOM) error**.  

#### **Possible Causes & Fixes:**  
1. **Batch Size Too Large**  
   - **Fix:** Reduce batch size.
   ```python
   model.fit(X_train, y_train, batch_size=16)
   ```

2. **Use Mixed Precision Training**  
   - **Fix:** Enable `float16` precision to reduce memory usage.
   ```python
   from tensorflow.keras.mixed_precision import experimental as mixed_precision
   policy = mixed_precision.Policy('mixed_float16')
   mixed_precision.set_policy(policy)
   ```

3. **Use Gradient Checkpointing** (for large Transformer models)  
   ```python
   model = tf.keras.models.Model(inputs, outputs)
   model = tf.recompute_grad(model)
   ```

---

### **Final Thoughts**  
These **intermediate debugging questions** test:  
✅ **Shape mismatches**  
✅ **Convergence issues**  
✅ **Overfitting solutions**  
✅ **Handling class imbalance**  
✅ **Memory optimization**  

Want more **company-specific ML debugging questions**? Let me know! 🚀