### **1. Understanding & Modifying Existing TensorFlow Code**  
*These questions test how well you can read, modify, and improve existing ML models.*

**Q1: Identify issues in the given TensorFlow model**  
_You are given a simple MLP model for binary classification in TensorFlow. Can you spot and fix any issues?_ 

```python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
```

- **How would you fix the loss function issue?**   
'mse' is used for linear regression tasks. 'Sigmoid' is usually used for binary classification, so correct loss function is binary_crossentropy
- **What happens if we don’t specify an `input_shape` in the first layer?**  
`Deferred execution` or `lazy building`. The model will dynamically infer the input shape from the data provided during training or inference.


### **2. Debugging ML Code Using Documentation & Error Analysis**

---

**Q2: Debugging a Training Pipeline**

**Error message:**
```
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type int).
```

#### **Follow-up:**  

**What is causing this error?**  
This error occurs when the data types of the input data (`X_train` or `y_train`) are not supported by TensorFlow. Specifically, TensorFlow expects the data to be of types like `float32`, `float64`, `int32`, or `int64`. The error suggests that the data contains an unsupported type, such as a Python `int` or another non-Numeric type that NumPy or TensorFlow cannot handle.

#### **How would you fix it?**  
To fix this error:
1. **Ensure that `X_train` and `y_train` are NumPy arrays or TensorFlow tensors.**  
   You can convert the data to NumPy arrays or TensorFlow tensors with the correct type.

2. **Check and convert the data types to supported formats:**

```python
import numpy as np

# Convert to NumPy arrays with the correct dtype if necessary
X_train = np.array(X_train, dtype=np.float32)
y_train = np.array(y_train, dtype=np.int64)  # For classification tasks
```

3. **Ensure that the labels (`y_train`) are integers** for classification tasks, or **floats for regression tasks.**

4. **Verify that no other data structures are passed** (such as lists or Python `int`) that are incompatible with TensorFlow.

#### **How does TensorFlow handle data type conversions?**  
TensorFlow automatically converts certain input types (like Python lists) into tensors with the appropriate data types. However, if it encounters a data type that doesn't fit within its expected range (e.g., unsupported objects), it throws a `ValueError`. TensorFlow can handle types like:
- NumPy arrays (`np.float32`, `np.int64`, etc.)
- Tensors (`tf.float32`, `tf.int64`, etc.)

TensorFlow will try to cast these to the proper type when required, but explicit type conversion (as shown above) is usually a good practice to prevent issues.

---

### **3. Evaluating ML Model Performance**

---

**Q3: How would you evaluate a ranking model for Etsy Ads?**

- **What metrics would you use?**  
  You would typically use **ranking-specific metrics** such as:
  - **NDCG (Normalized Discounted Cumulative Gain):** Measures the quality of the ranking by considering the position of relevant items.
  - **Precision@K (P@K):** Evaluates how many relevant items appear in the top `K` results.
  - **Click-Through Rate (CTR):** Measures the percentage of impressions that resulted in clicks.
  - **Mean Reciprocal Rank (MRR):** Evaluates the position of the first relevant item.

- **How would you handle class imbalance in evaluation?**  
  You could handle class imbalance in the following ways:
  - **Weighted metrics:** Use weighted versions of Precision, Recall, or NDCG that give more importance to underrepresented classes.
  - **Synthetic data generation:** Augment the data using techniques like SMOTE (Synthetic Minority Over-sampling Technique).
  - **Resampling:** Use oversampling or undersampling techniques to balance the dataset.

- **How do you know if a model is overfitting?**  
  You can detect overfitting by:
  - Monitoring the **training loss** and **validation loss** over epochs. If the training loss keeps decreasing while the validation loss increases, overfitting is likely.
  - Checking the **validation accuracy**: If it remains stagnant or decreases while training accuracy increases, overfitting is happening.
  - Use **cross-validation** to ensure consistent model performance.

---

**Q4: You trained a TensorFlow model, and its accuracy is high, but the business metric (e.g., conversions) is low. What could be wrong?**

- **How would you debug this issue?**  
  Possible causes include:
  1. **Mismatch between training accuracy and business goal:** The model might optimize for accuracy but not align with business metrics like conversions. You may need to focus on **precision, recall, or other business-relevant metrics**.
  2. **Data leakage or misalignment between training and test data**: Verify that the data is preprocessed correctly and that there is no leakage between the training and validation sets.
  3. **Class imbalance or skewed distribution**: If conversions are rare, the model might focus on predicting the majority class.
  
  **Solution:**  
  - Incorporate business metrics (e.g., **conversions**) directly into the loss function or as evaluation metrics.
  - Use **ROC-AUC, F1 score**, or **precision/recall curves** to better align model predictions with business objectives.

- **What additional evaluation strategies would you use?**  
  - **Monitor business KPIs directly**: Track the business impact (e.g., conversion rate, sales) as a key evaluation metric.
  - Use **ablation studies** to analyze how different model components impact conversions.
  - **Post-deployment analysis**: If possible, A/B test the model in production to directly observe business outcomes.

---

### **4. Scalability & Performance Considerations**

---

**Q5: Your TensorFlow model is slow during inference. How do you optimize it?**

- **How would you reduce latency in model serving?**
  - Use **TensorFlow Lite** for mobile or embedded devices.
  - Implement **model quantization** (e.g., `float32` to `int8`) to reduce the model size and speed up inference.
  - **Batch inference**: Process multiple inputs in one pass to optimize throughput.

- **What techniques can be used to optimize TensorFlow models for speed?**
  - **TensorRT**: Optimize the model for NVIDIA GPUs.
  - **Model pruning**: Remove unnecessary weights or neurons to reduce computation.
  - **Graph optimization**: Use `tf.function` for graph execution to enable optimizations like XLA (Accelerated Linear Algebra).
  - **Distributed inference**: Use multiple machines to handle high request volumes.

---

**Q6: How would you scale a recommendation system for millions of Etsy users and items?**

- **Would you precompute embeddings or compute them in real-time?**
  - For large-scale systems, **precomputing embeddings** for users and items is usually the better approach to reduce real-time latency. Store embeddings in a distributed database like Redis or Elasticsearch for quick retrieval.
  
- **How would you store and retrieve features efficiently?**
  - Use a **feature store** to store and retrieve features efficiently. For example:
    - **HDFS or cloud-based storage**: Store features in a distributed file system for easy access across multiple machines.
    - **NoSQL databases** (like MongoDB, Cassandra): These are good for fast lookups of user/item features in a recommendation system.
    - Use **caching** (e.g., Redis) to store frequently accessed features for real-time serving.

---

### **5. Handling Edge Cases & Model Robustness**

---

**Q7: Your ad ranking model gives high scores to irrelevant ads. What could be going wrong?**

- **How would you debug feature importance?**
  - Use **SHAP (SHapley Additive exPlanations)** or **LIME** to explain model predictions and understand which features are causing the model to give high scores to irrelevant ads.
  - Analyze the feature distribution for the irrelevant ads and check if there's any bias in the data.

- **Would you adjust the loss function or introduce new features?**
  - You could introduce new features such as:
    - **User behavioral data** (click history, session time).
    - **Item attributes** (e.g., category, price).
  - If the model doesn't distinguish between relevant and irrelevant ads, consider modifying the **loss function** to penalize irrelevant ads more heavily (e.g., using weighted cross-entropy).

---

**Q8: Your ML model performs well on training data but poorly on new listings (cold start problem). How do you handle this?**

- **What strategies would you use to rank new Etsy listings fairly?**
  - Use **content-based filtering** where you rank new listings based on their attributes (e.g., category, price, description).
  - Incorporate **metadata features** for new items and leverage **heuristics** such as ranking based on similar past listings.

- **Would you use zero-shot learning, transfer learning, or heuristics?**
  - **Zero-shot learning** can be effective in certain scenarios where you can use semantic embeddings (e.g., from a pre-trained transformer model) to rank new items without needing labeled training data.
  - **Transfer learning** using a pre-trained model can be fine-tuned on Etsy-specific data to handle new listings.
  - **Heuristics** based on item metadata could provide an initial ranking until more data is available for training.

---

**6. Working with TensorFlow and ML Libraries**

---

**Q9: What are the differences between `tf.data.Dataset` and using NumPy arrays for feeding data into TensorFlow models?**

- **When would you use `tf.data.Dataset`?**
  - Use `tf.data.Dataset` when you need efficient data loading, processing, and pipeline integration (e.g., batching, shuffling, prefetching).
  - It is especially useful for large datasets that cannot fit in memory or require transformations during training.
  
- **How can you optimize dataset loading for large-scale training?**
  - Use `tf.data` features like **batching**, **prefetching**, **shuffling**, and **parallel mapping** to optimize dataset loading for large-scale training.

```python
dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
dataset = dataset.shuffle(buffer_size=10000).batch(32).prefetch(tf.data.AUTOTUNE)
```

---

**Q10: Your training loss is NaN in TensorFlow. How do you debug it?**

- **What could be the possible causes?**
  - **Exploding gradients**: The gradients may become too large, causing the loss to become NaN.
  - **Incorrect data preprocessing**: Check if there are NaNs or infinities in the dataset.
  - **Incorrect loss function**: For example, using a regression loss function for a classification task.
  
- **How would you use TensorFlow debugging tools (`tf.debugging`) to investigate?**
  - Use `tf.debugging.check_numerics` to catch NaN or Inf values in the tensors.

```python
tf.debugging.check_numerics(loss, 'NaN or Inf detected')
```

---

Let me know if you'd like any clarification or further exploration of these solutions!