
# Machine Learning Interview Practice Notebook


## Supervised Learning

**Theory Questions**

1. Explain the difference between Linear Regression and Logistic Regression. What types of problems are they each suitable for?
2. What are the assumptions made by linear regression models?
3. What are common methods to evaluate regression and classification models? Describe each briefly.
4. Explain bias-variance tradeoff and how it affects model performance.

**Programming Task: Linear Regression**

Implement a linear regression model to predict house prices. Use a dataset of your choice (such as Boston Housing, if available), 
and ensure to include model evaluation metrics (e.g., Mean Squared Error and R-squared).

**Programming Task: Classification with Logistic Regression**

Implement a logistic regression model using Scikit-learn to classify digits from the MNIST dataset. Evaluate the model using accuracy, 
precision, recall, and F1-score.


In [None]:
# Code for Linear Regression Model

# Your code here


## Unsupervised Learning

**Theory Questions**

1. Describe the K-means clustering algorithm and its main limitations.
2. What is the role of distance metrics in clustering algorithms?
3. Explain Principal Component Analysis (PCA) and its use cases in machine learning.
4. What are some ways to determine the optimal number of clusters?

**Programming Task: K-means Clustering**

Use K-means clustering on the Iris dataset and visualize the clusters. Try to determine the optimal number of clusters using 
the Elbow method.

**Programming Task: PCA**

Perform PCA on the Iris dataset and plot the first two principal components. Explain the variance captured by these components.


In [None]:
# Code for K-means Clustering

# Your code here


## Neural Networks

**Theory Questions**

1. Explain the structure of a feed-forward neural network. How does backpropagation work in this network?
2. What is the difference between activation functions such as ReLU, Sigmoid, and Tanh?
3. Describe how overfitting can be prevented in neural networks.
4. What are some advantages and disadvantages of using deep neural networks?

**Programming Task: Feed-Forward Neural Network**

Implement a simple feed-forward neural network using TensorFlow to classify digits from the MNIST dataset. 
Use techniques such as dropout to mitigate overfitting and evaluate the model's performance.


In [None]:
# Code for Feed-Forward Neural Network

# Your code here


## Transformers and NLP

**Theory Questions**

1. What is a transformer architecture, and how does it differ from recurrent neural networks (RNNs)?
2. Explain the concept of self-attention in transformers and why it is effective for NLP tasks.
3. What are common NLP tasks for which transformers are particularly well-suited?
4. Discuss the difference between BERT and GPT architectures.

**Programming Task: Text Classification**

Use the Hugging Face Transformers library to fine-tune a pre-trained BERT model on a text classification task, 
such as sentiment analysis. Evaluate the model on accuracy and F1-score.


In [None]:
# Code for Text Classification with Transformers

# Your code here


## Data Processing and Feature Engineering

**Theory Questions**

1. Explain the difference between normalization and standardization. When would you use each?
2. What are the advantages of using one-hot encoding for categorical variables?
3. Describe the process of handling missing data in a dataset.
4. How does feature scaling impact the performance of certain machine learning algorithms?

**Programming Task: Feature Engineering**

Given a dataset with categorical and numerical features (e.g., Titanic dataset), preprocess the data for use in a machine learning model.
Tasks include handling missing data, encoding categorical variables, and feature scaling.


In [None]:
# Code for Data Processing and Feature Engineering

# Your code here


## Dimensionality Reduction: t-SNE and PCA

**Theory Questions**

1. Explain the purpose of t-Distributed Stochastic Neighbor Embedding (t-SNE) and how it differs from PCA.
2. What are the main applications of PCA and t-SNE in machine learning?
3. Discuss the limitations of t-SNE for large datasets.

**Programming Task: t-SNE and PCA Comparison**

Use both PCA and t-SNE on the Iris dataset, and visualize the results in a 2D plot. Discuss any differences observed between the two visualizations.


In [None]:
# Code for t-SNE and PCA Comparison

# Your code here


## Hierarchical Clustering

**Theory Questions**

1. Explain the difference between agglomerative and divisive hierarchical clustering.
2. What are linkage methods, and why are they important in hierarchical clustering?
3. Describe dendrograms and their role in hierarchical clustering.

**Programming Task: Hierarchical Clustering**

Use hierarchical clustering on the Iris dataset and visualize the clusters using a dendrogram. Experiment with different linkage methods (e.g., single, complete, average) and observe any differences.


In [None]:
# Code for Hierarchical Clustering and Dendrogram

# Your code here


## Random Forest

**Theory Questions**

1. Describe how a Random Forest algorithm works and its main advantages.
2. What are hyperparameters in Random Forest, and how do they affect model performance?
3. Explain the concept of feature importance in Random Forest.

**Programming Task: Random Forest Classification**

Use Random Forest to classify the Iris dataset. Experiment with different numbers of estimators and observe their effect on model accuracy. 
Plot the feature importances based on the Random Forest model.


In [None]:
# Code for Random Forest Classification

# Your code here


## Decision Tree

**Theory Questions**

1. What is a Decision Tree, and how does it make decisions based on data?
2. Explain the concepts of information gain and Gini impurity in the context of Decision Trees.
3. How can overfitting be mitigated in Decision Trees?

**Programming Task: Decision Tree Classification**

Use a Decision Tree model to classify the Iris dataset. Experiment with different depth levels for the tree and evaluate the model's performance.


In [None]:
# Code for Decision Tree Classification

# Your code here


## Support Vector Machine (SVM)

**Theory Questions**

1. What is a Support Vector Machine, and how does it classify data?
2. Describe the concept of the margin and support vectors in SVM.
3. Explain the differences between linear and non-linear SVMs.

**Programming Task: SVM Classification**

Use an SVM to classify the Iris dataset. Experiment with different kernels (linear, polynomial, RBF) and observe their impact on model accuracy.


In [None]:
# Code for SVM Classification

# Your code here


## Logistic Regression

**Theory Questions**

1. Explain how Logistic Regression is used for binary classification.
2. What is the sigmoid function, and why is it used in Logistic Regression?
3. Describe how regularization (L1, L2) affects Logistic Regression.

**Programming Task: Logistic Regression Classification**

Use Logistic Regression to classify the digits in the MNIST dataset. Implement regularization and observe its impact on model performance.


In [None]:
# Code for Logistic Regression Classification

# Your code here

### 1.1 What is the difference between bias and variance?
- **Bias** refers to the error introduced by approximating a complex problem using a simpler model. High bias indicates that the model is too simple and cannot capture the underlying patterns of the data, leading to underfitting.
- **Variance** measures how much the model's predictions change when using different training data. High variance indicates that the model is too complex and captures noise in the data, leading to overfitting.

#### Practical Implications
- A high-bias model, such as a linear regression model for non-linear data, will result in low accuracy on both training and test data.
- A high-variance model, such as a deep neural network with insufficient data, will have high accuracy on the training data but low accuracy on the test data.

#### Techniques to Address Bias-Variance Tradeoff
1. Use regularization techniques like L1 and L2 penalties to reduce variance.
2. Increase model complexity or add more features to reduce bias.
3. Employ cross-validation techniques to identify the best model complexity.


### 1.2 Explain how gradient descent works.
**Gradient Descent** is an optimization algorithm used to minimize a function by iteratively moving towards the minimum value of the function. It does this by adjusting the model parameters in the direction of the steepest descent, defined by the negative of the gradient.

#### Types of Gradient Descent
1. **Batch Gradient Descent**: Uses the entire dataset to compute the gradient. It is computationally expensive for large datasets but provides a stable convergence path.
2. **Stochastic Gradient Descent (SGD)**: Uses one data point at a time to compute the gradient. It is computationally efficient but can have noisy updates.
3. **Mini-Batch Gradient Descent**: Uses a small batch of data points to compute the gradient. It strikes a balance between the stability of batch gradient descent and the efficiency of SGD.

#### Gradient Descent Variants
- **Momentum**: Helps accelerate SGD in relevant directions by adding a fraction of the previous update to the current update.
- **Adam**: Combines the advantages of both Momentum and RMSProp by maintaining an adaptive learning rate for each parameter.

## 2. Programming Challenges

### 2.1 Implement Logistic Regression from Scratch
Implement a logistic regression model using only NumPy. This exercise tests your understanding of the mathematics behind logistic regression and your ability to translate that into code.

#### Mathematical Background
The logistic regression model is defined as:

$$h(z) = \frac{1}{1 + e^{-z}}$$

Where:
- \(z\) is the linear combination of input features and weights.
- The logistic function \(h(z)\) maps any real-valued number into the range [0, 1].

The model is trained using the **cross-entropy loss** function, which measures the difference between the predicted probability and the actual label.


In [None]:

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Logistic Regression model
class LogisticRegression:
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.weights = None
        self.bias = None

    def fit(self, X, y):
        # Initialize parameters
        n_samples, n_features = X.shape
        self.weights = np.zeros(n_features)
        self.bias = 0

        # Gradient Descent
        for _ in range(self.n_iterations):
            # Linear model
            linear_model = np.dot(X, self.weights) + self.bias
            # Sigmoid function
            y_predicted = sigmoid(linear_model)

            # Compute gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
            db = (1 / n_samples) * np.sum(y_predicted - y)

            # Update parameters
            self.weights -= self.learning_rate * dw
            self.bias -= self.learning_rate * db

    def predict(self, X):
        linear_model = np.dot(X, self.weights) + self.bias
        y_predicted = sigmoid(linear_model)
        return [1 if i > 0.5 else 0 for i in y_predicted]

# Create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Logistic Regression model
model = LogisticRegression(learning_rate=0.01, n_iterations=1000)
model.fit(X_train, y_train)

# Evaluate the model
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)

print(f"Accuracy: {accuracy * 100:.2f}%")
