# Interview Questions
# Q1.

## Common Hyperparameters of Decision Tree Models

Decision trees are powerful models in machine learning, and their performance can be significantly influenced by the choice of hyperparameters. Below are some of the most common hyperparameters and their impact on the model's performance:

### Common Hyperparameters

- **`max_depth`**  
  Limits the depth of the tree. Restricting the depth prevents overfitting by simplifying the model, but it may lead to underfitting if set too low.

- **`min_samples_split`**  
  Minimum number of samples required to split an internal node. Higher values can prevent overfitting by stopping splits with small data subsets.

- **`min_samples_leaf`**  
  Minimum number of samples required to be in a leaf node. Larger values create more balanced trees and help reduce overfitting.

- **`criterion`**  
  Determines the function used to measure the quality of splits (e.g., `gini` for Gini Impurity or `entropy` for Information Gain). It affects the way the tree decides to split.

- **`max_features`**  
  Limits the number of features considered for the best split. Reducing features can decrease overfitting but may lead to a less expressive model.

- **`max_leaf_nodes`**  
  Limits the number of leaf nodes in the tree. This helps control the complexity and reduce overfitting.

- **`splitter`**  
  Decides how the splits are made at each node (`best` vs. `random`). Using `random` may make the model less accurate but faster to train.

### Effect on Model's Performance

- **Overfitting vs. Underfitting**  
  Increasing `max_depth` or lowering `min_samples_split` allows for deeper and more complex trees, which can capture more intricate patterns but risks overfitting.  
  Using constraints like `max_leaf_nodes`, `min_samples_leaf`, or a low `max_depth` reduces overfitting but might result in underfitting if the tree is too simple.

- **Choice of Criterion**  
  The choice of `criterion` affects how the splits are evaluated, potentially impacting accuracy based on the dataset's structure.

---
# Q2.
## Label Encoding vs. One-Hot Encoding

Both **Label Encoding** and **One-Hot Encoding** are techniques used to convert categorical data into numerical form. However, each technique has its own use case and potential downsides.

### Comparison Table

| **Aspect**             | **Label Encoding**                                          | **One-Hot Encoding**                                      |
|------------------------|-------------------------------------------------------------|-----------------------------------------------------------|
| **Definition**         | Converts categorical labels into integer values.            | Converts each category into a binary vector with one active bit. |
| **Output Format**      | Integer values (e.g., [0, 1, 2, ...]).                      | Binary vectors (e.g., [1, 0, 0], [0, 1, 0]).               |
| **Use Case**           | Suitable for ordinal data where order matters.              | Used for nominal data where no order exists.               |
| **Disadvantage**       | May introduce unintended ordinal relationships.             | Can create a large number of columns for high cardinality.  |
| **Implementation**     | `LabelEncoder` from sklearn.                                | `OneHotEncoder` or `pandas.get_dummies()`.                  |

### Example

#### Label Encoding

Given categories: `["Red", "Blue", "Green"]`  
Encoded as: `[0, 1, 2]`

#### One-Hot Encoding

Given categories: `["Red", "Blue", "Green"]`  
Encoded as:

- **Red**: `[1, 0, 0]`  
- **Blue**: `[0, 1, 0]`  
- **Green**: `[0, 0, 1]`

### Key Considerations

- **Label Encoding** is suitable when the feature has a natural order (e.g., `"Low"`, `"Medium"`, `"High"`).
- **One-Hot Encoding** is ideal for nominal features with no inherent order (e.g., `"Dog"`, `"Cat"`, `"Bird"`).
