### Question 1

A Decision Tree is a supervised machine learning algorithm that is used for classification and regression tasks. It works by splitting the data into subsets based on the value of input features, creating a tree-like structure of decisions. In classification, it assigns a class label to each leaf node based on the majority class of the training samples that reach that node. The model starts from the root node and recursively splits the dataset into child nodes until a stopping criterion is met (e.g., all samples belong to the same class, maximum depth reached, etc.).

### Question 2

Gini Impurity and Entropy are two impurity measures used to determine how a decision tree splits data at each node.

- Gini Impurity: Measures the probability of a randomly chosen element being incorrectly classified. The lower the Gini value, the better the split.
  Formula: Gini = 1 - Σ(p_i)^2

- Entropy: Measures the amount of uncertainty or randomness. A pure node has zero entropy.
  Formula: Entropy = - Σ(p_i * log2(p_i))

Both are used to evaluate splits, and the algorithm chooses the split that results in the lowest impurity (or highest Information Gain).

### Question 3

Pre-Pruning involves stopping the tree growth early, before it perfectly classifies the training set. Criteria might include max depth or minimum samples per node.
Post-Pruning involves growing the entire tree and then removing sections that do not provide power to classify instances.

Advantage of Pre-Pruning: Reduces overfitting early on.
Advantage of Post-Pruning: Allows complex trees to be simplified after observing full growth.

### Question 4

Information Gain measures the reduction in entropy after a dataset is split on an attribute. It helps in selecting the best attribute for a node in the decision tree.
Formula: Information Gain = Entropy(parent) - [Weighted average] * Entropy(children)

It is important because it helps choose the feature that best separates the data into classes, making the tree more effective and efficient.

### Question 5

Applications:
- Medical Diagnosis
- Credit Risk Assessment
- Marketing and Sales Predictions
- Fraud Detection
- Customer Churn Prediction

Advantages:
- Easy to interpret and visualize
- Requires little data preprocessing
- Handles both numerical and categorical data

Limitations:
- Prone to overfitting
- Can be unstable with small changes in data
- Biased toward features with more levels

### Question 6

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

accuracy = accuracy_score(y_test, clf.predict(X_test))
print("Accuracy:", accuracy)
print("Feature Importances:", clf.feature_importances_)

### Question 7

In [None]:
clf_full = DecisionTreeClassifier(random_state=42)
clf_full.fit(X_train, y_train)
acc_full = accuracy_score(y_test, clf_full.predict(X_test))

clf_depth3 = DecisionTreeClassifier(max_depth=3, random_state=42)
clf_depth3.fit(X_train, y_train)
acc_depth3 = accuracy_score(y_test, clf_depth3.predict(X_test))

print("Full Tree Accuracy:", acc_full)
print("Max Depth=3 Accuracy:", acc_depth3)

### Question 8

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
reg = DecisionTreeRegressor(random_state=42)
reg.fit(X_train, y_train)

mse = mean_squared_error(y_test, reg.predict(X_test))
print("MSE:", mse)
print("Feature Importances:", reg.feature_importances_)

### Question 9

In [None]:
from sklearn.model_selection import GridSearchCV

params = {
    'max_depth': [2, 3, 4, 5, None],
    'min_samples_split': [2, 4, 6]
}
grid = GridSearchCV(DecisionTreeClassifier(random_state=42), params, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)

### Question 10

Step-by-step process:

1. Handle Missing Values:
   - Use imputation (mean, median, mode) or model-based methods.

2. Encode Categorical Features:
   - Use Label Encoding or One-Hot Encoding.

3. Train Decision Tree Model:
   - Use scikit-learn’s DecisionTreeClassifier.

4. Hyperparameter Tuning:
   - Use GridSearchCV to tune parameters like max_depth, min_samples_split.

5. Evaluate Performance:
   - Use metrics like accuracy, precision, recall, F1-score, confusion matrix.

Business Value:
This model can help in early diagnosis, reduce human error, and optimize resources, potentially saving costs and improving patient outcomes.