**Instructions:**

- For questions that require coding, you need to write the relevant code and display its output. Your output should either be the direct answer to the question or clearly display the answer in it.
- For questions that require a written answer (sometimes along with the code), you need to put your answer in a Markdown cell. Writing the answer as a comment or as a print line is not acceptable.
- You need to render this file as HTML using Quarto and submit the HTML file. **Please note that this is a requirement and not optional.** A submission cannot be graded until it is properly rendered.

Import all the libraries and tools you need below.

In [None]:
import warnings
warnings.filterwarnings("ignore")

# Run the line below to install the xgboost library. It is not in Anaconda by default.
!pip install xgboost



In this assignment, you will use the data from the **cirrhosis_outcomes.csv** file. Each observation is a patient with liver cirrhosis. 

- The `Status` variable represents the survival state of the patient at `N-Days`: `C` for censored (alive), `D` for death and `CL` for censored (alive) with liver transplant.
- All other variables are medical predictors, either about the treatment or the patient.

## 1) Preprocessing (15 points)

### a)

Read the data. Use `index_col=0` to assign the `id` variable to the index; it should not be a predictor. **(2 points)**

### b)

`Status` will be the response (target) variable for the classification task. Print the `value_counts` of the classes. Are the classes balanced? Which one is the minority class? **(5 points)**

### c)

`map` the class labels to 0, 1 and 2. This is necessary because some models that are included do not recognize non-numeric input. **(2 points)**

### d)

- Separate the response and the predictors. All variables other than `Status` should be a predictor.
- One-hot-encode the categorical predictors. (This can and should be done with one function in one line.) Use `drop_first=True`.
- Create the training and test data with an 80%-20% split. **Stratify the data.** Use `random_state=42`. 
- Scale the training and the test data.

**(6 points)**

## 2) Tuning and Evaluating Different Multi-Class Classifiers (40 points)

### a)

Create four models with the specified inputs:

- A [Logistic Regression](https://scikit-learn.org/1.5/modules/generated/sklearn.linear_model.LogisticRegression.html) model: Use `multi_class = 'ovr'`, `solver = liblinear` and `random_state=1`.
- A [Linear SVC](https://scikit-learn.org/dev/modules/generated/sklearn.svm.LinearSVC.html): Use the `LinearSVC` object for efficiency reasons. Use `multi_class = 'ovr'` and `random_state=1`.
- A [KNN (K-Nearest Neighbors)](https://scikit-learn.org/1.5/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) classifier: Do not use any inputs.
- An [XGBoost](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn) classifier: Use `random_state=1`. Do not use any other inputs.

**(10 points)**

### b)

Note that the links to the model documentations are given in Part a. Using the documentations, answer the following questions:

- Do you see a `multi-class` input option for the models that did not take any such input in Part a? Why is that the case? (Only consider the scikit-learn API for XGBoost and disregard the experimental/work-in-progress inputs; they are not fully developed yet.)
- Among the models that took a `multi_class` input, `ovr` is an option along with some other algorithms. Is **OvO** (One vs One) one of the options? Why do you think this is the case?

**(10 points)**

### c)

Using the given hyperparameter grids and the following specifications, tune and evaluate each model:

- Use `cv=5`. The default classification setting of `GridSearchCV` is stratification. (The object requirement in the previous in-class assignment was to get everyone familiar with the usage of those cross-validation setting objects.)
- Use `f1_macro` for scoring. F1-score is calculated as: $$2*\frac{precision*recall}{precision+recall}$$ The macro f1-score uses the macro precision and recall scores. It is a good metric to use if you want to tune your model with both precision and recall.
- Print the cross-validation performance of the best model (`best_score_`).
- Print the `confusion_matrix` and the `classification_report` for the test data.
- Print the **micro** recall score for the test data.

**(20 points)**

In [None]:
grid_lr = {
    'penalty': [None, 'l1', 'l2', 'elasticnet'],
    'l1_ratio': [0, 0.3, 0.6, 1],
    'C': [0.01,0.1,1,10,100]
}

In [None]:
grid_svm = {
    'C': [0.01, 0.1, 1, 10, 100]
}

In [None]:
param_grid = {
    'n_neighbors': np.arange(1,25,2)
}

In [None]:
param_grid_xgb = {
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 4, 5],
    'learning_rate': [0.1, 0.01, 0.001]
}

## 3) Interpretation (45 points)

Using the prediction results of all four models, answer the following questions. **You need to justify your answers with the corresponding results for credit.**

### a)

In this classification task, what is the random baseline accuracy that the accuracy values would be compared against? **(5 points)**

### b)

How do the linear models handle the minority class? What do the False Negatives (FNs) and False Positives (FPs) of the minority class indicate about the linear models' capacity to handle the minority class? **(10 points)**

### c)

Is there a considerable difference between the micro and macro recall scores for all models? Why or why not? **(10 points)**

### d)

Compare the test accuracies of the linear models with the KNN classifier. Which one has a higher accuracy? Is accuracy a useful metric to evaluate the model performance in this case, especially regarding the minority class? Why or why not? **(10 points)**

### e)

Which model performs the best overall? How does its performance still change with the support (number of observations) of each class? What do you think can be done to overcome this persistent issue? (You will explore some options in this regard in Homework Assignment 2.) **(10 points)**