# Instructions

1. This assignment is worth 5% of the final grade.
2. In the questions below, insert a cell (code or markdown, as appropriate), and fill in your answers there.
2. You are required to work on this individually. Any form of plagiarism will result in 0.
3. Please submit your notebook file (name it `IND5003_A2_<Your_Name>.ipynb`) through Canvas before **17th Nov 2023 23:59hrs**.

# Question 1

In Python, you can save a binary version of an object by pickling it. The file `IND5003_2310_Assignment2.pickle` contains the training and test data that we used in class for the KNN model. The pickle was created in this way:

To read pickled objects into your notebook, you can do this:

In [1]:
import pickle

with open('../data/IND5003_2310_Assignment2.pickle', 'rb') as f:
    X_ttrain, X_ttest, y_train, y_test = pickle.load(f)

If this does not work for you, run the following cell to obtain `X_ttrain`, `X_ttest`, `y_train`, `y_test`.

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer, make_column_selector
from sklearn import set_config

set_config(display='diagram')

loans = pd.read_excel('../data/loans.xlsx', index_col=0)
loans.loc[:, 'issue_yr'] = loans.issue_d.apply(lambda x: x.year)
loans.loc[:, 'issue_mth'] = loans.issue_d.apply(lambda x: x.month)
drop_these_columns = loans.apply(lambda x: np.sum(pd.notna(x)), axis=0) < 40000
loans.drop(columns=loans.columns[drop_these_columns], inplace=True)
no_miss = loans[pd.notna(loans).all(axis=1)].copy()
cr_line_cols = no_miss.earliest_cr_line.str.split('-', expand=True)
cr_line_cols.columns = ['ecrl_mth', 'ecrl_yr']
cr_line_cols.ecrl_yr = cr_line_cols.ecrl_yr.astype(int)
no_miss = pd.concat([no_miss, cr_line_cols], axis=1)
y = no_miss.y
X_train, X_test, y_train, y_test = train_test_split(no_miss, y, test_size=0.3, random_state=41, stratify=y)

num_features = ['loan_amnt', 'int_rate', 'installment', 'total_pymnt', 'total_pymnt_inv', 'total_rec_prncp', 'issue_yr']
cat_features = ['term', 'grade', 'emp_length', 'home_ownership', 'loan_status', 'purpose', 'addr_state']
all_features = num_features + cat_features
X_train = X_train.loc[:, all_features]
X_test = X_test.loc[:, all_features]
ct = ColumnTransformer([
      ('scale', StandardScaler(), make_column_selector(dtype_include=np.number)),
      ('onehot', OneHotEncoder(), make_column_selector(dtype_include=object))])
ct.fit(X_train)
X_ttrain = ct.transform(X_train)
X_ttest = ct.transform(X_test)

1. A logistic regression classifier is a widely used classification model. It models the log-odds for a class as a linear combination of one or more independent variables. **With L2 regularization,** the hyper-parameter is C, the inverse of regularization strength. Create a sklearn logistic regression classifier with L2 regularization, `random_state=42` and `solver='saga'`, then generate and plot the validation curve by varying C over the range from 1 to 3001 (inclusive of both endpoints) with at least 16 different values.
2. Apply the best estimator to the *test set*, and compare the confusion matrix to the KNN model. In class, it achieved an accuracy of 0.89 and an F1-score of 0.58.

# Question 2

Review the section on 'Running Detections Programmatically' in `08_vision_lecture.ipynb`. Recall that the YOLOv4 model detected 12 objects in `football1.png` and all 12 objects were labeled `person`.

Modify `vision.py` and then use it with the codes in the section on 'Running Detections Programmatically' to detect the objects in `train_station_bournemouth.jpg` and **report all the labels detected and how many objects there were of each label.**

1. In the cell below, include the lines you modify in or/and add to `vision.py` and say where the lines are.
2. Insert other cells after this next cell and fill in the codes you use to detect the objects in `train_station_bournemouth.jpg` and report all the labels detected and how many objects there were of each label.

In [None]:
# Fill in the details of how you modified vision.py here:
#
# Modify line xxx

# Insert after line yyy


## References

1. The Python [pickle module](https://docs.python.org/3/library/pickle.html)
2. More details on logistic regression - see section 4.3 in [this book](https://www.statlearning.com/).
3. sklearn [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) on logistic regression.