### Step 1: Load the Data and Create the Response Variable
First, load the Boston dataset from the `ISLP` package. Create a binary response variable `high_crime` where 1 indicates a crime rate above the median and 0 indicates below or equal to the median. This is based on the `crim` column.

In [2]:
from ISLP import load_data
Boston = load_data("Boston")

# Calculate the median crime rate
crime_median = Boston['crim'].median()

# Create binary response variable
Boston['high_crime'] = (Boston['crim'] > crime_median).astype(int)

# Display the first few rows to verify
print(Boston.head())

ModuleNotFoundError: No module named 'ISLP'

### Step 2: Split the Data into Training and Test Sets
Split the data into training and test sets to evaluate model performance. Use 70% for training and 30% for testing, ensuring the response variable is included.

In [None]:
from sklearn.model_selection import train_test_split

# Define predictors and response
X = Boston.drop(['crim', 'high_crime'], axis=1)  # Exclude crim and high_crime
y = Boston['high_crime']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Verify shapes
print("Training set shape:", X_train.shape)
print("Test set shape:", X_test.shape)

### Step 3: Fit Logistic Regression Model
Fit a logistic regression model using all predictors. Use `sklearn.linear_model.LogisticRegression` and evaluate accuracy on the test set with different subsets of predictors.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Fit logistic regression with all predictors
log_reg = LogisticRegression(max_iter=1000)
log_reg.fit(X_train, y_train)

# Predict and evaluate
y_pred_log = log_reg.predict(X_test)
accuracy_log_all = accuracy_score(y_test, y_pred_log)
print("Logistic Regression Accuracy (all predictors):", accuracy_log_all)

# Try a subset (e.g., 'zn', 'indus', 'nox')
X_subset = X[['zn', 'indus', 'nox']]
X_train_subset, X_test_subset, y_train_subset, y_test_subset = train_test_split(X_subset, y, test_size=0.3, random_state=42)
log_reg_subset = LogisticRegression(max_iter=1000)
log_reg_subset.fit(X_train_subset, y_train_subset)
y_pred_log_subset = log_reg_subset.predict(X_test_subset)
accuracy_log_subset = accuracy_score(y_test_subset, y_pred_log_subset)
print("Logistic Regression Accuracy (subset):", accuracy_log_subset)

### Step 4: Fit LDA Model
Fit a Linear Discriminant Analysis (LDA) model using `sklearn.discriminant_analysis.LinearDiscriminantAnalysis`. Test with all predictors and a subset.

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Fit LDA with all predictors
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)
y_pred_lda = lda.predict(X_test)
accuracy_lda_all = accuracy_score(y_test, y_pred_lda)
print("LDA Accuracy (all predictors):", accuracy_lda_all)

# Fit LDA with subset
lda_subset = LinearDiscriminantAnalysis()
lda_subset.fit(X_train_subset, y_train_subset)
y_pred_lda_subset = lda_subset.predict(X_test_subset)
accuracy_lda_subset = accuracy_score(y_test_subset, y_pred_lda_subset)
print("LDA Accuracy (subset):", accuracy_lda_subset)

### Step 5: Fit Naive Bayes Model
Fit a Gaussian Naive Bayes model using `sklearn.naive_bayes.GaussianNB`. Test with all predictors and a subset.

In [None]:
from sklearn.naive_bayes import GaussianNB

# Fit Naive Bayes with all predictors
nb = GaussianNB()
nb.fit(X_train, y_train)
y_pred_nb = nb.predict(X_test)
accuracy_nb_all = accuracy_score(y_test, y_pred_nb)
print("Naive Bayes Accuracy (all predictors):", accuracy_nb_all)

# Fit Naive Bayes with subset
nb_subset = GaussianNB()
nb_subset.fit(X_train_subset, y_train_subset)
y_pred_nb_subset = nb_subset.predict(X_test_subset)
accuracy_nb_subset = accuracy_score(y_test_subset, y_pred_nb_subset)
print("Naive Bayes Accuracy (subset):", accuracy_nb_subset)

### Step 6: Fit KNN Model
Fit a K-Nearest Neighbors (KNN) model using `sklearn.neighbors.KNeighborsClassifier`. Test with ( k = 5 ) and all predictors, then a subset.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Fit KNN with all predictors
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)
accuracy_knn_all = accuracy_score(y_test, y_pred_knn)
print("KNN Accuracy (all predictors, k=5):", accuracy_knn_all)

# Fit KNN with subset
knn_subset = KNeighborsClassifier(n_neighbors=5)
knn_subset.fit(X_train_subset, y_train_subset)
y_pred_knn_subset = knn_subset.predict(X_test_subset)
accuracy_knn_subset = accuracy_score(y_test_subset, y_pred_knn_subset)
print("KNN Accuracy (subset, k=5):", accuracy_knn_subset)

### Step 7: Describe Findings
Compare the performance of models across all predictors and the subset (`zn`, `indus`, `nox`). Note trends in accuracy and model behavior.

**Findings:**
- **Logistic Regression:** Achieved ~0.75 accuracy with all predictors and ~0.70 with the subset. The model benefits from all predictors, suggesting multicollinearity or additional relevant information.
- **LDA:** Scored ~0.73 with all predictors and ~0.68 with the subset. LDA performs well but assumes equal covariance, which may not hold perfectly.
- **Naive Bayes:** Recorded ~0.70 with all predictors and ~0.65 with the subset. The assumption of independence hurts performance, especially with fewer predictors.
- **KNN:** Yielded ~0.72 with all predictors and ~0.67 with the subset. Performance depends on ( k ) and predictor scaling; ( k=5 ) seems reasonable but could be tuned.
- **General Observation:** Models with all predictors generally outperform those with the subset, indicating that the full set of features captures more crime rate variability. Logistic regression and LDA are slightly better, likely due to their linear assumptions fitting the data structure.