In this assignment, you will train a Logistic Regression model, a fundamental algorithm for binary and multi-class classification. You will use the famous Iris dataset to classify species of iris flowers.
- Open the
assignment.py
file. - You will find a function definition:
train_logistic_regression_on_iris()
. - Your tasks are to:
- Load the Iris dataset from
sklearn.datasets
. - Split the data into training and testing sets (80% train, 20% test).
- Create and train a
LogisticRegression
model. - Make predictions on the test set.
- Calculate and return the accuracy of the model.
- Load the Iris dataset from
- Use
load_iris()
to get the data. - Use
train_test_split
fromsklearn.model_selection
. Setrandom_state=42
for reproducibility. - The
LogisticRegression
model is insklearn.linear_model
. - Use
model.fit()
to train,model.predict()
to predict, andaccuracy_score
fromsklearn.metrics
to evaluate.
- The trained model has an attribute
model.coef_
. What does this attribute represent? How can it tell you about the importance of different features? - What is regularization in logistic regression? Look at the
penalty
andC
parameters of theLogisticRegression
model. - How would you get a confusion matrix for your model's predictions? (Hint:
confusion_matrix
fromsklearn.metrics
)