Import necessary libraries: The first few lines import the necessary libraries for loading and visualizing the dataset, training and evaluating the model, and creating the report.

In [11]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

Load the dataset: The `pd.read_csv()` function is used to load the dataset from a CSV file and create a `pandas` dataframe.

In [12]:
# Load the dataset from a CSV file
df = pd.read_csv("dataset.csv")
df.head()

Unnamed: 0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,feature_10,target
0,-1.085631,0.997345,0.282978,-1.506295,-0.5786,1.651437,-2.426679,-0.428913,1.265936,-0.86674,1.0
1,-0.678886,-0.094709,1.49139,-0.638902,-0.443982,-0.434351,2.20593,2.186786,1.004054,0.386186,1.0
2,0.737369,1.490732,-0.935834,1.175829,-1.253881,-0.637752,0.907105,-1.428681,-0.140069,-0.861755,0.0
3,-0.255619,-2.798589,-1.771533,-0.699877,0.927462,-0.173636,0.002846,0.688223,-0.879536,0.283627,1.0
4,-0.805367,-1.727669,-0.3909,0.573806,0.338589,-0.01183,2.392365,0.412912,0.978736,2.238143,0.0


In [13]:
# Data exploration
print(df.describe())

         feature_1    feature_2    feature_3    feature_4    feature_5  \
count  1000.000000  1000.000000  1000.000000  1000.000000  1000.000000   
mean     -0.005297     0.012115     0.017455    -0.011119    -0.008053   
std       0.986820     0.998430     0.990776     0.978192     1.003635   
min      -3.167055    -3.685499    -3.066988    -3.581474    -3.587494   
25%      -0.693351    -0.684767    -0.685417    -0.702782    -0.670098   
50%       0.007288    -0.005697    -0.004751     0.052642    -0.025822   
75%       0.641805     0.671780     0.713492     0.660364     0.672929   
max       3.050755     3.571579     3.386115     2.789487     3.558981   

         feature_6    feature_7    feature_8    feature_9   feature_10  \
count  1000.000000  1000.000000  1000.000000  1000.000000  1000.000000   
mean      0.044696     0.054121    -0.056954     0.028626     0.021528   
std       0.986880     1.016036     1.015486     1.023492     0.980629   
min      -3.231055    -3.570243    -3

Visualize the dataset: The `plt.scatter()` function is used to create a scatter plot of the dataset, with different colors for each target class. The resulting plot is saved to a PNG file using `plt.savefig()`

![Dataset.png](dataset-image.png)

Split the data into training and testing sets: The `train_test_split()` function from scikit-learn is used to split the dataset into training and testing sets.

In [14]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, :-1], df.iloc[:, -1], test_size=0.2, random_state=123)

Train a logistic regression model: The `LogisticRegression()` function is used to create a logistic regression model, which is then trained on the training set using the `fit()` method.

In [15]:
# Train a logistic regression model
model = LogisticRegression(random_state=123)
model.fit(X_train, y_train)

Evaluate the model on the testing set: The trained model is used to make predictions on the testing set using the `predict()` method, and the accuracy of the predictions is calculated using `accuracy_score()` from scikit-learn.

In [16]:
# Evaluate the model on the testing set
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# Print the model accuracy
print(f"Model accuracy: {accuracy:.2f}")

Model accuracy: 0.49


Visualize the model predictions on the testing set: The `plt.scatter()` function is used again to create a scatter plot of the testing set predictions, with different colors for each predicted class. The resulting plot is saved to a PNG file using `plt.savefig()`.

![Predictions.png](predictions.png)