# Machine Learning and Statistics Project

### Introduction:

### Assumptions:

I assume that all the data from data sources (Iris Dataset from SKLearn dataset) is correct.

### Step 1: Import required libraries and dataset 

In [19]:
# Import various libraries:

import pandas as pd # importing pandas as I'm likely going to be using this
from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [4]:
# load dataset from sklearn library and look for a description as a brief test.[01]
# Converting dataset into a pandas DataFrame for data analysis, this may not be required but may come in useful at a later date.

iris = load_iris()

# Create separate variables for features (X) and target (y)
X = iris.data
y = iris.target

# Create a DataFrame for further analysis if needed
data = pd.DataFrame(data=X, columns=iris.feature_names)
data['target'] = y

print(data.describe())

       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.057333           3.758000   
std             0.828066          0.435866           1.765298   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000   
75%             6.400000          3.300000           5.100000   
max             7.900000          4.400000           6.900000   

       petal width (cm)      target  
count        150.000000  150.000000  
mean           1.199333    1.000000  
std            0.762238    0.819232  
min            0.100000    0.000000  
25%            0.300000    0.000000  
50%            1.300000    1.000000  
75%            1.800000    2.000000  
max            2.500000    2.000000  


In [5]:
# I want to have a quick look and see the size of the data I am dealing with
data.shape

(150, 5)

### Step 2: Explore and understand the data and data remediation

In [6]:
data.columns

Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',
       'petal width (cm)', 'target'],
      dtype='object')

In [7]:
data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [8]:
# Continuing to review the file
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int32  
dtypes: float64(4), int32(1)
memory usage: 5.4 KB


In [9]:
# Check for missing values (if any)
missing_values = data.isnull().sum()

### Step 2:

In [10]:
# Split the data into training and testing sets (70% training, 30% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [15]:
# Create a Logistic Regression model
model = LogisticRegression()

In [16]:
# Train the model on the training data
model.fit(X_train, y_train)

LogisticRegression()

In [17]:
# Make predictions on the test data
y_pred = model.predict(X_test)

In [20]:
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')

In [21]:
# Print the evaluation metrics
print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")

Accuracy: 1.0
Precision: 1.0
Recall: 1.0


### References:

[01] Scikit-learn (2023) Machine Learning in Python. Available at: https://scikit-learn.org/stable/index.html Accessed 31/10/2023<br>
[02] Scikit-learn (2023) sklearn.datasets.load_iris. Available at: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris Accessed 31/10/2023<br>
[03] pandas (2023) pandas. Available at: https://pandas.pydata.org/ Accessed 31/10/2023<br>
[04] Scikit-learn (2023) 1.1. Linear Models. Available at: https://scikit-learn.org/stable/modules/linear_model.html Accessed 26/11/2023.<br>
[05] Scikit-learn (2023) sklearn.linear_model.LinearRegression. Available at: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html. Accessed 26/11/2023.
[06] activestate (2021) How To Run Linear Regressions In Python Scikit-learn. Available at: https://www.activestate.com/resources/quick-reads/how-to-run-linear-regressions-in-python-scikit-learn/
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]