<a href="https://colab.research.google.com/github/PMarrhia02/AI-ML-Algorithms/blob/main/ML_Algorithams.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Machine Learning Algorithms - Practical Implementations.**

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay, roc_curve, auc
from sklearn.preprocessing import label_binarize


We have used below given import files :

pandas: Working with data in tables (DataFrames).

numpy : Working with numbers and arrays.

from sklearn.model_selection import train_test_split: Used to split data into training and test sets.

from sklearn.preprocessing import LabelEncoder, StandardScaler: Converting labels to numbers and scaling data.

from sklearn.linear_model import LinearRegression, LogisticRegression: Create models for prediction.

from sklearn.cluster import KMeans: Used for grouping similar data (clustering).

from sklearn.neighbors import KNeighborsClassifier: Used to classify data by finding similar points.

from sklearn.svm import SVC: Used to classify data using Support Vector Machines.

from sklearn.metrics import accuracy_score: Used to check how well the model did.


In [None]:
from google.colab import files
uploaded = files.upload()


Saving healthcare_dataset.csv to healthcare_dataset.csv


**Function Used**:  files.upload()

**Why:** To upload the dataset file (healthcare_dataset.csv) from your local computer to Colab.



In [None]:
df = pd.read_csv('healthcare_dataset.csv')
df.head()


Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
0,Bobby JacksOn,30,Male,B-,Cancer,2024-01-31,Matthew Smith,Sons and Miller,Blue Cross,18856.281306,328,Urgent,2024-02-02,Paracetamol,Normal
1,LesLie TErRy,62,Male,A+,Obesity,2019-08-20,Samantha Davies,Kim Inc,Medicare,33643.327287,265,Emergency,2019-08-26,Ibuprofen,Inconclusive
2,DaNnY sMitH,76,Female,A-,Obesity,2022-09-22,Tiffany Mitchell,Cook PLC,Aetna,27955.096079,205,Emergency,2022-10-07,Aspirin,Normal
3,andrEw waTtS,28,Female,O+,Diabetes,2020-11-18,Kevin Wells,"Hernandez Rogers and Vang,",Medicare,37909.78241,450,Elective,2020-12-18,Ibuprofen,Abnormal
4,adrIENNE bEll,43,Female,AB+,Cancer,2022-09-19,Kathleen Hanna,White-White,Aetna,14238.317814,458,Urgent,2022-10-09,Penicillin,Abnormal


**Functions Used:** pd.read_csv(), df.head()\
**Why:**
read_csv() loads the CSV file into a DataFrame.

head() shows the first 5 rows to understand the data.


In [None]:
df.dropna(inplace=True)


In [None]:
df.isnull().sum()

Unnamed: 0,0
Name,0
Age,0
Gender,0
Blood Type,0
Medical Condition,0
Date of Admission,0
Doctor,0
Hospital,0
Insurance Provider,0
Billing Amount,0


**Functions Used:** dropna(), isnull().sum()\
**Why:** dropna() removes rows with empty (missing) values.\
isnull().sum() checks how many missing values are left.

In [None]:
le = LabelEncoder()
df['Test Results'] = le.fit_transform(df['Test Results'])

if 'Gender' in df.columns:
    df['Gender'] = LabelEncoder().fit_transform(df['Gender'])

**Function Used:** LabelEncoder(),.fit_transform()\
**Why:** Converts text labels (like "Positive", "Negative") into numbers (0, 1) so the model can use them.

In [None]:

X = df[['Age', 'Billing Amount']]

y = df['Test Results']


X contains input features; y contains the output (target) the model will learn to predict.

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

**Function Used:** StandardScaler().fit_transform()\
**Why:** Standardizes the data so each feature has equal importance (mean = 0, std = 1).

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.1, random_state=42
)


**Function Used:** train_test_split()\
**Why:** Splits the data — 90% for training,\
10% for testing — to evaluate model performance.

In [None]:
# Linear Regression
from sklearn.linear_model import LinearRegression

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
lr_preds = lr_model.predict(X_test)


print("Linear Regression Predictions (rounded):")
print(np.round(lr_preds[:5]))


Linear Regression Predictions (rounded):
[1. 1. 1. 1. 1.]


**Functions Used:** LinearRegression(), fit(), predict(), np.round()

**Why:** Trains a linear model and makes predictions. round() shows nearest whole number output,\
fit() trains the model,\
predict() makes predictions.

In [None]:
# Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

log_model = LogisticRegression(max_iter=1000, C=10)
log_model.fit(X_train, y_train)
log_preds = log_model.predict(X_test)


print("Logistic Regression Accuracy:")
print(accuracy_score(y_test, log_preds))


Logistic Regression Accuracy:
0.33981981981981985


**Functions Used:** LogisticRegression(), fit(), predict(), accuracy_score()

**Why:** For binary classification (0 or 1). accuracy_score() checks how correct the model is.

In [None]:
# KMeans Clustering
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)


print("KMeans Cluster Centers:")
print(kmeans.cluster_centers_)


KMeans Cluster Centers:
[[ 0.94477237 -0.59578901]
 [-0.91383112 -0.66614254]
 [-0.04114028  1.050907  ]]


**Functions Used:** KMeans(), fit(), cluster_centers_

**Why:** Clusters the data into 3 groups (unsupervised learning). Helps detect patterns,\
cluster_centers_ Gives the center points of each cluster.

In [None]:
# K-Nearest Neighbors (KNN)
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

knn = KNeighborsClassifier(n_neighbors=5)  # k = 5
knn.fit(X_train, y_train)
knn_preds = knn.predict(X_test)

print("KNN Accuracy:")
print(accuracy_score(y_test, knn_preds))


KNN Accuracy:
0.34


**Functions Used:** KNeighborsClassifier(), fit(), predict(), accuracy_score()

**Why:** Predicts test results based on the 5 nearest neighbors.\
accuracy_score() Measures how good predictions are.

In [None]:
# Support Vector Machine (SVM)
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

svm = SVC(kernel='linear')
svm.fit(X_train, y_train)
svm_preds = svm.predict(X_test)

print("SVM Accuracy:")
print(accuracy_score(y_test, svm_preds))


SVM Accuracy:
0.3403603603603604


**Functions Used:** SVC(), fit(), predict(), accuracy_score()

**Why:** Creates a line (hyperplane) to separate classes clearly.