In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


These lines import necessary libraries for data manipulation, model training, and evaluation.
numpy and pandas are commonly used for data manipulation.
train_test_split is used to split the dataset into training and testing sets.
RandomForestClassifier is an ensemble learning method used for classification tasks.
accuracy_score is used to evaluate the accuracy of the model.

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
column_names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg',
                'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']
data = pd.read_csv(url, names=column_names)


These lines load the dataset from a URL into a pandas DataFrame (data).
url contains the URL of the dataset.
column_names specifies the column names for the dataset.
pd.read_csv() reads the CSV file from the URL and assigns column names to the DataFrame.

In [None]:
data.replace('?', np.nan, inplace=True)
data.dropna(inplace=True)


These lines preprocess the data by handling missing values.
data.replace('?', np.nan, inplace=True) replaces '?' values with NaN (missing values).
data.dropna(inplace=True) drops rows with missing values.

In [None]:
data = pd.get_dummies(data, columns=['cp', 'restecg', 'slope', 'thal'])


This line converts categorical variables into numerical format using one-hot encoding.
pd.get_dummies() is used to one-hot encode categorical variables.
columns parameter specifies the columns to be one-hot encoded.

In [None]:
X = data.drop('target', axis=1)
y = data['target']


These lines split the data into features (X) and target (y).
X contains all columns except the target variable.
y contains only the target variable.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


This line splits the data into training and testing sets.
train_test_split() splits the data into train and test sets.
test_size specifies the proportion of the dataset to include in the test split.
random_state is used to ensure reproducibility.

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


These lines initialize and train a Random Forest Classifier model.
RandomForestClassifier() initializes the model with 100 decision trees.
model.fit() trains the model on the training data (X_train, y_train).

In [None]:
y_pred = model.predict(X_test)


This line makes predictions on the test set using the trained model.

In [None]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 0.6166666666666667


These lines calculate and print the accuracy of the model.
accuracy_score() computes the accuracy by comparing predicted labels (y_pred) with true labels (y_test).
The accuracy is printed to the console.