PROBLEM STATEMENT -> Write a Python program that utilizes the scikit-learn library to perform several tasks on the Iris and Breast Cancer datasets. First, fetch both datasets and display their classes (target labels) and attributes (features). Next, split each dataset into training and testing sets to prepare for model evaluation. Implement a Gaussian Naive Bayes classifier, training it on the training set and evaluating its accuracy on the testing set. Print the accuracy of the model for each dataset. Finally, apply Lasso regression to perform feature selection and shrinkage, thereby identifying the most significant features in the datasets.

Importing the required Libraries

In [1]:
from sklearn.datasets import load_iris, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

Loading the Iris Dataset from sklearn Library

In [2]:
iris = load_iris()
iris_data, iris_target = iris.data, iris.target

In [3]:
print("Iris Dataset:")
print("Attributes:", iris.feature_names)
print("Classes:", iris.target_names)

Iris Dataset:
Attributes: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Classes: ['setosa' 'versicolor' 'virginica']


Splitting the Iris dataset into training and testing sets

In [4]:
iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(
    iris_data, iris_target, test_size=0.3, random_state=42
)


Loading the Breast Cancer dataset

In [5]:
breast_cancer = load_breast_cancer()
bc_data, bc_target = breast_cancer.data, breast_cancer.target


In [6]:
print("\nBreast Cancer Dataset:")
print("Attributes:", breast_cancer.feature_names)
print("Classes:", breast_cancer.target_names)



Breast Cancer Dataset:
Attributes: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Classes: ['malignant' 'benign']


Splitting the Breast Cancer dataset into training and testing sets

In [7]:
bc_X_train, bc_X_test, bc_y_train, bc_y_test = train_test_split(
    bc_data, bc_target, test_size=0.3, random_state=42
)


Apply Gaussian Naive Bayes classifier to the Iris dataset

In [8]:
gnb_iris = GaussianNB()
gnb_iris.fit(iris_X_train, iris_y_train)
iris_y_pred = gnb_iris.predict(iris_X_test)
iris_accuracy = accuracy_score(iris_y_test, iris_y_pred)
print("\nIris Dataset Accuracy: {:.2f}%".format(iris_accuracy * 100))



Iris Dataset Accuracy: 97.78%


Apply Gaussian Naive Bayes classifier to the Breast Cancer dataset

In [9]:
gnb_bc = GaussianNB()
gnb_bc.fit(bc_X_train, bc_y_train)
bc_y_pred = gnb_bc.predict(bc_X_test)
bc_accuracy = accuracy_score(bc_y_test, bc_y_pred)
print("Breast Cancer Dataset Accuracy: {:.2f}%".format(bc_accuracy * 100))


Breast Cancer Dataset Accuracy: 94.15%


Importing Libraries for applying Lasso

In [10]:
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

Function to apply Lasso for feature selection

In [11]:
def select_features_with_lasso(X_train, X_test, y_train, alpha=0.01):
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    lasso = Lasso(alpha=alpha)
    lasso.fit(X_train_scaled, y_train)
    
    # Select features where coefficients are non-zero
    important_features = lasso.coef_ != 0
    
    X_train_selected = X_train[:, important_features]
    X_test_selected = X_test[:, important_features]
    
    return X_train_selected, X_test_selected

Apply Lasso for feature selection on the Iris dataset

In [12]:
iris_X_train_selected, iris_X_test_selected = select_features_with_lasso(iris_X_train, iris_X_test, iris_y_train)

In [14]:
gnb_iris = GaussianNB()
gnb_iris.fit(iris_X_train_selected, iris_y_train)
iris_y_pred = gnb_iris.predict(iris_X_test_selected)
iris_accuracy = accuracy_score(iris_y_test, iris_y_pred)
print("\nIris Dataset Accuracy after Lasso feature selection: {:.2f}%".format(iris_accuracy * 100))


Iris Dataset Accuracy after Lasso feature selection: 97.78%


Apply Lasso for feature selection on the Breast Cancer dataset

In [15]:
bc_X_train_selected, bc_X_test_selected = select_features_with_lasso(bc_X_train, bc_X_test, bc_y_train)

Apply Gaussian Naive Bayes classifier to the selected Breast Cancer dataset

In [16]:
gnb_bc = GaussianNB()
gnb_bc.fit(bc_X_train_selected, bc_y_train)
bc_y_pred = gnb_bc.predict(bc_X_test_selected)
bc_accuracy = accuracy_score(bc_y_test, bc_y_pred)
print("Breast Cancer Dataset Accuracy after Lasso feature selection: {:.2f}%".format(bc_accuracy * 100))


Breast Cancer Dataset Accuracy after Lasso feature selection: 92.98%


Hence Done


In [1]:
# hence done
