# Cancer Cell Classification

This program classifies cancer cells. First, we need to import the necessary libraries.

In [2]:
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

Now, let's take a look at the data.

In [26]:
cancer_data = load_breast_cancer()

# print(cancer_data["feature_names"])
print(cancer_data["target_names"])

['malignant' 'benign']


It seems that cancer cells are classified as malignant or benign. Now, let's create the training and testing sets.

In [22]:
data, target = load_breast_cancer(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(data, target, train_size=0.8, random_state=42)

target


Then, we can use an SGD classifier, which uses Stochastic Gradient Descent, to predict if these cells are malignant or benign.

In [14]:
from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(max_iter=10_000)
sgd_clf.fit(X_train, y_train)

predictions = sgd_clf.predict(X_test)
print(predictions)

[1 0 0 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1
 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 1 0 1 1
 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0
 1 1 0]


Now that we have the predictions, we measure the accuracy between the test labels and the predictions from the classification model.

In [16]:
from sklearn.metrics import accuracy_score

print(accuracy_score(y_test, predictions)) # 84.2%

0.8421052631578947


We get 84.2% accuracy using an SGD classification model. While this score is quite good, we can do better. Now, let's try a Ridge classification model.

In [27]:
from sklearn.linear_model import RidgeClassifier

ridge_clf = RidgeClassifier(max_iter=10_000)
ridge_clf.fit(X_train, y_train)

ridge_pred = ridge_clf.predict(X_test)
print(accuracy_score(y_test, ridge_pred))

0.956140350877193


Great! We got 95.6% accuracy using the Ridge classifier.