<a href="https://colab.research.google.com/github/Kirtanaaa/Breast-Cancer-Classification/blob/main/bcan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BREAST CANCER CLASSIFICATION

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
ds = pd.read_csv('bcan.csv')
x = ds.iloc[:,:-1].values
y = ds.iloc[:,-1].values

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3,random_state=0)

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

## Logistic Regression Classification Model

Logistic regression is a machine learning algorithm used for binary classification tasks, where the goal is to predict one of two possible classes (e.g., Yes/No, Spam/Not Spam, etc.).

- Input: It takes one or more features (numerical or categorical) as input. For binary classification, we have two classes, labeled as 0 and 1.

- Output: The algorithm outputs a probability score between 0 and 1, indicating the likelihood of an input belonging to class 1. If the probability is close to 0, it belongs to class 0, and if it's close to 1, it belongs to class 1.

- Modeling: Logistic regression uses the sigmoid function to model the relationship between the input features and the predicted probability. The sigmoid function ensures that the output stays within the range (0, 1).

- Training: During training, the algorithm adjusts the model's parameters (weights) to minimize the difference between the predicted probabilities and the actual class labels in the training data.

- Decision Boundary: The decision boundary is a threshold value (usually 0.5) that separates the two classes. If the predicted probability is greater than the threshold, the instance is classified as class 1; otherwise, it's classified as class 0.


Applications:

Logistic regression has various applications in real-world scenarios, including:

- Email Spam Detection: To classify incoming emails as spam or not spam based on their content and features.

- Medical Diagnosis: For predicting whether a patient has a particular disease based on medical test results and patient characteristics.

- Credit Risk Assessment: To determine the likelihood of a customer defaulting on a loan based on their credit history and financial attributes.

- Customer Churn Prediction: To predict whether a customer will cancel their subscription or leave a service based on historical usage data and behavior.

- Online Ad Click Prediction: To predict the probability of a user clicking on an online advertisement based on their browsing behavior and demographics.

- Sentiment Analysis: To determine whether a piece of text (e.g., a review or tweet) expresses a positive or negative sentiment.


Multi-class Logistic Regression:

In the case of multi-class classification (more than two classes), logistic regression can be adapted to handle multiple classes by using the following approaches:

1. One-vs-Rest (OvR) or One-vs-All (OvA):

In this strategy, you train multiple binary logistic regression classifiers, where each classifier is responsible for distinguishing one class from all the others. For example, if you have three classes (A, B, C), you would train three binary classifiers: A vs. (B, C), B vs. (A, C), and C vs. (A, B). During prediction, the class with the highest probability from the individual classifiers is assigned as the final class label.

2. Softmax Regression (Multinomial Logistic Regression):

Softmax regression generalizes logistic regression to handle multiple classes directly. Instead of predicting a probability for just one class, it predicts a probability distribution across all classes. The probabilities are normalized using the softmax function, which ensures that they sum to 1. The class with the highest probability is considered the predicted class.

Applications of Multi-class Logistic Regression:

- Handwritten Digit Recognition: Classifying images of handwritten digits (0 to 9) into their corresponding classes.

- Species Classification: Identifying different species of plants or animals based on various features.

- Sentiment Analysis (with multiple classes): Categorizing text into several sentiment classes like positive, negative, and neutral.
Image Classification: Identifying objects or scenes in images from a predefined set of classes.

In [None]:
from sklearn.linear_model import LogisticRegression
classifier1 = LogisticRegression()
classifier1.fit(x_train,y_train)

In [None]:
y_pred1 = classifier1.predict(x_test)
print(np.concatenate((y_pred1.reshape(len(y_pred1),1),y_test.reshape(len(y_test),1)),1))

In [None]:
from sklearn.metrics import confusion_matrix,accuracy_score
cm1 = confusion_matrix(y_test,y_pred1)
print(cm1)
accuracy_score(y_test,y_pred1)

[[105   3]
 [  3  60]]


0.9649122807017544

## KNN Classification Model

K-Nearest Neighbors (KNN) Classification:

K-Nearest Neighbors is a simple yet powerful classification algorithm used in supervised machine learning. It's a type of instance-based learning, where the algorithm doesn't build an explicit model but instead memorizes the training dataset. KNN makes predictions based on the majority class of its k-nearest neighbors in the feature space.

How KNN Works:

1. Data Collection and Preprocessing: Just like with other machine learning algorithms, you start by collecting and preprocessing your data. Ensure that your features are appropriately scaled and relevant for the classification task.

2. Choosing k: The "k" in KNN represents the number of neighbors that will be considered when making a prediction. Choosing an appropriate k value is crucial. A small k value can be sensitive to noise, while a large k value can make the model less flexible. The choice of k should be based on cross-validation or other techniques.

3. Calculating Distances: KNN works by calculating the distances between data points in the feature space. Common distance metrics include Euclidean distance, Manhattan distance, and cosine similarity. The distance metric you choose depends on the nature of your data.

4. Finding Nearest Neighbors: Once the distances are calculated, the algorithm identifies the k-nearest neighbors to the data point you're trying to classify.

5. Voting or Weighting: For classification tasks, the algorithm looks at the class labels of the k-nearest neighbors. It either counts the votes from each class and assigns the class with the most votes to the data point, or it can apply weighted voting, where closer neighbors have a greater influence on the prediction.

6. Making Predictions: After voting or weighting, the class label with the highest count (or weighted score) among the k-nearest neighbors is assigned to the data point being classified.

Example:

Let's say you're working on a project to classify whether an email is spam or not spam (ham). You've collected a dataset with features like the frequency of certain words and the length of the email. To make a prediction for a new email, the KNN algorithm would:

1. Calculate distances between the new email's feature values and all the labeled emails in the dataset.

2. Identify the k-nearest labeled emails based on the shortest distances.

3. Count the number of spam and ham emails among the k-nearest neighbors.

4. Assign the class label (spam or ham) based on the majority count.

Unique Aspect:

One unique feature of KNN is its simplicity and versatility. It doesn't make strong assumptions about the underlying data distribution, making it suitable for a wide range of problems. Moreover, it can be used for both classification and regression tasks. KNN also showcases the importance of distance metrics and the significance of choosing the right k value. The trade-off between bias and variance (lower k tends to overfit, higher k tends to smooth out data) is a critical aspect to consider.

In summary, K-Nearest Neighbors is a powerful and intuitive classification algorithm that makes predictions based on the majority class of its k-nearest neighbors. Its simplicity and flexibility have made it a popular choice in various domains, from image recognition to recommendation systems. Just remember to preprocess your data, choose an appropriate k value, and select a suitable distance metric for your specific problem.

In [None]:
from sklearn.neighbors import KNeighborsClassifier
classifier2 = KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2)
classifier2.fit(x_train,y_train)

In [None]:
y_pred2 = classifier2.predict(x_test)
print(np.concatenate((y_pred2.reshape(len(y_pred2),1),y_test.reshape(len(y_test),1)),1))

In [None]:
cm2 = confusion_matrix(y_test,y_pred2)
print(cm2)
accuracy_score(y_test,y_pred2)

[[107   1]
 [  6  57]]


0.9590643274853801

##Support Vector Machine Classification Model

Support Vector Machines are powerful and versatile machine learning algorithms commonly used for both classification and regression tasks. They belong to the family of supervised learning algorithms, meaning they require labeled data for training. SVMs are particularly well-suited for cases where the data isn't linearly separable in its original feature space. They work by finding a hyperplane that best separates different classes of data points while maximizing the margin between them.

Important Terms:

1. Hyperplane:

A hyperplane is a decision boundary that separates the data into different classes. In two-dimensional space, it's a line, in three-dimensional space, it's a plane, and so on.

2. Margin:

The margin is the distance between the hyperplane and the nearest data points from each class. SVM aims to maximize this margin.

3. Support Vectors:

These are the data points that lie closest to the decision boundary (hyperplane). They play a critical role in defining the hyperplane.

4. Kernel Trick:

SVM can handle non-linear data by transforming the original feature space into a higher-dimensional space using a kernel function. This allows SVM to find a linear hyperplane in the transformed space.

Advantages:

- SVMs are effective in high-dimensional spaces, making them suitable for tasks like text categorization and image classification.

- They can handle both linear and non-linear data through the use of different kernel functions.

- SVMs are less prone to overfitting when compared to some other complex models.

- They work well when there's a clear margin of separation between classes.

Disadvantages:

- SVMs can be computationally expensive, especially with large datasets.

- Choosing the right kernel function and tuning hyperparameters can be challenging.

- Interpreting the results can be less intuitive compared to simpler models like logistic regression.
- SVMs might struggle with noisy data or overlapping classes.

Applications:

1. Image Recognition: They are used for object recognition, face detection, and image classification.

2. Text Classification: SVMs are effective in sentiment analysis, topic classification, and spam detection.

3. Bioinformatics: They're used in protein classification and gene expression analysis.

4. Finance: SVMs can predict stock prices and detect fraudulent activities.
Medical Diagnosis: They assist in disease classification and prognosis.

Example:

Imagine you have a dataset of flower features (like petal length, petal width, etc.) and you want to classify them into different species (e.g., iris-setosa, iris-versicolor, iris-virginica). By training an SVM model on this data, the algorithm will find the best hyperplane that separates the flowers into their respective classes.

In [None]:
from sklearn.svm import SVC
classifier3 = SVC(kernel='linear')
classifier3.fit(x_train,y_train)

In [None]:
y_pred3 = classifier3.predict(x_test)
print(np.concatenate((y_pred3.reshape(len(y_pred3),1),y_test.reshape(len(y_test),1)),1))

In [None]:
cm3 = confusion_matrix(y_test,y_pred3)
print(cm3)
accuracy_score(y_test,y_pred3)

[[100   8]
 [  3  60]]


0.935672514619883

## Kernal Support Vector Machine Classification Model

A Kernel SVM is an enhancement of the traditional SVM that introduces the concept of kernel functions. These functions allow the SVM to transform the original feature space into a higher-dimensional space where the data might become linearly separable. This transformation enables the SVM to handle complex, non-linear relationships between features.

Important Terms:

1. Kernel Function:

A kernel function computes the dot product between two data points in the transformed space. Common kernel functions include:

2. Linear Kernel: It's equivalent to the standard SVM without any transformation.

3. Polynomial Kernel:

Raises the dot product to a power, creating a polynomial effect.

4. Gaussian (RBF) Kernel:

Utilizes a Gaussian distribution to transform data into a higher-dimensional space.

5. Sigmoid Kernel:

Produces an S-shaped transformation, useful for neural network activation functions.

6. Regularization:

Like in standard SVM, regularization parameter (C) is used to control the trade-off between maximizing the margin and minimizing the classification error. A higher C emphasizes correct classification, while a lower C emphasizes a larger margin.

7. Margin and Support Vectors:

These terms retain their meanings from standard SVM. Support vectors in the transformed space still define the optimal hyperplane.

Advantages:

- Non-linearity Handling: Kernel SVMs can handle complex non-linear data patterns, which standard SVMs cannot do without transformation.

- Flexible Kernel Choices: Different kernel functions can be used based on the problem's nature and requirements.

- Powerful in High Dimensions: Kernel SVMs can work well in high-dimensional spaces, such as text data, where relationships might be intricate.

Disadvantages:

- Hyperparameter Tuning: Selecting the right kernel and its parameters can be challenging and require experimentation.

- Computational Intensity: Kernel transformations can be computationally expensive, especially with large datasets.

- Overfitting: If not properly tuned, kernel SVMs can overfit to noise or small datasets.

Applications:

Kernel SVMs are particularly useful in cases where data has non-linear relationships:

1. Image Segmentation: Identifying objects in images where boundaries are not linear.

2. Text Sentiment Analysis: Analyzing sentiment in text data where the correlation between words might not be linear.

3. Biomedical Data: In areas like genomics, where the relationship between genes might not be linear.

Example:

Consider a scenario where you're working with a dataset of medical test results, and you want to predict whether a patient has a certain disease or not. The relationship between different test results might not be linear, but by using a suitable kernel function (e.g., Gaussian kernel), you can transform the data to a space where a linear boundary is effective for classification.

In [None]:
from sklearn.svm import SVC
classifier4 = SVC(kernel='rbf')
classifier4.fit(x_train,y_train)

In [None]:
y_pred4 = classifier4.predict(x_test)
print(np.concatenate((y_pred4.reshape(len(y_pred4),1),y_test.reshape(len(y_test),1)),1))

In [None]:
cm4 = confusion_matrix(y_test,y_pred4)
print(cm4)
accuracy_score(y_test,y_pred4)

[[106   2]
 [  3  60]]


0.9707602339181286

## Naive Bayes Classification Model

Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem, with the "naive" assumption of independence between features. Despite this assumption, Naive Bayes often performs surprisingly well in practice and is widely used in various applications like text classification, spam filtering, and sentiment analysis.

Important Terms:

1. Bayes' Theorem:

A fundamental concept in probability theory that describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

2. Prior Probability:

The initial probability of a certain class being true, before considering any evidence.

3. Likelihood:

The likelihood measures how likely the observed data is given the class.

4. Posterior Probability:

The probability of a certain class being true after considering the observed data.

5. Conditional Independence:

The "naive" assumption that the features are conditionally independent of each other given the class label.

Advantages:

- Simple and Fast: Naive Bayes is relatively simple to understand and implement. It's particularly fast for training and prediction.

- Handles High Dimensions: Naive Bayes can handle high-dimensional data effectively, making it useful for text classification tasks.

- Works with Small Datasets: Even with limited training data, Naive Bayes can still produce reasonable results.

- Interpretable Results: The probabilistic nature of Naive Bayes provides a clear indication of how certain a prediction is.

Disadvantages:

- Strong Independence Assumption: The naive assumption of feature independence may not hold in all real-world scenarios.

- Sensitive to Feature Choice: The choice of features can heavily influence the model's performance.

- Limited Expressiveness: Due to its simplicity, Naive Bayes might not capture complex relationships in data as well as more advanced models.

- Poor Handling of Rare Events: If a feature's value is not present in the training data for a particular class, Naive Bayes cannot predict that class.

Applications:

Naive Bayes finds applications in text classification and other areas:

1. Text Classification: Naive Bayes is often used for spam filtering, sentiment analysis, and topic categorization in natural language processing.

2. Medical Diagnosis: It can assist in predicting medical conditions based on patient symptoms and test results.

3. Recommendation Systems: Naive Bayes can contribute to building simple recommendation systems.

4. Document Classification: It's used to categorize documents into predefined classes, like news articles or research papers.

Example:

Imagine you're building a spam email filter. By training a Naive Bayes model on a dataset of labeled emails (spam or not spam) with their word frequencies, the model learns the conditional probabilities of words given the class labels. When presented with a new email, the model calculates the probability of it being spam or not spam based on the words it contains and their corresponding probabilities.

Also,
suppose you want to find probability for your marks and there are 2 features:
1. how much you study
2. teachers partiality

Naive bayes assumes there is no relation between the 2 and gives you independant probablities for each features without considering if the other ones might have a say in it. this is useful in some cases.

In [None]:
from sklearn.naive_bayes import GaussianNB
classifier5 = GaussianNB()
classifier5.fit(x_train,y_train)

In [None]:
y_pred5 = classifier5.predict(x_test)
print(np.concatenate((y_pred5.reshape(len(y_pred5),1),y_test.reshape(len(y_test),1)),1))

In [None]:
cm5 = confusion_matrix(y_test,y_pred4)
print(cm5)
accuracy_score(y_test,y_pred5)

[[106   2]
 [  3  60]]


0.9122807017543859

In [None]:
from sklearn.tree import DecisionTreeClassifier
classifier6 = DecisionTreeClassifier(criterion='entropy',random_state=0)
classifier6.fit(x_train,y_train)

In [None]:
y_pred6 = classifier6.predict(x_test)
print(np.concatenate((y_pred6.reshape(len(y_pred6),1),y_test.reshape(len(y_test),1)),1))

In [None]:
cm6 = confusion_matrix(y_test,y_pred6)
print(cm6)
accuracy_score(y_test,y_pred6)

[[105   3]
 [  3  60]]


0.9649122807017544

## Random Forest Classification Model

Random Forest is an ensemble learning algorithm that combines multiple decision trees to create a more robust and accurate classification model. It's designed to reduce overfitting and improve the generalization performance of individual decision trees.

Important Terms:

1. Ensemble Learning:

A technique that combines multiple individual models to create a stronger overall model. In the case of Random Forest, the individual models are decision trees.

2. Decision Tree:

A tree-like structure where each internal node represents a decision based on a certain feature, leading to branches representing the possible outcomes or classes.

3. Bagging (Bootstrap Aggregating):

The process of creating multiple subsets (bags) of the training data by random sampling with replacement. Each subset is then used to train a separate decision tree.

4. Feature Randomization:

For each decision tree, a random subset of features is chosen to determine the best split at each node. This reduces the correlation between trees and improves diversity.

5. Voting:

In classification, each decision tree "votes" for a class, and the class with the most votes becomes the predicted class of the Random Forest.

Advantages:

- Reduces Overfitting: Random Forest reduces overfitting by averaging the predictions of multiple trees, which reduces the risk of any individual tree capturing noise in the data.

- Handles High Dimensions: It can handle a large number of features and high-dimensional data effectively.

- Robust to Outliers: Random Forest is less sensitive to outliers compared to individual decision trees.

- Feature Importance: It provides a measure of feature importance, helping to identify which features contribute the most to the classification.

Disadvantages:

- Complexity: Random Forest can be more complex to understand and interpret compared to single decision trees.

- Computationally Intensive: Training multiple decision trees can be computationally expensive, especially with large datasets.

- Possible Bias: If one class dominates the data, Random Forest may be biased towards that class.

Applications:

- Image Classification: It's used for object recognition, facial recognition, and scene classification.

- Healthcare: Predicting medical conditions, identifying diseases, and prognosis.

- Finance: Credit risk assessment, fraud detection, and stock market prediction.

- Ecology: Species classification based on environmental factors.

Example:

Consider a scenario where you're building a model to classify whether a customer will purchase a product based on their demographic and browsing history. By training a Random Forest model on historical customer data, the ensemble of decision trees can provide a more accurate prediction by considering various features and their interactions.

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier7 = RandomForestClassifier(n_estimators=10,criterion='entropy')
classifier7.fit(x_train,y_train)

In [None]:
y_pred7 = classifier7.predict(x_test)
print(np.concatenate((y_pred7.reshape(len(y_pred7),1),y_test.reshape(len(y_test),1)),1))

In [None]:
cm7 = confusion_matrix(y_test,y_pred7)
print(cm7)
accuracy_score(y_test,y_pred7)

[[104   4]
 [  5  58]]


0.9473684210526315