Scikit learn Classification Examples 

Classification is a supervised learning technique in which we train a model on labeled data to make predictions on unseen instances. The labeled data consists of input variables (features) and output variables (labels or classes). The goal is to learn a mapping function that can accurately predict the class labels of new instances.

When it comes to classification, there are two main types: binary classification and multiclass classification. In binary classification, the target variable has only two classes, such as spam or not spam. In multiclass classification, the target variable can have more than two classes, like classifying images into different categories.

Popular Classification Algorithms in Scikit-Learn
Scikit-Learn offers a rich collection of classification algorithms that can be used for different types of data and problem domains. Let’s explore some of the popular ones:

Data



Scikit-Learn Start with some data, you finally give it to the model and the model will learn from it then you will be able to prediction that is the general flow and check more specifically What is meant by giving data to the model. typically if we have a dataset that is useful for prediction then we split the data into two parts. One part is called X and the other part is called Y
X represents everything that is used for prediction and Y is the prediction in which I am interested. The use case of this is house price prediction, Y contains the house prices and X is the information about the house, When you split data in this fashion the next thing you will do to pass it to the model.The model job to learn the pattern such that we can predict Y using X

![image.png](attachment:image.png)

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

In [3]:
df = pd.read_csv("HistoricalPrices.csv")

In [4]:
# Train the dataset using scikit-learn
iris = load_iris()

In [5]:
x= iris.data
y=iris.target

In [6]:
# x and y = array or matrices need to be split
# test_size => if int then no. of test samples and if float => fraction of test samples
# train_size => same as test_size
# random_state => data shuffle before splitting
# shuffle => boolean state
# stratify => to form data set in layers
# train_test_split => returns list of split input data.
from sklearn.model_selection import train_test_split






# Example for train test 
data = [20, 4, 12, 9, 0, 10]
labels = ["A", "B", "B", "A", "C", "A"]

# Splitting list equally
train, test = train_test_split(data, test_size=0.5)

print("==== Splitting train and test ====")
print("=> Train <= ",train)
print("=> Test <= ",test)

# Splitting list by 20%

train_1, test_1 = train_test_split(data, test_size=0.2)
print("==== Splitting train and test by 20% ====")
print("=> Train <= ",train_1)
print("=> Test <= ",test_1)

# multiple list split 
train_data,test_data,train_label,test_label = train_test_split(data,labels)
print("=== Multiple list split ===")
print("=> Train data <= ",train_data)
print("=> Test data <= ",test_data)
print("=> Train label <= ",train_label)
print("=> Test label <= ",test_label)

==== Splitting train and test ====
=> Train <=  [0, 12, 20]
=> Test <=  [4, 10, 9]
==== Splitting train and test by 20% ====
=> Train <=  [10, 9, 20, 4]
=> Test <=  [12, 0]
=== Multiple list split ===
=> Train data <=  [12, 10, 20, 9]
=> Test data <=  [4, 0]
=> Train label <=  ['B', 'A', 'A', 'A']
=> Test label <=  ['B', 'C']


In [7]:
x_train, x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=1)

In [8]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(x_train,y_train)

In [9]:
y_pred = knn.predict(x_test)

Logistic regression

In [10]:
from sklearn.datasets import load_diabetes
x,y = load_diabetes(return_X_y=True)

In [11]:
# Logistic Regression Algorithm
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.25)
clf = LogisticRegression(C=1.0,solver="lbfgs",multi_class="ovr")
accuracy = clf.fit(X_train,y_train)
print("====== Accuracy ======",accuracy)



In [12]:
# Predicting new array
import numpy as np 
new_array = np.array([[5.1,3.5,1.4,0.2]])
prediction = clf.predict(new_array)
print("===== Prediction ======",prediction)



In [13]:
# Artificial neural networks(ANN)
import sklearn.neural_network as skl
natural_network = skl.MLPClassifier(hidden_layer_sizes=(10,10),activation="relu")
natural_network.fit(X_train,y_train)
new_prediction = natural_network.predict(X_test)
print("===== New Pred =====",new_prediction)

===== New Pred ===== [2 2 2 2 1 0 2 1 2 2 0 0 2 2 2 2 2 1 1 0 0 2 0 2 0 0 0 2 2 1 1 2 2 2 2 1 0
 2]




In [14]:
# Support Vector Machine (SVM)
# Uses SVC class
import sklearn.svm 
svm = sklearn.svm.SVC(kernel="linear",C=1.0)
svm.fit(X_train,y_train)
x_prediction = svm.predict(X_test)
print("===== x prediction ======",x_prediction)

 2]


In [15]:
# Naive Bayes (NB)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
iris = load_iris()
X_train,X_test,Y_train,Y_test = train_test_split(iris.data,iris.target,test_size=0.5)
clf = GaussianNB()
clf.fit(X_train,Y_train)
predict = clf.predict(X_test)
accuracy= accuracy_score(Y_test,predict)
print("===== Accuracy =====",accuracy)

===== Accuracy ===== 1.0


In [16]:
# ExtraTreesClassifier
# Returns an array of predicted class labels.
from sklearn.ensemble import ExtraTreesClassifier
clf = ExtraTreesClassifier(n_estimators=100)
clf.fit(X_train,Y_train)
prediction = clf.predict(X_test)
print("===== prediction ===== ",prediction)

===== prediction =====  [1 2 0 2 2 0 0 1 0 1 2 2 0 1 0 1 2 2 1 1 2 2 2 2 0 1 0 0 1 0 0 0 0 0 1 2 1
 2 1 2 0 2 1 0 2 1 0 0 1 2 1 0 1 1 2 2 0 2 2 0 2 2 2 0 0 0 2 1 2 2 0 2 0 1
 2]


In [17]:
# Gradient Boosting Trees (GBT)
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
X_train = pd.read_csv("train.csv")
Y_train = X_train['target']
X_test=pd.read_csv("test.csv")
clf = GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train,Y_train)
prediction = clf.predict(X_test)
print(" ===== prediction ===== ",prediction)
accuracy = accuracy_score(Y_test,prediction)
print("Accuracy" , accuracy)

ValueError: could not convert string to float: ' State-gov'