# Brief overview of OOP and Classes [3] in ML

- Si_ja
- 2019-05-20
- https://github.com/Si-ja

This file will be a bit different. It will take an idea of classes and implement it in using with Machine Learning functions. Particularly KNN and Support Vector Machines (SVMs). The idea will be to make a class that holds an ability to prepare different models of different categories and produce a log of how well we performed. 

We will not build any functions from scratch, rather will use the standard scikit-learn package...however this will lead to weird methods application where we will apply a method from scikit-learn library for a ML model, on top of our methods that we will define in our classes. This is because those ML model preparation methodologies are also based on classes.

It might not be what you want to prepare in general. However, my idea was that i want to have a class that can work with many ML model training approaches and specifically generate logs for me on that data.

In [1]:
class MLs_log:
    """This class will be creating an object that can also proceed few scikit-learn MLs and output a log.
    MLs featured:
    --KNN
    --SVM
    
    TODO: maybe can be easily expanded but the idea is to show how easily we can swap the data."""
    
    def __init__(self, data, targets, fun_app = None, dependency = None, accuray_res = None):
        """We initialize the creation of the class primarily. For now we just need data. Everything else is null."""
        self.data = data                #Data we will be using (initially only it matters
        self.targets = targets          #We also need to know what identification our data has (i.e. target values)
        self.fun_app = fun_app          #What type of an alogorithm we apply
        self.dependency = dependency    #How much data was used for training
        self.accuray_res = accuray_res  #What is the accuracy of our model        

In [2]:
#Let's import iris dataset (we will later experiment with something different...maybe)
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [3]:
#Let's put our data into our newly created class
record_1 = MLs_log(X, y)
#So no error, that's good

In [4]:
#Let's see what our class now has
print("Dataset:\n", record_1.data[:5])
print()
print("Targets:\n", record_1.targets[:5])
print()
print("Type of the function applied:\n", record_1.fun_app)
print()
print("How much data we used for training:\n", record_1.dependency)
print()
print("What is the accuracy of our model:\n", record_1.accuray_res)
#As we can see - some information like our data and target values has actually migrated to the class.
#Other stuff like type of function we used or accuracy of those models - haven't, because we haven't done anything with them.

Dataset:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

Targets:
 [0 0 0 0 0]

Type of the function applied:
 None

How much data we used for training:
 None

What is the accuracy of our model:
 None


In [5]:
#Let's update our class
#So far we only had initializtion, now let's add few function.
#But we won't be adding so much classes, as methods.

class MLs_log:
    """This class will be creating an object that can also proceed few scikit-learn MLs and output a log.
    MLs featured:
    --KNN
    --SVM"""
    
    def __init__(self, data, targets, fun_app = None, dependency = None, accuray_res = None):
        """We initialize the creation of the class primarily. For now we just need data. Everything else is null."""
        self.data = data                #Data we will be using (initially only it matters
        self.targets = targets          #We also need to know what identification our data has (i.e. target values)
        self.fun_app = fun_app          #What type of an alogorithm we apply
        self.dependency = dependency    #How much data was used for training
        self.accuray_res = accuray_res  #What is the accuracy of our model      
        
    #Let's implement a KNN algorithms usage here    
    def ml_knn(self, trainer = 0.33, nn = 3, rand_st = 666):
        """This method will allow us to use the KNN classifier on our prepared data.
        *trainer - can be set to indicate how much data goes into the validation set. Default = 0.33 for validation."
        *nn - number of neighbours you want to set. Default = 3.
        *rand_st - random state of data shuffling. Default = 666"""
        
        #Now we want to import the package that allows us to make the knn classifier and shuffler
        from sklearn.neighbors import KNeighborsClassifier
        from sklearn.model_selection import train_test_split
        
        #we can indicate now that the method we have stored is of KNN 
        self.fun_app = "For this instance a KNN classifier has been used."
        
        #now, since we know how much data will go into our training and validation sets, let's update that value
        self.dependency = 1 - trainer
        
        #let's prepare our data
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        
        #now we can implement the KNN algorithm
        #initiate first the knn classifier
        neigh = KNeighborsClassifier(n_neighbors=nn)
        #input already prepared data into it
        neigh.fit(X_train, y_train)
        #and calculate the accuracy score that we want to memorize
        self.accuray_res = neigh.score(X_test, y_test)
        
        #and let's return everything we need and have if for some reason it needs to be returned
        #otherwise we have updated most of our data
        return self.fun_app, self.dependency, self.accuray_res
        
        #We want to know it's accuracy

In [6]:
#Let's run this one
record_2 = MLs_log(X, y)
#What data do we have now?
print("Dataset:\n", record_2.data[:5])
print()
print("Targets:\n", record_2.targets[:5])
print()
print("Type of the function applied:\n", record_2.fun_app)
print()
print("How much data we used for training:\n", record_2.dependency)
print()
print("What is the accuracy of our model:\n", record_2.accuray_res)
#Same data, as we have not done anything yet to it. So let's actually.

Dataset:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

Targets:
 [0 0 0 0 0]

Type of the function applied:
 None

How much data we used for training:
 None

What is the accuracy of our model:
 None


In [7]:
#We will not pass any arguments for this one, just let's see what happens
record_2.ml_knn()
#and now also let's run the previous small report
print("Dataset:\n", record_2.data[:5])
print()
print("Targets:\n", record_2.targets[:5])
print()
print("Type of the function applied:\n", record_2.fun_app)
print()
print("How much data we used for training:\n", record_2.dependency)
print()
print("What is the accuracy of our model:\n", record_2.accuray_res)
#Cool, we actually have an updated report more or less

Dataset:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

Targets:
 [0 0 0 0 0]

Type of the function applied:
 For this instance a KNN classifier has been used.

How much data we used for training:
 0.6699999999999999

What is the accuracy of our model:
 1.0


In [8]:
#Let's expand our class even further as we planned with SVM
class MLs_log:
    """This class will be creating an object that can also proceed few scikit-learn MLs and output a log.
    MLs featured:
    --KNN
    --SVM"""
    
    def __init__(self, data, targets, fun_app = None, dependency = None, accuray_res = None):
        """We initialize the creation of the class primarily. For now we just need data. Everything else is null."""
        self.data = data                #Data we will be using (initially only it matters
        self.targets = targets          #We also need to know what identification our data has (i.e. target values)
        self.fun_app = fun_app          #What type of an alogorithm we apply
        self.dependency = dependency    #How much data was used for training
        self.accuray_res = accuray_res  #What is the accuracy of our model      
        
    def ml_knn(self, trainer = 0.33, nn = 3, rand_st = 666):
        """This method will allow us to use the KNN classifier on our prepared data.
        *trainer - can be set to indicate how much data goes into the validation set. Default = 0.33 for validation."
        *nn - number of neighbours you want to set. Default = 3.
        *rand_st - random state of data shuffling. Default = 666"""
        
        from sklearn.neighbors import KNeighborsClassifier
        from sklearn.model_selection import train_test_split
        
        self.fun_app = "For this instance a KNN classifier has been used."
        self.dependency = 1 - trainer
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        neigh = KNeighborsClassifier(n_neighbors=nn)
        neigh.fit(X_train, y_train)
        self.accuray_res = neigh.score(X_test, y_test)
        return self.fun_app, self.dependency, self.accuray_res
        
    def ml_svm(self, trainer = 0.33, rand_st = 666):
        """This will be the method for SVM.
        *trainer - can be set to indiate how much data goes into the validation set. Default = 0.33 for validation.
        *rand_st - random state of data shuffling. Default = 666."""
        from sklearn.svm import SVC
        from sklearn.model_selection import train_test_split
        
        #naturally you can add more criteria to this to make it more appealing and useful to work with
        #as SVMs have more setting from sklearn that can be set
        self.fun_app = "For this instance a SVM classifier has been used."
        self.dependency = 1 - trainer
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        clf = SVC(gamma='auto')
        clf.fit(X_train, y_train) 
        self.accuray_res = clf.score(X_test, y_test)
        
        return self.fun_app, self.dependency, self.accuray_res        

In [9]:
#Let's do the same routine
record_3 = MLs_log(X, y)
#and use the svm method right away
record_3.ml_svm()
print("Dataset:\n", record_3.data[:5])
print()
print("Targets:\n", record_3.targets[:5])
print()
print("Type of the function applied:\n", record_3.fun_app)
print()
print("How much data we used for training:\n", record_3.dependency)
print()
print("What is the accuracy of our model:\n", record_3.accuray_res)
#And it updates as well

Dataset:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

Targets:
 [0 0 0 0 0]

Type of the function applied:
 For this instance a SVM classifier has been used.

How much data we used for training:
 0.6699999999999999

What is the accuracy of our model:
 1.0


In [10]:
#And let's update it a little bit more to get more useful details out this class for the reporing
class MLs_log:
    """This class will be creating an object that can also proceed few scikit-learn MLs and output a log.
    MLs featured:
    --KNN
    --SVM"""
    
    def __init__(self, data, targets, fun_app = None, dependency = None, accuray_res = None):
        """We initialize the creation of the class primarily. For now we just need data. Everything else is null."""
        self.data = data                #Data we will be using (initially only it matters
        self.targets = targets          #We also need to know what identification our data has (i.e. target values)
        self.fun_app = fun_app          #What type of an alogorithm we apply
        self.dependency = dependency    #How much data was used for training
        self.accuray_res = accuray_res  #What is the accuracy of our model      
          
    def ml_knn(self, trainer = 0.33, nn = 3, rand_st = 666):
        """This method will allow us to use the KNN classifier on our prepared data.
        *trainer - can be set to indicate how much data goes into the validation set. Default = 0.33 for validation."
        *nn - number of neighbours you want to set. Default = 3.
        *rand_st - random state of data shuffling. Default = 666"""

        from sklearn.neighbors import KNeighborsClassifier
        from sklearn.model_selection import train_test_split
        self.fun_app = "KNN Classifier model is prepared."
        self.dependency = 1 - trainer
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        neigh = KNeighborsClassifier(n_neighbors=nn)
        neigh.fit(X_train, y_train)
        self.accuray_res = neigh.score(X_test, y_test)
        return self.fun_app, self.dependency, self.accuray_res
        
    def ml_svm(self, trainer = 0.33, rand_st = 666):
        """This will be the method for SVM.
        *trainer - can be set to indiate how much data goes into the validation set. Default = 0.33 for validation.
        *rand_st - random state of data shuffling. Data = 666."""
        from sklearn.svm import SVC
        from sklearn.model_selection import train_test_split
        
        self.fun_app = "SVM classifier has been prepared."
        self.dependency = 1 - trainer
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        clf = SVC(gamma='auto')
        clf.fit(X_train, y_train) 
        self.accuray_res = clf.score(X_test, y_test)
        return self.fun_app, self.dependency, self.accuray_res        
    
    def report(self):
        """This function will generate data for us based on which model was ran.
        *model - knn or svm"""
        import numpy as np
        
        #we will give a blank report if our data is not prepared at all.
        if self.fun_app == None:
            print("----------------------------------------------------------")
            print("At the current moment nothing was done with the data.")
            print("----------------------------------------------------------")
        
        else:
            print("----------------------------------------------------------")
            print("The state of your model: {}".format(self.fun_app))
            print("----------------------------------------------------------")
            print("You used {}% of original data to train your classifier.".format(np.round(self.dependency,2)*100))
            print("----------------------------------------------------------")
            print("On the training data your classifier showed {}% accuracy.".format(np.round(self.accuray_res,4)*100))
            print("----------------------------------------------------------")

In [11]:
#Let's do the same routine
record_4 = MLs_log(X, y)
record_4.ml_svm()
print("Dataset:\n", record_4.data[:5])
print()
print("Targets:\n", record_4.targets[:5])
print()
print("Type of the function applied:\n", record_4.fun_app)
print()
print("How much data we used for training:\n", record_4.dependency)
print()
print("What is the accuracy of our model:\n", record_4.accuray_res)

Dataset:
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

Targets:
 [0 0 0 0 0]

Type of the function applied:
 SVM classifier has been prepared.

How much data we used for training:
 0.6699999999999999

What is the accuracy of our model:
 1.0


In [12]:
#Naturally the data got prepared, but can we generate a report for it
record_4.report()
#Bam, it fully works.

----------------------------------------------------------
The state of your model: SVM classifier has been prepared.
----------------------------------------------------------
You used 67.0% of original data to train your classifier.
----------------------------------------------------------
On the training data your classifier showed 100.0% accuracy.
----------------------------------------------------------


In [13]:
#What if we never assign any methods or preparation of our data would not happen
record_5 = MLs_log(X, y)
record_5.report()
#Again, also good, showing we have nothing to work with. 

----------------------------------------------------------
At the current moment nothing was done with the data.
----------------------------------------------------------


In [14]:
#Let's update our class to a higher degree by forcing it to have the model saved and reused
#As so far we only had numbers of what our model can do
#Let's bring practicality into it
class MLs_log:
    """This class will be creating an object that can also proceed few scikit-learn MLs and output a log.
    MLs featured:
    --KNN
    --SVM
    
    TODO: maybe can be easily expanded but the idea is to show how easily we can swap the data."""
    
    def __init__(self, data, targets, fun_app = None, dependency = None, accuray_res = None, model = None):
        """We initialize the creation of the class primarily. For now we just need data. Everything else is null."""
        self.data = data                #Data we will be using (initially only it matters
        self.targets = targets          #We also need to know what identification our data has (i.e. target values)
        self.fun_app = fun_app          #What type of an alogorithm we apply
        self.dependency = dependency    #How much data was used for training
        self.accuray_res = accuray_res  #What is the accuracy of our model   
        self.model = model              #This will be the model that we will be able to re-use for the future
          
    def ml_knn(self, trainer = 0.33, nn = 3, rand_st = 666):
        """This method will allow us to use the KNN classifier on our prepared data.
        *trainer - can be set to indicate how much data goes into the validation set. Default = 0.33 for validation."
        *nn - number of neighbours you want to set. Default = 3.
        *rand_st - random state of data shuffling. Default = 666"""
        
        from sklearn.neighbors import KNeighborsClassifier
        from sklearn.model_selection import train_test_split

        self.fun_app = "KNN Classifier model is prepared."
        self.dependency = 1 - trainer
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        neigh = KNeighborsClassifier(n_neighbors = nn)
        neigh.fit(X_train, y_train)
        
        #Let's save our model
        self.model = neigh
        self.accuray_res = neigh.score(X_test, y_test)
        return self.fun_app, self.dependency, self.accuray_res, self.model
        
    def ml_svm(self, trainer = 0.33, rand_st = 666):
        """This will be the method for SVM.
        *trainer - can be set to indiate how much data goes into the validation set. Default = 0.33 for validation.
        *rand_st - random state of data shuffling. Data = 666."""
        from sklearn.svm import SVC
        from sklearn.model_selection import train_test_split
        
        self.fun_app = "SVM classifier has been prepared."
        self.dependency = 1 - trainer
        X_train, X_test, y_train, y_test = train_test_split(self.data, self.targets, test_size = trainer, random_state = rand_st)
        clf = SVC(gamma='auto')
        clf.fit(X_train, y_train) 
        
        #And save it here as well
        self.model = clf
        self.accuray_res = clf.score(X_test, y_test)    
        return self.fun_app, self.dependency, self.accuray_res, self.model  
    
  
    def report(self):
        """This function will generate data for us based on which model was ran.
        *model - knn or svm"""
        import numpy as np
        if self.fun_app == None:
            print("----------------------------------------------------------")
            print("At the current moment nothing was done with the data.")
            print("----------------------------------------------------------")
        
        else:
            print("----------------------------------------------------------")
            print("The state of your model: {}".format(self.fun_app))
            print("----------------------------------------------------------")
            print("You used {}% of original data to train your classifier.".format(np.round(self.dependency,2)*100))
            print("----------------------------------------------------------")
            print("On the training data your classifier showed {}% accuracy.".format(np.round(self.accuray_res,4)*100))
            print("----------------------------------------------------------")

In [15]:
#We initiatie a blank object
record_6 = MLs_log(X, y)
record_6.report()
print()
#We train our model
record_6.ml_knn()
record_6.report()
#But we need a function to use it, right?
#WRONG! We have it saved in the object, we can just call it, by knowing that it is saved in the .model
#There, you can view it.
print(record_6.model) #<- our model is saved and can be reused

----------------------------------------------------------
At the current moment nothing was done with the data.
----------------------------------------------------------

----------------------------------------------------------
The state of your model: KNN Classifier model is prepared.
----------------------------------------------------------
You used 67.0% of original data to train your classifier.
----------------------------------------------------------
On the training data your classifier showed 100.0% accuracy.
----------------------------------------------------------
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=3, p=2,
           weights='uniform')


In [16]:
#So it looks a bit messy in my opinion but it works:
#X <- from our previously prepared iris dataset
record_6.model.predict(X[48:52])
#Now the method is messy, because our model is saved in a classes variable. In total it's an object.
#But itself this classifying possibility comes from the scikit learn package.
#If it's fully yours - maybe you can make it more convenient, or patch it in into a new function and call a method for that.

#You don't really need classes all the time. Use functions as well, if that's enough to do your work. but for me,
#Currently there are 6 objects that were generated, and more variables in them. I have much more information stored,
#But here for me I only wanted reports data. 

#Always think what satisifes your needs. For reports like here I thought working with classes might be more beneficial
#Than generating a report every time a classifier is trained. 

array([0, 0, 1, 1])

In [17]:
#To show a different version of a report - with bad accuracy and different split:
record_7 = MLs_log(X, y)
record_7.ml_knn(0.95, 5) #first shows how much data goes into the training of original, and 10 - number of neighbours used
record_7.report()

----------------------------------------------------------
The state of your model: KNN Classifier model is prepared.
----------------------------------------------------------
You used 5.0% of original data to train your classifier.
----------------------------------------------------------
On the training data your classifier showed 31.47% accuracy.
----------------------------------------------------------


In [18]:
#And let me show this works with different data, not just iris:
#targets need to be classes though
from sklearn.datasets import load_diabetes
diab = load_diabetes()
X_di = diab.data
y_di = diab.target
#and as you can see, we do not have classes, but some data points (maybe ols would deal better with this)
print(y_di[:5])

[151.  75. 141. 206. 135.]


In [19]:
record_8 = MLs_log(X_di, y_di)
record_8.report()
print()
record_8.ml_knn()
record_8.report()
#Not the best data to show you its' work, but hey, it works.

----------------------------------------------------------
At the current moment nothing was done with the data.
----------------------------------------------------------

----------------------------------------------------------
The state of your model: KNN Classifier model is prepared.
----------------------------------------------------------
You used 67.0% of original data to train your classifier.
----------------------------------------------------------
On the training data your classifier showed 0.6799999999999999% accuracy.
----------------------------------------------------------
