# Object Oriented Programing

- in DATA1030, we mostly used functions or just wrote lines of code into cells as a way of organization
    - most software engineers would be outraged on how badly we organized code in 1030 :)
    - it's an OK start if you are new to python and coding in general
    - but it is time to do much much better!
- OOP is a method of structuring your code into reusable units called classes
- a class is a template and it does two things:
    - it describes how information should be organized
    - a class has methods which performs operations or interacts with other classes or outside info/data
- when you fill up the template, you create an object
    - an object is an instance of a class
    

### Most python packages are organized around the idea of OOP
- sklearn, pandas, numpy, matplotlib, tensorflow, keras, etc. are all object oriented
- sklearn's LogisticRegression is a class
    - the 'data' in the LogisticRegression class are the hyperparameters
    - some methods are .fit(), .predict(), .score(), etc. which all take outside info (e.g., some X and y)
    - once you set the hyperparameters of a logistic regression model, you create a logistic regression object
    - you can create as many objects as you want!
- pandas's DataFrame is a class
    - it describes that tabular data in a DataFrame should have rows and columns, row and column indices, etc
    - some methods are .head(), .shape, etc
    - once you read in csv/excel/sql data into a pd.DataFrame, you create a DataFrame object
    - you can create as many objects as you want!

In [4]:
class Pokemon:
    def __init__(self,name):
        '''
        initialize the object
        arguments passed as input will be bound to the class
        __init__ can also contain other commands
        self is constructor, contains all the attributes of the class
        '''
        self.name = name
        
    def say_hi(self):
        '''
        the first argument passed to all methods is self
        the method can access (and modify if necessary) the arguments
        '''
        print('Hi, I am '+self.name+'.')
        if self.name == 'Pikachu':
            print('Pika-pika-chuuu!')
            


p1 = Pokemon('Pikachu') # one instance of the Person class, an object
p1.say_hi()
    
p2 = Pokemon('Charmander') # another instance of the Person class, another object
p2.say_hi()


Hi, I am Pikachu.
Pika-pika-chuuu!
Hi, I am Charmander.


## Typical class structure in this course

In [None]:
class ML_algorithm:
    '''
    The class of a supervised ML algorithm, a mathematical function which converts feature values into prediction.
    It minimizes a loss function using some optimization algorithm in train.
    It uses the trained model to provide predictions.
    '''
    def __init__(self, hyperparameter1, hyperparameter2, ...):
        '''
        the attributes of the model
        '''
        # hyperparameters like regularization, kernel width, max depth, etc.
        # hyperparameters are not updated by the methods of the class!
        # when you do cross-validation, you'd create a new object for each hyperparameter combination
        self.hyperparameter1 = hyperparameter1
        self.hyperparameter2 = hyperparameter2
        ...
        # you would initialize any other model parameters here (e.g., weights in linear and logistic regression)
        # these parameters are updated by .train() to minimize the loss
        self.parameters = ...
        
    def train(self, X, Y):
        '''
        Trains the ML model by finding the optimal set of parameters using an optimization algorithm.
        In sklearn .train() is often called .fit()
        @params:
            X: 2D Numpy array where each row contains an example, padded by 1 column for the bias
            Y: 1D Numpy array containing the corresponding values for each example
        @return:
            None - self.parameters will be updated, nothing needs to be returned
        '''
        # [TODO]


    def predict(self, X):
        '''
        Returns predictions of the model on a set of examples X.
        @params:
            X: a 2D Numpy array where each row contains an example, padded by 1 column for the bias
        @return:
            A 1D Numpy array with one element for each row in X containing the predicted value.
        '''
        # [TODO]
        return y_pred


    def loss(self, X, Y):
        '''
        Returns the loss function on some dataset (X, Y).
        @params:
            X: 2D Numpy array where each row contains an example, padded by 1 column for the bias
            Y: 1D Numpy array containing the corresponding values for each example
        @return:
            A float number which is the loss of the model on the dataset
        '''
        # [TODO]
        return loss

# Pros and cons of OOP
- Pros:
    - code structure is nice and clean, easy to maintain, develop, debug
    - code is resuable
    - secure with respect to data
        - any data provided to an object only lives within the object and it can only be accessed by the methods of the corresponding class
- Cons:
    - code base can be larger than other approaches
    - it takes some time to get used to it
    - OOP is not suitable for all problems (it can be slower than traditional approaches)

## When not to use OOP?
- retail example
- you work with the log files of a retail company
- each row in the log describes a customer buying a certain product
- you have the genius idea to write a customer class to handle the data

In [None]:
class customer:
    """
    a class to collect all data on a customer and to calculate some stats
    
    """
    def __init__(self,customer_ID,DataFrame):
        self.customer_ID = customer_ID
        self.data = DataFrame[DataFrame['customer'] == self.customer_ID]
        
    def nr_products_bought(self):
        return self.data.shape(0) # return number of rows
    
    def avg_price(self):
        return self.data['price'].mean()
    
# open the log file
df = pd.read_csv('log_file.csv')
customers = []
for customer_ID in customer_IDs:
    customer = customer(customer_ID,df) # we create a customer object
    customers.append(customer) # store it in a list

- the approach above is very slow...
- sometimes it is better to manipulate data on all customers at once


# Time for our first Mud card!