Imagine you are starting a ride share business. Let's call it fuber. All rides generally have the same basic information. They have a driver, passenger(s), origin, destination, car information, and price. You plan on having a pretty large client base, so, you could imagine having many rides being taken every day.

So, you will need to have a way to bundle up and operate on all the above information about a particular ride. And as the business would take off, you are going to need to create rides over and over.

Here is what our Ride class would look like in Python:

### `class Ride:`

    #code for distance attribute

    #code for time attribute

    #code for price attribute

    #code to start a ride

    #code to end a ride

In [96]:
class Ride:
    pass

In [97]:
first_ride = Ride()
print(first_ride)

<__main__.Ride object at 0x000002BACCF2B0A0>


In [98]:
second_ride = Ride()
third_ride = Ride()

print(Ride)
print(first_ride)
print(second_ride)
print(third_ride)

<class '__main__.Ride'>
<__main__.Ride object at 0x000002BACCF2B0A0>
<__main__.Ride object at 0x000002BACCF46490>
<__main__.Ride object at 0x000002BACCEEB1F0>


## Instance Methods

So, let's take the example of a Dog class. What are the things that all dogs do? They can bark, beg to go for a walk, chase squirrels, etc. When you create a new dog instance object, the dog should be able to automatically bark, beg, and chase squirrels.

Let's see how you would create a single dog, rex, and get him to bark. First define a Dog class:


In [99]:
class Dog:
    pass

In [100]:
rex = Dog()
print(rex)

<__main__.Dog object at 0x000002BACCF2B730>


In [101]:
#can rex  bark?
rex.bark()

AttributeError: 'Dog' object has no attribute 'bark'

Okay, here you have an instance of the Dog class, but as you can see rex cannot bark yet. Let's see if we can fix that. We said that instance methods are basically functions that are callable attributes, like functions, of an instance object. So, let's write a function that returns the string "bark!", and assign it as an attribute of rex.

Note: Dictionary object attributes are accessed using the bracket ([ ]) notation. However, instance object attributes are accessed using the dot (.) notation.

In [None]:
def make_a_bark():
    return 'ruff ruff!'

rex.bark= make_a_bark
rex.bark

<function __main__.make_a_bark()>

In [None]:
#Now you can make rex bark by calling the .bark() method:
rex.bark()

'ruff ruff!'

## Define an instance method

In [None]:
class Dog:

    def bark():
        return "I'm an instance method! Oh and... ruff ruff!"

In [None]:
fluffy = Dog()
fluffy.bark

<bound method Dog.bark of <__main__.Dog object at 0x000001590F1C27F0>>

## `self` to the rescue!

As with any function or method, you can name the parameters however you want, but the convention in Python is to name this first parameter of all method classes as self, which makes sense since it is the object itself on which you are calling the method.

In [None]:
class Dog:
    def bark(self):
        return 'I am actually going to bark this time. bark!!'
    
xoxo = Dog()
xoxo.bark()

'I am actually going to bark this time. bark!!'

In [None]:
class Dog:
    def bark(self):
        return 'I am actually going to bark this time. bark!!'
    
    def who_am_i(self):
        return self

In [None]:
fido = Dog()
print("1.", fido.who_am_i()) #check return value of method
print("2.", fido) #comparing return of the fido instance object

1. <__main__.Dog object at 0x000001590F552D60>
2. <__main__.Dog object at 0x000001590F552D60>


## Using `self`

 Let's use the example of a Person class. A class produces instance objects, which in turn are just pieces of code that bundle together attributes like descriptors and behaviors. For example, an instance object of a Person class can have descriptors like height, weight, age, etc. and also have behaviors such as saying_hello, eat_breakfast, talk_about_weather, etc.

In [None]:
class Person():
    
    def say_hello(self):
        return 'Hi, how are you?'
        
    def eat_breakfast(self):
        self.hungry = False
        return 'Yum that was delish!'

ems = Person()
print('1.', vars(ems))
ems.name = 'Emilly'
ems.age = 22
ems.weight = 'None of your business!'
print('2.', ems.say_hello())
print('3.', ems.eat_breakfast())
print('4.', vars(ems))

1. {}
2. Hi, how are you?
3. Yum that was delish!
4. {'name': 'Emilly', 'age': 22, 'weight': 'None of your business!', 'hungry': False}


Let's say it is gail's birthday. Gail is 29 and she is turning 30. To ensure the instance object reflects that you can define an instance method that updates gail's age:

In [None]:
class Person:
    def  happy_birthday(self):
        self.age +=1
        return f"Happy birthday to {self.name} (aka ME!) Can't believe i am 30!!"

gail = Person()
gail.name = 'Gail'
gail.age = 22

print('1. ', gail.age)
print('2. ', gail.happy_birthday())
print('3. ', gail.age)

1.  22
2.  Happy birthday to Gail (aka ME!) Can't believe i am 30!!
3.  23


## Calling Instance Methods on self
Another very important behavior people have is eating. It is something that we all do and it helps prevent us from getting hangry, or angry because we're hungry.

In [None]:
class Person:
    def eat_sandwhich(self):
        if(self.hungry):
            self.relieve_hunger()
            return "Wow, that really hit the spot! I am so full, but more importantly, I'm not hangry anymore!"
        else:
            self.drink_water()
            return "Oh, I don't think I can eat another bite, i'll just drink water. Thank you, though!"
    
    def relieve_hunger(self):
        print('Hunger is being relieved')
        self.hungry=False

    def drink_water(self):
        print("Drinking water to stay hydrated.")

the_snail = Person()
the_snail.name = 'the Snail'
the_snail.hungry = True

print('1. ', the_snail.hungry)
print('2. ', the_snail.eat_sandwhich())
print('3. ', the_snail.hungry)
print('4. ', the_snail.eat_sandwhich())

1.  True
Hunger is being relieved
2.  Wow, that really hit the spot! I am so full, but more importantly, I'm not hangry anymore!
3.  False
Drinking water to stay hydrated.
4.  Oh, I don't think I can eat another bite, i'll just drink water. Thank you, though!


In the cell below define a Driver class.

For this class, create a method called greet_passenger(), which returns the string Hello! I'll be your driver today. My name is  followed by that driver's first name and last name (i.e. Hello! I'll be your driver today. My name is John Doe). (Be sure to keep in mind that the driver's name will be stored under two separate attributes: first and last.)

 Now create an instance of your driver class. Then, create the following attributes for your instance:

first - the first name of the driver. Set it to Matthew.

last - the last name of the driver. Set it to Mitchell.

miles_driven - the number of miles driven by the driver. Set it to 100.

rating - the driver's rating. Set it to 4.9

In [None]:
# Define Driver Class here with properties for each instance variable
class Driver:
    
    def greet_passenger(self):
        return f"Hello! I'll be your driver today. My name is {self.first} {self.last} "
    
Mathew_Mitchell = Driver()
Mathew_Mitchell.first = 'Mathew'
Mathew_Mitchell.last = 'Mitchell'
Mathew_Mitchell.miles_driven = 100
Mathew_Mitchell.rating = 4.9

print('1. ', Mathew_Mitchell.miles_driven)
print('2. ', Mathew_Mitchell.rating)

Mathew_Mitchell.greet_passenger() 

1.  100
2.  4.9


"Hello! I'll be your driver today. My name is Mathew Mitchell "

## Object Initialization
### Introducing `__init__`

The `__init__` method allows classes to have default behaviors and attributes.

By using the `__init__` method, you can initialize instances of objects with defined attributes. Without this, attributes are not defined until other methods are called to populate these fields, or you set attributes manually. This can be problematic. For example, if you had tried to call the `greet_passeneger()` method from the previous lab without first setting the driver's first and last attributes, you would have encountered an error. Here's another example to demonstrate:

In [None]:
class Person:
    def set_name(self, name):
        self.name = name
    def set_job(self, job):
        self.job = job

In [None]:
bob = Person()

In [None]:
#If we try to access an attribute before setting it we'll get an error.
bob.name

AttributeError: 'Person' object has no attribute 'name'

In [None]:
bob.set_name('Bob')
bob.name

'Bob'

In [None]:
#To avoid errors such as this, you can use the __init__ method to set attributes on instantiation.
class Person:
    def __init__(self, name, job):
        self.name = name
        self.job = job

bob = Person('Bob', 'Carpenter')
print(bob.name)
print(bob.job)

Bob
Carpenter


## Setting default arguments in the `__init__` method

To circumvent this, we can also define `__init__` to have default arguments. This allows parameters to be specified if desired but are not required.

In [None]:
class Person:
    def __init__(self, name=None, job=None):
        self.name = name
        self.job = job

girl = Person()
print(girl.name)
print(girl.job)

print('\n')
murugi = Person(job = 'Analyst')
print(murugi.name)
print(murugi.job)

print('\n')
ems = Person('Murugi', 'Analyst')
print(ems.name)
print(ems.job)

None
None


None
Analyst


Murugi
Analyst


## Inheritance
You can use inheritance to create relationships between Superclasses and Subclasses to further save you from writing redundant code!

#### Writing D.R.Y. code
Assume for a second that you are going to build a data model around the most popular band of the last century, the Beatles!



Our First Subclass

The Guitarist class and the Bass_Guitarist class are extremely similar. In fact, we could say that bass guitarists are a special case of guitarists. With a few notable exceptions, the Bass_Guitarist class is generally going to be more similar than different to the Guitarist class.

In Python, we can make Bass_Guitarist a subclass of Guitarist. This will mean that the Bass_Guitarist class will inherit the attributes and methods of its superclass, Guitarist. This will save us a lot of redundant code!

In [None]:
class Guitarist(object):
    
    def __init__(self):
        self.name = 'George'
        self.role = 'Guitarist'
        self.instrument_type = 'Stringed Instrument'
        
    def tune_instrument(self):
        print('Tune the strings!')
        
    def practice(self):
        print('Strumming the old 6 string!')
        
    def perform(self):
        print('Hello, New  York!')
        
class Bass_Guitarist(Guitarist):
    
    def __init__(self):
        super().__init__()
        self.name = 'Paul'
        self.role = 'Guitarist'
        
    def practice(self):
        print('I play the Seinfeld Theme Song when I get bored')
        
    def perform(self):
        super().perform()
        print('Thanks for coming out!')

In [None]:
george = Guitarist()
paul = Bass_Guitarist()

print(george.instrument_type)
print(paul.instrument_type)

Stringed Instrument
Stringed Instrument


In [None]:
george.tune_instrument()
paul.tune_instrument()

Tune the strings!
Tune the strings!


In [None]:
george.practice()
paul.practice()

Strumming the old 6 string!
I play the Seinfeld Theme Song when I get bored


In [None]:
george.perform()
paul.perform()

Hello, New  York!
Hello, New  York!
Thanks for coming out!



Take a look at the way the classes were created and the corresponding outputs in the cells above.  A couple of things stand out:

1.  The `.tune_instrument()` method was never declared for class `Bass_Guitarist()`, but the `paul` instance still has access to this method.  

2. The `.instrument_type` attribute was never set for the `Bass_Guitarist()` class, but the `paul` instance nonetheless has that attribute, and the attribute has the same value as it had in the `Guitarist` class. This is because it inherited it from the `Guitarist()` calls through the `super().__init__()` method first.  

3. With inheritance, you can still change or overwrite specific attributes or methods. For example, in the `Bass_Guitarist()` class, the `.practice()` and `.perform()` methods, as well as the values for the `.name` and `.role` attributes all differ from the inherited `Guitarist()` class.

### Using `.super()`

The `super()` method gives you access to the superclass of the object that calls `super()`.  In this case, you saw how `super()` was used in the  `__init__()` method to initialize the object just as if we were creating a new `guitar` object. Afterward, you can modify attributes as needed.  Although not shown in this example, it is worth noting that you can also add attributes and methods to a subclass that a superclass does not have. For instance, if you added the attribute `self.string_type = 'bass'` inside the `Bass_Guitarist.__init__()` method, all bass guitarist objects would have that attribute, but guitarist objects would not.  Conversely, any changes that you make to the superclass `Guitarist()` will always be reflected in the subclass `Bass_Guitarist()`. 


### Changing Values and Methods 

Note that in both of these classes, you have methods named `.practice()` that have the same name, but different behaviors. This is an example of **_Polymorphism_**, meaning that you can have methods that have the same name, but contain different code inside their bodies.  This is not a naming collision because these methods exist attached to different classes.  

Also, take note of the way the `.perform()` method is written inside of `Bass_Guitarist()`. If you want a method in a subclass to do everything that method does in a superclass and *then* do something else, you can accomplish this by simply calling the superclass's version of the method by accessing it with `super()` and then adding any remaining behavior afterward in the body of the function. 

### Accessing Methods

By default, subclasses have access to all methods contained in a superclass. Because they are a subclass, they can automatically do the same things as the corresponding superclass. You do not need to declare the functions in the body of the subclass to have access to them. For example, while there was no mention of the method `.tune_instrument()`, `paul` still has access to the exact same `.tune_instrument()` method as `george`.  You only declare methods that are mentioned in the superclass if you want to override their behavior in the subclass.  

## Abstract Superclasses

When you make use of a subclass and a superclass, you are defining levels of **_Abstraction_**. In this case, the superclass `Guitarist()` is one level of abstraction higher than the subclass `Bass_Guitarist()`. Intuitively, this makes sense -- bass guitarists are a kind of guitarist, but not all guitarists are bass guitarists.

It's also worth noting that you can always go a level of abstraction higher by defining a class that is more vague but still captures the common thread amongst the subclasses. Here's an example to demonstrate.

At first glance, it may seem that guitarists, singers, and drummers don't have enough in common with each other to make use of inheritance -- a drummer is not a type of singer, etc. However, one thing they all have in common is they are all a type of `Musician()`. No matter what sort of musician you are, you:

* have a `name`  
* have an `instrument`  
* know how to `tune_instrument`  
* can `practice` and `perform`

In this way, you can write a single superclass that will be useful for all of the subclasses in our band: `Drummer()`, `Guitarist()`, `Bass_Guitarist()`, and `Singer()` are all types of musicians!

This is called an **_Abstract Superclass_**. The superclass being used is at a level of abstraction where it does not make sense for it to exist on its own.  For example, it makes sense to instantiate drummers, singers, and guitarists -- they are members of a band, and by playing these instruments, they are musicians.  However, you cannot be a `musician` without belonging to one of these subclasses -- there is no such thing as a musician that doesn’t play any instruments or sing! It makes no sense to instantiate a `Musician()`, because they don't really exist in the real world -- you only create this **_Abstract Superclass_** to define the commonalities between our subclasses and save ourselves some redundant code!

### Creating The Beatles Using an Abstract Superclass

The cell below models the Beatles by making use of the abstract superclass `Musician()`, and then subclassing it when creating `Drummer()`, `Singer()`, and `Guitarist()`.  Note that since you can have multiple layers of abstraction, it makes sense to keep `Bass_Guitarist()` as a subclass of `Guitarist()`.

In [None]:
class Musician(object):
    def __init__(self, name):
        self.name = name
        self.band = "THE BEATLES"
    
    def tune_instrument(self):
        print("Tuning instrument")
    
    def practice(self):
        print("Practicing!!")
    
    def perform(self):
        print("Hello New York!!")

class Singer(Musician):
    def __init__(self, name):
        super().__init__(name)  # Notice how we pass in name argument from init to the super().__init() method, because it expects it
        self.role = "Singer"
    
    def tune_instrument(self):
        print("No tuning needed...I'm a singer!!")

class Guitarist(Musician):
    def __init__(self, name):
        super().__init__(name)
        self.role = "Guitarist"

    def Practice(self):
        print("Strumming the old 6 string!!")

class Bass_Guitarist(Guitarist):
    
    def __init__(self, name):
        super().__init__(name)
        self.role = "Bass Guitarist"
        
    def practice(self):
        print("I play the Seinfeld Theme Song when I get bored")
        
    def perform(self):
        super().perform()
        print("Thanks for coming out!")
        
class Drummer(Musician):
    
    def __init__(self, name):
        super().__init__(name)
        self.role = "Drummer"
        
    def tune_instrument(self):
        print('Where did I put those drum sticks?')
        
    def practice(self):
        print('Why does my chair still say "Pete Best"?')

In [None]:
john = Singer('John Lennon')
paul = Bass_Guitarist('Paul McCartney')
ringo = Drummer('Ringo Starr')
george = Guitarist('George Harrison')

the_beatles = [john, ringo, george, paul]

In [None]:
for musician in the_beatles:
    print(f" {musician.name} is the {musician.role} ")

 John Lennon is the Singer 
 Ringo Starr is the Drummer 
 George Harrison is the Guitarist 
 Paul McCartney is the Bass Guitarist 


In [None]:
for musician in the_beatles:
    musician.tune_instrument()

No tuning needed...I'm a singer!!
Where did I put those drum sticks?
Tuning instrument
Tuning instrument


In [None]:
for musician in the_beatles:
    musician.practice()

Practicing!!
Why does my chair still say "Pete Best"?
Practicing!!
I play the Seinfeld Theme Song when I get bored


In [None]:
for musician in the_beatles:
    musician.perform()

Hello New York!!
Hello New York!!
Hello New York!!
Hello New York!!
Thanks for coming out!


## Modeling a Zoo

Consider the following scenario:  You've been hired by a zookeeper to build a program that keeps track of all the animals in the zoo.  This is a great opportunity to make use of inheritance and object-oriented programming!

## Creating an Abstract Superclass

Start by creating an abstract superclass, `Animal()`.  When your program is complete, all subclasses of `Animal()` will have the following attributes:

* `name`, which is a string set at instantation time
* `size`, which can be `'small'`, `'medium'`, `'large'`, or `'enormous'` 
* `weight`, which is an integer set at instantiation time 
* `species`, a string that tells us the species of the animal
* `food_type`, which can be `'herbivore'`, `'carnivore'`, or `'omnivore'`
* `nocturnal`, a boolean value that is `True` if the animal sleeps during the day, otherwise `False`

They'll also have the following behaviors:

* `sleep`, which prints a string saying if the animal sleeps during day or night
* `eat`, which takes in the string `'plants'` or `'meat'`, and returns `'{animal name} the {animal species} thinks {food} is yummy!'` or `'I don't eat this!'` based on the animal's `food_type` attribute 

In the cell below, create an abstract superclass that meets these specifications.

**_NOTE:_** For some attributes in an abstract superclass such as `size`, the initial value doesn't matter -- just make sure that you remember to override it in each of the subclasses!

In [None]:
class Animal(object):
    def __init__(self, name, weight):
        self.name = name
        self.size = None
        self.weight = weight
        self.species = None
        self.food_type = None
        self.nocturnal = False
        
    def sleep(self):
        if self.nocturnal:
            print("{} sleeps during the day!".format(self.name))
        else:
            print("{} sleeps during the night!".format(self.name))
            
    def eat(self, food):
        if self.food_type == 'omnivore':
            print("{} the {} thinks {} is Yummy!".format(self.name, self.species, food))
        elif (food == 'meat' and self.food_type == "carnivore") or (food == 'plants' and self.food_type == 'herbivore'):
            print("{} the {} thinks {} is Yummy!".format(self.name, self.species, food))
        else:
            print("I don't eat this!")

Now that you have our abstract superclass, you can begin building out the specific animal classes.

In the cell below, complete the `Elephant()` class.  This class should:

* subclass `Animal` 
* have a species of `'elephant'` 
* have a size of `'enormous'` 
* have a food type of `'herbivore'` 
* set nocturnal to `False` 

**_Hint:_** Remember to make use of `.super()` during initialization, and be sure to pass in the values it expects at instantiation time!

In [None]:
class Elephant(Animal):
    def __init__(self, name, weight):
        super().__init__(name, weight)
        self.name = 'Elephant'
        self.size = 'enormous'
        self.food_type = 'herbivore'
        self.nocturnal = 'False'

Now, in the cell below, create a `Tiger()` class.  This class should: 

* subclass `Animal` 
* have a species of `'tiger'` 
* have a size of `'large'` 
* have a food type of `'carnivore'` 
* set nocturnal to `True` 

In [None]:
class Tiger(Animal):
    def __init__(self, name, weight):
        super().__init__(name, weight)
        self.name = 'Tiger'
        self.size = 'large'
        self.food_type = 'carnivore'
        self.nocturnal = 'True'

Two more classes to go. In the cell below, create a `Raccoon()` class. This class should:

* subclass `Animal` 
* have a species of `raccoon` 
* have a size of `'small'` 
* have a food type of `'omnivore'` 
* set nocturnal to `True` 

In [None]:
class Raccoon(Animal):
    def __init__(self, name, weight):
        super().__init__(name, weight)
        self.name = 'Raccoon'
        self.size = 'small'
        self.food_type = 'omnivore'
        self.nocturnal = 'True'

Create a `Gorilla()` class. This class should:

* subclass `Animal` 
* have a species of `gorilla` 
* have a size of `'large'` 
* have a food type of `'herbivore'` 
* set nocturnal to `False` 

In [None]:
class Gorilla(Animal):
    def __init__(self, name, weight):
        super().__init__(name, weight)
        self.name = 'Gorilla'
        self.size = 'large'
        self.food_type = 'herbivore'
        self.nocturnal = 'False'

## Using Our Objects

Now it's time to populate the zoo! To ease the creation of animal instances, create a function `add_animal_to_zoo()`.

This function should take in the following parameters:

* `zoo`, an array representing the current state of the zoo 
* `animal_type`, a string.  Can be `'Gorilla'`, `'Raccoon'`, `'Tiger'`, or `'Elephant'` 
* `name`, the name of the animal being created 
* `weight`, the weight of the animal being created 

The function should then:

* use `animal_type` to determine which object to create
* Create an instance of that animal, passing in the `name` and `weight`
* Append the newly created animal to `zoo`
* Return `zoo`

In [None]:
def add_animal_to_zoo(zoo, animal_type, name, weight):
    animal = None
    if animal_type == 'Gorilla':
        animal = Gorilla(name, weight)
    elif animal_type == 'Raccoon':
        animal = Raccoon(name, weight)
    elif animal_type == 'Tiger':
        animal = Tiger(name, weight)
    else:
        animal = Elephant(name, weight)
    
    zoo.append(animal)
    
    return zoo

Great! Now, add some animals to your zoo. 

Create the following animals and add them to your zoo.  The names and weights are up to you.

* 2 Elephants
* 2 Raccons
* 1 Gorilla
* 3 Tigers

In [None]:
# Create your animals and add them to the 'zoo' in this cell!
to_create = ['Elephant', 'Elephant', 'Raccoon', 'Raccoon', 'Gorilla', 'Tiger', 'Tiger', 'Tiger']

zoo = []

for i in to_create:
    zoo = add_animal_to_zoo(zoo, i, 'name', 100)
    
zoo

[<__main__.Elephant at 0x2bac7e278b0>,
 <__main__.Elephant at 0x2bac817a730>,
 <__main__.Raccoon at 0x2bac817a820>,
 <__main__.Raccoon at 0x2bac817a940>,
 <__main__.Gorilla at 0x2bac817ae80>,
 <__main__.Tiger at 0x2bac817ab20>,
 <__main__.Tiger at 0x2bac817ac10>,
 <__main__.Tiger at 0x2bac817a5e0>]

Great! Now that you have a populated zoo, you can do what the zookeeper hired you to do -- write a program that feeds the correct animals the right food at the right times!

To do this, write a function called `feed_animals()`. This function should take in two arguments:

* `zoo`, the zoo array containing all the animals
* `time`, which can be `'Day'` or `'Night'`.  This should default to day if nothing is entered for `time` 

This function should:

* Feed only the non-nocturnal animals if `time='Day'`, or only the nocturnal animals if `time='Night'`
* Check the food type of each animal before feeding.  If the animal is a carnivore, feed it `'meat'`; otherwise, feed it `'plants'`. Feed the animals by using their `.eat()` method 

In [None]:
def feed_animals(zoo, time='Day'):
    for animal in zoo:
        if time == 'Day':
            if animal.nocturnal == False:
                if animal.food_type == 'carnivore':
                    animal.eat('meat')
                else:
                    animal.eat('plants')
        else:
            if animal.nocturnal == True:
                if animal.food_type == 'carnivore':
                    animal.eat('meat')
                else:
                    animal.eat('plants')

Call the function for a daytime feeding below.

In [None]:
feed_animals(zoo)

If the elephants and gorrillas were fed then things should be good!

Call feed_animals() again, but this time set time='Night'

In [None]:
feed_animals(zoo, 'Night')

## OOP with Scikit-Learn

**Most scikit-learn classes are mutable**, which means that calling methods on them changes their internal data.

### Scikit-Learn Classes
Scikit-learn has four main classes to be aware of:

- Estimator

- Transformer

- Predictor

- Model

They are defined based on which methods they possess. The classes are not mutually exclusive.

### Estimator

Almost all scikit-learn classes you will use will be some kind of estimator. It is the "base object" in scikit-learn.

An estimator is defined by having a `fit` method. There are two typical forms for this method:

```python
estimator.fit(data)
```

and

```python
estimator.fit(X, y)
```

The first one is typically used in the context of a transformer or unsupervised learning predictor, while the second is used in the context of a supervised learning predictor.

### Transformer


A transformer is an estimator that has a `transform` method:

```python
transformer.transform(data)
```

The `transform` method is called after the `fit` method and returns a modified form of the input data.

An example of a transformer (that is not also a predictor or model) is:

#### `StandardScaler`

`StandardScaler` is used to standardize features by removing the mean and scaling to unit variance ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html))

In [None]:
#import class from sckit-learn
from sklearn.preprocessing import StandardScaler

# Instantiate the scaler (same step for all estimators, though specific args differ)
scaler = StandardScaler()

In [None]:
#When the estimator is first instantiated, these are all of its attributes:
scaler.__dict__

{'with_mean': True, 'with_std': True, 'copy': True}

In [None]:
#The next step, like with any scikit-learn estimator, is to fit the scaler on the data:

data = [[10],[20],[30],[40],[50]] #data representing a single feature

# Fit the scaler (same step for all estimators, though specific args differ)
scaler.fit(data)

StandardScaler()

In [None]:
#Now that fit has been called, because transformers are mutable, there are additional attributes:

scaler.__dict__

{'with_mean': True,
 'with_std': True,
 'copy': True,
 'n_features_in_': 1,
 'n_samples_seen_': 5,
 'mean_': array([30.]),
 'var_': array([200.]),
 'scale_': array([14.14213562])}

The underscore (`_`) at the end of these new variables (e.g. `mean_`) is a scikit-learn convention, which means that these attributes are not available until the estimator has been fit.

We can access these fitted attributes using the standard dot notation:

In [None]:
scaler.mean_

array([30.])

In [None]:
#Now the scaler is fit, we can use it to transform the data:

scaler.transform(data) 

array([[-1.41421356],
       [-0.70710678],
       [ 0.        ],
       [ 0.70710678],
       [ 1.41421356]])

### Predictor

As you might have...*predicted*...a predictor is an estimator that has a `predict` method:

```python
predictor.predict(X)
```

The `predict` method is called after the `fit` method and can be part of a supervised or unsupervised learning model. It returns a list of predictions `y` associated with the input data `X`.

An example of a predictor is:

#### `LinearRegression`

`LinearRegression` is a class that represents an ordinary least squares linear regression model ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html))

In [None]:
#import class from sckit-learn
from sklearn.linear_model import LinearRegression

#instantiate the model
lr = LinearRegression()

In [None]:
lr.__dict__

{'fit_intercept': True, 'normalize': False, 'copy_X': True, 'n_jobs': None}

In [None]:
# Data representing X (features) and y (target), where y = 10x + 5
X = [[1], [2], [3], [4], [5]]
y = [15, 25, 35, 45, 55]

# Fit the linear regression
lr.fit(X,y)

LinearRegression()

In [None]:
lr.__dict__

{'fit_intercept': True,
 'normalize': False,
 'copy_X': True,
 'n_jobs': None,
 'n_features_in_': 1,
 'coef_': array([10.]),
 '_residues': 1.7452973362415567e-29,
 'rank_': 1,
 'singular_': array([3.16227766]),
 'intercept_': 4.999999999999993}

We can access the fitted attributes using dot notation. For example, below we access the intercept and coefficient of the regression:

In [None]:
print(lr.intercept_)
print(lr.coef_[0])

4.999999999999993
10.000000000000002


Because this is a predictor and not a transformer, the next step is to use the `predict` method rather than the `transform` method:

In [None]:
lr.predict(X)

array([15., 25., 35., 45., 55.])

### Model

A model is an estimator that has a `score` method. There are two typical forms for this method:

```python
model.score(X, y)
```

and

```python
model.score(X)
```

For example, using the linear regression model from above, we can score the model using r-squared:

In [None]:
"""
According to the documentation, this score represents the mean accuracy
"""

In [103]:
lr.score(X,y)

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 4)

An example of a model that produces a score with just `X` would be `PCA` ([documentation here](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)):

In [None]:
# Import class from scikit-learn
from sklearn.decomposition import PCA

# Instantiate the model (same step for all estimators, though specific args differ)
pca = PCA(n_components=1)



In [None]:
pca.__dict__

{'n_components': 1,
 'copy': True,
 'whiten': False,
 'svd_solver': 'auto',
 'tol': 0.0,
 'iterated_power': 'auto',
 'random_state': None}

In [None]:
# Data representing two features
X = [[1, 11], [2, 12], [3, 14], [4, 16], [5, 18]]

# Fit the PCA (same step for all estimators, though specific args differ)
pca.fit(X)

PCA(n_components=1)

In [None]:
pca.__dict__

{'n_components': 1,
 'copy': True,
 'whiten': False,
 'svd_solver': 'auto',
 'tol': 0.0,
 'iterated_power': 'auto',
 'random_state': None,
 'n_features_in_': 2,
 '_fit_svd_solver': 'full',
 'mean_': array([ 3. , 14.2]),
 'noise_variance_': 0.02341572863058885,
 'n_samples_': 5,
 'n_features_': 2,
 'components_': array([[0.48215553, 0.87608564]]),
 'n_components_': 1,
 'explained_variance_': array([10.67658427]),
 'explained_variance_ratio_': array([0.99781161]),
 'singular_values_': array([6.53500858])}

In [None]:
pca.score(X)

-1.9447298858494055

To understand what a given score means, look at the documentation for the model (e.g. [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html?highlight=score#sklearn.linear_model.LinearRegression.score) for `LinearRegression` or [here](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=score#sklearn.decomposition.PCA.score) for `PCA`).



### Overlapping Classes

As stated previously, these scikit-learn classes are not mutually exclusive.

`StandardScaler` is an **estimator** and a **transformer** but not a predictor or a model.

`LinearRegression` is an **estimator**, a **predictor**, and a **model** but not a transformer.

`KMeans` is an **estimator**, a **transformer**, a **predictor**, and a **model**.

`PCA` is an **estimator**, a **transformer**, and a **model** but not a predictor.

## Takeaways

**You do not need to memorize** these labels for every scikit-learn class you encounter. You can always figure out what a class can do by looking at its documentation:

* If it has a `fit` method, it's an estimator
* If it has a `transform` method, it's a transformer
* If it has a `predict` method, it's a predictor
* If it has a `score` method, it's a model

Recognizing these terms can help you navigate the official documentation as well as third-party resources, which might refer to these classes and their instances with various labels interchangeably, since multiple labels often apply.

Also, keep in mind that estimators are mutable and store important information during the `fit` step, which means that you always need to call the `fit` method before you can call the `transform`, `predict`, or `score` methods. 

In [None]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True, as_frame=True)



In [None]:
X

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [None]:
y

0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: target, Length: 150, dtype: int32

In [None]:
#import
from sklearn.preprocessing import MinMaxScaler

#Inatntiate
scaler = MinMaxScaler()

#Fit
scaler.fit(X)

MinMaxScaler()

In [None]:
#import
from sklearn.tree import DecisionTreeClassifier

#instantiate
decision = DecisionTreeClassifier()

#fit
decision.fit(X,y)

DecisionTreeClassifier()