# Lab - Object Oriented Programming

## Class

We are always trying to group collections together.

> Dicts: group data together 

> Functions: group actions together

> Classes: group data and actions together <3

### Examples

> Car

> Person

> DataFrame

In [3]:
class Car:
    pass

class Person:
    pass

class Dataframe:
    pass

In [4]:
df = Dataframe()

In [6]:
type(df)

__main__.Dataframe

In [7]:
import pandas as pd

In [10]:
real_df = pd.DataFrame()

type(real_df)

pandas.core.frame.DataFrame

## Attributes

Values stored inside of an object

In [29]:
class Person:
    name = 'Andre'
    surname = 'Aguiar'
    birth_date = '08/01/1992'
    address = 'Alameda Jaú, 1301'
    email = 'andre.aguiar@ironhack.com'
    
    
x = Person


In [30]:
x.email

'andre.aguiar@ironhack.com'

In [31]:
y = Person

In [32]:
y.address

'Alameda Jaú, 1301'

hmm... interesting

what if we want to pass these values as arguments?

## Methods

### The `__init__` method

In [124]:
class Person:    
    """Person class."""
    def __init__(self, nome, sobrenome, birth_date):
        """
        The purpose of this method is thus to set up a new object using data that we have provided.
         
        Creates a person with a name associated to it.
        """
        self.name = nome
        self.surname = sobrenome
        self.birth_date = birth_date
        self.attrs = {}
        
    def get_age(self):
        from datetime import datetime
        
        n_days = (datetime.today() - datetime.strptime(self.birth_date, '%d/%m/%Y')).days
        return round(n_days/365)

In [118]:
andre = Person(nome='Andre',sobrenome='Aguiar',birth_date='08/01/1992')

In [119]:
andre.get_age()

28

In [121]:
andre.attrs.update({'tipo':'gato'})

In [75]:
andre.surname

'Aguiar'

In [108]:
import pandas as pd

In [110]:
df = pd.DataFrame()

In [100]:
from datetime import datetime
date_diff = datetime.today() - datetime.strptime(andre.birth_date, '%d/%m/%Y')
round(date_diff.days / 365)

28

# Lab

In order to understand the benefits of simple object-oriented programming, we have to build up our classes from the beginning. 

In [126]:
import pandas as pd

In [127]:
chars = ['a', 'b', 'c','d', 'e', 'f', ' ', 'á','é','ó']

In [128]:
import numpy as np

In [225]:
def create_weird_dataframe(size=10):
    def create_weird_colnames(size=size):
        probs = [.2,.2,.15,.1,.1,.1,.05,.05,.025,.025]

        return [''.join(
            [(char.upper() if np.random.random() < 0.2 else char) 
                     for char in np.random.choice(chars,size=12, p=probs)]) for i in range(size)]
    
    data = np.random.random(size=(size,size))
    colnames = create_weird_colnames(size)
    return pd.DataFrame(data=data, columns=colnames)

In [244]:
df = create_weird_dataframe()

In [245]:
df

Unnamed: 0,FAcfaDaacc E,a a ebáBfaA,dcfdódedbácB,BBBcAcFaecbb,FaDadéaefbóc,cCdfaóaéEÁaB,céóa ÉCabfbA,bcabddbbd ce,bEfabcEBfaaf,cbaafeféBDaa
0,0.163118,0.752995,0.5444,0.859675,0.274013,0.481413,0.503844,0.253995,0.692665,0.559837
1,0.247302,0.106836,0.126144,0.434204,0.572324,0.472847,0.883387,0.815491,0.754484,0.034371
2,0.697422,0.878556,0.64881,0.00067,0.181008,0.048661,0.920137,0.192451,0.600667,0.773147
3,0.619737,0.67567,0.078263,0.967575,0.646317,0.01373,0.637818,0.005379,0.052359,0.149963
4,0.511138,0.985973,0.003934,0.072659,0.580214,0.544005,0.625536,0.300977,0.11344,0.077549
5,0.781726,0.228816,0.869564,0.347575,0.478251,0.306618,0.684339,0.517895,0.34496,0.413945
6,0.558554,0.954637,0.196378,0.787453,0.031651,0.142039,0.689468,0.106734,0.755942,0.351402
7,0.231181,0.782204,0.032489,0.165517,0.612893,0.862522,0.391886,0.778282,0.470036,0.641185
8,0.802357,0.291661,0.009872,0.255891,0.329959,0.005271,0.042944,0.842972,0.58105,0.576628
9,0.254199,0.027583,0.82072,0.864777,0.280942,0.758441,0.873803,0.515068,0.850621,0.46489


## Correcting the column names

### let's start simple: get the column names of the dataframe.

Store it in a variable called `col_names`


In [250]:
col_names = df.columns
col_names

Index(['FAcfaDaacc E', 'a  a ebáBfaA', 'dcfdódedbácB', 'BBBcAcFaecbb',
       'FaDadéaefbóc', 'cCdfaóaéEÁaB', 'céóa ÉCabfbA', 'bcabddbbd ce',
       'bEfabcEBfaaf', 'cbaafeféBDaa'],
      dtype='object')

### Let's iterate through this columns and transform them into lower-case column names

Create a list comprehension to do that if possible. Store it in a variable called `lower_colnames`

In [253]:
lower_colnames = [col.lower() for col in col_names]

In [254]:
lower_colnames

['facfadaacc e',
 'a  a ebábfaa',
 'dcfdódedbácb',
 'bbbcacfaecbb',
 'fadadéaefbóc',
 'ccdfaóaéeáab',
 'céóa écabfba',
 'bcabddbbd ce',
 'befabcebfaaf',
 'cbaafefébdaa']

### Let's remove the spaces of these column names!

Replace each column name space ` ` for an underline `_`. Again, try to use a list comprehension to do that. 
For this first task use `.replace(' ','')` method to do that.

In [292]:
[col.replace(' ','_') for col in lower_colnames]

['facfadaacc_e',
 'a__a_ebábfaa',
 'dcfdódedbácb',
 'bbbcacfaecbb',
 'fadadéaefbóc',
 'ccdfaóaéeáab',
 'céóa_écabfba',
 'bcabddbbd_ce',
 'befabcebfaaf',
 'cbaafefébdaa']

### Create a function that groups the results obtained above and return the lower case underlined names as a list

Name the function `normalize_cols`. This function should receive a dataframe, get the column names of a it and return the treated list of column names.

In [427]:
def normalize_cols(dataframe):
    """
    Receive a dataframe, get its columns, put it in lower 
    case and then replace spaces by underlines.
    """
    colnames = dataframe.columns
    lower_colnames = [col.lower() for col in colnames]

    return [col.replace(' ','_') for col in lower_colnames]

### Test your results

Use the following line of code to test your results. Run it several times to see some behaviors.

In [430]:
normalize_cols(create_weird_dataframe())

['eb_c__aafadb',
 'abd_eacfbéca',
 'dáefed_ób_e_',
 'becácbcccbbf',
 'bbefc_bdbbab',
 'aaóbáeebbbeb',
 'cbbfdcbaób_b',
 'ca_d_c_abada',
 'abbbbfcebacd',
 'abaacbfbabdd']

### hmmm, we made a mistake!

We've commited several mistakes by doing this. Have observed any bugs associated with our results?

In order for us to see some problems in our results, we have to look for edge cases. 

For example: 

**Problem #1:** what if there are 2 or more following spaces? We want it to replace the spaces by several underlines or condense them into one?

**Problem #2:** what if there are spaces at the beginning? Should we substitute them by underline or drop them?

Let's correct each problem. Starting by problem 2.

## Correcting our function

Instead of substituting the spaces at first place, let's remove the trailing and leading spaces!

Recreate the `normalize_cols` with the solution to `Problem 2`.

*Hint: Copy and paste the last `normalize_cols` function to change it.*

In [432]:
def normalize_cols(dataframe):
    """
    Receive a dataframe, get its columns, put it in lower 
    case, strip leading and trailing spaces and then replace 
    the remaining spaces by underlines.
    """
    colnames = dataframe.columns
    lower_colnames = [col.lower().strip() for col in colnames]

    return [col.replace(' ','_') for col in lower_colnames]

### Test your results again.

At least, for now, you should not have any trailing nor leading underlines.

In [448]:
normalize_cols(create_weird_dataframe())

['cbfféfacbaaa',
 'ceaéea__éd_c',
 'bdfffcbófaca',
 'cdéfcecacca',
 'bcbacaaóbaac',
 'aceeacóabc_é',
 'bbcdbaaafacé',
 'bbb__dáeabec',
 'bdfaóafc_cdd',
 'ábbafcabeb_d']

### Correcting problem 1

To correct problem 1, instead of using `.replace()` string method, we want to use a regular expression. Use the module `re` to substitute the pattern of `1 or more spaces` by 1 underline `_`.

Test your solution on the variable below:

In [460]:
import re 

text = 'these spaces      should all be one underline'

In [462]:
re.sub('\s+','_', text)

'these_spaces_should_all_be_one_underline'

### Now correct your `normalize_cols` function

*Hint: Copy and paste the last `normalize_cols` function to change it.*

In [487]:
def normalize_cols(dataframe):
    """
    Receive a dataframe, get its columns, put it in lower 
    case, strip leading and trailing spaces. The inner remaining 
    spaces are then substituted by underlines (consecutive spaces
    are ignored).
    """
    colnames = dataframe.columns
    lower_colnames = [col.lower().strip() for col in colnames]

    return [re.sub('\s+', '_', col) for col in lower_colnames]

### Again, test your results.

Now, sometimes some column names should have smaller sizes (because you are removing consecutive spaces)

In [488]:
normalize_cols(create_weird_dataframe())

['eabóébcfbcbb',
 'cfdcdáaedbfc',
 'fcdaécfab_eb',
 'fcadbbaa_aca',
 'dbdacdaabae',
 'bacedbeaffea',
 'accba_abaábc',
 'aebcadfaábóe',
 'caedfdaafá',
 'óedcbdcebacc']

## Last step: remove accents

The last step consists in removing accents from the strings.

Import the package `unidecode` to use its module also called `unidecode` to remove accents. Test on the word below.

In [484]:
from unidecode import unidecode

In [485]:
text = 'aéóúaorowó'

In [486]:
unidecode(text)

'aeouaorowo'

### Now remove the accents for each column name in your `normalized_cols` function.

*Hint: Copy and paste the last `normalize_cols` function to change it.*

In [489]:
def normalize_cols(dataframe):
    """
    Receive a dataframe, get its columns, put it in lower 
    case and strip leading and trailing spaces. The inner remaining 
    spaces are then substituted by underlines (consecutive spaces
    are ignored).
    """
    colnames = dataframe.columns
    lower_colnames = [col.lower().strip() for col in colnames]

    return [unidecode(re.sub('\s+', '_', col)) for col in lower_colnames]

### Test your results

In [497]:
normalize_cols(create_weird_dataframe())

['fefebbbaeca',
 'ofbaccabfbcb',
 'dcbeec_aaaad',
 'ebcddacabcbd',
 'ceaaadbabeba',
 'adaaedaaefa',
 'bae_dcfebbad',
 'fbeaaeaabecf',
 'cdecebedaafa',
 'cbaafabedce']

## Good job. 

Right now you have a function that receives a dataframe and returns its columns names with a good formatting.

# Creating our own dataframe.

In [578]:
from pandas import DataFrame

A dataframe is just a simple class. It contains its own attributes and methods. 

When you create a pd.DataFrame() you are just instantiating the DataFrame class as an object that you can store in a variable. From this point onwards, you have access to all DataFrame class attributes (`.columns` for example) and methods (`.isna()` for example). We've been using those since always! 

If we wish, we could create our own class inheriting everything from a DataFrame class.

In [None]:
class myDataFrame(DataFrame):
    pass


In [575]:
class myDataFrame(DataFrame):
    
    
    def normalize_cols(self):
        """
        Receive a dataframe, get its columns, put it in lower 
        case and strip leading and trailing spaces. The inner remaining 
        spaces are then substituted by underlines (consecutive spaces
        are ignored).
        """
        colnames = self.columns
        lower_colnames = [col.lower().strip() for col in colnames]

        self.columns = [unidecode(re.sub('\s+', '_', col)) for col in lower_colnames]
        
        return self

In [576]:
df = myDataFrame(create_weird_dataframe())

In [577]:
df.normalize_cols()

Unnamed: 0,cfbbacaabobb,aeaeaaaacbdb,c_afeabeobaa,adbafebaaccc,ddfcbceaba,eccddaeeabae,edccbfabcfda,afadbaafcbfb,ebfebafbeba,f_c_aafddbcc
0,0.472835,0.848847,0.906653,0.903822,0.109248,0.786137,0.746215,0.910873,0.127374,0.254131
1,0.895307,0.82413,0.581754,0.677616,0.543269,0.693672,0.791333,0.296323,0.574805,0.010372
2,0.361676,0.568477,0.310177,0.932225,0.52098,0.36634,0.949068,0.457448,0.88052,0.229401
3,0.44729,0.587247,0.065008,0.232211,0.226403,0.882723,0.211655,0.888467,0.399023,0.018939
4,0.968006,0.730009,0.429434,0.491126,0.5665,0.005355,0.194978,0.040664,0.825781,0.002027
5,0.53529,0.106083,0.238623,0.833247,0.330043,0.240625,0.428158,0.543237,0.823002,0.878882
6,0.182044,0.50974,0.253705,0.10303,0.6009,0.033798,0.184457,0.235554,0.116627,0.185139
7,0.094324,0.66616,0.510271,0.538886,0.83168,0.483634,0.756369,0.750077,0.981544,0.02283
8,0.19314,0.694843,0.468497,0.474393,0.753057,0.839238,0.163925,0.418921,0.708624,0.617461
9,0.91885,0.914129,0.665494,0.702666,0.302457,0.492509,0.922612,0.405814,0.980236,0.543627


## Understanding even more the `self` argument

Now change your method to return the dataframe itself. That is, return the `self` argument this time and see the results!

# Challenge 1

## Creating a class

First of all, let's create a simple class. Name this class `Car`. ([PEP8](https://www.python.org/dev/peps/pep-0008/#class-names) suggests using CamelCase for class names, i.e., using the first letter of each name as upper-case.)

That should be as simple as possible. Use the class syntax to create it and its content should be only the 
```python 
pass
```
statement.


The `pass` statement is used just as a placeholder. This will be a class that doesn't do anything (yet).

In [585]:
# your code here

In [586]:
class Car:
    pass

In [587]:
my_car = Car()

## Let's think of which attributes should a car have

Think of attributes that are intrinsic of a car. Think of 5 attributes that all cars have and their possible values. Write down these 5 attributes for later use.

In [642]:
# write the attributes name you've chosen as a comment here.


We will create the `__init(self,)__` special method. This is the first thing that is run when you instantiate a new object (by calling `Car()` for example).

So each object that you are creating will instantly do whatever operation you perfom inside `__init(self,)__`. If you create new attributes over there, it will be accessible as soon as you create it. If you, instead, run some internal methods, it will perform as soon as the variable is created.

Let's check that.

### Create a `__init__(self)` special method inside your `Car` class and then perform a `for loop`  inside of it. 


To see the what happens when you initialize your class when a `__init__(self)` method exists, define this function and plug the following piece of code inside of it.

```python
from tqdm.auto import tqdm
import time

for i in tqdm(range(10), desc='__init__ is running, yay'):
    time.sleep(.1)
```

In [638]:
# your code here

In [639]:
class Car:
    def __init__(self):
        from tqdm.auto import tqdm
        import time
        
        for i in tqdm(range(10), desc='__init__ is running, yay'):
            time.sleep(.1)

### Afterwards, instantiate your `Car` class and see this beauty.

In [637]:
my_car = Car()

HBox(children=(FloatProgress(value=0.0, description='__init__ is running, yay', max=10.0, style=ProgressStyle(…




## Understanding the self argument

Now, below the `for loop` you've created, let's create the attributes of the `Car` class. Remember the attributes you wrote down earlier? Let's put them as arguments of the `__init__(self,)` function.

Remember, the first argument of the `__init__(self,)` function should always be the `self` keyword. 

The `self` argument represents the object itself. That is a way for you to have access to the objects own attribute. 


### First, let's start creating one single attribute of this car.

Let's say you have chosen `name` as a car attribute (what? can't a car have a name?). 

If you want your class to receive a specific car name as an argument, you have to put this variable as the argument of the `__init__` function. So, to add `name`, the results of your special function definition would be:

```python
def __init__(self, name):
    ...
```

Now, when you instantiate your Car class, the syntax would be similar to calling a function (which, by now, you should now that it is what you are effectively doing - you are calling the __init__ method), so the syntax would be:

```python
my_car = Car('jeguinho')
```

If you don't specify an argument, the python interpreter will complain that your class requires one argument (try that - if you don't try it now, it is not a problem, you'll try in future, even when you don't want to).



### Now let's store that new argument

By now, you are only receiving the name of the car as an argument, but you are not doing anything specifically with that variable called `name`.

Let's store that in the object. That's the first use of the `self` keyword.

To store the variable in a way that the user can access via a `car.SOMETHING`, you have to specify that the object itself is receiving the attribute `name` (for example)

Then, **create a variable called `name` that receives the argument `name`** (keep in mind that the name of the variable need not necessarily be the same, you could assing the argument `name` to an attribute called `chimpanze` for example).

Also **create the other 5 attributes that you previously had in mind**


In [None]:
# your code here

### Access the attribute

You should now be able to access the object's attribute once you instantiate it as `my_car.name`

You can try to write `my_car.<TAB>` to check what attributes or methods your object contains.

## Understanding special methods

Special methods are the ones that start with double underlines (usually called `dunder`), for example the `__init__` method, the `__doc__` method or `__repr__` method (called as `dunder init`, `dunder doc`, `dunder repr`).

The `__repr__` method is responsible to show how your class will be displayed on screen when you display it.
Let's create a `__repr__(self)` function on our `Car` class that returns the following string below:

```python
    car = f'''
                  ______--------___
                 /|             / |
      o___________|_\__________/__|
     ]|___     |  |=   ||  =|___  |"
     //   \\    |  |____||_///   \\|"
    |  X  |\--------------/|  X  |\"
     \___/                  \___/
    '''
```

Your class should now have two special methods, `__init__` and `__repr__`

### Now instantiate your Car class again

### And check what happens when you print your object on screen

In [None]:
## The self 

In [656]:
class Car:
    
    def __init__(self, car_name):
        self.car_name = car_name
    
    def __repr__(self):
        
        car = f'''
                      ______--------___
                     /|             / |
          o___________|_\__________/__|
         ]|___     |  |=   ||  =|___  |"
         //   \\    |  |____||_///   \\|"
        |  X  |\--------------/|  X  |\"
         \___/                  \___/
                 {self.car_name}
        '''
        
        return car

In [657]:
my_car = Car('jequitinhonha')

In [658]:
my_car


                      ______--------___
                     /|             / |
          o___________|_\__________/__|
         ]|___     |  |=   ||  =|___  |"
         //   \    |  |____||_///   \|"
        |  X  |\--------------/|  X  |"
         \___/                  \___/
                 jequitinhonha
        