# Lab - Object Oriented Programming

In [32]:
import pandas as pd
import numpy as np

# Challenge 2

In order to understand the benefits of simple object-oriented programming, we have to build up our classes from the beginning. 

You'll use the following dataframe generator to test some things. Try to understand what the following function does.

In [33]:
chars = ['a', 'b', 'c','d', 'e', 'f', ' ', 'á','é','ó']

def create_weird_dataframe(size=10):
    def create_weird_colnames(size=size):
        probs = [.2,.2,.15,.1,.1,.1,.05,.05,.025,.025]

        return [''.join(
            [(char.upper() if np.random.random() < 0.2 else char) 
                     for char in np.random.choice(chars,size=12, p=probs)]) for i in range(size)]
    
    data = np.random.random(size=(size,size))
    colnames = create_weird_colnames(size)
    return pd.DataFrame(data=data, columns=colnames)

Test the results of running that function below. Run it several times

In [34]:
df = create_weird_dataframe()
df.head()

Unnamed: 0,EbabeCÓÉáábá,bDaBfeóAdCda,ebbBaeBdfcba,Baccfcbebdé,Eb cDcbafabE,áaáaBcaEcaed,aócebdadÁeaÓ,aAcáafdFdeab,eEc ccebcCf,BBAb dafdabe
0,0.560372,0.66171,0.398995,0.217707,0.832576,0.541624,0.175059,0.412926,0.590882,0.892858
1,0.305179,0.251231,0.582118,0.220873,0.057252,0.810775,0.021666,0.811478,0.02872,0.416771
2,0.367333,0.960143,0.171967,0.849724,0.870932,0.687468,0.141847,0.333568,0.270926,0.924395
3,0.931438,0.226208,0.781358,0.705584,0.861286,0.824949,0.838881,0.314062,0.034455,0.70291
4,0.320304,0.032452,0.34521,0.936688,0.686717,0.078747,0.341173,0.305504,0.609939,0.261063


## Correcting the column names

We'll create a function that rename the weird column names. The idea is to, later, extend that idea to our own brand new dataframe class.

### let's start simple: get the column names of the dataframe.

Store it in a variable called `col_names`


In [35]:
col_names=df.columns

### Let's iterate through this columns and transform them into lower-case column names

Create a list comprehension to do that if possible. Store it in a variable called `lower_colnames`

In [36]:
lower_colnames = []
for i in col_names:
    lower_colnames.append(i.lower())
lower_colnames    

['ebabecóéáábá',
 'bdabfeóadcda',
 'ebbbaebdfcba',
 'baccfcbebdé ',
 'eb cdcbafabe',
 'áaáabcaecaed',
 'aócebdadáeaó',
 'aacáafdfdeab',
 ' eec ccebccf',
 'bbab dafdabe']

### Let's remove the spaces of these column names!

Replace each column name space ` ` for an underline `_`. Again, try to use a list comprehension to do that. 
For this first task use `.replace(' ','_')` method to do that.

In [37]:
for i in lower_colnames:
    if ' ' in i:
        lower_colnames[lower_colnames.index(i)] = i.replace(' ','_')
lower_colnames    

['ebabecóéáábá',
 'bdabfeóadcda',
 'ebbbaebdfcba',
 'baccfcbebdé_',
 'eb_cdcbafabe',
 'áaáabcaecaed',
 'aócebdadáeaó',
 'aacáafdfdeab',
 '_eec_ccebccf',
 'bbab_dafdabe']

### Create a function that groups the results obtained above and return the lower case underlined names as a list

Name the function `normalize_cols`. This function should receive a dataframe, get the column names of a it and return the treated list of column names.

In [40]:
def normalize_cols(df):
    col_names=df.columns
    lower_colnames = []
    for i in col_names:
        lower_colnames.append(i.lower())
        for i in lower_colnames:
            if ' ' in i:
                lower_colnames[lower_colnames.index(i)] = i.replace(' ','_')
    return lower_colnames   

### Test your results

Use the following line of code to test your results. Run it several times to see some behaviors.

In [45]:
normalize_cols(create_weird_dataframe())

['acfbadcaéább',
 'bebccfdabbab',
 'cáab_dce_cca',
 'aacebaffffda',
 'bfababbá_adf',
 'fbaefébfbcbb',
 'e_eadeaaffaf',
 'cabbcabáadóc',
 'bcacaáacabcb',
 'ebaaffbfdada']

### hmmm, we've made a mistake!

We've commited several mistakes by doing this. Have observed any bugs associated with our results?

In order for us to see some problems in our results, we have to look for edge cases. 

For example: 

**Problem #1:** what if there are 2 or more following spaces? We want it to replace the spaces by several underlines or condense them into one?

**Problem #2:** what if there are spaces at the beginning? Should we substitute them by underline or drop them?

Let's correct each problem. Starting by problem 2.

## Correcting our function

Instead of substituting the spaces at first place, let's remove the trailing and leading spaces!

Recreate the `normalize_cols` with the solution to `Problem 2`.

*Hint: Copy and paste the last `normalize_cols` function to change it.*

In [52]:
def normalize_cols(df):
    col_names=df.columns
    lower_colnames = []
    for i in col_names:
        lower_colnames.append(i.lower())
        for i in lower_colnames:
            if i.startswith(' '):
                lower_colnames[lower_colnames.index(i)] = i.lstrip()
            elif ' ' in i:
                lower_colnames[lower_colnames.index(i)] = i.replace(' ','_')
            else:
                lower_colnames[lower_colnames.index(i)] = i
                
    return lower_colnames   

### Test your results again.

At least, for now, you should not have any trailing nor leading underlines.

In [65]:
normalize_cols(create_weird_dataframe())

['eaffbcbfébfb',
 'bbdbcfbadcáb',
 'b_badccfceád',
 'afáacáfeddad',
 'aecfbábcáeda',
 'bácaaeaccfdd',
 'aéádbacbacfc',
 'aabcfabbeába',
 'ceáfdcdbcbdd',
 'cócccfedcfba']

### Correcting problem 1

To correct problem 1, instead of using `.replace()` string method, we want to use a regular expression. Use the module `re` to substitute the pattern of `1 or more spaces` by 1 underline `_`.

Test your solution on the variable below:

In [66]:
import re 

text = 'these spaces      should all be one underline'

In [80]:
txt = re.sub(' ','_', text)
print(txt)
print(re.sub('([_]+)','_',txt))

these_spaces______should_all_be_one_underline
these_spaces_should_all_be_one_underline


### Now correct your `normalize_cols` function

*Hint: Copy and paste the last `normalize_cols` function to change it.*

In [87]:
def normalize_cols(df):
    col_names=df.columns
    lower_colnames = []
    for i in col_names:
        lower_colnames.append(i.lower())
        for i in lower_colnames:
            if i.startswith(' '):
                lower_colnames[lower_colnames.index(i)] = i.lstrip()
            elif ' ' in i:
                pos = lower_colnames.index(i)
                i = re.sub(' ','_', i)
                i = re.sub('([_]+)','_',i)
                lower_colnames[pos] = i
            else:
                lower_colnames[lower_colnames.index(i)] = i
                
    return lower_colnames  

### Again, test your results.

Now, sometimes some column names should have smaller sizes (because you are removing consecutive spaces)

In [88]:
normalize_cols(create_weird_dataframe())

['beaccaabéacá',
 'fcd_be_accác',
 'bebbabb_cbcd',
 'bbbbófaabbea',
 'baéfbbffdaf',
 'b_baacaabbba',
 'fadccbcfbbfb',
 'abdcc_affccb',
 'ab_bbadbbaád',
 'caaócbfbabfb']

## Last step: remove accents

The last step consists in removing accents from the strings.

Import the package `unidecode` to use its module also called `unidecode` to remove accents. Test on the word below.

In [97]:
!pip3 install unidecode

Collecting unidecode
  Using cached https://files.pythonhosted.org/packages/d0/42/d9edfed04228bacea2d824904cae367ee9efd05e6cce7ceaaedd0b0ad964/Unidecode-1.1.1-py2.py3-none-any.whl
Installing collected packages: unidecode
Successfully installed unidecode-1.1.1


In [98]:
from unidecode import unidecode

In [101]:
text = 'aéóúaorowó' 

In [102]:
unidecode(text)

'aeouaorowo'

### Now remove the accents for each column name in your `normalized_cols` function.

*Hint: Copy and paste the last `normalize_cols` function to change it.*

In [105]:
def normalize_cols(df):
    col_names=df.columns
    lower_colnames = []
    for i in col_names:
        i = unidecode(i)
        lower_colnames.append(i.lower())
        for i in lower_colnames:
            if i.startswith(' '):
                lower_colnames[lower_colnames.index(i)] = i.lstrip()
            elif ' ' in i:
                pos = lower_colnames.index(i)
                i = re.sub(' ','_', i)
                i = re.sub('([_]+)','_',i)
                lower_colnames[pos] = i
            else:
                lower_colnames[lower_colnames.index(i)] = i
                
    return lower_colnames  

### Test your results

In [106]:
normalize_cols(create_weird_dataframe())

['aacccdacbfbf',
 'abdbecaacaaa',
 'babdaafb_a_',
 'cabfbecbaccc',
 'beebdccaeacc',
 'f_dcdbcafdaa',
 'ccdebbadedd_',
 'fafoabbaodba',
 'eeocobbcbece',
 'facef_aaabaa']

## Good job. 

Right now you have a function that receives a dataframe and returns its columns names with a good formatting.

# Creating our own dataframe.

In [107]:
from pandas import DataFrame

A dataframe is just a simple class. It contains its own attributes and methods. 

When you create a pd.DataFrame() you are just instantiating the DataFrame class as an object that you can store in a variable. From this point onwards, you have access to all DataFrame class attributes (`.columns` for example) and methods (`.isna()` for example). We've been using those since always! 

If we wish, we could create our own class inheriting everything from a DataFrame class.

In [108]:
class myDataFrame(DataFrame):
    pass

Instead of just creating myDataFrame, put your function inside your new inherited class, that is, transform `normalize_cols` into a method of your own DataFrame.

Remember you'll have to give self as the first argument of the `normalize_cols`. So you could replace everything you once called `dataframe` inside your `normalize_cols` by `self`. 

At the end, return the list of the correct names.

In [113]:
class myDataFrame(DataFrame):
    def normalize_cols(self):
        col_names=self.columns
        lower_colnames = []
        for i in col_names:
            i = unidecode(i)
            lower_colnames.append(i.lower())
            for i in lower_colnames:
                if i.startswith(' '):
                    lower_colnames[lower_colnames.index(i)] = i.lstrip()
                elif ' ' in i:
                    pos = lower_colnames.index(i)
                    i = re.sub(' ','_', i)
                    i = re.sub('([_]+)','_',i)
                    lower_colnames[pos] = i
                else:
                    lower_colnames[lower_colnames.index(i)] = i
                
        return lower_colnames  

Test your results.

In [114]:
df = myDataFrame(create_weird_dataframe())
df.normalize_cols()

['bdeebeab_eab',
 'acbbafaccc_b',
 'efec_ebaede',
 'eabcb_a_afab',
 'cabbaafcafbe',
 'fcbaaabbfbdb',
 'aabbcea_abac',
 'cccdaeaabdc',
 'baaa_abdfdac',
 'efbfdb_febaf']

## Understanding even more the `self` argument

Instead of returning a list containing the correct columns, you should now assign the correct columns to the `self.columns` - this will effectively replace the values of your object by the correct columns.


Now change your method to return the dataframe itself. That is, return the `self` argument this time and see the results! 

```python
class myDataFrame(DataFrame):
    def normalize_cos(self):
        ...
        return self
```

In [122]:
class myDataFrame(DataFrame):
    def normalize_cols(self):
        col_names=self.columns
        lower_colnames = []
        for i in col_names:
            i = unidecode(i)
            lower_colnames.append(i.lower())
            for i in lower_colnames:
                if i.startswith(' '):
                    lower_colnames[lower_colnames.index(i)] = i.lstrip()
                elif ' ' in i:
                    pos = lower_colnames.index(i)
                    i = re.sub(' ','_', i)
                    i = re.sub('([_]+)','_',i)
                    lower_colnames[pos] = i
                else:
                    lower_colnames[lower_colnames.index(i)] = i
                
     
        self.columns = lower_colnames            
        return self  

In [123]:
df = myDataFrame(create_weird_dataframe())
df.normalize_cols()

Unnamed: 0,bdca_aababac,ffccdecfcccc,aacaaaafcfdd,cabbfebfcbbd,dabf_acbfeab,cbbfaedbfbdb,cbaaeabedaff,eccabaeeca_d,cbcebedfcdab,ccfbabcdaffa
0,0.61642,0.931237,0.46963,0.746373,0.918487,0.758473,0.235616,0.96189,0.27693,0.246395
1,0.849017,0.250975,0.739936,0.427323,0.116952,0.561149,0.546146,0.238881,0.78578,0.226862
2,0.303931,0.323895,0.367229,0.254722,0.249803,0.630437,0.586975,0.585635,0.180715,0.048303
3,0.494516,0.566789,0.322082,0.2835,0.11505,0.569276,0.859821,0.240007,0.645257,0.127804
4,0.530873,0.735414,0.394613,0.481686,0.333123,0.658203,0.840639,0.948086,0.239187,0.767037
5,0.832373,0.730057,0.150634,0.98701,0.566729,0.99705,0.077597,0.872504,0.608233,0.701973
6,0.75296,0.54661,0.558165,0.233757,0.435297,0.929127,0.928201,0.914252,0.145037,0.987561
7,0.537113,0.662166,0.588955,0.554291,0.343417,0.99573,0.201902,0.010038,0.565643,0.847529
8,0.524365,0.248021,0.578534,0.207728,0.232216,0.303676,0.680903,0.63988,0.131868,0.230204
9,0.423364,0.218112,0.815582,0.693277,0.886677,0.53114,0.237161,0.028835,0.826256,0.898408


## Understanding the self argument

Now, below the `for loop` you've created, let's create the attributes of the `Car` class. Remember the attributes you wrote down earlier? Let's put them as arguments of the `__init__(self,)` function.

Remember, the first argument of the `__init__(self,)` function should always be the `self` keyword. 

The `self` argument represents the object itself. That is a way for you to have access to the objects own attribute. 


### First, let's start creating one single attribute of this car.

Let's say you have chosen `name` as a car attribute (what? can't a car have a name?). 

If you want your class to receive a specific car name as an argument, you have to put this variable as the argument of the `__init__` function. So, to add `name`, the results of your special function definition would be:

```python
def __init__(self, name):
    pass
```

Now, when you instantiate your Car class, the syntax would be similar to calling a function (which, by now, you should now that it is what you are effectively doing - you are calling the __init__ method), so what the syntax would be:

*Hint: If you don't specify an argument, the python interpreter will complain that your class requires one argument (try that - if you don't try it now, it is not a problem, you'll try in future, even when you don't want to).*


In [0]:
# your code here

### Now let's store that new argument

By now, you are only receiving the name of the car as an argument, but you are not doing anything specifically with that variable called `name`.

Let's store that in the object. That's the first use of the `self` keyword.

To store the variable in a way that the user can access via a `car.SOMETHING`, you have to specify that the object itself is receiving the attribute `name` (for example)

Then, **create a variable called `name` that receives the argument `name`** (keep in mind that the name of the variable need not necessarily be the same, you could assing the argument `name` to an attribute called `chimpanze` for example).

Also **create the other 5 attributes that you previously had in mind**


In [0]:
# your code here

### Access the attribute

You should now be able to access the object's attribute once you instantiate it as `my_car.name`

You can try to write `my_car.<TAB>` to check what attributes or methods your object contains.

## Understanding special methods

Special methods are the ones that start with double underlines (usually called `dunder`), for example the `__init__` method, the `__doc__` method or `__repr__` method (called as `dunder init`, `dunder doc`, `dunder repr`).

The `__repr__` method is responsible to show how your class will be displayed on screen when you display it.
Let's create a `__repr__(self)` function on our `Car` class that returns the following string below (copy the string below):

```python
    car = f'''
                  ______--------___
                 /|             / |
      o___________|_\__________/__|
     ]|___     |  |=   ||  =|___  |"
     //   \\    |  |____||_///   \\|"
    |  X  |\--------------/|  X  |\"
     \___/                  \___/
    '''
```

Your class should now have two special methods, `__init__` and `__repr__`

In [0]:
class Car:
    
    def __init__(self, car_name):
        self.car_name = car_name
    
    def __repr__(self):
        
        car = f'''
                      ______--------___
                     /|             / |
          o___________|_\__________/__|
         ]|___     |  |=   ||  =|___  |"
         //   \\    |  |____||_///   \\|"
        |  X  |\--------------/|  X  |\"
         \___/                  \___/
        '''
        
        return car

### Now instantiate your Car class again

In [0]:
my_car = Car('Jeguinho')

### And check what happens when you print your object on screen

In [0]:
print(my_car)

### Now create a simple method to receive and return the `self` variable

Create a simple method inside your `class Car` and return `self` the self argument. Name this method `get_itself`.

In [0]:
class Car:
    
    def __init__(self, car_name):
        self.car_name = car_name
    
    def __repr__(self):
        
        car = f'''
                      ______--------___
                     /|             / |
          o___________|_\__________/__|
         ]|___     |  |=   ||  =|___  |"
         //   \\    |  |____||_///   \\|"
        |  X  |\--------------/|  X  |\"
         \___/                  \___/
        '''
        
        return car
    
    def get_itself(self):
        return self

#### Now instantiate the Car class and call `get_itself()`

In [0]:
my_car = Car('andre')

In [0]:
my_car.get_itself()

This happens because you are print this specific object. 