<a href="https://colab.research.google.com/github/codeforgirls-sa/ds/blob/master/unit2/feature-engineering/demo_feature_engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Abuat the dataset

Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell
through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming
task. Other measurements, which are easier to obtain, are used to predict the age. Further information,
such as weather patterns and location (hence food availability) may be required to solve the problem.

![alt text](https://informationdensity.files.wordpress.com/2018/02/chileuncultivatedabalone.jpeg?w=758&h=454)

In [None]:
import pandas as pd

In [None]:

!wget https://raw.githubusercontent.com/codeforgirls-sa/ds/master/unit2/feature-engineering/abalone.csv


--2020-06-17 09:12:17--  https://raw.githubusercontent.com/codeforgirls-sa/ds/master/unit2/feature-engineering/abalone.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 191962 (187K) [text/plain]
Saving to: ‘abalone.csv’


2020-06-17 09:12:17 (4.29 MB/s) - ‘abalone.csv’ saved [191962/191962]



In [None]:
snails = pd.read_csv('abalone.csv')
snails.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


#Create new feature age = Rings + 1.5

In [None]:
# Create new feature age = Rings + 1.5
snails['age'] = snails['Rings'] + 1.5
# print(snails.head())
snails.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings,age
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15,16.5
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7,8.5
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9,10.5
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10,11.5
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7,8.5


# Create categories from age column

In [None]:

# Create categories from age column
def create_age_categories(age):
    if age < 9.5:
        return 'Young'
    elif age > 12.5:
        return 'Old'
    else:
        return 'Middle-aged'


# Pandas.apply allow the users to pass a function and apply it on every single value of the Pandas series.
snails['categorical_age'] = snails['age'].apply(create_age_categories)
# print(snails.head())
snails.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings,age,categorical_age
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15,16.5,Old
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7,8.5,Young
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9,10.5,Middle-aged
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10,11.5,Middle-aged
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7,8.5,Young


# Convert string categories into integers

In [None]:

# What if our algorithm needs categories expressed as integer? => Convert string categories into integers!
age_to_int = {'Young': 1, 'Middle-aged': 2, 'Old': 3}

snails['categorical_age_int'] = snails['categorical_age'].map(age_to_int)
# print(snails.head())
snails.head()

Unnamed: 0,Sex,Length,Diameter,Height,Whole weight,Shucked weight,Viscera weight,Shell weight,Rings,age,categorical_age,categorical_age_int
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15,16.5,Old,3
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7,8.5,Young,1
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9,10.5,Middle-aged,2
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10,11.5,Middle-aged,2
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7,8.5,Young,1
