<div class="alert alert-block alert-success">
    <h1 align="center">Pandas Trick</h1>
    <h3 align="center">Convert Data</h3>
    <h4 align="center"><a href="https://github.com/SMSajadi99/Practical-Machine-Learning">Seyed Mohammad Sajadi</a></h5>
</div>

### Importing the libraries

In [4]:
import pandas as pd

In [5]:
pd.__version__

'2.0.3'

## Import Dataset & Make Dataframe

In [6]:
df = pd.read_csv('titanic.csv')
df.head()

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S,2.0,,"St Louis, MO"
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S,11.0,,"Montreal, PQ / Chesterville, ON"
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON"
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON"


## Memory Usage in each Column (how to optimize your DataFrame storage)

In [7]:
df.sex.head()

0    female
1      male
2    female
3      male
4    female
Name: sex, dtype: object

In [8]:
df['sex_num'] = df.sex.map({'male':0, 'female':1})
df.sex_num.head()

0    1
1    0
2    1
3    0
4    1
Name: sex_num, dtype: int64

In [9]:
df.embarked.head(10)

0    S
1    S
2    S
3    S
4    S
5    S
6    S
7    S
8    S
9    C
Name: embarked, dtype: object

In [10]:
df['embarked_num'] = df.embarked.factorize()[0]
df.embarked_num.head(10)

0    0
1    0
2    0
3    0
4    0
5    0
6    0
7    0
8    0
9    1
Name: embarked_num, dtype: int64

### Factorize()
`factorize()` returns a tuple in which the first element contains the new values, which is why I had to use `[0]` to extract the values.

You can see that "S" has become 0, "C" has become 1, and "Q" has become 2. It chose that mapping based on the order in which the values appear in the Series, and if you need to reference the mapping, it's stored in the second value in the tuple:

In [11]:
df.embarked.factorize()[1]

Index(['S', 'C', 'Q'], dtype='object')

In [12]:
df.sibsp.head(10)

0    0
1    1
2    1
3    1
4    1
5    0
6    1
7    0
8    2
9    0
Name: sibsp, dtype: int64

In [15]:
df['sibsp_binary'] = (df.sibsp > 0).astype('int')
df.sibsp_binary.head(10)

0    0
1    1
2    1
3    1
4    1
5    0
6    1
7    0
8    1
9    0
Name: sibsp_binary, dtype: int32

#### Notice that the only value greater than 1 has been converted to a 1.

In [16]:
df.head(10)

Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,sex_num,embarked_num,sibsp_binary
0,1,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,S,2,,"St Louis, MO",1,0,0
1,1,1,"Allison, Master. Hudson Trevor",male,0.92,1,2,113781,151.55,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",0,0,1
2,1,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",1,0,1
3,1,0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,0,1
4,1,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",1,0,1
5,1,1,"Anderson, Mr. Harry",male,48.0,0,0,19952,26.55,E12,S,3,,"New York, NY",0,0,0
6,1,1,"Andrews, Miss. Kornelia Theodosia",female,63.0,1,0,13502,77.9583,D7,S,10,,"Hudson, NY",1,0,1
7,1,0,"Andrews, Mr. Thomas Jr",male,39.0,0,0,112050,0.0,A36,S,,,"Belfast, NI",0,0,0
8,1,1,"Appleton, Mrs. Edward Dale (Charlotte Lamson)",female,53.0,2,0,11769,51.4792,C101,S,D,,"Bayside, Queens, NY",1,0,1
9,1,0,"Artagaveytia, Mr. Ramon",male,71.0,0,0,PC 17609,49.5042,,C,,22.0,"Montevideo, Uruguay",0,1,0
