# Lesson 16: Pandas Part 6b - Transforming Data in Pandas

## Using `map()`, `apply()`, `applymap()` to transform data

Sources:
>- [map doc](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html)
>- [apply doc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)
>- [applymap doc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.applymap.html)


#### First import pandas and numpy and build a practice DataFrame

numpy is behind a lot of the fundamentals of pandas and offers a lot of fast mathematical calculations.
>- numpy documentation: https://numpy.org/devdocs/user/whatisnumpy.html

In [56]:
# Imports

import os, pandas as pd

from google.colab import drive

drive.mount('/content/drive/')

os.chdir('/content/drive/MyDrive/Files_for_pandas/')



Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [57]:
import numpy as np





In [58]:
df = pd.DataFrame({'A': [1,2,3,4],
                   'B': [10,20,30,40],
                   'C': [20, 40, 60, 90]},
                   index= ['Row1', 'Row2', 'Row3', 'Row4']
                  )

## Using `map()` to modify a Series

#### First, add 10 to all numbers in column 'A'
>- Store the results in a new column named, `'A+10'`

In [59]:
df['A+10'] = df['A'].map(lambda x: x+10)

In [60]:
df

Unnamed: 0,A,B,C,A+10
Row1,1,10,20,11
Row2,2,20,40,12
Row3,3,30,60,13
Row4,4,40,90,14


#### We can also use conditional logic
>- Add a new column that stores a `0` for all values in 'A' below 3 and `1` for all values 3 and above
>- Name the new column, `flag3`

In [61]:
df['flag3'] = df['A'].map(lambda x: 0 if x<3 else 1)

In [62]:
df

Unnamed: 0,A,B,C,A+10,flag3
Row1,1,10,20,11,0
Row2,2,20,40,12,0
Row3,3,30,60,13,1
Row4,4,40,90,14,1


### If you have more conditions or more complicated request define a custom function
>- Then call your function within `map()`

In [63]:
def myfilter(x):

  if x<2:
    return 0

  elif x==3:
    return 3

  else:
    return 1

In [64]:
df['flagX']= df['A'].map(myfilter)

In [65]:
df

Unnamed: 0,A,B,C,A+10,flag3,flagX
Row1,1,10,20,11,0,0
Row2,2,20,40,12,0,1
Row3,3,30,60,13,1,3
Row4,4,40,90,14,1,1


# Using `apply()` to apply a function along an axis of the DataFrame or on values of a series

## Let's use `apply()` to sum all the values across the columns
>- Name the column `rowTot`
>- We will only sum columns 'A', 'B', 'C'

### First, define a function

In [66]:
def mysum(value):
  return value.sum()

### Then, use `apply()` to apply your function across the columns
>- `axis = 1` tells `apply()` to work across the columns

In [67]:
df['rowTot']= df[['A','B','C']].apply(mysum, axis= 1)

df

Unnamed: 0,A,B,C,A+10,flag3,flagX,rowTot
Row1,1,10,20,11,0,0,31
Row2,2,20,40,12,0,1,62
Row3,3,30,60,13,1,3,93
Row4,4,40,90,14,1,1,134


#### You can also use lambda function because our custom sum function was pretty simple

In [68]:
df['rowTot1']= df[['A','B','C']].apply(lambda x: x.sum(), axis= 1)
df

Unnamed: 0,A,B,C,A+10,flag3,flagX,rowTot,rowTot1
Row1,1,10,20,11,0,0,31,31
Row2,2,20,40,12,0,1,62,62
Row3,3,30,60,13,1,3,93,93
Row4,4,40,90,14,1,1,134,134


### We can also use `apply()` to sum across (or down if you prefer) the rows
>- We will label the sum row as `colTot`
>- Passing `axis = 0` tells apply() to work down the rows

In [69]:
df.loc['colTot']= df.apply(lambda x:x.sum(), axis=0)

df

Unnamed: 0,A,B,C,A+10,flag3,flagX,rowTot,rowTot1
Row1,1,10,20,11,0,0,31,31
Row2,2,20,40,12,0,1,62,62
Row3,3,30,60,13,1,3,93,93
Row4,4,40,90,14,1,1,134,134
colTot,10,100,210,50,2,5,320,320


In [70]:
df.loc['colTot2']= df.loc[['Row1','Row2'],].apply(lambda x:x.sum(), axis=0)

df

Unnamed: 0,A,B,C,A+10,flag3,flagX,rowTot,rowTot1
Row1,1,10,20,11,0,0,31,31
Row2,2,20,40,12,0,1,62,62
Row3,3,30,60,13,1,3,93,93
Row4,4,40,90,14,1,1,134,134
colTot,10,100,210,50,2,5,320,320
colTot2,3,30,60,23,0,1,93,93


In [71]:
df.loc[ 'Row1',['A'] ]

A    1
Name: Row1, dtype: int64

# We use `applymap()` to apply functions across every element in a DataFrame

## Square everything in the DataFrame using `np.square`
>- Check here for other numpy math functions: https://numpy.org/doc/stable/reference/routines.math.html

In [72]:
df.applymap(np.square)

Unnamed: 0,A,B,C,A+10,flag3,flagX,rowTot,rowTot1
Row1,1,100,400,121,0,0,961,961
Row2,4,400,1600,144,0,1,3844,3844
Row3,9,900,3600,169,1,9,8649,8649
Row4,16,1600,8100,196,1,1,17956,17956
colTot,100,10000,44100,2500,4,25,102400,102400
colTot2,9,900,3600,529,0,1,8649,8649


#### can also use a `lambda` function

In [73]:
df.applymap(lambda x: x**2 )

Unnamed: 0,A,B,C,A+10,flag3,flagX,rowTot,rowTot1
Row1,1,100,400,121,0,0,961,961
Row2,4,400,1600,144,0,1,3844,3844
Row3,9,900,3600,169,1,9,8649,8649
Row4,16,1600,8100,196,1,1,17956,17956
colTot,100,10000,44100,2500,4,25,102400,102400
colTot2,9,900,3600,529,0,1,8649,8649
