## The .apply() method

Here we will learn about a very useful method known as **apply** on a DataFrame. This allows us to apply and broadcast custom functions on a DataFrame column

In [2]:
import numpy as np 
import pandas as pd
import os

os.chdir("C:\\Users\\sfrie\\Python\\pandas\\udemy_pandas_files")
df = pd.read_csv("tips.csv")

## Apply on a Single Column

#### If we want to get the last four digits of credit card info, we can check the pandas functions to see what there is. However, when we get the data type info, we come across a difficulty:


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   total_bill        244 non-null    float64
 1   tip               244 non-null    float64
 2   sex               244 non-null    object 
 3   smoker            244 non-null    object 
 4   day               244 non-null    object 
 5   time              244 non-null    object 
 6   size              244 non-null    int64  
 7   price_per_person  244 non-null    float64
 8   Payer Name        244 non-null    object 
 9   CC Number         244 non-null    int64  
 10  Payment ID        244 non-null    object 
dtypes: float64(3), int64(2), object(6)
memory usage: 21.1+ KB


#### The CC Number column is an integer. We can not use slice notation on an integer, such as 123456 followed by bracketed 0. Therefore, we need to convert it to str(1234567) followed by bracketed 0.
#### However, there are not any pandas functions to convert to string and then slice. The Apply method allows us to apply our own function though.

In [8]:
def last_four(num):
    return str(num)[-4:]
df['last_four'] = df['CC Number'].apply(last_four)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221


#### Another example, would be if we wanted to create a yelp review based on the expense of the food. Depending opn price range, it will assign different amounts of '$'/


In [10]:
def yelp(price):
    if price < 10:
        return '$'
    elif price >= 10 and price < 30:
        return '$$'
    else:
        return '$$$'
df['yelp'] = df['total_bill'].apply(yelp)
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,yelp
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$


In [12]:
#### to show this conditionally:
df[df['yelp'] == '$'].head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,yelp
6,8.77,2.0,Male,No,Sun,Dinner,2,4.38,Kristopher Johnson,2223727524230344,Sun5985,344,$
30,9.55,1.45,Male,No,Sat,Dinner,2,4.78,Grant Hall,30196517521548,Sat4099,1548,$
43,9.68,1.32,Male,No,Sun,Dinner,2,4.84,Christopher Spears,4387671121369212,Sun3279,9212,$
53,9.94,1.56,Male,No,Sun,Dinner,2,4.97,Curtis Morgan,4628628020417301,Sun4561,7301,$
67,3.07,1.0,Female,Yes,Sat,Dinner,1,3.07,Tiffany Brock,4359488526995267,Sat3455,5267,$


## Apply on Multiple Columns

In [13]:
#Lambda
# we can create a simple fucntion that takes a number and passes it multiplied bny two
def simple(num):
    return num*2
#Another way to create a function is with lambda
lambda num:num*2
#lambda is great for anonymous functions which you only plan to use once.

<function __main__.<lambda>(num)>

In [15]:
df['total_bill'].apply(lambda num:num*2).head()

0    33.98
1    20.68
2    42.02
3    47.36
4    49.18
Name: total_bill, dtype: float64

### There are two ways we can apply a function to multiple columns:
#### 1) lambdas
#### 2) numpy vectorize


#### Lambdas

In [28]:
def quality(total_bill, tip):
    if tip/total_bill >0.25:
        return 'Generous'
    else:
        return "Other"

df['Quality'] = df[['total_bill', 'tip']].apply(lambda df: quality(df['total_bill'], df['tip']), axis = 1)
#  We pass in df total bill and tip, into the quality function (through lambda), and we say to apply this to axis = 1, column (which is the second arguement in the apply method).
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,yelp,Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,Other
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$,Other
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$,Other
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$,Other
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$,Other


#### Numpy Vectorize

In [26]:
## We can make this much more aesthetic, and run much faster computationally using numpy vectorize.
### 
df['Quality'] = np.vectorize(quality)(df['total_bill'], df['tip'])
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size,price_per_person,Payer Name,CC Number,Payment ID,last_four,yelp,Quality
0,16.99,1.01,Female,No,Sun,Dinner,2,8.49,Christy Cunningham,3560325168603410,Sun2959,3410,$$,Other
1,10.34,1.66,Male,No,Sun,Dinner,3,3.45,Douglas Tucker,4478071379779230,Sun4608,9230,$$,Other
2,21.01,3.5,Male,No,Sun,Dinner,3,7.0,Travis Walters,6011812112971322,Sun4458,1322,$$,Other
3,23.68,3.31,Male,No,Sun,Dinner,2,11.84,Nathaniel Harris,4676137647685994,Sun5260,5994,$$,Other
4,24.59,3.61,Female,No,Sun,Dinner,4,6.15,Tonya Carter,4832732618637221,Sun2251,7221,$$,Other
