<a href="https://colab.research.google.com/github/breannashi/Data_Science_Bootcamp/blob/pandas-apply-function/pandas_apply_function.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Applying Functions to Dataframes in Pandas:**
The apply() function can be used to apply functions to an axis of a dataframe. The applymap() function can be used to apply a function elementwise.



## **Apply()**

The pandas dataframe defines the function with the following parameters:

~~~python
def apply( 
    self,
    func,
    axis=0,
    broadcast=None,
    raw=False,
    reduce=None,
    result_type=None,
    args=(),
    **kwds
)
~~~

The most important parameters are **func**, the function to be applied, **axis**, where to apply the function, and **args** and **kwargs**, used to pass additional arguments and kew-word arguments to the function. 

You will need the pandas package when you want to use apply(), and will need the numpy package to complete this tutorial.

In [None]:
#If using a local Python instance

#pip install pandas
#pip install numpy

In [2]:
import pandas as pd
import numpy as np

In [3]:
simple_data = {'A':[1, 2, 3],'B':[10, 20, 30]}
simple = pd.DataFrame(simple_data)
print(simple)

   A   B
0  1  10
1  2  20
2  3  30


###***Passing a function***

Let's make a basic function to apply to our DataFrame. These examples will only use the **func** argument.

In [None]:
def cube(x):
  return x * x * x 

In [None]:
cubed_simple = simple.apply(cube)
print(cubed_simple)

You can do the same thing by passing a NumPy universal function

In [None]:
cube_root_simple = simple.apply(np.cbrt)
print(cube_root_simple)

We can apply these sequentially to return to our original data

In [None]:
original_simple = simple.apply(cube).apply(np.cbrt)
print(original_simple)

     A     B
0  1.0  10.0
1  2.0  20.0
2  3.0  30.0


###***Choosing an axis***
The default value for axis is 0, which applies across rows of the dataframe. We can set axis to different values to effect columns or the n-th dimension.



In [None]:
column_sum = simple.apply(np.sum, axis=0)
print(column_sum, '\n')

row_sum = simple.apply(np.sum, axis = 1)
print(row_sum)

###***Using lambda***
A **lambda** function is a small, anonymous function that you can use within the **apply()** function.

**lambda** takes a set of input variables and performs a calculation.

lambda [parameter_list]: [expression]
~~~python
var = lambda x, y: x * y

var(2,4) #returns 8
~~~

Let's make a function to scale each value by 8.

In [None]:
scaled_simple = simple.apply(lambda x: x * 8)
print(scaled_simple)

But the **lambda** function's output/expression does not have to be related to the input, it just needs one. As such, we can replace the values of a dataframe using a **list** or a **Series**. Note the difference in how these apply!

In [None]:
replaced_simple_list = simple.apply(lambda x: [10, 100])

replaced_simple_series = simple.apply(lambda x: pd.Series([10, 100], index=['height', 'width']), axis=1)
print(replaced_simple_list, replaced_simple_series, sep='\n\n')

###***Using Arguments***

Sometimes the function you are calling will require additional input than the value of the dataframe. You can pass these additonal arguments as a **tuple** and the parameter **args=()**. You can see that here as we scale the data by some factor, y, and adjust it by some factor, z.

In [None]:
def scale_and_adjust(x, y, z):
  return x * y + z

adjusted_simple = simple.apply(scale_and_adjust, args=(2,1))
print(adjusted_simple)

###***Using kwargs***

Additional arguments can also be passed as keywords, in their own parameter. Here you can see the **kwarg** 'm' being used as a switch in the function 'bamboozal'.

In [None]:
def bamboozal(x, y, z, m = 0):
  if m == 0:
    return x * y + z
  elif m == 1:
    return "almond"
  else:
    return "invalid 'm' argument"

bamboozaled_simple_1 = simple.apply(bamboozal, args=(2,1))
bamboozaled_simple_2 = simple.apply(bamboozal, args=(2,1), m = 1)
bamboozaled_simple_3 = simple.apply(bamboozal, args=(2,1), m = 2)

print(bamboozaled_simple_1,bamboozaled_simple_2,bamboozaled_simple_3, sep = '\n\n')