# Python User Defined Functions

* A function have only 3 things : 1) Input 2) function body 3) Output
*  The idea is to put some commonly or repeatedly done tasks together and make a function so that instead of writing the same code again and again for different inputs, we can call the function.
* Functions that readily comes with Python are called built-in functions. Python provides built-in functions like print(), etc. but we can also create our own functions. These functions are known as user defines functions.

### Steps for writing user defined function
* Step:1 def keyword is used to declare user defined functions.
* Step:2 An indented block of statements follows the function name and arguments which contains the body of the function.

In [None]:
def function_name():
    statements

function_name() # calling the function

#### Non Parameterised Functions

In [1]:
# Here the output is independent of what we pass within the function
def fun():
    print("Inside Function")

* Function name : "fun"
* It is a non parameterized function, as we do not any input here
* Whenever we will call this function we will get the output as "Inside Function"

In [2]:
fun()

Inside Function


#### Parameterised Function

* The function may take arguments(s) also called parameters as input within the opening and closing parentheses, just after the function name followed by a colon

In [None]:
def function_name(argument_1, argument_2, ....):
    statements
    .
    .
    .

In [3]:
def evenOdd(x):
    if (x % 2 == 0):
        print("Even")
    else:
        print("Odd")

In [4]:
evenOdd(-2)

Even


#### Default Arguments 

* A default argument is a parameter that assumes a default value if a value is not provided in the function call for that argument.

In [5]:
def myFun(x , y=50):
    print(x)
    print(y)

In [6]:
myFun(20) 

20
50


In [7]:
myFun(30)

30
50


In [8]:
def students(x,y):
    print(x ,y)

In [9]:
students('A','B')

A B


#### Function with return value

* A return statement is used to end the execution of the function call and “returns” the result (value of the expression following the return keyword) to the caller.

In [None]:
def fun():
    statements
    .
    .
    return [expression]

In [10]:
def addition(a,b):
    return a+b

In [11]:
addition(2,5)

7

In [12]:
c = addition(2,5)
c

7

In [13]:
def addition(a,b):
    print(a+b)

In [14]:
addition(2,3)

5


**Print command only prints the value on the console, but the return statements provides the value which we can assign to a variable as well**

**Ques: As we have seen that 'return' and 'print' gives the same output of the function. Does it means that there is no difference between them in the function script?**
***
*  if you want to use the result of the function elsewhere in your code, you would use "return." If you just want to display the result for informational purposes, you can use "print."

In [15]:
# Calling an existing function in the function body of another function
def calculation(a,b):
    return addition(a,b) + 7 

In [16]:
def addition(a,b):
    return a+b

In [17]:
calculation(2,5)

14

In [18]:
# Now suppose we have:
def addition(a,b):
    print(a+b)

In [19]:
def calculation(a,b):
    return addition(a,b) + 7 

In [20]:
# Here we have seen that when used a function with "print" command inside another function. We are getting error
calculation(2,7)

9


TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

**Example : Function with for loop**

# Sampling

* Pandas sample() is used to generate a sample random row or column from the function caller data frame.

In [None]:
df.sample(n=None , frac = None , replace = False , weights = None, random_state = None, axis = None)

* n : number of random rows to genertate
* frac : 
* replace : boolean value, return sample with replacement if True
* random_state : if set to a particular integer, will return same rows as sample in every iteration
* axis = 0/ rows & 1/column 

In [21]:
import pandas as pd
import numpy as np

In [22]:
data_path = "xxx"
df = pd.read_csv(data_path + "employees.csv")
df.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


In [24]:
# It picked up the random row
row_1 = df.sample(n=1)
row_1

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
900,Christina,Female,6/23/2002,3:18 PM,35477,18.178,False,Human Resources


In [25]:
# Random 10% of the total rows are randomly selected
df.sample(frac = 0.1)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
645,Anna,,3/13/1985,9:19 AM,45418,10.162,False,Marketing
176,Victor,Male,1/8/2003,1:02 PM,124486,10.166,False,Product
707,Patricia,Female,3/7/1998,1:10 AM,75825,7.839,False,Engineering
526,Barbara,Female,3/22/2004,8:11 AM,144677,8.696,False,Finance
538,Adam,Male,10/8/2010,9:53 PM,45181,3.491,False,Human Resources
...,...,...,...,...,...,...,...,...
699,Amy,,5/19/1984,11:47 AM,102839,10.385,True,Distribution
674,,Male,8/28/2012,7:56 PM,88733,1.932,,Human Resources
607,,Male,10/13/1983,11:59 PM,139754,12.740,,Sales
188,Charles,Male,10/14/2000,9:40 PM,71749,15.931,False,Legal


In [27]:
# It picked up the random column
df.sample(n=1, axis =1)

Unnamed: 0,First Name
0,Douglas
1,Thomas
2,Maria
3,Jerry
4,Larry
...,...
995,Henry
996,Phillip
997,Russell
998,Larry


# Lambda Function

* A lambda function can take any number of arguments, but can only have one expression.

In [31]:
# Syntax
lambda arguments : expression

# In expression we are doing something with the arguments

<function __main__.<lambda>(arguments)>

In [32]:
# Example1 : Add 10 to argument a, and return the result
# Here we have build a lambda function 'x'
x = lambda a : a + 10 

In [33]:
x(25)

35

**Lambda functions can take any number of arguments**

In [34]:
x = lambda a,b : a *b 
x(5,2)

10

**Lambda function with if condition**

In [35]:
x = lambda a : 3 if np.isnan(a) else 4

In [36]:
x(2 )

4

In [37]:
x(np.nan)

3

**Lambda function with a dataframe**

In [39]:
df.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


In [43]:
df.loc[0:50 , 'Salary'] = np.nan

In [47]:
salary_mean = df['Salary'].mean()
salary_mean

90430.39726027397

In [48]:
# Here we are applying a lambda function to a variable 'Age' in the dataframe
df['Salary'] = df['Salary'].apply(lambda x : int(np.random.normal(salary_mean, 3)) if np.isnan(x) else x)

* Here 'x' refers to each element of the variable "Age" in the data frame "df_missing"
* if Age of a respondent is null value, then Age = np.random.normal(age_mean, 3) otherwise (age value is not null) then it would be as it is.
* The resulting object 'df_missing['Age']' would also be a series. Whose index would the index of the dataframe 'df_missing'. Hence if we will replace the existing 'Age' variable with transformed Age then that would be very smooth because the index of the the series and the dataframe would be the same. 
* Hence this methiod is primarily used to transform a variable in the dataframe based on custom complex conditions.

In [49]:
df

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,90426.0,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,90434.0,4.170,True,
2,Maria,Female,4/23/1993,11:17 AM,90433.0,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,90434.0,9.340,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,90432.0,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
995,Henry,,11/23/2014,6:09 AM,132483.0,16.655,False,Distribution
996,Phillip,Male,1/31/1984,6:30 AM,42392.0,19.675,False,Finance
997,Russell,Male,5/20/2013,12:39 PM,96914.0,1.421,False,Product
998,Larry,Male,4/20/2013,4:45 PM,60500.0,11.985,False,Business Development


# Try - Except function

* Error in Python can be of two types i.e. Syntax errors and Exceptions
* Try and Except statement is used to handle these errors within our code in Python.
* The try block is used to check some code for errors i.e the code inside the try block will execute when there is no error in the program. Whereas the code inside the except block will execute whenever the program encounters some error in the preceding try block.

In [None]:
# Syntax
try:
    # Some code
except:
    # Execute if error in the try block 

In [54]:
def divide(x,y):
    try:
        result = x/y
        print("Your Answer is : ", result)
    except:
        print("You are dividing by 0")

In [55]:
divide(10,2)

Your Answer is :  5.0


In [56]:
divide(5,0)

You are dividing by 0


# Transformation of a variable based on conditions

### Where Command 

**df.where(cond , other = nan)**
* For every vakue in DataFrame where cond is True, the original value is retained. 
* For every value where cond is False, the original vakue is replaced by the value specified by the other argument.

In [57]:
import pandas as pd
df = pd.DataFrame({'points' : [25, 12, 15, 14, 19, 23, 25, 29] , 'assists' : [5,7,7,9,12,9,9,4] , 'rebounds':[11,8,10,6,6,5,9,12]})
df

Unnamed: 0,points,assists,rebounds
0,25,5,11
1,12,7,8
2,15,7,10
3,14,9,6
4,19,12,6
5,23,9,5
6,25,9,9
7,29,4,12


**Replace values in the entire dataframe**

In [58]:
# Here by default 'other = nan'
df.where(df>7)

Unnamed: 0,points,assists,rebounds
0,25,,11.0
1,12,,8.0
2,15,,10.0
3,14,9.0,
4,19,12.0,
5,23,9.0,
6,25,9.0,9.0
7,29,,12.0


In [59]:
df.where(df>7 , 'low')

Unnamed: 0,points,assists,rebounds
0,25,low,11
1,12,low,8
2,15,low,10
3,14,9,low
4,19,12,low
5,23,9,low
6,25,9,9
7,29,low,12


**replace values in specific column of the DataFrame**

In [60]:
df.where(df['points'] > 15 , 'low')

Unnamed: 0,points,assists,rebounds
0,25,5,11
1,low,low,low
2,low,low,low
3,low,low,low
4,19,12,6
5,23,9,5
6,25,9,9
7,29,4,12


* Here if the given condition is not satisfied then the entire row of the dataset 'df' would be converted into nan. That is why it is so important to use specific column before the where condition.

In [61]:
# This is a series 
df['points'].where(df['points'] > 15 , 'low')

0     25
1    low
2    low
3    low
4     19
5     23
6     25
7     29
Name: points, dtype: object

In [62]:
# The obove series would be assigned to a particular column of the dataframe
df['points'] = df['points'].where(df['points'] > 15 , 'low')
df

Unnamed: 0,points,assists,rebounds
0,25,5,11
1,low,7,8
2,low,7,10
3,low,9,6
4,19,12,6
5,23,9,5
6,25,9,9
7,29,4,12


# References

**Where command**
* https://www.statology.org/pandas-where/#:~:text=The%20where()%20function%20can,values%20in%20a%20pandas%20DataFrame.&text=For%20every%20value%20in%20a,specified%20by%20the%20other%20argument.