# Python user defined functions


## 1- `def` statement 
* the Python `def` statement is a true executible statement, when it runs it creates a new funcsion object and assigns it to a name.   
* `def`s are not evaluated until they are reached and ran, hence they do not need to be fully defined before a program runs.  
* in the `def` statement we define the number of argumets to tbe provided. (optionl, minimum can be zero).   
* often the body of a `def` statement contains an -optional- `return` statement and it may show up anywhere in the body of the method.  
* `def` can appear nested within an `if` statement depending on a condition.  
&nbsp;

`def method_name(arg_1, arg_2.... arg_N)
    statement(s)
    return(value)`
    
* let us write a simple method. It take two number and multiples one by the other.  

In [None]:
def multiply(a, b):
    return(a * b)

In [None]:
multiply(3,4)

In [None]:
multiply(12,0.3)

* the output of a method return can be assigned to a name:

In [None]:
f = multiply('too',5)
f

In [None]:
multiply(3,'Na')

* this is an example of *polymorphism* in Python, a type-dependent behavior. What the expression `a * y` will return depends upon the kinds of objects that `a` and `b` are. In the first example the method performs a multiplication while in the second it performs a repetition.      
* any two objects that support  **`*`** will work no matter what type.  
* polymorphism means that the meaning of an operation depends on the object being operated upon.   
* because python is dynamically type language almost every operation is a polymorphic operation.  
* this by design, accounts to no small extent for Python's conciseness flexibility.  
&nbsp;


In [None]:
def intersect(seq1, seq2):
    return([x for x in seq1 if x in seq2])

In [None]:
intersect('tribe','entreat')


* the `return` statement is optional:

In [None]:
def test_func(n1,n2):
    print('if you multiply {} by {} you get {}'.format(n1,n2,n1*n2))

In [None]:
test_func(22,0.9)

* to avoid having to specify all arguments we can assign default values for method's arguments
* the arguments are passed in order, so the first argument will always be assigned to `y` and the second to `x` unless declared otherwise.   

In [None]:
def power(y,x = 2):
    return(y**x)

In [None]:
power(20)

* it is possible to include a placeholder for an argument without a default value and test for it using the reserved keyword `None`    

In [None]:
def double_power(y,x = 2,z = None):
    val = y**x if z==None else y**(x**z)
    return(val)

In [None]:
double_power(3)

In [None]:
double_power(y=3,z=4)

* method(s) can call or be called from another method(s)

In [None]:
def power_call():
    # use map to capture multiple entries
    i,j,k = map(float,input('enter i,j,k: ').split(','))
    return(double_power(i,j,k))
    

def double_power(y,x = 2,z = None):
    val = y**x if z==None else y**(x**z)
    return(val)

In [None]:
power_call()

&nbsp;

* if we rearrange the argument order in the method `double_power` we get a SyntaxError!
* in Python methods, **a method's non-default argument(s) should always preceed detault arguments.  **   

In [None]:
def double_power(y,x = 2,z = None):
    val = y**x if z==None else y**(x**z)
    return(val)

&nbsp;

## 2- `*args` and `**kwargs`

* the extension `*` and `**` support passing any number of arguments into a function. 
* commonly `*args` is used to pass arguments that are interpreted as a list whereas `**kwargs` allows passing argument that will be processed as a dictionary. however any other word can be used with `*` and `**`.

In [None]:
#1 star creates a list
def count_arguments(*args):
    for obj in args:
        print(obj)
    print('\nthere\'s {} object(s) in this argument'.format(len(args)))

In [None]:
count_arguments('bar',3,2,'spam','foo')

In [None]:
#two stars creates a dictionary 
def count_dict(small = 15, large = 22, **kwargs):
    for keys in kwargs:
        print(keys)
        
    print('\nsmall = {}, large = {},'.format(small, large), kwargs)

In [None]:
count_dict(small = 15, large = 25, pepperoni=2,sausage=5,beef=4,chicken=3)

notice that kwargs is printied entirely as a dictionary. 

&nbsp;


In [None]:
count_dict(pepperoni='two',sausage='five',beef='four',chicken='three')

* if a method contains an `*args` argument mixed with other arguments it is important to pay attention to the order in the `def` call.  
* since an `*args` argument sequesters everything that comes after them, if there is another argument to be set after `*args` that argument needs to be passed explicitly.

In [None]:
def test_func(a,*b,c):
    print(a,b,c)

In [None]:
# an error ocurrs because *b will sequester the values 2,3,4,5 and perceives c to be missing.
test_func(1,2,3,4,5)

In [None]:
test_func(1,2,3,4,c=5)

In [None]:
test_func(a=1,c=22)

* the `*` extension can be used by iteself (without a keyword) to force all arguments following it to be passed.

In [None]:
# the following construct will force all arguments to be declared
def forced_args(a,*,b,c,d):
    print(a,b,c,d)

In [None]:
forced_args(5, 2, 1,'foo')

while the first argument passed will be automatically assigned to `a`, the `*` extension forces argument declaration for everything that follows it.

In [None]:
forced_args(5, c=2, d='foo',b=1)

* unlike `*`, the two star argument `**` cannot appear by itself as an argument.
* unline `*args`, the `**kwargs` does not accept any named arguments after it. The `**kwards` (or equivalent) has to be the last argument.

In [None]:
def count_dict(**kwargs, small, large):
    for keys in kwargs:
        print(keys)
    print('\n','small={},'.format(small),'large={},'.format(large),kwargs)

&nbsp;

In the Gregorian calendar three criteria must be taken into account to identify leap years:

* The year can be evenly divided by 4, is a leap year, unless:
    * The year can be evenly divided by 100, it is NOT a leap year, unless:   
        * The year is also evenly divisible by 400. Then it is a leap year. 
        
        
* the years 2000 and 2400 are leap years, while 1800, 1900, 2100, 2200, 2300 and 2500 are NOT leap years
        
<span style='color:blue'>write the method `is_leap()` takes integers (years) as argument, goes thru the logic above and then prints out whether a year is a leap year or not. use the **`*args`** extension to pass multiple arguments of year into the method</span>

method return

`1904 IS a leap year
1932 IS a leap year
1986 IS NOT a leap year
2000 IS a leap year
2008 IS a leap year
2016 IS a leap year
2021 IS NOT a leap year`

In [None]:
#skipped code
year = list(range(1900:2300))
def is_leap(*):
    if year%4==0: 
        if year%100 <> 0:
            if year%400==0:
                return (year 'IS a leap year')
            else:
                return (year 'IS NOT a leap year')
        else:
            return (year 'IS NOT a leap year')
    else:
        return (year 'IS NOT a leap year')
        
    
    


In [None]:
def is_leap(*args):
    for year in args:
        leap= 'IS NOT'
        if year%4 ==0:
            if year % 100 != 0:
                leap = 'IS'
            elif year%400 == 0:
                leap = 'IS'
        print('{1} {0} a leap year'.format(leap,year))

In [None]:
#check your logic
is_leap(1904, 1932, 1986, 2000, 2008, 2016, 2021)


&nbsp;

## 3- Generators and the `yield` statement:

* we have already come across a few iterables such as the `range()` function and list comprehensions.   
* a Generator is an iterable object that when created is compiled into an object that supports iteration protocols.  
* Generators can be created in two different ways:  
 - 1 -  using `def` statement with `yield()` statement instead of `return()` statement in a method.      
 - 2 -  a comprehension expression that is enclosed in parantheses `(`  `)`.     
         
         
* When the method/comprehension runs and is assigned to a name the resulting object is a generator object.


the function below finds all numbers between 0 and 1000 that are divisble by 11 and creates an iterator.

In [None]:
my_list = list(range(10001))

In [None]:
def divisible_11(a_list):
    for num in a_list:
        if num % 11 == 0:yield num  
            

In [None]:
iterator_11 = divisible_11(my_list)

In [None]:
type(iterator_11)

we can use `iterator_11` to do something with those values. 

In [None]:
for i in iterator_11:
    my_list[i] = '------'
    
print(my_list[:1000], end = ' ')

* but why do this when we can do it with a simple filtered list comprehension ?
    - Generators are memory space optimizers.
    - Generators similar to `range()` objects do not require the entire list of objects to be constructed.   
    - Generators are better suited for very large result sets because for small sets list comprehensions run faster.  

In [None]:
# let't try to build the index list using a conventional approach
divisible_list = []
for num in range(10001):
    if num % 11 == 0:
        divisible_list.append(num)


In [None]:
from sys import getsizeof
getsizeof(iterator_11), getsizeof(divisible_list)

* the other way to create Generators.

In [None]:
iterator_7 = (num for num in range(10000) if num % 7 == 0)

In [None]:
type(iterator_7)

In [None]:
iterator_7 = (num for num in range(10000) if num % 7 == 0)
for i in iterator_7:
    my_list[i] = '_____'

In [None]:
print(my_list[:1000], sep=' ')

&nbsp;

<span style="color:blue">write a user defined function that iterates over a a list of integers 1 to `n` and yields the integers which are a perfect square</span>

In [None]:
#skipped code
n=int(input('please select 1 to '))
x = list(range(1,(n+1)))

def perf_sqr(a_list):
    for a in a_list:
        if (sqrt(a)).is_integer():
            yield (sqrt(a))

num for num in range(1,n) if sqrt(a).is_integer()
print(perf_sqr(200))

In [None]:
import numpy as np

n=int(input('please select 1 to '))
x=(num for num in range(1,n+1) if (np.sqrt(num)).is_integer())
for x in x:
    print((x))


In [None]:
from numpy import * 
import numpy as np
np.sqrt(49)

In [None]:
Click on M to Change a chunk to a text chuck

similar to `range()` objects, to display the contents of genertor either print() the elements or convert the generator object to a list

In [None]:
#Write a udf that takes in a list of states and a state abbreviation and returns an iterator with the index of that states within the list


def state_index(list_, ST):
    for index, st in enumerate(list_):
        if st == ST:
            yield index
            
IL_list = list(State_index(random_ist, 'IL'))

&nbsp;

&nbsp;

## 3.2.5 `lambda()` operator.

* `lambda()` operator is an **expression** used to create anonymous functions. Because it is an expression it can appear in place where a `def` statement in not allowed such as inside a list literal or a function call argument. 
* the body of a `lambda()` function is similar to that of a `def()` yet the result is written as a naked expression. It is more limited by virtue of being an expression.   
* it is designed to run simple tasks and it is very instrumental in PySpark when used in conjuction with `map` and `reduce`.   

### ganaral syntax:


#### `lambda()`:
`lambda x,y: x+y`

if `x=(1,2,3)` is a list or a tuple of 3 elements:

`lambda x: x[2]*(x[0] + x[1])`


#### `map()`:

`map(function, iterable)` or `map(lambda(), list)` or `map(lambda(), column)`

`map()`, `filter()` and `reduce()` in conjuction with `lambda()` allow a user to apply functions into an object witout having to write a formal loop or a user defined function. 

In [None]:
def multiply(x,y):
    return(x*y)

In [None]:
multiply(5,6)

In [None]:
f = lambda x, y: x*y

In [None]:
f(5,6)

* above is only a demonstration to show that `lambda()` and `def()` essentially do the same task but lambda is rarely assigned to a name because it is applied almost always as an expression within a statement. 

 
* `lambda` is used to perform operations on Python containers and data frames wihout the need to write a formal user defined function in a manner that is similar to a list comprehension yet offering more flexibility. 

In [None]:
word_list = ('foo', 'bar', '_molly_', '423','gronk', '_wrong_' ,'hello kitty', 'sling', 'drag', '8', '__make__')

digits = filter(lambda word: word.isdigit(), word_list)
list(digits)

In [None]:
tup = (3,2)

tup_dir = filter(lambda attri:not attri.endswith('_'), dir(tup))
list(tup_dir)

<span style='color:blue'>convert the following list of speeds in mph to Mach. 1 Mach = 767.269 mph.    
use `round(object, dceimals)` to round to 2 decimal places.  </span>

In [None]:
mph = [1562, 8965, 124, 1125, 754, 3368]

In [None]:
mach = np.round(list(map(lambda z: z/767.269, mph)),2)
mach

should look like 


`[2.04, 11.68, 0.16, 1.47, 0.98, 4.39]`

* this can be achieved using a simple list comprehension:  

In [None]:
mach = [round(mph/767.269,2) for mph in mph]
mach

### so why use the lambda operator ? if we can achieve the same using list comprehensions...

* the true power of the `lambda` operator comes when it is used along with other methods such as `map()` and `filter()` and `reduce` to iterate over rows of a column or multiple columns in a dataframe. as such it allows fairly complex processing without the need for an explicit *for-statement*. 

In [None]:
# skipping ahead
import numpy as np
import pandas as pd

In [None]:
np.random.seed(11)
df = pd.DataFrame(np.random.random((20)).round(3).reshape(10,2), columns = ['col1','col2'])
df.head()

* `col3` is created by placing the values in `col1` and `col2` in a tuple for every row. 

In [None]:
df['col3'] = list(map(lambda x, y: (x,y) , df.col1, df.col2))
df

* `col4` uses the values in a tuple to calculate $\ \ x^2 + 2xy + y^2$ 

In [None]:
df['col4'] = list(map(lambda x: x[0]**2 + 2*x[0]*x[1] + x[1]**2, df.col3))
df

* `lambda()` operator can also be used in conjunction with `filter` to find the ocurrence of key words in a text.  
* below we read a sample dummy resume for a data scientist. 

In [None]:
file = open('data/tweets.txt', 'r')
tweets = file.readlines()
file.close()

* this is a small sample of tweets collected from accounts that tweet about topics related to data science and machine learning.      
* we would like to check if any of the phrases in our list `key_words` appear in these tweets. 

In [None]:
tweets

* we can read the tweets into a DataFrame given the structure of the file. 

In [4]:
import pandas as pd

In [6]:
pd.set_option('display.max_colwidth', 60)

In [7]:
tweets = pd.read_table('C:/Users/u353822/Desktop/Short Python Course/tweets.txt', names = ['text'])
tweets


Unnamed: 0,text
0,Meet Vestri - relies on a deeplearning technology called...
1,How Can Natural Language Processing Change Business Inte...
2,You have until Friday to save 65% on tickets to Open Dat...
3,"New blog post: ""What's the difference between data scien..."
4,"In an era of algorithms and big data mining, Hawkey's a ..."
5,Machine Learning e Data Mining https://click.linksynergy...
6,"EXCLUSIVE: Cambridge Analytica, the pro-Trump data-minin..."
7,How Will Machine Learning Address Cyber Security Problem...
8,How Can Natural Language Processing Change Business Inte...
9,AI And Deep Learning – A Review of The Past 12 Months AI...


<span style='color:blue'>add a new column `split` in which every row from column `text` is converted to lower case and split by space . use `map` and `lambda`</span>

In [15]:
#skipped code
tweets['split']= list(map(lambda x: (x.lower().split(" "), tweets.text))

In [10]:
tweets

Unnamed: 0,text,split
0,Meet Vestri - relies on a deeplearning technology called...,"[meet, vestri, -, relies, on, a, deeplearning, technolog..."
1,How Can Natural Language Processing Change Business Inte...,"[how, can, natural, language, processing, change, busine..."
2,You have until Friday to save 65% on tickets to Open Dat...,"[you, have, until, friday, to, save, 65%, on, tickets, t..."
3,"New blog post: ""What's the difference between data scien...","[new, blog, post:, ""what's, the, difference, between, da..."
4,"In an era of algorithms and big data mining, Hawkey's a ...","[in, an, era, of, algorithms, and, big, data, mining,, h..."
5,Machine Learning e Data Mining https://click.linksynergy...,"[machine, learning, e, data, mining, https://click.links..."
6,"EXCLUSIVE: Cambridge Analytica, the pro-Trump data-minin...","[exclusive:, cambridge, analytica,, the, pro-trump, data..."
7,How Will Machine Learning Address Cyber Security Problem...,"[how, will, machine, learning, address, cyber, security,..."
8,How Can Natural Language Processing Change Business Inte...,"[how, can, natural, language, processing, change, busine..."
9,AI And Deep Learning – A Review of The Past 12 Months AI...,"[ai, and, deep, learning, –, a, review, of, the, past, 1..."


&nbsp;

`filter()` from the name applies a filter to an object.    

In [None]:
statement = 'The Koh-i-Noor is a 106 carats diamond which was once the largest diamond in the world.'

In [None]:
stop_words = ['is', 'was','are','which','a','the','in','on','for','from']

In [None]:
list(filter(lambda x: x not in stop_words, statement.lower().split()))

In [None]:
' '.join( list(filter(lambda x: x not in stop_words, statement.lower().split())) )

&nbsp;

<span style="color:blue">add a new column `filtered` to the <u>tweets</u> data frame. this column checks the words in the column `cleaned` against the list of words in list `key_words` and keeps ones common between the two</span>   

In [18]:
key_words = ['data', 'science', 'scientist', 'machine', 'learning', 'deeplearning', 'natural', 'language', \
             'big', 'data', 'deep','machinelearning', 'predictive', 'ai', 'mining', 'data-mining']

In [22]:
#skipped code
tweets['filtered']=list(filter(lambda word: word in key_words & word in split, key_words, split))

NameError: name 'split' is not defined

In [None]:
tweets

&nbsp;

`lambda()` is a flexible operator and allows for the new object to be of any container type. 

In [23]:
pizza_meat = pd.DataFrame([['pepperoni', 5],['beef', 7],['steak', 5],['sausage', 10], ['italian sausage', 23],['chicken', 6],['anchoves', 6]], 
             columns = ['toppings', 'price'])

pizza_meat

Unnamed: 0,toppings,price
0,pepperoni,5
1,beef,7
2,steak,5
3,sausage,10
4,italian sausage,23
5,chicken,6
6,anchoves,6


In [24]:
pizza_meat['dict'] = list(map(lambda x,y: {x: y}, pizza_meat.toppings, pizza_meat.price))
pizza_meat

Unnamed: 0,toppings,price,dict
0,pepperoni,5,{'pepperoni': 5}
1,beef,7,{'beef': 7}
2,steak,5,{'steak': 5}
3,sausage,10,{'sausage': 10}
4,italian sausage,23,{'italian sausage': 23}
5,chicken,6,{'chicken': 6}
6,anchoves,6,{'anchoves': 6}
