<img src="https://www.python.org/static/community_logos/python-powered-w-200x80.png" style="float: left; margin: 20px; height: 55px">

# Python Basics - Functions

_Author: Alfred Zou_

---

# Functions 
---

* The purpose of functions is to take an input, and perform some sort of operation or return an output
* We create functions if we are going to reuse a code throughout a project. If we are reusing a code for one section, we can just use an iterator
* Alongside the built-in functions provide by python, users themselves can create functions
* The standard layout is:

``` python
# A parameter refers to the variable in the declaration of a function, where
# An argument refers to the variable when calling or running the function
def function_name(parameter):
    # The doc string is a comment explaining the function, it's surrounded by ''' ''' or """ """ as the first line for any function
    # It can be called by help(function_name) or pressing shift + tab when after writing function_name
    ''' Prints the input
    '''

    print(parameter)
          
function_name('hello world')
output: 'hello world'
    
help(function_name)
output: 'Prints the input
```

In [69]:
# Let's say we're annoyed at calling print(,end= " ") for this fizzbuzz For loop
# We can create a function to solve this

for number in range(1,31):
    if number % 3 == 0 and number % 5 == 0:
        print("fizzbuzz", end=" ")
    elif number % 3 == 0:
        print("fizz", end=" ")
    elif number % 5 == 0:
        print("buzz", end=" ")
    else:
        print(number, end=" ")

1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz 16 17 fizz 19 buzz fizz 22 23 fizz buzz 26 fizz 28 29 fizzbuzz 

In [64]:
def sprint(x):
    '''Space print: prints the argument with a space behind, allowing results to be displayed horizontally
    '''
    
    print(x,end=" ")

In [71]:
help(sprint)

Help on function sprint in module __main__:

sprint(x)
    Space print: prints the argument with a space behind, allowing results to be displayed horizontally



In [73]:
# Rewriting this fizzbuzz For loop
# After typing sprint, press shift + tab to enable docstring

for number in range(1,31):
    if number % 3 == 0 and number % 5 == 0:
        sprint("fizzbuzz")
    elif number % 3 == 0:
        sprint("fizz")
    elif number % 5 == 0:
        sprint("buzz")
    else:
        sprint(number)

1 2 fizz 4 buzz fizz 7 8 fizz buzz 11 fizz 13 14 fizzbuzz 16 17 fizz 19 buzz fizz 22 23 fizz buzz 26 fizz 28 29 fizzbuzz 

### Function Syntax

* One important concept regarding functions is there are two types of parameters:
    * Positional parameters: that always require the exact number of positional arguments when called, and
    * Key parameters: that have a predefined value and are optional when called
* It is important to note that when defining or calling functions the positional parameters/arguments must go first before the optional key parameters/arguments

In [85]:
# We can observe this by reading the docstring for the print function
# In this case the positional parameter is the value, which must always be supplied, and one of the key parameters is end=

help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [10]:
# Instead of conducting an operation, we can also use return to retrieve a value and to end the function.
# Ending the function works similar to break for loops
def my_sum(a,b,c=3,d=5):
    return a+b+c+d    
    print("this is not printing due to return")

# Key arguments are optional and do not need to be supplied
print(my_sum(1,2))

# Key arguments can be called in different orders
print(my_sum(1,2,d=3,c=5))

11
11


### Local and Global Variables in Functions, or Python Scoping/Namespace

* The local variables in functions are separate to the global variables
* If we want to change a global variable, we need to call global inside the function
* Methods such as .append, etc do not need this

In [24]:
x = 2 # global x

def f():
    x=5 # local x in function
    return(x)

print('function x:',f())
print('global x:',x)

function x: 5
global x: 2


In [25]:
x = 2 # global x

def g():
    global x 
    x=5
    return(x)

print('function x:',g())
print('global x:', x )

function x: 5
global x: 5


# Modules, Packages and Libraries

* When we want to use functions across multiple projects, we use modules
* Modules are functions saved in .py files that can be imported in
* Packages are .py files that store a collection of modules, for example the pprint package stores the pprint module
* A library is a collection of packages
* There are two main ways to import packages:

In [12]:
# Importing the whole package, using import __
import pprint
pprint.pprint({1:{'name':'tim','age':24,'gender':"male"},2:{'name':'ashley','age':27,'gender':"female"}})

# If we check the directory of the pprint package, we can see the pprint module
print(dir(pprint))

# Importing just the module, using from __ import __ 
from pprint import pprint
pprint({1:{'name':'tim','age':24,'gender':"male"},2:{'name':'ashley','age':27,'gender':"female"}})

{1: {'age': 24, 'gender': 'male', 'name': 'tim'},
 2: {'age': 27, 'gender': 'female', 'name': 'ashley'}}
['PrettyPrinter', '_StringIO', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_builtin_scalars', '_collections', '_perfcheck', '_recursion', '_safe_key', '_safe_repr', '_safe_tuple', '_sys', '_types', '_wrap_bytes_repr', 'isreadable', 'isrecursive', 'pformat', 'pprint', 're', 'saferepr']
{1: {'age': 24, 'gender': 'male', 'name': 'tim'},
 2: {'age': 27, 'gender': 'female', 'name': 'ashley'}}


### Writing our own Package
* We can further demonstrate this idea of package and library by writing our own, and calling them in our Jupyter Lab/notebook

In [35]:
%%writefile math_operations.py
# First create a package to stall our modules
# This magic command will write our code into the file math_operations.py in the working directory, if there is any existing code it will be overwritten
'''Contains modules for '''

def add_two_numbers(a,b):
    '''adds two numbers together'''
    return(a+b)

def subtract_two_numbers(a,b):
    '''subtracts the second number from the first number'''
    return(a-b)

Overwriting math_operations.py


In [38]:
# Now lets load the package
import math_operations
help(math_operations)

dir(math_operations)

Help on module math_operations:

NAME
    math_operations - adfsdf

FUNCTIONS
    add_two_numbers(a, b)
        adds two numbers together
    
    subtract_two_numbers(a, b)
        subtracts the second number from the first number

FILE
    c:\users\draciel\dropbox\general assembly\pre-work\math_operations.py




['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'add_two_numbers',
 'subtract_two_numbers']

In [40]:
import sys
sys.path

['C:\\Users\\draciel\\Dropbox\\General Assembly\\Pre-work',
 'C:\\Users\\draciel\\Anaconda3\\python37.zip',
 'C:\\Users\\draciel\\Anaconda3\\DLLs',
 'C:\\Users\\draciel\\Anaconda3\\lib',
 'C:\\Users\\draciel\\Anaconda3',
 '',
 'C:\\Users\\draciel\\Anaconda3\\lib\\site-packages',
 'C:\\Users\\draciel\\Anaconda3\\lib\\site-packages\\win32',
 'C:\\Users\\draciel\\Anaconda3\\lib\\site-packages\\win32\\lib',
 'C:\\Users\\draciel\\Anaconda3\\lib\\site-packages\\Pythonwin',
 'C:\\Users\\draciel\\Anaconda3\\lib\\site-packages\\IPython\\extensions',
 'C:\\Users\\draciel\\.ipython']

In [44]:
import math_operations
print(dir(math_operations))
math_operations.__file__
math_operations.__package__

['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'add_two_numbers', 'subtract_two_numbers']


''

Dir useful for finding a good way to explain methods

In [2]:
# We can see that .astype() method is specific to Pandas Dataframe and Series
import pandas as pd
dir(pd.DataFrame)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_SLICEMAP',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__reduce__',

In [3]:
import pandas as pd
dir(pd.Series)

['T',
 '_AXIS_ALIASES',
 '_AXIS_IALIASES',
 '_AXIS_LEN',
 '_AXIS_NAMES',
 '_AXIS_NUMBERS',
 '_AXIS_ORDERS',
 '_AXIS_REVERSED',
 '_AXIS_SLICEMAP',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_prepare__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__bytes__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__finalize__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__imod__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 

##### [Further Reading](https://realpython.com/python-modules-packages/#python-modules-overview)

PACKAGES TBC  
import _ as _ preferred over from mod import 
__file__ file location??
__main__ check to see if run as a script or not??

### Sorting using sorted() and Lambda Expressions

##### Lambda Expression


In [75]:
list(map(lambda x: x**2,[1,2,3]))

[1, 4, 9]

### Lambda Expressions
* Lambda Expressions can are anonymous functions
* They are useful in sorting algorithms
* For functions we only use once and then throw away

In [9]:
# This function can be rewritten as a lambda expression
# The format is lambda input: output
def f(x):
    return 3*x + 1
print(f(2))

g = lambda x: 3*x + 1
g(2)

7


7

In [10]:
scifi_authors = ["Isaac Asimov","Ray Bradbury","Robert Heinlein","Arthus C. Clarke"
                , "Frank Herbert", "Orson Scott Card", "Douglas Adams",
                "H. G. Wells", "Leigh Brackett"]

scifi_authors.sort(key = lambda name: name.split(" ")[-1].title())
scifi_authors

['Douglas Adams',
 'Isaac Asimov',
 'Leigh Brackett',
 'Ray Bradbury',
 'Orson Scott Card',
 'Arthus C. Clarke',
 'Robert Heinlein',
 'Frank Herbert',
 'H. G. Wells']

In [11]:
# Map interates a function over an iterator. The function could be a defined function or a lambda expression
# map(function,iterator)
temps = [("Berlin",29),("Cairo",36),("Tokyo",34)]
c_to_f = lambda data: (data[0], (9/5)*data[1] + 32)
list(map(c_to_f, temps))

[('Berlin', 84.2), ('Cairo', 96.8), ('Tokyo', 93.2)]

In [13]:
# filter interates a condition over an iterator. The function could be a defined function or a lambda expression
# filter(function,iterator)

import statistics

data = [1.3, 2.7, 0.8, 4.1, 4.3, -0.1]
avg = statistics.mean(data)
print(avg)

print(list(filter(lambda x: x > avg, data)))

2.183333333333333
[2.7, 4.1, 4.3]


In [79]:
categories = list({row['category'] for row in movies_db})
categories

['Drama',
 'Romance',
 'Thriller',
 'Comedy',
 'War',
 'Action',
 'Crime',
 'Suspense',
 'Adventure']

In [86]:
import numpy as np

category_mean = []
for category in categories:
    l=[]
    for row in movies_db:
        if category == row['category']:
            l.append(row['imdb'])
    category_mean.append(np.mean(l))
category_mean_sorted = dict(zip(categories,category_mean))
category_mean_sorted

{'Drama': 8.0,
 'Romance': 6.44,
 'Thriller': 5.6,
 'Comedy': 7.2,
 'War': 3.2,
 'Action': 6.3,
 'Crime': 4.0,
 'Suspense': 8.1,
 'Adventure': 9.0}

In [2]:
category_scores = {}
for movie in movies_db:
    
    if not movie['category'] in category_scores:
        # add the category key with its first value being the IMDB score
        category_scores[movie['category']] = [movie['imdb']]
#     else:
#         # otherwise append the score to the existing categories values list
#         category_scores[movie['category']].append(movie['imdb'])
print(category_scores)

{'Thriller': [7.0], 'Action': [6.3], 'Adventure': [9.0], 'Drama': [8.0], 'Romance': [6.2], 'War': [3.2], 'Crime': [4.0], 'Comedy': [7.2], 'Suspense': [9.2]}


In [100]:
category_scores = {}
for movie in movies_db:
    
    if not movie['category'] in category_scores:
        # add the category key with its first value being the IMDB score
        category_scores[movie['category']] = [movie['imdb']]
    else:
        # otherwise append the score to the existing categories values list
        category_scores[movie['category']].append(movie['imdb'])
print(category_scores)

{'Thriller': [7.0, 4.2], 'Action': [6.3], 'Adventure': [9.0], 'Drama': [8.0], 'Romance': [6.2, 7.4, 6.0, 5.4, 7.2], 'War': [3.2], 'Crime': [4.0], 'Comedy': [7.2], 'Suspense': [9.2, 7.0]}


In [37]:
sorted(movies_db,key=lambda x:(category_mean_sorted[x['category']],x['imdb']),reverse=True)
# applies 

NameError: name 'movies_db' is not defined

In [None]:
not 

### Opening Files

In [15]:
# The best way to open a file to read it is using the with method.
# This method means you don't have to close the file after

with open('earthquake.csv','r') as f:
    lines = f.readlines()
print(lines[0:5])

['earthquake_id,occurred_on,latitude,longitude,depth,mangnitude,calculation_method,network_id,place,cause\n', '1,1969-01-01 09:07:06,51.096,-179.392,45,5.6,mw,iscgem812771,"Andreanof Islands, Aleutian Islands, Alaska",earthquake\n', '2,1969-01-02 17:50:48,-56.096,-27.842,80.1,6,mw,iscgemsup812819,South Sandwich Islands region,earthquake\n', '3,1969-01-03 03:16:40,37.14,57.899,10,5.5,mw,iscgem812826,Turkmenistan-Iran border region,earthquake\n', '4,1969-01-03 13:28:12,51.132,-179.306,15,5.9,mw,iscgem812841,"Andreanof Islands, Aleutian Islands, Alaska",earthquake\n']
