# Functions

Functions are an essential part of coding, and are a way of defining a procedure that you can reuse. They are incredibly helpful in writing clean, readable, and reusable code. I would encourage you to code using functions (or classes) whenever you can. 

Python has a very large number of built-in functions, as well as a vast repository of modules/libraries through which other functions (as well as classes) are available. Check out the [built-in function docs](https://docs.python.org/2/library/functions.html) to see all of the built-in functions that are available to use. One of the other libraries whose functions I use regularly is the [itertools](https://docs.python.org/2/library/itertools.html) library. 

From a top-down level, functions name a piece of code that takes an argument/arguments and allows you to write "tiny commands". The purpose/use of a functions should typically be able to be described in one sentence, and on average functions should probably be no longer than 10 lines. 

### How do I create them?

Functions are defined using a `def` statement, followed by the name that you wish to give the function (this name must follow variable naming conventions), followed by a set of parentheses that contain any potential arguments that may be passed to the function. After that set of parentheses, we place a colon, and then finally we get to write our function code. This function code will fall on one or more indented lines (note that the indentation is **crucial**). 

```python 
def my_func(passed_arg1, passed_arg2, passed_arg3): 
    # code goes here 
    pass
```

Note that we don't have to pass any arguments if we don't want to: 

```python 
def my_func_no_args(): 
    print 'There are no args passed :) ' 
```

### Examples

In [6]:
def is_palindrome(word): 
    '''
    Input: String
    Output: Bool
    
    Return whether or not the inputted word is a palindrome. 
    '''
    
    # Note we use return to return something back from the function. 
    return word == word[::-1]

print is_palindrome('hello')
print is_palindrome('racecar')

False
True


In [5]:
def get_divisors(number):
    '''
    Input: Integer
    Output: List

    Return a list of the divisors of the inputted number
    '''
    # The return statement can return any kind of data structure. 
    return [divisor for divisor in xrange(1, number + 1) if number % divisor == 0]

print get_divisors(10)
print get_divisors(100)

[1, 2, 5, 10]
[1, 2, 4, 5, 10, 20, 25, 50, 100]


### Variable Scope

Variable scope is an important concept to consider when building functions. There is a good chance that if you've built a function and are 100% sure that the code is written correctly but you're still getting the wrong result, you are dealing with a scope issue. 

**Variable scope** determines the part (or block) of the program in which that variable is visible. We typically refer to one of two scopes of variables - **global** scope and **local** scope. A variable with **global** scope is visible everywhere and can be used by any function, while a variable with **local** scope is visible only in the function in which it was defined. 


In [58]:
my_global_var = 'This is a global variable.'

def scoping_func(): 
    my_local_var = 'This is a local variable, only usable in the scoping_func.'
    print my_local_var

In [56]:
print my_global_var

This is a global variable.


In [57]:
print my_local_var

NameError: name 'my_local_var' is not defined

In [59]:
scoping_func()

This is a local variable, only usable in the scoping_func.


### Variable Scope Part Two

When **referencing** a variable in an expression, Python will search the following scopes to resolve the reference: 

1.) The current function's scope.   
2.) Any enclosing scopes (like other containing functions).   
3.) The scope of the module that contains the code (also called *global scope*).   
4.) The built-in scope (contains the built-in functions).  

When **assigning** a value to a variable, things work a little bit differently. If the variable is **already defined** in the current scope, then it will just take on the new value that you assign it. However, if it is **not defined** in the **current** scope, then Python treats the assignment as a variable definition. Let's take a look at how this plays out...

In [7]:
# This 'found' is in the global scope, so everything has access to it. 
found = True
def find_number(numbers_lst, search_number): 
    # This 'found' is in the scope of 'find_number', and anything that is enclosed in it. 
    found = False
    def inner_func(): 
        for num in numbers_lst:      
            # This has access to the current function's scope and anything above it. So when 
            # it looks for the 'found' variable, it doesn't find it in the 'innner_func' scope, 
            # but does find it in the containing function's ('find_number') scope. Since it 
            # find's it in the 'find_number' scope, it doesn't keep looking, and so it never 
            # find's the one in the global scope. 
            print found
    inner_func()
        
find_number([1, 2, 3, 4, 5], 3)

False
False
False
False
False


In [69]:
def find_number(numbers_lst, search_number): 
    # This 'found' is in the scope of 'find_number', and anything that is enclosed in it. 
    found = False
    def inner_func(): 
        for num in numbers_lst:  
            found = True if num == search_number else False
    inner_func()
    return found
        
print find_number([1, 2, 3, 4, 5], 3)

False


### Giving arguments default values

If you'd like, you can give your function arguments default values. You do this within the function definition statement: 

```python 
def find_number(numbers_lst, search_number=3):
    for num in numbers_lst: 
        if num == search_number: 
            print 'Found'
```

The way this works is that if the caller of your function passes in a value for search_number, the function uses that. If the caller doesn't pass in a value for search_number, then your function uses the default value that you gave it. 

Note that you can also call your functions with either positional arguments (like I have done up until now), or with keyword arguments. The only stipulation is that all positional arguments must be placed before all keyword arguments (i.e. you can't call your function with a keyword argument placed before a positional argument). 

In [39]:
def find_number(numbers_lst, search_number=3):
    for num in numbers_lst: 
        if num == search_number: 
            print 'Found'

find_number([1, 2, 3, 4, 5]) # Okay because we specified default value. 
find_number([1, 2, 3, 4, 5], 4) # The second passed argument (4) overrides the default 3. 
find_number([1, 2, 3, 4, 5], search_number=4) # Okay because all positional arguments specified first. 
find_number(numbers_lst=[1, 2, 3, 4, 5], 4) # Not okay, because we specified a keyword argument before a positional.

SyntaxError: non-keyword arg after keyword arg (<ipython-input-39-7fb3b12a3762>, line 9)

### *args and *kwargs

The use of `*args` and `**kwargs` is something that you might see or use with your function. This is one of the really nice features of Python (although I don't use it often); it allows your functions to accept an arbitrary number of optional arguments. `*args` allows you to accept an arbitrary of number of optional positional arguments, where as `**kwargs`, which stands for *keyword arguments*, allows you to accept an arbitrary number of optional keyword arguments. 

In [79]:
def args_func(first_arg, *args): 
    print first_arg
    for arg in args: 
        print arg

In [80]:
args_func(1)

1


In [81]:
args_func(1, 2, 3, 4)

2
3
4
5


In [89]:
args_func(1, [2, 3, 4])

1
[2, 3, 4]


In [104]:
def kwargs_func(first_arg, **kwargs): 
    print first_arg
    for kwarg, value in kwargs.iteritems(): 
        print kwarg, value

In [98]:
kwargs_func(1)

1
{}


In [99]:
kwargs_func(1, 2, 3, 4)

TypeError: kwargs_func() takes exactly 1 argument (4 given)

In [105]:
kwargs_func(1, second_arg=2, third_arg=3, fourth_arg=4)

1
second_arg 2
fourth_arg 4
third_arg 3


# OOP and Classes

From [wikipedia](https://en.wikipedia.org/wiki/Object-oriented_programming): "Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which are data structures that contain data, in the form of fields, often known as attributes; and code, in the form of procedures, often known as methods."

Object-oriented programming has many benefits (see *encapsulation*, *polymorphism*, and *inheritance* in the wiki), but it also kind of matches how we think about the world. The world is composed of *objects*, where objects can be people, houses, cars, buildings, etc. These *objects* have some properties about them (i.e. they contain data), and they can do things (i.e. they have methods that can be applied). Object oriented programming approaches a programming problem by using objects that interact with each other, much like they do in the real world. 

### Some Terminology 

1.) **Class** - used to refer to the abstract concept of an object.  
2.) **Object** - An actual instance of a class.   
3.) **Instance** - What Python returns when you tell it to create a class.   
4.) **self** - Inside of a class, a variable for the instance/object being accessed (i.e. it holds a reference to the instance/object of that class).  
5.) **attribute** / **field** / **property** - A property or piece of data that a class has, stored in a variable. All attributes/fields/properties within a class are assigned via self.  
6.) **method** / **procedure** - A block of code that is accessible via the class, and typically acts on or with the classes attributes/fields/properties. All methods/procedures within a class are created via def. (they are really just functions). 

From here on out, I will treat attribute, field, and property as interchangeable, and I'll do the same with method and procedure. 

### Defining a Class 

Much like defining a function, there is a common format to defining a class. It is almost exactly the same as defining a function, but we replace `def` with `class`. That is, we write `class`, then the name of the class that we are defining, followed by a set of parentheses, and finally a colon. After the colon is an indented block of code that we use to define the class attributes and methods. One subtle difference is that with functions, the standard is to name these beginning in lowercase and seperating words with underscores, while with classes, the standard is to name these beginning in uppercase, and not separate words at all. 

```python 
class MyClass(): 
    # Attributes and methods go in here. 
```

### Instantiation

Instantiation is just a fancy word for saying that we're going to create an instance of a particular class. 

```python 
my_class = MyClass() # Now we have my_class as an instance of MyClass
```

### Inner Workings 

Inside of a class, we can have both *attributes* and *methods*. We can then think of these *attributes* and *methods* as belonging to the class, and they become accessible via any instances of the class (through dot notation, which we'll get to in a second). Inside of the class, all of these *attributes* and *methods* are set and retrieved via *self*. Let's dive in...

##### The \__init\__() method 

Almost every class you ever write will have an \__init\__() method. This method gets called every time that you create a new instance of a class, and handles any kind of setup that the class may require. Setup typically just involves assigning values to variables, which we can do with or without passing values in. 

In [8]:
class MyClass(): 
    
    def __init__(self):
        # No values have been passed in here. 
        self.meetup_name = 'Data Science'

my_class = MyClass()
print my_class.meetup_name # Note the dot notation here to access the 'meetup_name' field. 

Data Science


In [14]:
class MyClass(): 
    
    def __init__(self, meetup_name):
        # Here we passed the value that will be assigned in. Note the assignment using self. 
        self.meetup_name = meetup_name

my_class = MyClass('Data Science')
print my_class.meetup_name

Data Science


##### What happens if we don't use self?

In [15]:
class MyClass(): 
    
    def __init__(self, meetup_name):
        # Here we passed the value that will be assigned in. 
        meetup_name = meetup_name

my_class = MyClass('Data Science')
print my_class.meetup_name

AttributeError: MyClass instance has no attribute 'meetup_name'

#### Magic Methods

The \__init\__()  method is a special type of [**magic method**](http://www.rafekettler.com/magicmethods.html). Magic methods allow you to build a lot of functionality into your classes, most of which allow you to interact with your classes using a lot of the built-in functions. I have personally never used these in my day to day, but the \__len\__(), \__str\__(), \__repr__\() ones are pretty common. The first lets you use the len() function on instances of your class, and the second to allow you to define a readable display of an instance of your class (used when printing or applying the str() function). 

#### Other Methods 

We can of course define other methods of our classes...

In [17]:
class MyClass(): 
    
    def __init__(self, meetup_name='Data Science'):
        # Here we passed the value that will be assigned in. 
        self.meetup_name = meetup_name
        self.meetup_questions = []
        self.meetup_answers = []
    
    def add_question(self, question):
        # Note the referal to the meetup_questions field via self. 
        self.meetup_questions.append(question)
    
    def add_answer(self, answer): 
        self.meetup_answers.append(answer)
        
my_class = MyClass()
print my_class.meetup_name
print my_class.meetup_questions
print my_class.meetup_answers

Data Science
[]
[]


In [18]:
my_class.add_question('What question should I ask?')
my_class.add_answer('Think of anything!')

In [19]:
print my_class.meetup_name
print my_class.meetup_questions
print my_class.meetup_answers

Data Science
['What question should I ask?']
['Think of anything!']


### Using Multiple Objects

That's the whole point of them, right?

In [52]:
class Member(): 
    
    def __init__(self, name): 
        self.name = name
        self.questions_asked = []
        self.question_answers = []
    
    def add_question(self, question): 
        self.questions_asked.append(question)
    
    def add_answer(self, question): 
        self.question_answers.append(question)
        
class MyClass(): 
    
    def __init__(self, name='Data Science'): 
        self.name = name
        self.members = []
    
    def num_questions_asked(self): 
        total_questions = 0
        for member in members: 
            total_questions += len(member.questions_asked)
        
        return total_questions
        
    def num_questions_answered(self): 
        total_questions = 0
        for member in members: 
            total_questions += len(member.question_answers)
        
        return total_questions

In [53]:
# Create some members. 
josh = Member('Josh')
joanna = Member('Joanna')
sean = Member('Sean')
members = [josh, joanna, sean]

# Create a class and add the members to it. 
my_class = MyClass()
my_class.members = members
print my_class.name
for member in my_class.members: 
    print member.name

Data Science
Josh
Joanna
Sean


In [54]:
josh.add_question('Hellooooo?')
joanna.add_answer('???????')

print my_class.num_questions_asked()
print my_class.num_questions_answered()

1
1


# Intro to Pandas 

Today in terms of Pandas, we're just going to look at how to get data into a DataFrame and how to look at that data. Pandas DataFrames are a class, and when we interact with these dataframes, we will be interacting with objects, accessing the dataframes fields just like we would any other objects, and manipulating the dataframe data via its methods. 

### Pandas Import 

```python
import pandas as pd
```

### Loading External Data

The [Pandas documentation](http://pandas.pydata.org/pandas-docs/stable/io.html) will show you all of the ways that you could load external data into a DataFrame. Basically, there is a way to load in data in any format that you might want to load it in from (CSV, JSON, SQL, Excel, HTML). All of these take some form of a `read_` method. So, if we wanted to load data in from a CSV, we would simply use: 

```python
df = pd.read_csv('my_data.csv')
```

Note that you need to have the column names as the first row in the `.csv`. 

### Instantiating a DataFrame with data from your Python program

If we are instantiating a DataFrame from data that already exists in our program, there are a couple of ways we can do this. One is by using the DataFrame constructor and passing in a list of dictionaries. Pandas will create columns with the names as the keys in the dictionary, and the values as the values in the dictionary. It will fill in values for any column that has a value present, and N/A's elsewhere. Another way of doing this is to pass in a list of lists of values as the `data` argument and another list as the `columns` argument.

In [25]:
import pandas as pd
data_lst = [{'a': 1, 'b': 2, 'c':3}, {'a': 4, 'b':5, 'c':6, 'd':7}]
df = pd.DataFrame(data_lst)
df

Unnamed: 0,a,b,c,d
0,1,2,3,
1,4,5,6,7.0


In [27]:
data_vals = [[1, 2, 3], [4, 5, 6]]
data_cols = ['a', 'b', 'c']
df = pd.DataFrame(data=data_vals, columns=data_cols)
df

Unnamed: 0,a,b,c
0,1,2,3
1,4,5,6


### Looking at the data

I got the following data to look at [here](http://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/). I'm just going to detail four methods for looking at our data: info(), describe(), head(), and tail(). 

In [30]:
df = pd.read_csv('data/forestfires.csv')

In [33]:
# Gives us a very high level overview of our data. 
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 517 entries, 0 to 516
Data columns (total 13 columns):
X        517 non-null int64
Y        517 non-null int64
month    517 non-null object
day      517 non-null object
FFMC     517 non-null float64
DMC      517 non-null float64
DC       517 non-null float64
ISI      517 non-null float64
temp     517 non-null float64
RH       517 non-null int64
wind     517 non-null float64
rain     517 non-null float64
area     517 non-null float64
dtypes: float64(8), int64(3), object(2)
memory usage: 56.5+ KB


In [35]:
# Gives us a more detailed look at each of the columns in our dataset. Note that it 
# doesn't include non-numeric columns in this summary. 
df.describe()

Unnamed: 0,X,Y,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
count,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0,517.0
mean,4.669246,4.299807,90.644681,110.87234,547.940039,9.021663,18.889168,44.288201,4.017602,0.021663,12.847292
std,2.313778,1.2299,5.520111,64.046482,248.066192,4.559477,5.806625,16.317469,1.791653,0.295959,63.655818
min,1.0,2.0,18.7,1.1,7.9,0.0,2.2,15.0,0.4,0.0,0.0
25%,3.0,4.0,90.2,68.6,437.7,6.5,15.5,33.0,2.7,0.0,0.0
50%,4.0,4.0,91.6,108.3,664.2,8.4,19.3,42.0,4.0,0.0,0.52
75%,7.0,5.0,92.9,142.4,713.9,10.8,22.8,53.0,4.9,0.0,6.57
max,9.0,9.0,96.2,291.3,860.6,56.1,33.3,100.0,9.4,6.4,1090.84


In [31]:
# Shows us the first 5 rows of our data set. 
df.head()

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0


In [32]:
# Shows us the last 5 rows of our data set. 
df.tail()

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
512,4,3,aug,sun,81.6,56.7,665.6,1.9,27.8,32,2.7,0,6.44
513,2,4,aug,sun,81.6,56.7,665.6,1.9,21.9,71,5.8,0,54.29
514,7,4,aug,sun,81.6,56.7,665.6,1.9,21.2,70,6.7,0,11.16
515,1,4,aug,sat,94.4,146.0,614.7,11.3,25.6,42,4.0,0,0.0
516,6,3,nov,tue,79.5,3.0,106.7,1.1,11.8,31,4.5,0,0.0


# List Comprehensions 

We're going to take a brief look at list comprehensions. List comprehensions are frequently used in Python in place of **for** loops, and so it's important to at least be able to recognize them (but also, they are incredibly efficient to use). 

### Basics

List comprehensions are the same thing as a for loop that creates a list: 

```python 
doubles = []
for num in xrange(10): 
    doubles.append(num * 2)
``` 

is the **same** as: 

```python 
doubles = [num * 2 for num in xrange(10)]
```

### But why? 

List comprehensions are a more compact way of writing for loops that create lists (1 line of code above instead of 3). More importantly, though, is that they are much more efficient (they are heavily optimized in C).

### What about more complicated list comprehensions? 

You'll often see (and maybe write) more complicated list comprehensions than those above. When writing them yourself, there is a tradeoff between readability and speed that you have to keep in mind. If you are writing code that becomes unwieldy and unreadable but doesn't have huge gains in efficiency (or maybe even if it does), consider writing it in a clearer manner (without a list comp.) even though it may be a little slower. This goes for all your code writing in general.

In [4]:
num_pairs = [(num1, num2) for num1 in xrange(100) for num2 in xrange(100)]
num_pairs = [(num1, num2) for num1 in xrange(100) for num2 in xrange(100) if num2 % 2 == 0]

### List Comps. v Map and Filter

In other programming languages (particularly functional ones), using something like a map or filter function is pretty common. 

In [12]:
def square_num(x): 
    return x ** 2
    
def num_less_20(x): 
    return x < 20

print map(square_num, [1, 2, 3, 4, 5]) # Return a list of the squares of all the numbers. 
print filter(num_less_20, [1, 2, 3, 4, 5, 25]) # Returns only those numbers less than 20. 
    

[1, 4, 9, 16, 25]
[1, 2, 3, 4, 5]


However, in Python it is considered the norm. (and more Pythonic) to use list comprehensions to acheive the same thing. 

In [15]:
print [x ** 2 for x in [1, 2, 3, 4, 5]]
print [x for x in [1, 2, 3, 4, 5, 25] if x < 20]

[1, 4, 9, 16, 25]
[1, 2, 3, 4, 5]
