# Fun with functions and pandas!

## Overview of functions

Python is an object-oriented language. Thus, functions are used to create objects or modify an existing object. 

In general, data science and R practices easily complement Python; in other words, function-oriented programming implements functions *onto* objects in Python, thus making it complementary.

So, we assign them to variables, store to lists, pass them as aurguments to other functions, create them inside functions and even produce a function as a result of a funcion

Make sure to keep in mind that:

- Functions need to be "pure" meaning that if you call it again with the same inputs you get the same results. sys.time() not a "pure" function

- The execution of the function shouldn't change global variables, have no side effects. 

For reference, these are the types of functions you'll see:
![Function Types](fp.png)

## Let's build a function!
Below, we construct a functional.

In [2]:
def my_mean(x):
    Sum_val = sum(x)
    N = length(x)
    return(Suml/N)  # return is optional but helps with clarity!

Create a little list and pass it to the function and see if it works. Also call the Sum and N variables...does this work?

Now, let's construct a "function factory".

In [27]:
def power1(exp):
    def action(x):
        return x**exp
    return action

In [28]:
square = power1(2)
cube = power1(3)

In [31]:
print(square(3), cube(3))

9 27


## Quick exercise
Create a function that computes the range of a variable and then, for no good reason, adds 100 and divides by 10. Write out the steps you would need first in Pseudocode, then develop the function. 

Pseudocode:

(enter here)


In [None]:
# develop function here

## `pandas` functions

1. df.loc[] -- Pick observations by their values
2. df.sort_values() -- Reorder the rows
3. df.loc[:, x] -- Pick variables by their column names and/or conditions (x)
4. df.assign() -- Create new variables with functions of existing variables
5. df.describe() -- Collapse many values down to a single summary

### `.loc[]`

You can select columns by name or position.

One row or column: .loc['colname']

Multiple rows or columns: .loc[['colname1', 'colname2', 'colname3']]

One column, one row: .loc['row', 'column']

All rows, one column: .loc[:, 'column']


You can also select columns based on some criteria, by using .loc[df['column'] (condition) value]

For more, check out the documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

#### Example
Load the `weather.csv`. This contains daily temperature data in 2010 for some location.

In [37]:
import pandas as pd
weather = pd.read_csv('~/Desktop/DS-3001/data/weather.csv')

weather.head(2)

Unnamed: 0,id,year,month,element,d1,d2,d3,d4,d5,d6,...,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31
0,MX17004,2010,1,tmax,,,,,,,...,,,,,,,,,27.8,
1,MX17004,2010,1,tmin,,,,,,,...,,,,,,,,,14.5,


In [44]:
# get list of applicable columns; all that start with 'd'
days = [col for col in weather.columns if col.startswith('d')]
selected = weather[days]
selected.head(5)

Unnamed: 0,d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,...,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31
0,,,,,,,,,,,...,,,,,,,,,27.8,
1,,,,,,,,,,,...,,,,,,,,,14.5,
2,,27.3,24.1,,,,,,,,...,,29.9,,,,,,,,
3,,14.4,14.4,,,,,,,,...,,10.7,,,,,,,,
4,,,,,32.1,,,,,34.5,...,,,,,,,,,,


The main contrast here with the R code has to do with that process. In Python, `startswith` is a string function. In R, it's associated with the `select` function. As a result, we had to have the intermediary step of collecting the exact columns we need.

### .assign()

In [45]:
# talk to Brian about what dataset could be a replacement for mpg

### .drop()

.drop() removes certain rows based on some criteria

In [46]:
# find another dataset; also ask Brian if we want to include this in the actual assignment file in the intro

## Basic data types in `pandas`
![Data Types](pandas_datatypes.png)