
#### pain points:

* Parallel processing.
* Python-Spark API
* Classes
* How do you easily subset a df and create new variables
* Functions and how you can leverage a code to make it generic to many data wrangling processes. i.e. datalodThere’s a * % include functionality in SAS that allows you to “call” different programs in current program. Is that possible?
* Variable creation (all kinds: strings, dummys, combinations of existing vars, binning, etc.)
* How do you run in batch multiple programs
* subseting an existing df by passing the names of the vars you want to keep in a list format
* how to load an existing python program
 

#### wish points:

* Learn to use decorators more effectively
* How to create variables using longitudinal data
* There’s a % include functionality in SAS that allows you to “call” different programs in current program. Is that possible?
* Variable creation (all kinds: strings, dummys, combinations of existing vars, binning, etc.)
* How do you run in batch multiple programs
* subseting an existing df by passing the names of the vars you want to keep in a list format
 



## Parallel processing

https://docs.python.org/2.7/library/multiprocessing.html

    multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.


In [1]:
from multiprocessing import Pool
import os

def f(x):
    print(os.getpid(), x)
    return x * 2

p = Pool(3)
p.map(f, [1, 2, 3])


(8814, 2)
(8813, 1)
(8815, 3)


[2, 4, 6]

## pandas subset

 * subseting an existing df by passing the names of the vars you want to keep in a list format
 * How do you easily subset a df and create new variables


In [2]:
import pandas as pd
import numpy as np

dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
2000-01-01,-0.342438,0.172574,1.848999,-1.65338
2000-01-02,-0.327576,0.626668,1.080234,-0.03322
2000-01-03,-2.158234,-1.335349,0.192175,-0.597858
2000-01-04,-0.777459,0.103227,-0.337235,0.217696
2000-01-05,0.645967,0.818281,0.964814,0.11894
2000-01-06,1.065188,-0.46884,-0.832021,-0.406287
2000-01-07,-1.813782,0.427384,1.056419,0.863527
2000-01-08,0.513334,-0.145134,-0.016359,-1.458833


In [10]:
df[(df.A < 0) & (df.B > 0)][["A", "B"]]

Unnamed: 0,A,B
2000-01-01,-0.342438,0.172574
2000-01-02,-0.327576,0.626668
2000-01-04,-0.777459,0.103227
2000-01-07,-1.813782,0.427384


## calling different programs

```
%macro words(string);
   %local count word;
   %let count=1;
   %let word=%qscan(&string,&count,%str( ));
   %do %while(&word ne);
      %let count=%eval(&count+1);
      %let word=%qscan(&string,&count,%str( ));
   %end;
   %let count=%eval(&count-1);
   %put The string contains &count words.;
%mend words;

%words(This is a very long string)
```


In [13]:

def words(string):
    "This is what it does ...."
    count = len(string.split())
    print("The string contains {} words".format(count))
    return count

words("This is a very long string")



The string contains 6 words


6

In [19]:
def do(func, string):
    func(string)

do(words, "One two three")

The string contains 3 words


In [20]:
do = words
do("hi world")

The string contains 2 words


2

### Variable creation (all kinds: strings, dummys, combinations of existing vars, binning, etc.)

In [22]:
a = "string"
b = a
a, b
a == b

True

In [30]:
import sys
import random
from mock import Mock

random.randint = Mock(return_value = "hi")
random.randint()
random.randint()
random.randint()
random.randint()



4

In [35]:
c = a + b
c

'stringstring'

# decorators

http://thecodeship.com/patterns/guide-to-python-function-decorators/

In [36]:
def greet(name):
    return "hello " + name

greet_someone = greet
print greet_someone("John")

hello John


In [38]:
def greet(name):
    def get_message():
        return "Hello "

    result = get_message() + name
    return result

print greet("John")

Hello John


In [40]:

def greet(name):
   return "Hello " + name 

def call_func(func):
    other_name = "John"
    return func(other_name)  

print call_func(greet)


Hello John


In [41]:
def compose_greet_func():
    def get_message():
        return "Hello there!"

    return get_message

greet = compose_greet_func()
print greet()

Hello there!


In [42]:
def compose_greet_func(name):
    def get_message():
        return "Hello there "+name+"!"

    return get_message

greet = compose_greet_func("John")
print greet()


Hello there John!


In [48]:

print(1)

def get_text(name):
   print(6)
   return "lorem ipsum, {0} dolor sit amet".format(name)

def p_decorate(func):
   print(3)
   def func_wrapper(name):
       print(5) 
       name = "Brian and {}".format(name)
       ret = func(name)
       ret += "!!!!"
       return "<p>{0}</p>".format(ret)
   print(4)
   return func_wrapper

print(2)
my_get_text = p_decorate(get_text)

print my_get_text("John")
print(7)

1
2
3
4
5
6
<p>lorem ipsum, Brian and John dolor sit amet!!!!</p>
7


In [50]:
def p_decorate(func):
   def func_wrapper(name):
       return "<p>{0}</p>".format(func(name))
   return func_wrapper

@p_decorate
def get_text(name):
   return "lorem ipsum, {0} dolor sit amet".format(name)

@p_decorate
def get_more_text(string):
    return string

print get_more_text("Hello world")

<p>Hello world</p>


In [51]:

def p_decorate(func):
   def func_wrapper(self):
       return "<p>{0}</p>".format(func(self))
   return func_wrapper

class Person(object):
    def __init__(self):
        self.name = "John"
        self.family = "Doe"

    @p_decorate
    def get_fullname(self):
        return self.name+" "+self.family

my_person = Person()
print my_person.get_fullname()

<p>John Doe</p>


In [53]:
def tags(tag_name):
    def tags_decorator(func):
        def func_wrapper(name):
            return "<{0}>{1}</{0}>".format(tag_name, func(name))
        return func_wrapper
    return tags_decorator



@tags("div")
def get_text(name):
    return "Hello "+name


print get_text("John")



<div>Hello John</div>
