### Stacking dataframes on top of one another (similar to vstack):

In [6]:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.rand(2, 4))
df = pd.concat([df1, df1, df1], ignore_index=True)
df

Unnamed: 0,0,1,2,3
0,0.52744,0.641283,0.50615,0.489345
1,0.977062,0.589091,0.752786,0.106942
2,0.52744,0.641283,0.50615,0.489345
3,0.977062,0.589091,0.752786,0.106942
4,0.52744,0.641283,0.50615,0.489345
5,0.977062,0.589091,0.752786,0.106942


### Stacking dataframes horizontally (similar to hstack):

In [9]:
pd.concat([df1, df1, df1], ignore_index=True, axis = 1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,0.52744,0.641283,0.50615,0.489345,0.52744,0.641283,0.50615,0.489345,0.52744,0.641283,0.50615,0.489345
1,0.977062,0.589091,0.752786,0.106942,0.977062,0.589091,0.752786,0.106942,0.977062,0.589091,0.752786,0.106942


### Applying function to all columns (or rows) of df:

In [13]:
print df.apply(lambda x: x.sum(), axis = 0)
print df.apply(lambda x: x.sum(), axis = 1)

0    4.513507
1    3.691123
2    3.776808
3    1.788861
dtype: float64
0    2.164219
1    2.425881
2    2.164219
3    2.425881
4    2.164219
5    2.425881
dtype: float64


### Dropping all NaN values:

In [24]:
df.iloc[0, 3] = None
df.iloc[2, 1] = None
#df.dropna()
print df
df.dropna()

          0         1         2         3
0  0.527440  0.641283  0.506150       NaN
1  0.977062  0.589091  0.752786  0.106942
2  0.527440       NaN  0.506150  0.489345
3  0.977062  0.589091  0.752786  0.106942
4  0.527440  0.641283  0.506150  0.489345
5  0.977062  0.589091  0.752786  0.106942


Unnamed: 0,0,1,2,3
1,0.977062,0.589091,0.752786,0.106942
3,0.977062,0.589091,0.752786,0.106942
4,0.52744,0.641283,0.50615,0.489345
5,0.977062,0.589091,0.752786,0.106942


### loc and iloc in pandas:
When we select some rows from a df (e.g., using a where clause), the index of the rows is preserved in the new df. So for the new df, df.index is not necessarily 0, 1, etc. (They dont even have to be numbers). To work with these indexes, use df.loc. To ignore these indexes and use an integer index from zero, use iloc.  

In [3]:
f = pd.read_csv('features.csv')

In [10]:
f_holiday = f[f.IsHoliday == True]
f_holiday.head(10)

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
1,1,2010-02-12,38.51,2.548,,,,,,211.24217,8.106,True
31,1,2010-09-10,78.69,2.565,,,,,,211.49519,7.787,True
42,1,2010-11-26,64.52,2.735,,,,,,211.748433,7.838,True
47,1,2010-12-31,48.43,2.943,,,,,,211.404932,7.838,True
53,1,2011-02-11,36.39,3.022,,,,,,212.936705,7.742,True
83,1,2011-09-09,76.0,3.546,,,,,,215.861056,7.962,True
94,1,2011-11-25,60.14,3.236,410.31,98.0,55805.51,8.0,554.92,218.467621,7.866,True
99,1,2011-12-30,44.55,3.129,5762.1,46011.38,260.36,983.65,4735.78,219.53599,7.866,True
105,1,2012-02-10,48.02,3.409,13925.06,6927.23,101.64,8471.88,6886.04,220.265178,7.348,True
135,1,2012-09-07,83.96,3.73,5204.68,35.74,50.94,4120.32,2737.17,222.439015,6.908,True


In [8]:
f_holiday.iloc[0, :]

Store                    1
Date            2010-02-12
Temperature          38.51
Fuel_Price           2.548
MarkDown1              NaN
MarkDown2              NaN
MarkDown3              NaN
MarkDown4              NaN
MarkDown5              NaN
CPI                211.242
Unemployment         8.106
IsHoliday             True
Name: 1, dtype: object

In [9]:
f_holiday.loc[0, :]

KeyError: 'the label [0] is not in the [index]'

Empty array of type object:

In [18]:
np.empty((2, 3, 4), dtype = object)

array([[[None, None, None, None],
        [None, None, None, None],
        [None, None, None, None]],

       [[None, None, None, None],
        [None, None, None, None],
        [None, None, None, None]]], dtype=object)

## Regular Expressions

The re module provides two types of interfaces:
* re.compile(pat) which compiles a pattern to an *re* object. Functions like search and match can be then called on the re object
* Module level functions such as re.search(pat, str) and re.match(pat, str) which first compile the pat to obtain a re object and then call the above functions on the object
    

In [13]:
import re
r = re.compile(r'ab*')
m = r.search('abbbbccabb')
print m.group(0)

abbbb


In [56]:
m = re.search(r'ab*', 'abbbbccabb')
print m.group()

abbbb


The is always a default group which is the whole expression (without curly brackets). This group is given by m.group() or m.group(0).

If there are other groups in the reg expression, they are given by m.groups(1), m.groups(2), etc.,
and m.groups() or m.groups(0) returns a tuple in the form of (m.group(1), m.group(2), etc).

m.group() or m.group(0) return the default group as before.

In general since group(0) is not that useful, m.group() and m.groups() give most of the necessary info.

In [16]:
m = re.search(r'a(bc)de', 'fghabcdehi')
print m.group()
print m.group(0)
print m.group(1)
print m.groups()
print m.groups(0)


abcde
abcde
bc
('bc',)
('bc',)


In [44]:
m = re.search(r'(\d\.)+$', 'ffasdfd2.2.3.4.fdagfds2.3.5.')
m.groups()

('5.',)

In [61]:
def grep(pat, l):
     import re
     r = re.compile(pat)
     return filter(lambda i : r.search(i) is not None, l)

In [59]:
import sys
grep('(\d\.){2}', sys.path)

['/usr/local/lib/python2.7/dist-packages/pip-6.0.6-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/tweepy-3.1.0-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/six-1.7.3-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/requests_oauthlib-0.4.1-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/requests-2.4.3-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/oauthlib-0.7.2-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/Lasagne-0.1.dev0-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/Theano-0.7.0-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/nolearn-0.6a0.dev0-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/tabulate-0.7.5-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/Django-1.9.dev20150715192252-py2.7.egg',
 '/usr/local/lib/python2.7/dist-packages/statsmodels-0.8.0rc1-py2.7-linux-x86_64.egg',
 '/usr/local/lib/python2.7/dist-packages/patsy-0.4.1-py2.7.egg']

**Sequence unpacking:**
The statement "for i in x" works as follows:

Get iterator from x (it = iter(x)). This is GET_ITER opcode.
Call next repeatedly on it until we get StopIteration exception (i = next(it)). This is FOR_ITER opcode.

If there is sequence unpacking like in "for i, j in x", we treat next(it) as a sequence and unpack it 
into 2 objects since i, j is equivalent to (i, j). So s = next(it), unpack(s)->i, j.

Another example: "for i, (j, k, t) in x"
Here we treat next(it) as seq of 2 objects an unpack them. Then we unpack the second object into 3 object i, k, and t.

In [6]:
def f1():
    for i, (k, t, r), (j), (u, t) in x:
        pass
import dis
print dis.dis(f)

  2           0 SETUP_LOOP              41 (to 44)
              3 LOAD_GLOBAL              0 (x)
              6 GET_ITER            
        >>    7 FOR_ITER                33 (to 43)
             10 UNPACK_SEQUENCE          4
             13 STORE_FAST               0 (i)
             16 UNPACK_SEQUENCE          3
             19 STORE_FAST               1 (k)
             22 STORE_FAST               2 (t)
             25 STORE_FAST               3 (r)
             28 STORE_FAST               4 (j)
             31 UNPACK_SEQUENCE          2
             34 STORE_FAST               5 (u)
             37 STORE_FAST               2 (t)

  3          40 JUMP_ABSOLUTE            7
        >>   43 POP_BLOCK           
        >>   44 LOAD_CONST               0 (None)
             47 RETURN_VALUE        
None


## Numpy:

In [3]:
x = np.random.rand(4, 5)
x

array([[ 0.13166059,  0.37771059,  0.20451729,  0.40347615,  0.48344973],
       [ 0.89125383,  0.05904047,  0.73347805,  0.00196137,  0.80282951],
       [ 0.19040538,  0.27233316,  0.94902031,  0.60226128,  0.43548302],
       [ 0.12637652,  0.97853763,  0.48097779,  0.58749681,  0.32132346]])

In [5]:
# gives a 1d array
print x[:, 3]
print x[2, :]

[ 0.40347615  0.00196137  0.60226128  0.58749681]
[ 0.19040538  0.27233316  0.94902031  0.60226128  0.43548302]


In [17]:
# returns a 2d array
print x[2:3, :]
print x[2:3, 3:4]
# empty array
print x[2:3, 3:3]

[[ 0.19040538  0.27233316  0.94902031  0.60226128  0.43548302]]
[[ 0.60226128]]
[]


In [61]:
print x
print 
print x[0:3:2]
print
# index can be an iterable
print x[[1, 3]]
print 
print x[range(1,3)]
# x[(1, 2)] is the same as x[1, 2]
print x[[(1, 2), (2, 3)]]
print
# boolean types are converted to int before using
print x[[False, True, True, False]]
print 
# an np.array of dtype=boolean (not necessarily same dimension) is used after applying np.where!
print x[np.array([False, True, True])]
print x[((1, 2), (2, 3))]


[[ 0.13166059  0.37771059  0.20451729  0.40347615  0.48344973]
 [ 0.89125383  0.05904047  0.73347805  0.00196137  0.80282951]
 [ 0.19040538  0.27233316  0.94902031  0.60226128  0.43548302]
 [ 0.12637652  0.97853763  0.48097779  0.58749681  0.32132346]]

[[ 0.13166059  0.37771059  0.20451729  0.40347615  0.48344973]
 [ 0.19040538  0.27233316  0.94902031  0.60226128  0.43548302]]

[[ 0.89125383  0.05904047  0.73347805  0.00196137  0.80282951]
 [ 0.12637652  0.97853763  0.48097779  0.58749681  0.32132346]]

[[ 0.89125383  0.05904047  0.73347805  0.00196137  0.80282951]
 [ 0.19040538  0.27233316  0.94902031  0.60226128  0.43548302]]
[ 0.73347805  0.60226128]

[[ 0.13166059  0.37771059  0.20451729  0.40347615  0.48344973]
 [ 0.89125383  0.05904047  0.73347805  0.00196137  0.80282951]
 [ 0.89125383  0.05904047  0.73347805  0.00196137  0.80282951]
 [ 0.13166059  0.37771059  0.20451729  0.40347615  0.48344973]]

[[ 0.89125383  0.05904047  0.73347805  0.00196137  0.80282951]
 [ 0.19040538  0.27

For the compiler x[a, b, c] and x[(a, b, c)] are equivalent and are both regarded as x[(a, b, c)] and translated to a *tuple* given to a BINARY_SUBSCR opcode.

x[((a, b), (c, d))] for example is translated by building tuple (a, b), followed by (c, d) and finally building and passing tuple ((a, b), (c, d)) to the subcription operator.

x[[(a, b), (c, d)]] is equivalent to l = [(a, b), (c, d)]; x[l]

This means that when the compiler encounters x[expr], it assumes that expr is an expr list or a tuple. In the former case, expr list is converted into a tuple before passing it the operator. 

Therefore for numpy or any other library, x[a, b] = x[(a, b)], x[(a, b), d] = x[((a, b), d)]. How the tuple passed to the interpreter is interpreted however, is up to the type an how its get_item? is implemented.

## ssh

Popen from subprocess can be used to capture input and output from an ssh login (http://python-for-system-administrators.readthedocs.io/en/latest/ssh.html)

In [None]:
import subprocess
import sys

HOST="www.example.org"
# Ports are handled in ~/.ssh/config since we use OpenSSH
COMMAND="uname -a"

ssh = subprocess.Popen(["ssh", "%s" % HOST, COMMAND],
                       shell=False,
                       stdout=subprocess.PIPE,
                       stderr=subprocess.PIPE)
result = ssh.stdout.readlines()
if result == []:
    error = ssh.stderr.readlines()
    print >>sys.stderr, "ERROR: %s" % error
else:
    print result

In the above solution fcntl can be used to make the stdout file descriptor non-blocking. But for some reason the prompt for password is still printed even though stdout is redirected to a pipe.

A better way to communicate with interactive processes is through pexpect (http://stackoverflow.com/questions/15166973/sending-a-password-over-ssh-or-scp-with-subprocess-popen). An even simpler and more specifically adapted solution:

In [None]:
import pxssh
import getpass
try:
    s = pxssh.pxssh()
    hostname = raw_input('hostname: ')
    username = raw_input('username: ')
    password = getpass.getpass('password: ')
    s.login(hostname, username, password)
    s.sendline('uptime')   # run a command
    s.prompt()             # match the prompt
    print(s.before)        # print everything before the prompt. 
    s.logout()
except pxssh.ExceptionPxssh as e:
    print("pxssh failed on login.")
    print(e)