# Permutation and Random Sampling

Permuting (randomly reordering) a Series or the rows in a DataFrame is easy to do using the numpy.random.permutation function. Calling permutation with the length of the axis you want to permute produces an array of integers indicating the new ordering:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series

In [6]:
df = DataFrame(np.arange(5 * 4).reshape(5, 4))

df

Unnamed: 0,0,1,2,3
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15
4,16,17,18,19


In [133]:
sampler = np.random.permutation(5)

In [134]:
sampler

array([3, 4, 0, 1, 2])

That array can then be used in ix-based indexing or the take function:

In [135]:
df

Unnamed: 0,0,1,2,3
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15
4,16,17,18,19


In [136]:
df.take(sampler)

Unnamed: 0,0,1,2,3
3,12,13,14,15
4,16,17,18,19
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11


o select a random subset without replacement, one way is to slice off the first k elements of the array returned by permutation, where k is the desired subset size. There are much more efficient sampling-without-replacement algorithms, but this is an easy strategy that uses readily available tools:

In [141]:
df.take(np.random.permutation(len(df))[:3])

Unnamed: 0,0,1,2,3
4,16,17,18,19
1,4,5,6,7
2,8,9,10,11


To generate a sample with replacement, the fastest way is to use np.random.randint todraw random integers:

In [142]:
bag = np.array([5, 7, -1, 6, 4])

In [148]:
sampler = np.random.randint(-1, len(bag), size=10)

In [149]:
sampler

array([3, 4, 1, 0, 4, 4, 0, 0, 2, 2])

In [150]:
draws = bag.take(sampler)

In [151]:
draws

array([ 6,  4,  7,  5,  4,  4,  5,  5, -1, -1])

In [153]:
a = [0,3,1,2,4]

b = DataFrame(np.arange(20).reshape(5,4))

a,b

([0, 3, 1, 2, 4],
     0   1   2   3
 0   0   1   2   3
 1   4   5   6   7
 2   8   9  10  11
 3  12  13  14  15
 4  16  17  18  19)