**REINDEXING METHODS**

In [21]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame
from numpy.random import randn, rand

In [2]:
sample = Series([1,2,3,4], index = ['e','f','g','h'])
sample

e    1
f    2
g    3
h    4
dtype: int64

In [3]:
#creating new indexes using reindex
sample.reindex(['e','a','b','d','n','o']) #What this does is to reindex the indexes and for new additonal indexes,
#replace their labels with Nan


e    1.0
a    NaN
b    NaN
d    NaN
n    NaN
o    NaN
dtype: float64

If you notice, because a, b, d, n and o were added and not present in the original sample, the indexes were assigned Nan values. 

In [8]:
sample.reindex(['e','f','g','h','i'])

e    1.0
f    2.0
g    3.0
h    4.0
i    NaN
dtype: float64

In [9]:
#See the difference? Now to fill the values up: 
sample.reindex(['e','f','g','h','i','j','k'], fill_value = 10)

e     1
f     2
g     3
h     4
i    10
j    10
k    10
dtype: int64

Notice how all the Nan values were filled with 10. That is what the fill_value argument does

Now using forward fill, or 'ffill', you can reindex a series differently. The illustration is as follows:

In [19]:
cars = Series(['Audi', 'Mercedes', 'Toyota'], index = [0,5,9])
cars.reindex(list(range(13)), method = 'ffill')

0         Audi
1         Audi
2         Audi
3         Audi
4         Audi
5     Mercedes
6     Mercedes
7     Mercedes
8     Mercedes
9       Toyota
10      Toyota
11      Toyota
12      Toyota
dtype: object

In [22]:
#the series has been reindexed with each value repeated at specific interval for different indexes


Using numpy's randn and rand. randn yields random samples from normal distriubution, while rand yields random samples from uniform distribution between (0, 1)

In [26]:
#for example using rand
df = DataFrame(rand(5,5), index = ['a', 'b', 'c', 'd','e'], columns = list('drunk'))
df

Unnamed: 0,d,r,u,n,k
a,0.913747,0.011323,0.773519,0.69344,0.564718
b,0.797631,0.467265,0.441079,0.305198,0.620102
c,0.612508,0.336756,0.364927,0.543276,0.196298
d,0.370039,0.93085,0.990737,0.027525,0.120823
e,0.600214,0.238313,0.758946,0.072403,0.954785


In [27]:
#for example using randn
df_1 = DataFrame(randn(3,2), index = ['phill', 'angela', 'michael'], columns = list('ab'))
df_1

Unnamed: 0,a,b
phill,1.375402,0.380087
angela,0.023274,0.377312
michael,0.234272,0.385452


In [31]:
#to reindex rows and fill them with 23
df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g'], fill_value = 23)

Unnamed: 0,d,r,u,n,k
a,0.913747,0.011323,0.773519,0.69344,0.564718
b,0.797631,0.467265,0.441079,0.305198,0.620102
c,0.612508,0.336756,0.364927,0.543276,0.196298
d,0.370039,0.93085,0.990737,0.027525,0.120823
e,0.600214,0.238313,0.758946,0.072403,0.954785
f,23.0,23.0,23.0,23.0,23.0
g,23.0,23.0,23.0,23.0,23.0


In [32]:
#to reindex the columns and fill the new column values with 100
df_1.reindex(columns = ['a', 'b', 'c', 'd'], fill_value = 100)

Unnamed: 0,a,b,c,d
phill,1.375402,0.380087,100,100
angela,0.023274,0.377312,100,100
michael,0.234272,0.385452,100,100


In [38]:
#You can select a list with loc or iloc
df.loc[['a','b'], ['d', 'k']]

Unnamed: 0,d,k
a,0.913747,0.564718
b,0.797631,0.620102


In [39]:
df.iloc[[0, 1, 3], :2]

Unnamed: 0,d,r
a,0.913747,0.011323
b,0.797631,0.467265
d,0.370039,0.93085


In [41]:
#quick tip: dropping entries in pandas data frames and series
dl = DataFrame(rand(3,2), index = ['a','b','c'], columns = list('ab'))
dl

Unnamed: 0,a,b
a,0.933605,0.327167
b,0.789118,0.267957
c,0.023228,0.883843


In [54]:
#to drop use the df.drop(index)
dl.drop('b')

Unnamed: 0,a,b
a,0.933605,0.327167
c,0.023228,0.883843


In [46]:
#for pandas series
a = Series([1,2,3])
a.drop(2)

0    1
1    2
dtype: int64

In [50]:
#Another way of creating a dataframe
DataFrame(np.arange(9).reshape(3,3))

Unnamed: 0,0,1,2
0,0,1,2
1,3,4,5
2,6,7,8
