In [0]:
import pandas as pd
import numpy as np

## 3. Indexing and Selection (Continued)

When you want to select a certain row, try loc and iloc. 

In [0]:
data=pd.DataFrame(np.arange(16).reshape(4,4),index=['Ohio','Colorado','Utah','New York'],
                 columns=['one','two','three','four'])
data

Unnamed: 0,one,two,three,four
Ohio,0,1,2,3
Colorado,4,5,6,7
Utah,8,9,10,11
New York,12,13,14,15


In [0]:
data.loc['Colorado',['two','three']]

two      5
three    6
Name: Colorado, dtype: int64

In [0]:
#silimar selection with integers
data.iloc[1,[1,2]]

two      5
three    6
Name: Colorado, dtype: int64

In [0]:
data.iloc[2] #select the entire row of index 2. 

one       8
two       9
three    10
four     11
Name: Utah, dtype: int64

In [0]:
data.iloc[:,:3][data.three>5]

Unnamed: 0,one,two,three
Colorado,4,5,6
Utah,8,9,10
New York,12,13,14


In [0]:
data.at['Ohio','one']#select a single scalar value by row and column label. 

0

## 4. Arithmetic and Data Alignment

You can add same type of objects. We will try Seires and then DataFrame. 

When there is mismatch of labels, the internal data alignment introduces missing values in the label location that don't overlap. Missing values will then be passed in future arithmetic computations. 

In [0]:
s1=pd.Series([1.2,3.4],index=['a','c'])
s1

a    1.2
c    3.4
dtype: float64

In [0]:
s2=pd.Series([-1.2,2.,-3.4,5.0],index=['a','b','c','d'])
s2

a   -1.2
b    2.0
c   -3.4
d    5.0
dtype: float64

In [0]:
s1+s2

a    0.0
b    NaN
c    0.0
d    NaN
dtype: float64

In [0]:
df1=pd.DataFrame(np.arange(6.).reshape(2,3),columns=list('bcd'),index=list('AB'))
df1

Unnamed: 0,b,c,d
A,0.0,1.0,2.0
B,3.0,4.0,5.0


In [0]:
df2=pd.DataFrame(np.arange(4.).reshape(2,2),
                 columns=list('ab'),index=list('AC'))
df2

Unnamed: 0,a,b
A,0.0,1.0
C,2.0,3.0


In [0]:
df1+df2

Unnamed: 0,a,b,c,d
A,,1.0,,
B,,,,
C,,,,


**Arithmetic between DataFrame and Series**
matches the index of the Series on the DataFrame's **columns**, broadcasting down the rows. 

In [0]:
df1

Unnamed: 0,b,c,d
A,0.0,1.0,2.0
B,3.0,4.0,5.0


In [0]:
#do you remember how to select row 0 from DataFrame?
S=df1.iloc[0]
S
#note: this Series contains the column name from the DataFrame 

b    0.0
c    1.0
d    2.0
Name: A, dtype: float64

In [0]:
df1-S

Unnamed: 0,b,c,d
A,0.0,0.0,0.0
B,3.0,3.0,3.0


In [0]:
df1+S

Unnamed: 0,b,c,d
A,0.0,2.0,4.0
B,3.0,5.0,7.0


In [0]:
#What if we define a new pure Series. 
S2=pd.Series(range(4),index=list('bcde'))
S2

b    0
c    1
d    2
e    3
dtype: int64

In [0]:
df1+S2

Unnamed: 0,b,c,d,e
A,0.0,2.0,4.0,
B,3.0,5.0,7.0,


Numpy element-wise array methods also work with pandas objects. 

In [0]:
frame=pd.DataFrame(np.random.randn(4,3),
                   columns=list('abc'),index=list('ABCD'))
frame

Unnamed: 0,a,b,c
A,0.334692,-0.417124,-0.196852
B,0.425473,-0.097992,0.854161
C,0.269041,0.041798,-1.089126
D,-0.062225,0.677381,-1.233493


In [0]:
np.abs(frame)

Unnamed: 0,a,b,c
A,0.334692,0.417124,0.196852
B,0.425473,0.097992,0.854161
C,0.269041,0.041798,1.089126
D,0.062225,0.677381,1.233493


## 5. Sorting and Ranking
The data is sorted by rows or columns in lexicographically asceding order by default. We apply sort_index method in both Series and DataFrame. 

In [0]:
S=pd.Series(range(4),index=list('cdba'))
S

c    0
d    1
b    2
a    3
dtype: int64

In [0]:
S.sort_index()

a    3
b    2
c    0
d    1
dtype: int64

In [0]:
frame=pd.DataFrame(np.arange(8).reshape(2,4),index=['B','A'],columns=list('badc'))
frame

Unnamed: 0,b,a,d,c
B,0,1,2,3
A,4,5,6,7


In [0]:
frame.sort_index()#sort rows

Unnamed: 0,b,a,d,c
A,4,5,6,7
B,0,1,2,3


In [0]:
frame.sort_index(axis=1)#sort columns

Unnamed: 0,a,b,c,d
B,1,0,3,2
A,5,4,7,6


In [0]:
#sort the index in descending order. 
frame.sort_index(axis=1,ascending=False)

Unnamed: 0,d,c,b,a
B,2,3,0,1
A,6,7,4,5


In [0]:
# To sort a Series by its values. 
# Any missing value will be sorted tot he end of the Series by default. 
S.sort_values()

c    0
d    1
b    2
a    3
dtype: int64

In [0]:
# To sort a DataFrame by its values. 
# Pass one or more column names to by option of sort_values. We sort in revser order. 
frame.sort_values(by='a',ascending=False)

Unnamed: 0,b,a,d,c
A,4,5,6,7
B,0,1,2,3


### Ranking
Ranking assigns ranks from 1 through the number of valide data points in an array. 
For example, a list [3,1.5, 3] can be sorted to ascending order first - [1.5, 3, 3]. 1.5 is at rank 1,two 3s are at rank 2 and 3. 
Those two 3s are called a tied group. 
By default, each 3 will have a mean rank 2.5. 

In [0]:
S=pd.Series([7,-5,7,4])
S.rank()

0    3.5
1    1.0
2    3.5
3    2.0
dtype: float64

Ranks can also be assigned to the order in which they are observed in the data. 
Take [1.5, 3,3] for example again. the first 3 will take rank 2, and the second 3 will take rank 3. 

In [0]:
S.rank(method='first')

0    3.0
1    1.0
2    4.0
3    2.0
dtype: float64

**More tie-breaking method with rank**

'avaerage': default one

'min': Use the minimum rank for the whole group

'max': Use the maximum rank for the whole group

'first':Assign ranks in the order the values appear in the data


In [0]:
S.rank(method='min')

0    3.0
1    1.0
2    3.0
3    2.0
dtype: float64

In [0]:
S.rank(method='max')

0    4.0
1    1.0
2    4.0
3    2.0
dtype: float64