### B. Data Indexing, Selection and Iteration

Indexing and selection works in both Series and Dataframe.

Because DataFrame is made of Series, let's focus on how to select data in DataFrame. 

In [8]:
# importing numpy and pandas
import pandas as pd
import numpy as np

In [32]:
# Creating DataFrame from a dictionary

player = {'name': ['MJ', 'SP', 'DR', 'AI'],
             'number':[23, 33, 91, 3]}

In [33]:
df = pd.DataFrame(player, index = ['a', 'b', 'c', 'd'])
df

Unnamed: 0,name,number
a,MJ,23
b,SP,33
c,DR,91
d,AI,3


In [34]:
df['name']

a    MJ
b    SP
c    DR
d    AI
Name: name, dtype: object

In [35]:
df.name

a    MJ
b    SP
c    DR
d    AI
Name: name, dtype: object

In [36]:
df.number

a    23
b    33
c    91
d     3
Name: number, dtype: int64

In [37]:
df['number']

a    23
b    33
c    91
d     3
Name: number, dtype: int64

In [38]:
## When you have many columns, columns in list will be selected

df [['name', 'number']]

Unnamed: 0,name,number
a,MJ,23
b,SP,33
c,DR,91
d,AI,3


In [39]:
# This will return the first two rows
df [0:2]

Unnamed: 0,name,number
a,MJ,23
b,SP,33


You can also use `loc` to select data by the label indexes and `iloc` to select by default integer index (or by the position of the row)

In [40]:
df.loc['a']

name      MJ
number    23
Name: a, dtype: object

In [41]:
df.loc['b':'c']

Unnamed: 0,name,number
b,SP,33
c,DR,91


In [42]:
df.loc[:'c']

Unnamed: 0,name,number
a,MJ,23
b,SP,33
c,DR,91


In [43]:
df.iloc[0]

name      MJ
number    23
Name: a, dtype: object

In [44]:
df.iloc[2]

name      DR
number    91
Name: c, dtype: object

In [46]:
df.iloc[0:4]

Unnamed: 0,name,number
a,MJ,23
b,SP,33
c,DR,91
d,AI,3


### Conditional Selection

In [47]:
df

Unnamed: 0,name,number
a,MJ,23
b,SP,33
c,DR,91
d,AI,3


In [48]:
df[df['number'] == 3]

Unnamed: 0,name,number
d,AI,3


In [49]:
df[df['number'] < 33]

Unnamed: 0,name,number
a,MJ,23
d,AI,3


In [50]:
df[df['name'] == 'AI']

Unnamed: 0,name,number
d,AI,3


In [53]:
# You can use and (&) or (|) for more than conditions
#df [(condition 1) & (condition 2)]

df [(df['number'] == 33 ) | (df['name'] == 'AI') ]

Unnamed: 0,name,number
b,SP,33
d,AI,3


You can also use `isin()` and `where()` to select data in a series or dataframe.

In [54]:
# isin() return false or true when provided value is included in dataframe
sample_numbers_names=[1,3,33, 'AI', 'DR']

df.isin(sample_numbers_names)

Unnamed: 0,name,number
a,False,False
b,False,True
c,True,False
d,True,True


As you can see, it returned `True` wherever a country code or name was found. Otherwise, `False`. You can use a dictinary to match search by columns. A key must be a column and values are passed in list.

In [55]:
sample_numbers_names = {'number':[1,3,3], 'name':['AI', 'SC', 'DR']}

df.isin(sample_numbers_names)

Unnamed: 0,name,number
a,False,False
b,False,False
c,True,False
d,True,True


In [56]:
df2 = pd.DataFrame(np.array ([[1,2,3], [4,5,6], [7,8,9]]), 
                   columns = ['column 1', 'column 2', 'column 3'])

df2

Unnamed: 0,column 1,column 2,column 3
0,1,2,3
1,4,5,6
2,7,8,9


In [57]:
df2.isin([0,3,4,5,7])

Unnamed: 0,column 1,column 2,column 3
0,False,False,True
1,True,True,False
2,True,False,False


In [58]:
df2 [df2 > 4]

Unnamed: 0,column 1,column 2,column 3
0,,,
1,,5.0,6.0
2,7.0,8.0,9.0


In [59]:
df2.where(df2 > 4)

Unnamed: 0,column 1,column 2,column 3
0,,,
1,,5.0,6.0
2,7.0,8.0,9.0


`where` allows you to replace the values that doesn't meet the provided condition with any other value. So, if we do `df2.where(df2 > 4, 0)` as follows, all values less than `4` will be replaced by `0`.

In [60]:
df2.where(df2 > 4, 0)

Unnamed: 0,column 1,column 2,column 3
0,0,0,0
1,0,5,6
2,7,8,9


In [61]:
df2 [df2 <= 4] = 0
df2

Unnamed: 0,column 1,column 2,column 3
0,0,0,0
1,0,5,6
2,7,8,9
