# Indexing using loc. iloc and normal df notation

In [92]:
import pandas as pd

In [93]:
# Create a DataFrame with dummy data.
df = pd.DataFrame(data={'staff_no': [9999] * 5,
                        'name': ['Dean McGrath'] * 5,
                        'year': [2016, 2017, 2018, 2019, 2020],
                        'hours': [349, 231, 876, 679, 976]},
                        index=[2,6,4,8,9])
df

Unnamed: 0,staff_no,name,year,hours
2,9999,Dean McGrath,2016,349
6,9999,Dean McGrath,2017,231
4,9999,Dean McGrath,2018,876
8,9999,Dean McGrath,2019,679
9,9999,Dean McGrath,2020,976


Use df.loc[ ] and df.iloc[ ] to select only rows, only columns or both.

Use df.at[ ] and df.iat[ ] to access a single value by row and column.

First index selects rows, second index columns.

## General Guidance 

1. To do slicing for both row and column, .loc or .iloc must be used
2. When using .loc or .iloc, both row and columns need to be mentioned in the [ ] if the column is mentioned
3. When using .loc or .iloc, row index comes before column while in normal df notation, it is vice versa
4. When slicing by index or column label, results displayed are inclusive of boundaries for both
5. During normal df notation, row indexing or slicing can be used without using .loc or .iloc: df[['hours','year']][1:3]

## df Location Notations:
1. df.loc[row space,column space]

2. df.loc[2:4,'name':'hours'] # slicing rows and columns using row and column labels

3. df.iloc[2:4,1:3] # slicing rows and columns using row index and column position index

4. df.iloc[2:4,[1,3]] # slicing rows and calling columns by position index

5. Error if column lable used in iloc: df.iloc[1:3,'name':'hours'] 

6. df.loc[df['year']>2018,'name':'hours']  # Row filter with column slicing

7. Error if filtering with iloc: df.iloc[df['year']>2018,0:2]

8. df.loc[2:4] or df.iloc[2:4] displayes selected rows and all column

9. Error if df.loc[1:3] since index lables 1 and 3 don't exist

## Normal df notation:
1. df[1:3] # Rows with index 1 and 2 displayed

2. Error if df[1:3][0]

3. df.iloc[1:4][['name','hours']] # Rows with index 1 to 3 selected with iloc and normal df notation applied to result to show columns for name and hours only


# Select specific rows using index

In [94]:
df.iloc[2:4]    # Select rows (2-4]. Not that this df has an index number which is ignored. 
                # Right boundary is ignored as usual 
                # unlike the case of column indexing which is inclusive (see example below)

Unnamed: 0,staff_no,name,year,hours
4,9999,Dean McGrath,2018,876
8,9999,Dean McGrath,2019,679


# Select specific columns using column index (position)  

In [95]:
df.iloc[:, [1, 2, 3]] # Select columns in positions 1, 2 and 5 (first column is 0).    

Unnamed: 0,name,year,hours
2,Dean McGrath,2016,349
6,Dean McGrath,2017,231
4,Dean McGrath,2018,876
8,Dean McGrath,2019,679
9,Dean McGrath,2020,976


# Select specific columns using thier names

In [96]:
df.loc[:, 'name':'hours'] # Select all columns between column2 and column4 (inclusive).

Unnamed: 0,name,year,hours
2,Dean McGrath,2016,349
6,Dean McGrath,2017,231
4,Dean McGrath,2018,876
8,Dean McGrath,2019,679
9,Dean McGrath,2020,976


## Referencing using index name

In [97]:
df.loc[2:4] # Note rows between index label 2 and 4 (inclusive) are called not index number
            # Meaning df.iloc[2:4] will have a different output. See below

Unnamed: 0,staff_no,name,year,hours
2,9999,Dean McGrath,2016,349
6,9999,Dean McGrath,2017,231
4,9999,Dean McGrath,2018,876


In [98]:
df.iloc[2:4]

Unnamed: 0,staff_no,name,year,hours
4,9999,Dean McGrath,2018,876
8,9999,Dean McGrath,2019,679


## Normal df referencing 

In [99]:
df[['name','hours']][1:4]   # Normal df referencing: note that coulmn name comes before index
                            # and you can slice index. Default index number is used not index label.

Unnamed: 0,name,hours
6,Dean McGrath,231
4,Dean McGrath,876
8,Dean McGrath,679


In [100]:
df.iloc[1:4][['name','hours']]

Unnamed: 0,name,hours
6,Dean McGrath,231
4,Dean McGrath,876
8,Dean McGrath,679


Column name slicing doesn't work using normal data frame notation
df['name':'hours'][1:4]

# Select Specific rows and columns using row filter

In [101]:
# Select rows meeting logical condition, and only the specific columns

df.loc[df['year'] > 2018, ['name', 'hours']]

Unnamed: 0,name,hours
8,Dean McGrath,679
9,Dean McGrath,976


# Access a single value by index

In [102]:
df.iat[1, 2] # Access single value by index

2017

# Access a single value by label

In [104]:
df.at[4, 'year'] # Access single value by label

2018