# Data Manipulation: Indexing, Slicing, and Filtering

I reviewed:

- Indexing
- Slicing
- Conditional Filtering
- Conditional Filling data with `where()`


In [1]:
import pandas as pd
import numpy as np

In [2]:
data = {
     'name': ['Xavier', 'Ann', 'Jana', 'Yi', 'Robin', 'Amal', 'Nori'],
     'city': ['Mexico City', 'Toronto', 'Prague', 'Shanghai',
              'Manchester', 'Cairo', 'Osaka'],
     'age': [41, 28, 33, 34, 38, 31, 37],
     'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
 }

row_labels = [101, 102, 103, 104, 105, 106, 107]
df = pd.DataFrame(data=data, index=row_labels)

df

Unnamed: 0,name,city,age,py-score
101,Xavier,Mexico City,41,88.0
102,Ann,Toronto,28,79.0
103,Jana,Prague,33,81.0
104,Yi,Shanghai,34,80.0
105,Robin,Manchester,38,68.0
106,Amal,Cairo,31,61.0
107,Nori,Osaka,37,84.0


## Indexing

### Entire Columns or Rows

There are different ways to access to data:

1. `df[<column_name>]` or `df.column_name` access the entire column, returns a Series.
2. `df.loc[<row_lable>]` access to the entire row, returns a Series.

In [5]:
df['name']

101    Xavier
102       Ann
103      Jana
104        Yi
105     Robin
106      Amal
107      Nori
Name: name, dtype: object

In [6]:
df.loc[101]

name             Xavier
city        Mexico City
age                  41
py-score           88.0
Name: 101, dtype: object

### Access to specific values

However, you can also access to specific values in the DataFrame. To do that,
you can use `df.loc` or `df.iloc`, which have more powerful behavior.

1. `.loc[]` accepts the labels of rows and columns and returns Series or DataFrames. You can use it to get entire rows or columns, as well as their parts.
2. `.iloc[]` accepts the zero-based indices of rows and columns and returns Series or DataFrames. You can use it to get entire rows or columns, or their parts.

**Note**: the difference between both: `.loc[]` uses the row and column labels,
while `.iloc[]` uses the absolute numeric position.

**Note**: When when you need only a single value, pandas recommends using the specialized accessors `.at[]` and `.iat[]`. Both works similar to `.loc[]` and `.iloc[]` with the difference that `.at[]` and `.iat[]` needs
both row and column indices and return **always a single values**.

In [7]:
df.loc[101,'name']

'Xavier'

In [8]:
df.iloc[0,0]

'Xavier'

In [9]:
df.at[101, 'name']

'Xavier'

In [10]:
df.iat[0,0]

'Xavier'

## Slicing

Using the power of `.loc[]` and `.iloc[]` and slicing, you can access to 
different chunks of data in a DataFrame in a similar way as we can do with 
numpy arrays.

In other words, you can use the slice construct (`:`) similar to Numpy. 

### Slicing difference between `.loc[]` and `.iloc[]`

However, there is a difference between `.loc[]` and `.iloc[]`.

- `.loc[]` accepts a right inclusive slice. e.g `.iloc[1:5]` returns rows 
from 1 to 5 included.
- `.iloc[]` accepts a right exclusive slice. e.g `.iloc[1:5]` returns rows
from 1 to 4 (5 is not included). It is more consistent with numpy arrays.

**Note**: Don’t use tuples instead of lists or integer arrays to get ordinary rows or columns.

**Note**: Instead of using the slicing construct (`:`), you could also use 
the built-in Python class `slice()`, as well as `np.s_[]` or `pd.IndexSlice[]`:

In [3]:
# access to a column

df.loc[:,'name']

# the alternative coul be df.iloc[:,0]

101    Xavier
102       Ann
103      Jana
104        Yi
105     Robin
106      Amal
107      Nori
Name: name, dtype: object

In [12]:
# using slices and lists to return a part of the DataFrame
df.loc[101:105, ['name', 'city']]

Unnamed: 0,name,city
101,Xavier,Mexico City
102,Ann,Toronto
103,Jana,Prague
104,Yi,Shanghai
105,Robin,Manchester


In [16]:
df.iloc[0:5, [0,1]]

Unnamed: 0,name,city
101,Xavier,Mexico City
102,Ann,Toronto
103,Jana,Prague
104,Yi,Shanghai
105,Robin,Manchester


### Skiping rows and columns
You can skip rows and columns with .iloc[] with a step parameter, in the same
way as you can do with numpy arrays.

In [17]:
# rows from 1 to 5 (6 not included) with a step 2
df.iloc[1:6:2, 0]

102     Ann
104      Yi
106    Amal
Name: name, dtype: object

NOTE: important to note that you can use 

df[<row index position>] <- as you where using iloc. It is not necessary to use iloc, but it only works to select rows (not rows and cols at the same time)
df[<column names>]

## Setting Data



## Filtering
