##Indexing DataFrames

Import libraries.

In [3]:
import pandas as pd
import numpy  as np

Create `sales` dictionary.

In [5]:
sales={'eggs': [47, 110, 221, 77],
       'salt': [12., 50., 89., 87.],
       'spam': [17, 31, 72, 56]}

In [6]:
sales

Convert `sales` dict into pandas dataframe.

In [8]:
pd.DataFrame(sales)

Add row index for the pandas dataframe, and name the dataframe `sales_df`.

In [10]:
sales_df = pd.DataFrame(sales, index=['Jan', 'Feb', 'Mar', 'Apr'])

In [11]:
sales_df

Now we have a pandas dataframe with `4` rows and `3` columns.

The pandas library offers several ways of indexing data in dataframes:

1. Indexing using square brackets, example: df[`column`][`row`];
1. Using column attribute and row label, example: df.`column`[`row`];
1. Using the `.loc` accessor, example: df.loc[`row`, `column`];
1. Using the `.iloc` accessor, example: df.iloc[`row number`, `column number`].

Let's try accessing the dataframe in different ways.

In [14]:
sales_df['salt']['Mar']

In [15]:
sales_df.salt['Mar']

In [16]:
sales_df.loc['Mar', 'salt']

In [17]:
sales_df.iloc[2, 1]

##Slicing DataFrames

Slicing and indexing using `.loc` with labels:
- 'Jan' to 'Apr' rows (inclusive);
- 'eggs' to 'salt' columns (inclusive).

In [20]:
sales_df.loc['Jan':'Apr', 'eggs':'salt']

Slicing and indexing using `.iloc` with positions:

- From row 1 up to but not including row 3;
- From column 1 onwards.

In [22]:
sales_df.iloc[1:3, 1:]

Lists are also acceptable in indexing DataFrames using `.loc` and `.iloc`, such as:

In [24]:
sales_df.iloc[[0, 2, 3], 0:2]

Selecting Series vs. 1-column DataFrame

In [26]:
type(sales_df['eggs'])

In [27]:
type(sales_df[['eggs']])

Slice the row labels 'Feb' to 'Apr'.

In [29]:
sales_df.loc['Feb':'Apr']

Slice the row labels in reverse order from 'Apr' to 'Feb', with `-1`.

In [31]:
sales_df.loc['Apr':'Feb':-1]

Subselecting DataFrames with lists

In [33]:
rows = ['Feb', 'Mar']
cols = ['salt', 'spam']
sales_df.loc[rows, cols]

##Filtering DataFrames

Create a Boolean Series.

In [36]:
sales_df.salt > 50

Filtering with a Boolean Series.

In [38]:
sales_df[sales_df.salt > 50]

Combining filters with various logical operators.

In [40]:
sales_df[(sales_df.salt > 50) & (sales_df.eggs > 80)] 

In [41]:
sales_df[(sales_df.salt > 50) | (sales_df.eggs > 80)] 

Filtering a column based on another.

In [43]:
sales_df.eggs[sales_df.salt > 25]

DataFrames with zeros and NaNs

In [45]:
df2 = sales_df.copy()
df2['bacon'] = [0, 0, 50, 60]
df2

Select columns with all non-zero values.

In [47]:
df2.loc[:, df2.all()]

Select columns with any non-zero values (contain at least one value that is non-zero).

In [49]:
df2.loc[:, df2.any()]

Select columns without any NaN values by jointly using `notnull()` and `all()` methods.

In [51]:
df2.loc[:, df2.notnull().all()]

Reversely, we can also select columns with at least one NaN value by using `isnull()` and `any()` methods. However, in `df2` there's no such column.

In [53]:
df2.loc[:, df2.isnull().any()]

Drop rows with any or all NaNs with `dropna()`. However, in our example, none of the columns contain NaN values, so nothing was dropped.

In [55]:
df2.dropna(how='any')
df2.dropna(how='all')

##Transforming DataFrames

DataFrame vectorized methods

In [58]:
sales_df.floordiv(12) #Convert all values in dozens unit

Np vectorized methods

In [60]:
np.floor_divide(sales_df, 12)

Plain Python functions with `def`.

In [62]:
def dozens(n):
  return n//12;

In [63]:
sales_df.apply(dozens)

One-time use Python functions with `lambda`.

In [65]:
sales_df.apply(lambda n: n//12)

Storing a transformation with a new column

In [67]:
sales_df['eggs_in_dozens'] = sales_df.eggs.floordiv(12)

In [68]:
sales_df

Working with string values of the index.

In [70]:
sales_df.index

In [71]:
sales_df.index = sales_df.index.str.upper() #Transform the indexes into all upper case

In [72]:
sales_df.index

`index` methods can't use `apply` methods, but have `map` instead.

In [74]:
sales_df.index = sales_df.index.map(str.lower) #Transforms the indexes into lower case

In [75]:
sales_df.index

Defining columns using computation of other columns.

In [77]:
sales_df['salt&eggs'] = sales_df['salt'] + sales_df['eggs']

In [78]:
sales_df

Add quarter to `df`.

In [80]:
sales_df['quarter'] = ['first', 'first', 'first', 'second']

In [81]:
sales_df

Use `.map()` to create the quarter number with a dict.

In [83]:
quarter_in_num = {'first': '1st', 'second': '2nd', 'third': '3rd', 'fourth': '4th'}
sales_df['quarter#'] =  sales_df.quarter.map(quarter_in_num)

In [84]:
sales_df

__The End__