# Slicing DataFrames

In [4]:
import pandas as pd
import numpy as np

In [15]:
# Define the multi-level index
index = pd.MultiIndex.from_tuples([
    ('Chihuahua', 'Tan'),
    ('Chow Chow', 'Brown'),
    ('Labrador', 'Black'),
    ('Labrador', 'Brown'),
    ('Poodle', 'Black'),
    ('Schnauzer', 'Grey'),
    ('St. Bernard', 'White')
], names=['breed', 'color'])

# Create the DataFrame
df = pd.DataFrame({
    'name': ['Stella', 'Lucy', 'Max', 'Bella', 'Charlie', 'Cooper', 'Bernie'],
    'height_cm': [18, 46, 59, 56, 43, 49, 77],
    'weight_kg': [2, 22, 29, 25, 23, 17, 74]
}, index=index)

In [16]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chihuahua,Tan,Stella,18,2
Chow Chow,Brown,Lucy,46,22
Labrador,Black,Max,59,29
Labrador,Brown,Bella,56,25
Poodle,Black,Charlie,43,23
Schnauzer,Grey,Cooper,49,17
St. Bernard,White,Bernie,77,74


## Slicing dataframe using the indexes

There are 2 differences compared to slicing lists.
1. Rather than specifying row numbers, you specify index values.
2. Secondly, notice that the final value is included. Here, Poodle is
included in the results.

In [7]:
df.loc["Chow Chow":"Poodle"]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chow Chow,Brown,Lucy,46,22
Labrador,Black,Max,59,29
Labrador,Brown,Bella,56,25
Poodle,Black,Charlie,43,23


### The same technique doesn't work on inner index levels.

⚠️pandas doesn't throw an error to let you know
that there is a problem, so be careful when coding.

In [8]:
df.loc["Tan":"Grey"]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


### Slicing at inner index levels
The correct approach to slicing at inner index levels
is to pass the first and last positions as tuples.
Here, the first element to include is a tuple of Labrador and Brown.

In [9]:
df.loc[("Chihuahua", "Tan"):("Schnauzer", "Grey")]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm,weight_kg
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chihuahua,Tan,Stella,18,2
Chow Chow,Brown,Lucy,46,22
Labrador,Black,Max,59,29
Labrador,Brown,Bella,56,25
Poodle,Black,Charlie,43,23
Schnauzer,Grey,Cooper,49,17


## Slicing columns
Since DataFrames are two-dimensional objects, you can also slice columns. You do this by passing two arguments to loc. The simplest case involves subsetting columns but keeping all rows. To do this, pass a colon as the first argument to loc. As with slicing lists, a colon by itself means "keep everything." The second argument takes column names as the first and last positions to slice on.

In [11]:
df.loc[:, "name":"height_cm"]

Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1
Chihuahua,Tan,Stella,18
Chow Chow,Brown,Lucy,46
Labrador,Black,Max,59
Labrador,Brown,Bella,56
Poodle,Black,Charlie,43
Schnauzer,Grey,Cooper,49
St. Bernard,White,Bernie,77


## Slice twice
You can slice on rows and columns at the same time: simply pass the appropriate slice to each argument. Here, you see the previous two slices being performed in the same line of code.

In [12]:
df.loc[
        ("Labrador", "Brown"):("Schnauzer", "Grey"),
        "name" :"height_cm"
        ]


Unnamed: 0_level_0,Unnamed: 1_level_0,name,height_cm
breed,color,Unnamed: 2_level_1,Unnamed: 3_level_1
Labrador,Brown,Bella,56
Poodle,Black,Charlie,43
Schnauzer,Grey,Cooper,49


## Slicing by dates
You slice dates with the same syntax as other types. The first and last dates are passed as strings.
> Pandas understands dates and works efficiently with them

In [24]:
data = [
    {"date_of_birth": "2011-12-11", "name": "Cooper", "breed": "Schanuzer", "color": "Grey", "height_cm": 49, "weight_kg": 17},
    {"date_of_birth": "2013-07-01", "name": "Bella", "breed": "Labrador", "color": "Brown", "height_cm": 56, "weight_kg": 25},
    {"date_of_birth": "2014-08-25", "name": "Lucy", "breed": "Chow Chow", "color": "Brown", "height_cm": 46, "weight_kg": 22},
    {"date_of_birth": "2015-04-20", "name": "Stella", "breed": "Chihuahua", "color": "Tan", "height_cm": 18, "weight_kg": 2},
    {"date_of_birth": "2016-09-16", "name": "Charlie", "breed": "Poodle", "color": "Black", "height_cm": 43, "weight_kg": 23},
    {"date_of_birth": "2017-01-20", "name": "Max", "breed": "Labrador", "color": "Black", "height_cm": 59, "weight_kg": 29},
    {"date_of_birth": "2018-02-27", "name": "Bernie", "breed": "St. Bernard", "color": "White", "height_cm": 77, "weight_kg": 74},
]

dogs = pd.DataFrame(data)
dogs['date_of_birth'] = pd.to_datetime(dogs['date_of_birth'])
dogs.set_index('date_of_birth', inplace=True)


In [25]:
dogs

Unnamed: 0_level_0,name,breed,color,height_cm,weight_kg
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011-12-11,Cooper,Schanuzer,Grey,49,17
2013-07-01,Bella,Labrador,Brown,56,25
2014-08-25,Lucy,Chow Chow,Brown,46,22
2015-04-20,Stella,Chihuahua,Tan,18,2
2016-09-16,Charlie,Poodle,Black,43,23
2017-01-20,Max,Labrador,Black,59,29
2018-02-27,Bernie,St. Bernard,White,77,74


In [None]:
dogs.loc["2014-08-25":"2016-09-16"]

Unnamed: 0_level_0,name,breed,color,height_cm,weight_kg
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-08-25,Lucy,Chow Chow,Brown,46,22
2015-04-20,Stella,Chihuahua,Tan,18,2
2016-09-16,Charlie,Poodle,Black,43,23


## Slicing by partial dates

One helpful feature is that you can slice by partial dates. Here, the first and last positions are only specified as 2014 and 2016, with no month or day parts. pandas interprets this as slicing from the start of 2014 to the end of 2016; that is, all dates in 2014, 2015, and 2016.

In [27]:
dogs.loc["2014":"2016"]

Unnamed: 0_level_0,name,breed,color,height_cm,weight_kg
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-08-25,Lucy,Chow Chow,Brown,46,22
2015-04-20,Stella,Chihuahua,Tan,18,2
2016-09-16,Charlie,Poodle,Black,43,23


## Subsetting by row/column number
You can also slice DataFrames by row or column number using the iloc method. This uses a similar syntax to slicing lists, except that there are two arguments: one for rows and one for columns.

**Notice** that, like list slicing but unlike loc, the final values aren't included in the slice. In this case, the fifth row and fourth column aren't included.

In [29]:
dogs.iloc[2:5, 1:4]

Unnamed: 0_level_0,breed,color,height_cm
date_of_birth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-08-25,Chow Chow,Brown,46
2015-04-20,Chihuahua,Tan,18
2016-09-16,Poodle,Black,43


# Recap

Slicing index values
Slicing lets you select consecutive elements of an object using first:last syntax. DataFrames can be sliced by index values or by row/column number; we'll start with the first case. This involves slicing inside the .loc[] method.

Compared to slicing lists, there are a few things to remember.

1. You can only slice an index if the index is sorted (using .sort_index()).
2. To slice at the outer level, first and last can be strings.
3. To slice at inner levels, first and last should be tuples.
4. If you pass a single slice to .loc[], it will slice the rows.