#### Pandas Part 77: IntervalIndex Methods and MultiIndex

This notebook explores methods of IntervalIndex and introduces the MultiIndex class.

In [1]:
import pandas as pd
import numpy as np

##### 1. IntervalIndex Methods

Let's explore methods available on the IntervalIndex class.

### values Property

The `values` property returns the IntervalIndex's data as an IntervalArray.

In [2]:
# Create an IntervalIndex
interval_idx = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"IntervalIndex: {interval_idx}")
print(f"Type: {type(interval_idx)}")

# Get values
values = interval_idx.values
print(f"\nValues: {values}")
print(f"Type: {type(values)}")

IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')
Type: <class 'pandas.core.indexes.interval.IntervalIndex'>

Values: <IntervalArray>
[(0, 1], (1, 2], (2, 3]]
Length: 3, dtype: interval[int64, right]
Type: <class 'pandas.core.arrays.interval.IntervalArray'>


### from_arrays Method

The `from_arrays` class method constructs an IntervalIndex from two arrays defining the left and right bounds.

In [3]:
# Create arrays for left and right bounds
left = [0, 1, 2]
right = [1, 2, 3]

# Create IntervalIndex with default closed='right'
interval_idx = pd.IntervalIndex.from_arrays(left, right)
print(f"IntervalIndex (closed='right'): {interval_idx}")

# Create IntervalIndex with closed='left'
interval_idx = pd.IntervalIndex.from_arrays(left, right, closed='left')
print(f"\nIntervalIndex (closed='left'): {interval_idx}")

# Create IntervalIndex with closed='both'
interval_idx = pd.IntervalIndex.from_arrays(left, right, closed='both')
print(f"\nIntervalIndex (closed='both'): {interval_idx}")

# Create IntervalIndex with closed='neither'
interval_idx = pd.IntervalIndex.from_arrays(left, right, closed='neither')
print(f"\nIntervalIndex (closed='neither'): {interval_idx}")

IntervalIndex (closed='right'): IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

IntervalIndex (closed='left'): IntervalIndex([[0, 1), [1, 2), [2, 3)], dtype='interval[int64, left]')

IntervalIndex (closed='both'): IntervalIndex([[0, 1], [1, 2], [2, 3]], dtype='interval[int64, both]')

IntervalIndex (closed='neither'): IntervalIndex([(0, 1), (1, 2), (2, 3)], dtype='interval[int64, neither]')


In [4]:
# Create IntervalIndex with a name
interval_idx = pd.IntervalIndex.from_arrays(left, right, name='intervals')
print(f"IntervalIndex with name: {interval_idx}")
print(f"Name: {interval_idx.name}")

IntervalIndex with name: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]', name='intervals')
Name: intervals


### from_tuples Method

The `from_tuples` class method constructs an IntervalIndex from an array-like of tuples.

In [5]:
# Create tuples for intervals
tuples = [(0, 1), (1, 2), (2, 3)]

# Create IntervalIndex from tuples
interval_idx = pd.IntervalIndex.from_tuples(tuples)
print(f"IntervalIndex from tuples: {interval_idx}")

# Create IntervalIndex with closed='both'
interval_idx = pd.IntervalIndex.from_tuples(tuples, closed='both')
print(f"\nIntervalIndex from tuples (closed='both'): {interval_idx}")

IntervalIndex from tuples: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

IntervalIndex from tuples (closed='both'): IntervalIndex([[0, 1], [1, 2], [2, 3]], dtype='interval[int64, both]')


### from_breaks Method

The `from_breaks` class method constructs an IntervalIndex from an array of splits.

In [6]:
# Create breaks
breaks = [0, 1, 2, 3]

# Create IntervalIndex from breaks
interval_idx = pd.IntervalIndex.from_breaks(breaks)
print(f"IntervalIndex from breaks: {interval_idx}")

# Create IntervalIndex with closed='left'
interval_idx = pd.IntervalIndex.from_breaks(breaks, closed='left')
print(f"\nIntervalIndex from breaks (closed='left'): {interval_idx}")

IntervalIndex from breaks: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

IntervalIndex from breaks (closed='left'): IntervalIndex([[0, 1), [1, 2), [2, 3)], dtype='interval[int64, left]')


### contains Method

The `contains` method checks elementwise if the Intervals contain a value.

In [7]:
# Create an IntervalIndex
interval_idx = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"IntervalIndex: {interval_idx}")

# Check if intervals contain specific values
contains_0_5 = interval_idx.contains(0.5)
print(f"\nContains 0.5: {contains_0_5}")

contains_1 = interval_idx.contains(1)
print(f"Contains 1: {contains_1}")

contains_3 = interval_idx.contains(3)
print(f"Contains 3: {contains_3}")

IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

Contains 0.5: [ True False False]
Contains 1: [ True False False]
Contains 3: [False False  True]


### overlaps Method

The `overlaps` method checks elementwise if an Interval overlaps the values in the IntervalArray.

In [8]:
# Create an IntervalIndex
interval_idx = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"IntervalIndex: {interval_idx}")

# Check if intervals overlap with other intervals
other = pd.Interval(0.5, 1.5)
overlaps = interval_idx.overlaps(other)
print(f"\nOverlaps with {other}: {overlaps}")

other = pd.Interval(1, 2)
overlaps = interval_idx.overlaps(other)
print(f"Overlaps with {other}: {overlaps}")

other = pd.Interval(10, 20)
overlaps = interval_idx.overlaps(other)
print(f"Overlaps with {other}: {overlaps}")

IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

Overlaps with (0.5, 1.5]: [ True  True False]
Overlaps with (1, 2]: [False  True False]
Overlaps with (10, 20]: [False False False]


### set_closed Method

The `set_closed` method returns an IntervalArray identical to the current one, but closed on the specified side.

In [9]:
# Create an IntervalIndex
interval_idx = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"Original IntervalIndex (closed='right'): {interval_idx}")

# Change closed to 'left'
left_closed = interval_idx.set_closed('left')
print(f"\nClosed='left': {left_closed}")

# Change closed to 'both'
both_closed = interval_idx.set_closed('both')
print(f"\nClosed='both': {both_closed}")

# Change closed to 'neither'
neither_closed = interval_idx.set_closed('neither')
print(f"\nClosed='neither': {neither_closed}")

Original IntervalIndex (closed='right'): IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

Closed='left': IntervalIndex([[0, 1), [1, 2), [2, 3)], dtype='interval[int64, left]')

Closed='both': IntervalIndex([[0, 1], [1, 2], [2, 3]], dtype='interval[int64, both]')

Closed='neither': IntervalIndex([(0, 1), (1, 2), (2, 3)], dtype='interval[int64, neither]')


### to_tuples Method

The `to_tuples` method returns an ndarray of tuples of the form (left, right).

In [10]:
# Create an IntervalIndex
interval_idx = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"IntervalIndex: {interval_idx}")

# Convert to tuples
tuples = interval_idx.to_tuples()
print(f"\nTuples: {tuples}")
print(f"Type: {type(tuples)}")
print(f"First tuple: {tuples[0]}, type: {type(tuples[0])}")

IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

Tuples: Index([(0, 1), (1, 2), (2, 3)], dtype='object')
Type: <class 'pandas.core.indexes.base.Index'>
First tuple: (np.int64(0), np.int64(1)), type: <class 'tuple'>


##### 2. MultiIndex

MultiIndex is a hierarchical index object for pandas objects. It represents a multi-level, or hierarchical, index for an axis of a DataFrame or Series.

### Creating MultiIndex

In [11]:
# Create a MultiIndex from arrays
arrays = [["a", "a", "b", "b"], ["one", "two", "one", "two"]]
multi_idx = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
print(f"MultiIndex from arrays: {multi_idx}")

# Create a MultiIndex from tuples
tuples = [("a", "one"), ("a", "two"), ("b", "one"), ("b", "two")]
multi_idx = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
print(f"\nMultiIndex from tuples: {multi_idx}")

# Create a MultiIndex from product
iterables = [["a", "b"], ["one", "two"]]
multi_idx = pd.MultiIndex.from_product(iterables, names=["first", "second"])
print(f"\nMultiIndex from product: {multi_idx}")

MultiIndex from arrays: MultiIndex([('a', 'one'),
            ('a', 'two'),
            ('b', 'one'),
            ('b', 'two')],
           names=['first', 'second'])

MultiIndex from tuples: MultiIndex([('a', 'one'),
            ('a', 'two'),
            ('b', 'one'),
            ('b', 'two')],
           names=['first', 'second'])

MultiIndex from product: MultiIndex([('a', 'one'),
            ('a', 'two'),
            ('b', 'one'),
            ('b', 'two')],
           names=['first', 'second'])


### Using MultiIndex with Series and DataFrame

In [12]:
# Create a Series with MultiIndex
multi_idx = pd.MultiIndex.from_product([["a", "b"], ["one", "two"]], names=["first", "second"])
s = pd.Series(range(4), index=multi_idx)
print("Series with MultiIndex:")
print(s)

# Create a DataFrame with MultiIndex
df = pd.DataFrame({"A": range(4), "B": range(4, 8)}, index=multi_idx)
print("\nDataFrame with MultiIndex:")
print(df)

Series with MultiIndex:
first  second
a      one       0
       two       1
b      one       2
       two       3
dtype: int64

DataFrame with MultiIndex:
              A  B
first second      
a     one     0  4
      two     1  5
b     one     2  6
      two     3  7


### get_loc_level Method

The `get_loc_level` method gets the location for a label/slice/list/mask or a sequence of such at the specified level.

In [13]:
# Create a MultiIndex
mi = pd.MultiIndex.from_arrays([list('abb'), list('def')], names=['A', 'B'])
print(f"MultiIndex: {mi}")

# Get location for a label at the first level
loc, idx = mi.get_loc_level('b')
print(f"\nLocation of 'b': {loc}")
print(f"Remaining index: {idx}")

# Get location for a label at the second level
loc, idx = mi.get_loc_level('e', level='B')
print(f"\nLocation of 'e' at level 'B': {loc}")
print(f"Remaining index: {idx}")

# Get location for a complete label
loc, idx = mi.get_loc_level(['b', 'e'])
print(f"\nLocation of ['b', 'e']: {loc}")
print(f"Remaining index: {idx}")

MultiIndex: MultiIndex([('a', 'd'),
            ('b', 'e'),
            ('b', 'f')],
           names=['A', 'B'])

Location of 'b': slice(np.int64(1), np.int64(3), None)
Remaining index: Index(['e', 'f'], dtype='object', name='B')

Location of 'e' at level 'B': [False  True False]
Remaining index: Index(['b'], dtype='object', name='A')

Location of ['b', 'e']: 1
Remaining index: None


### get_indexer Method

The `get_indexer` method computes the indexer and mask for a new index given the current index.

In [14]:
# Create a MultiIndex
mi = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)])
print(f"MultiIndex: {mi}")

# Get indexer for exact matches
target = [('b', 2), ('c', 3), ('x', 1)]
indexer = mi.get_indexer(target)
print(f"\nIndexer for {target}: {indexer}")

# Get indexer with method='pad'
target = [('a', 3), ('d', 1)]
indexer = mi.get_indexer(target, method='pad')
print(f"\nIndexer for {target} with method='pad': {indexer}")
print(f"Values at these locations: {[mi[i] if i != -1 else None for i in indexer]}")

MultiIndex: MultiIndex([('a', 1),
            ('b', 2),
            ('c', 3)],
           )

Indexer for [('b', 2), ('c', 3), ('x', 1)]: [ 1  2 -1]

Indexer for [('a', 3), ('d', 1)] with method='pad': [0 2]
Values at these locations: [('a', np.int64(1)), ('c', np.int64(3))]


### get_level_values Method

The `get_level_values` method returns a vector of label values for the requested level.

In [15]:
# Create a MultiIndex
mi = pd.MultiIndex.from_arrays((list('abc'), list('def')))
mi.names = ['level_1', 'level_2']
print(f"MultiIndex: {mi}")

# Get level values by level number
level_0_values = mi.get_level_values(0)
print(f"\nLevel 0 values: {level_0_values}")

# Get level values by level name
level_2_values = mi.get_level_values('level_2')
print(f"Level 'level_2' values: {level_2_values}")

MultiIndex: MultiIndex([('a', 'd'),
            ('b', 'e'),
            ('c', 'f')],
           names=['level_1', 'level_2'])

Level 0 values: Index(['a', 'b', 'c'], dtype='object', name='level_1')
Level 'level_2' values: Index(['d', 'e', 'f'], dtype='object', name='level_2')


In [16]:
# Create a DataFrame with MultiIndex
index = pd.MultiIndex.from_product([['A', 'B'], ['one', 'two']], names=['letter', 'number'])
df = pd.DataFrame({'value': [1, 2, 3, 4]}, index=index)
print("DataFrame with MultiIndex:")
print(df)

# Get level values
print("\nLevel 'letter' values:")
print(df.index.get_level_values('letter'))

print("\nLevel 'number' values:")
print(df.index.get_level_values('number'))

DataFrame with MultiIndex:
               value
letter number       
A      one         1
       two         2
B      one         3
       two         4

Level 'letter' values:
Index(['A', 'A', 'B', 'B'], dtype='object', name='letter')

Level 'number' values:
Index(['one', 'two', 'one', 'two'], dtype='object', name='number')
