#### Pandas Part 76: More Index Methods and IntervalIndex

This notebook explores additional Index methods and the IntervalIndex class.

In [1]:
import pandas as pd
import numpy as np

##### 1. Additional Index Methods

Let's explore more methods available on pandas Index objects.

### to_series Method

The `to_series` method creates a Series with both index and values equal to the index keys.

In [2]:
# Create an Index
idx = pd.Index(['a', 'b', 'c', 'd'])
print(f"Index: {idx}")

# Convert to Series
series = idx.to_series()
print("\nSeries with default parameters:")
print(series)

# Convert to Series with custom index
series = idx.to_series(index=[10, 20, 30, 40])
print("\nSeries with custom index:")
print(series)

# Convert to Series with custom name
series = idx.to_series(name='letters')
print("\nSeries with custom name:")
print(series)

Index: Index(['a', 'b', 'c', 'd'], dtype='object')

Series with default parameters:
a    a
b    b
c    c
d    d
dtype: object

Series with custom index:
10    a
20    b
30    c
40    d
dtype: object

Series with custom name:
a    a
b    b
c    c
d    d
Name: letters, dtype: object


### tolist Method

The `tolist` method returns a list of the values in the Index.

In [3]:
# Create different types of indices
str_idx = pd.Index(['a', 'b', 'c'])
int_idx = pd.Index([1, 2, 3])
float_idx = pd.Index([1.1, 2.2, 3.3])
date_idx = pd.date_range('2023-01-01', periods=3)

# Convert to lists
str_list = str_idx.tolist()
int_list = int_idx.tolist()
float_list = float_idx.tolist()
date_list = date_idx.tolist()

print(f"String index to list: {str_list}, type: {type(str_list[0])}")
print(f"Integer index to list: {int_list}, type: {type(int_list[0])}")
print(f"Float index to list: {float_list}, type: {type(float_list[0])}")
print(f"Date index to list: {date_list}, type: {type(date_list[0])}")

String index to list: ['a', 'b', 'c'], type: <class 'str'>
Integer index to list: [1, 2, 3], type: <class 'int'>
Float index to list: [1.1, 2.2, 3.3], type: <class 'float'>
Date index to list: [Timestamp('2023-01-01 00:00:00'), Timestamp('2023-01-02 00:00:00'), Timestamp('2023-01-03 00:00:00')], type: <class 'pandas._libs.tslibs.timestamps.Timestamp'>


### transpose Method

The `transpose` method returns the transpose, which for an Index is just itself.

In [4]:
# Create an Index
idx = pd.Index(['a', 'b', 'c'])
print(f"Original index: {idx}")

# Get the transpose
transposed = idx.transpose()
print(f"Transposed index: {transposed}")

# Check if they are the same object
print(f"Are they the same object? {idx is transposed}")

Original index: Index(['a', 'b', 'c'], dtype='object')
Transposed index: Index(['a', 'b', 'c'], dtype='object')
Are they the same object? True


### union Method

The `union` method forms the union of two Index objects.

In [5]:
# Create two indices with matching dtypes
idx1 = pd.Index([1, 2, 3, 4])
idx2 = pd.Index([3, 4, 5, 6])
print(f"idx1: {idx1}")
print(f"idx2: {idx2}")

# Find the union
union = idx1.union(idx2)
print(f"\nUnion: {union}")

idx1: Index([1, 2, 3, 4], dtype='int64')
idx2: Index([3, 4, 5, 6], dtype='int64')

Union: Index([1, 2, 3, 4, 5, 6], dtype='int64')


In [6]:
# Create two indices with mismatched dtypes
idx1 = pd.Index(['a', 'b', 'c', 'd'])
idx2 = pd.Index([1, 2, 3, 4])
print(f"idx1: {idx1}, dtype: {idx1.dtype}")
print(f"idx2: {idx2}, dtype: {idx2.dtype}")

# Find the union
union = idx1.union(idx2)
print(f"\nUnion: {union}")
print(f"Union dtype: {union.dtype}")

idx1: Index(['a', 'b', 'c', 'd'], dtype='object'), dtype: object
idx2: Index([1, 2, 3, 4], dtype='int64'), dtype: int64

Union: Index(['a', 'b', 'c', 'd', 1, 2, 3, 4], dtype='object')
Union dtype: object


In [7]:
# Union with sort parameter
idx1 = pd.Index([5, 3, 1, 4, 2])
idx2 = pd.Index([4, 3, 6, 5])
print(f"idx1: {idx1}")
print(f"idx2: {idx2}")

# Default: sort=None
union = idx1.union(idx2)
print(f"\nUnion (sort=None): {union}")

# With sort=False
union = idx1.union(idx2, sort=False)
print(f"Union (sort=False): {union}")

idx1: Index([5, 3, 1, 4, 2], dtype='int64')
idx2: Index([4, 3, 6, 5], dtype='int64')

Union (sort=None): Index([1, 2, 3, 4, 5, 6], dtype='int64')
Union (sort=False): Index([5, 3, 1, 4, 2, 6], dtype='int64')


### unique Method

The `unique` method returns unique values in the index in order of appearance.

In [8]:
# Create an Index with duplicates
idx = pd.Index(['a', 'b', 'c', 'a', 'b', 'd'])
print(f"Index with duplicates: {idx}")

# Get unique values
unique_idx = idx.unique()
print(f"\nUnique values: {unique_idx}")

Index with duplicates: Index(['a', 'b', 'c', 'a', 'b', 'd'], dtype='object')

Unique values: Index(['a', 'b', 'c', 'd'], dtype='object')


##### 2. IntervalIndex

The IntervalIndex is an Index of Interval objects, representing a range between two values.

### Creating IntervalIndex

In [9]:
# Create an IntervalIndex from a list of Interval objects
intervals = [pd.Interval(0, 1), pd.Interval(1, 2), pd.Interval(2, 3)]
interval_idx = pd.IntervalIndex(intervals)
print(f"IntervalIndex from list of Intervals: {interval_idx}")

# Create an IntervalIndex from arrays
left = [0, 1, 2]
right = [1, 2, 3]
interval_idx = pd.IntervalIndex.from_arrays(left, right)
print(f"\nIntervalIndex from arrays: {interval_idx}")

# Create an IntervalIndex from breaks
breaks = [0, 1, 2, 3]
interval_idx = pd.IntervalIndex.from_breaks(breaks)
print(f"\nIntervalIndex from breaks: {interval_idx}")

IntervalIndex from list of Intervals: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

IntervalIndex from arrays: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

IntervalIndex from breaks: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')


### IntervalIndex Properties

IntervalIndex provides various properties to access its components.

In [10]:
# Create an IntervalIndex
interval_idx = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"IntervalIndex: {interval_idx}")

# Get left endpoints
left = interval_idx.left
print(f"\nLeft endpoints: {left}")

# Get right endpoints
right = interval_idx.right
print(f"Right endpoints: {right}")

# Get closed attribute
closed = interval_idx.closed
print(f"Closed: {closed}")

# Get midpoints
mid = interval_idx.mid
print(f"Midpoints: {mid}")

# Get lengths
length = interval_idx.length
print(f"Lengths: {length}")

IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')

Left endpoints: Index([0, 1, 2], dtype='int64')
Right endpoints: Index([1, 2, 3], dtype='int64')
Closed: right
Midpoints: Index([0.5, 1.5, 2.5], dtype='float64')
Lengths: Index([1, 1, 1], dtype='int64')


### is_empty Property

The `is_empty` property indicates if an interval is empty, meaning it contains no points.

In [12]:
# Create separate IntervalIndex objects for each closed option
right_intervals = pd.IntervalIndex([pd.Interval(0, 1, closed='right'), pd.Interval(0, 0, closed='right')])
print(f"right_intervals: {right_intervals}")
print(f"is_empty: {right_intervals.is_empty}")

left_intervals = pd.IntervalIndex([pd.Interval(0, 0, closed='left')])
print(f"\nleft_intervals: {left_intervals}")
print(f"is_empty: {left_intervals.is_empty}")

neither_intervals = pd.IntervalIndex([pd.Interval(0, 0, closed='neither')])
print(f"\nneither_intervals: {neither_intervals}")
print(f"is_empty: {neither_intervals.is_empty}")

both_intervals = pd.IntervalIndex([pd.Interval(0, 0, closed='both')])
print(f"\nboth_intervals: {both_intervals}")
print(f"is_empty: {both_intervals.is_empty}")

right_intervals: IntervalIndex([(0, 1], (0, 0]], dtype='interval[int64, right]')
is_empty: [False  True]

left_intervals: IntervalIndex([[0, 0)], dtype='interval[int64, left]')
is_empty: [ True]

neither_intervals: IntervalIndex([(0, 0)], dtype='interval[int64, neither]')
is_empty: [ True]

both_intervals: IntervalIndex([[0, 0]], dtype='interval[int64, both]')
is_empty: [False]


In [13]:
# IntervalIndex with NaN
intervals = [pd.Interval(0, 0, closed='neither'), np.nan]
interval_idx = pd.IntervalIndex(intervals)
print(f"IntervalIndex with NaN: {interval_idx}")
print(f"is_empty: {interval_idx.is_empty}")

IntervalIndex with NaN: IntervalIndex([(0.0, 0.0), nan], dtype='interval[float64, neither]')
is_empty: [ True False]


### is_non_overlapping_monotonic Property

The `is_non_overlapping_monotonic` property returns True if the IntervalIndex is non-overlapping and monotonic.

In [14]:
# Create a non-overlapping monotonic IntervalIndex
non_overlapping = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"Non-overlapping monotonic IntervalIndex: {non_overlapping}")
print(f"is_non_overlapping_monotonic: {non_overlapping.is_non_overlapping_monotonic}")

# Create an overlapping IntervalIndex
overlapping = pd.IntervalIndex([
    pd.Interval(0, 2),
    pd.Interval(1, 3),
    pd.Interval(2, 4)
])
print(f"\nOverlapping IntervalIndex: {overlapping}")
print(f"is_non_overlapping_monotonic: {overlapping.is_non_overlapping_monotonic}")

# Create a non-monotonic IntervalIndex
non_monotonic = pd.IntervalIndex([
    pd.Interval(0, 1),
    pd.Interval(2, 3),
    pd.Interval(1, 2)
])
print(f"\nNon-monotonic IntervalIndex: {non_monotonic}")
print(f"is_non_overlapping_monotonic: {non_monotonic.is_non_overlapping_monotonic}")

Non-overlapping monotonic IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')
is_non_overlapping_monotonic: True

Overlapping IntervalIndex: IntervalIndex([(0, 2], (1, 3], (2, 4]], dtype='interval[int64, right]')
is_non_overlapping_monotonic: False

Non-monotonic IntervalIndex: IntervalIndex([(0, 1], (2, 3], (1, 2]], dtype='interval[int64, right]')
is_non_overlapping_monotonic: False


### is_overlapping Property

The `is_overlapping` property returns True if the IntervalIndex has overlapping intervals.

In [15]:
# Create a non-overlapping IntervalIndex
non_overlapping = pd.IntervalIndex.from_breaks([0, 1, 2, 3])
print(f"Non-overlapping IntervalIndex: {non_overlapping}")
print(f"is_overlapping: {non_overlapping.is_overlapping}")

# Create an overlapping IntervalIndex
overlapping = pd.IntervalIndex([
    pd.Interval(0, 2),
    pd.Interval(1, 3),
    pd.Interval(2, 4)
])
print(f"\nOverlapping IntervalIndex: {overlapping}")
print(f"is_overlapping: {overlapping.is_overlapping}")

Non-overlapping IntervalIndex: IntervalIndex([(0, 1], (1, 2], (2, 3]], dtype='interval[int64, right]')
is_overlapping: False

Overlapping IntervalIndex: IntervalIndex([(0, 2], (1, 3], (2, 4]], dtype='interval[int64, right]')
is_overlapping: True


In [17]:
# Create IntervalIndex with all intervals closed on the right
right_closed = pd.IntervalIndex([
    pd.Interval(0, 1, closed='right'),
    pd.Interval(1, 2, closed='right')
])
print(f"IntervalIndex with right-closed intervals: {right_closed}")
print(f"is_overlapping: {right_closed.is_overlapping}")

# Create IntervalIndex with all intervals closed on the left
left_closed = pd.IntervalIndex([
    pd.Interval(0, 1, closed='left'),
    pd.Interval(1, 2, closed='left')
])
print(f"\nIntervalIndex with left-closed intervals: {left_closed}")
print(f"is_overlapping: {left_closed.is_overlapping}")

# To demonstrate the concept of shared endpoints and overlapping:
print("\nManual check for overlapping with different closed options:")
interval1_right = pd.Interval(0, 1, closed='right')  # [0, 1)
interval2_left = pd.Interval(1, 2, closed='left')    # [1, 2)
print(f"Does {interval1_right} overlap with {interval2_left}? {1 in interval1_right or 1 in interval2_left}")

interval1_left = pd.Interval(0, 1, closed='left')    # (0, 1]
interval2_right = pd.Interval(1, 2, closed='right')  # (1, 2]
print(f"Does {interval1_left} overlap with {interval2_right}? {1 in interval1_left or 1 in interval2_right}")

IntervalIndex with right-closed intervals: IntervalIndex([(0, 1], (1, 2]], dtype='interval[int64, right]')
is_overlapping: False

IntervalIndex with left-closed intervals: IntervalIndex([[0, 1), [1, 2)], dtype='interval[int64, left]')
is_overlapping: False

Manual check for overlapping with different closed options:
Does (0, 1] overlap with [1, 2)? True
Does [0, 1) overlap with (1, 2]? False
