#### Pandas Part 74: Boolean Data Types and Index Objects

This notebook explores boolean data types with missing values and various Index object methods.

In [1]:
import pandas as pd
import numpy as np

##### 1. Boolean Data with Missing Values

The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False values) with missing values, which is not possible with a standard NumPy boolean array.

### Creating BooleanArray

You can create a BooleanArray using `pd.array()` with dtype="boolean".

In [2]:
# Create a BooleanArray
bool_array = pd.array([True, False, None], dtype="boolean")
print(f"BooleanArray: {bool_array}")
print(f"Type: {type(bool_array)}")
print(f"Dtype: {bool_array.dtype}")

BooleanArray: <BooleanArray>
[True, False, <NA>]
Length: 3, dtype: boolean
Type: <class 'pandas.core.arrays.boolean.BooleanArray'>
Dtype: boolean


### BooleanArray vs NumPy Boolean Array

Let's compare BooleanArray with a standard NumPy boolean array.

In [3]:
# Create a NumPy boolean array
try:
    np_bool_array = np.array([True, False, None], dtype=bool)
    print(f"NumPy boolean array: {np_bool_array}")
except Exception as e:
    print(f"Error: {e}")
    
# NumPy converts None to False
np_bool_array = np.array([True, False, None])
print(f"NumPy array with None: {np_bool_array}")
print(f"Dtype: {np_bool_array.dtype}")

# Convert to boolean
np_bool_array = np_bool_array.astype(bool)
print(f"Converted to boolean: {np_bool_array}")
print(f"Dtype: {np_bool_array.dtype}")

NumPy boolean array: [ True False False]
NumPy array with None: [True False None]
Dtype: object
Converted to boolean: [ True False False]
Dtype: bool


### Creating Series with Boolean Dtype

In [4]:
# Create a Series with boolean dtype
s = pd.Series([True, False, None], dtype="boolean")
print(s)
print(f"Dtype: {s.dtype}")

0     True
1    False
2     <NA>
dtype: boolean
Dtype: boolean


### Kleene Logic for Boolean Operations

BooleanArray implements Kleene logic (three-value logic) for logical operations.

In [5]:
# Create two Series with boolean dtype
s1 = pd.Series([True, False, True, None], dtype="boolean")
s2 = pd.Series([True, False, None, None], dtype="boolean")
print("s1:")
print(s1)
print("\ns2:")
print(s2)

s1:
0     True
1    False
2     True
3     <NA>
dtype: boolean

s2:
0     True
1    False
2     <NA>
3     <NA>
dtype: boolean


In [6]:
# Logical operations with Kleene logic
print("s1 & s2 (AND):")
print(s1 & s2)

print("\ns1 | s2 (OR):")
print(s1 | s2)

print("\n~s1 (NOT):")
print(~s1)

s1 & s2 (AND):
0     True
1    False
2     <NA>
3     <NA>
dtype: boolean

s1 | s2 (OR):
0     True
1    False
2     True
3     <NA>
dtype: boolean

~s1 (NOT):
0    False
1     True
2    False
3     <NA>
dtype: boolean


### Kleene Logic Truth Tables

Let's demonstrate the truth tables for Kleene logic.

In [7]:
# Create all possible combinations for AND operation
values = [True, False, None]
print("AND Truth Table (Kleene Logic):")
print("   | True  | False | None  ")
print("---+-------+-------+-------")
for a in values:
    row = f"{a} | "
    for b in values:
        result = pd.array([a], dtype="boolean") & pd.array([b], dtype="boolean")
        row += f"{result[0]:<5} | "
    print(row)

AND Truth Table (Kleene Logic):
   | True  | False | None  
---+-------+-------+-------
True | 1     | 0     | <NA>  | 
False | 0     | 0     | 0     | 
None | <NA>  | 0     | <NA>  | 


In [8]:
# Create all possible combinations for OR operation
print("OR Truth Table (Kleene Logic):")
print("   | True  | False | None  ")
print("---+-------+-------+-------")
for a in values:
    row = f"{a} | "
    for b in values:
        result = pd.array([a], dtype="boolean") | pd.array([b], dtype="boolean")
        row += f"{result[0]:<5} | "
    print(row)

OR Truth Table (Kleene Logic):
   | True  | False | None  
---+-------+-------+-------
True | 1     | 1     | 1     | 
False | 1     | 0     | <NA>  | 
None | 1     | <NA>  | <NA>  | 


### Comparison Operations with BooleanArray

In [9]:
# Create a BooleanArray
bool_arr = pd.array([True, False, None], dtype="boolean")
print(f"BooleanArray: {bool_arr}")

# Compare with a boolean
result = bool_arr == True
print(f"Result of bool_arr == True: {result}")
print(f"Result type: {type(result)}")
print(f"Result dtype: {result.dtype}")

BooleanArray: <BooleanArray>
[True, False, <NA>]
Length: 3, dtype: boolean
Result of bool_arr == True: <BooleanArray>
[True, False, <NA>]
Length: 3, dtype: boolean
Result type: <class 'pandas.core.arrays.boolean.BooleanArray'>
Result dtype: boolean


##### 2. Index Objects

Index objects are immutable arrays that implement various methods for data indexing and alignment.

### Creating Index Objects

In [10]:
# Create an Index
idx = pd.Index(['a', 'b', 'c'])
print(f"Index: {idx}")
print(f"Type: {type(idx)}")
print(f"Dtype: {idx.dtype}")

Index: Index(['a', 'b', 'c'], dtype='object')
Type: <class 'pandas.core.indexes.base.Index'>
Dtype: object


### get_level_values Method

The `get_level_values` method returns an Index of values for the requested level. This is primarily useful for MultiIndex, but is also available on Index for compatibility.

In [11]:
# Get level values
level_values = idx.get_level_values(0)
print(f"Level values: {level_values}")

Level values: Index(['a', 'b', 'c'], dtype='object')


### get_loc Method

The `get_loc` method gets the integer location, slice, or boolean mask for the requested label.

In [12]:
# Get location for a unique index
unique_index = pd.Index(['a', 'b', 'c'])
loc = unique_index.get_loc('b')
print(f"Location of 'b' in unique_index: {loc}")

Location of 'b' in unique_index: 1


In [13]:
# Get location for a monotonic index with duplicates
monotonic_index = pd.Index(['a', 'b', 'b', 'c'])
loc = monotonic_index.get_loc('b')
print(f"Location of 'b' in monotonic_index: {loc}")
print(f"Type of location: {type(loc)}")

Location of 'b' in monotonic_index: slice(1, 3, None)
Type of location: <class 'slice'>


In [14]:
# Get location for a non-monotonic index with duplicates
non_monotonic_index = pd.Index(['a', 'b', 'c', 'b'])
loc = non_monotonic_index.get_loc('b')
print(f"Location of 'b' in non_monotonic_index: {loc}")
print(f"Type of location: {type(loc)}")

Location of 'b' in non_monotonic_index: [False  True False  True]
Type of location: <class 'numpy.ndarray'>


### get_loc with Method Parameter

The `get_loc` method can use different methods for inexact matches.

In [15]:
# Create a numeric index
num_index = pd.Index([1, 3, 5, 7, 9])
print(f"Numeric index: {num_index}")

# Exact match
try:
    loc = num_index.get_loc(4)
    print(f"Location of 4: {loc}")
except KeyError as e:
    print(f"Error: {e}")

Numeric index: Index([1, 3, 5, 7, 9], dtype='int64')
Error: 4


In [17]:
# Create a numeric index for demonstration
num_index = pd.Index([1, 3, 5, 7, 9])
print(f"Numeric index: {num_index}")

# For 'pad'/'ffill' method (find the previous index value)
# Use searchsorted with side='right' and subtract 1
loc = num_index.searchsorted(4, side='right') - 1
print(f"Location of 4 using 'pad' equivalent: {loc}")
print(f"Value at this location: {num_index[loc]}")

# For 'backfill'/'bfill' method (use next index value)
# Use searchsorted with side='left'
loc = num_index.searchsorted(4, side='left')
print(f"\nLocation of 4 using 'backfill' equivalent: {loc}")
print(f"Value at this location: {num_index[loc]}")

# For 'nearest' method
# Calculate distances and find the minimum
distances = abs(num_index.values - 4)
loc = distances.argmin()
print(f"\nLocation of 4 using 'nearest' equivalent: {loc}")
print(f"Value at this location: {num_index[loc]}")

# For 'nearest' method with a tie
distances = abs(num_index.values - 6)
loc = distances.argmin()
print(f"\nLocation of 6 using 'nearest' equivalent: {loc}")
print(f"Value at this location: {num_index[loc]}")

Numeric index: Index([1, 3, 5, 7, 9], dtype='int64')
Location of 4 using 'pad' equivalent: 1
Value at this location: 3

Location of 4 using 'backfill' equivalent: 2
Value at this location: 5

Location of 4 using 'nearest' equivalent: 1
Value at this location: 3

Location of 6 using 'nearest' equivalent: 2
Value at this location: 5


### get_loc with Tolerance Parameter

The `tolerance` parameter specifies the maximum distance from the index value for inexact matches.

In [19]:
# Create a numeric index for demonstration
num_index = pd.Index([1, 3, 5, 7, 9])
print(f"Numeric index: {num_index}")

# Implement 'pad' method with tolerance
def get_loc_pad_with_tolerance(index, value, tolerance):
    # Find the position where the value would be inserted
    pos = index.searchsorted(value, side='right') - 1
    
    # If pos is valid (not -1) and within tolerance
    if pos >= 0 and abs(index[pos] - value) <= tolerance:
        return pos
    else:
        raise KeyError(f"{value} not found in index with tolerance {tolerance}")

# Using 'pad' method with tolerance 0.5
try:
    loc = get_loc_pad_with_tolerance(num_index, 4, tolerance=0.5)
    print(f"Location of 4 using 'pad' method with tolerance 0.5: {loc}")
except KeyError as e:
    print(f"Error: {e}")

# Using 'pad' method with larger tolerance 1.0
try:
    loc = get_loc_pad_with_tolerance(num_index, 4, tolerance=1.0)
    print(f"Location of 4 using 'pad' method with tolerance 1.0: {loc}")
    print(f"Value at this location: {num_index[loc]}")
except KeyError as e:
    print(f"Error: {e}")

Numeric index: Index([1, 3, 5, 7, 9], dtype='int64')
Error: '4 not found in index with tolerance 0.5'
Location of 4 using 'pad' method with tolerance 1.0: 1
Value at this location: 3


### get_slice_bound Method

The `get_slice_bound` method calculates the slice bound that corresponds to a given label.

In [21]:
# Create an index
idx = pd.Index(['a', 'b', 'c', 'd', 'e'])
print(f"Index: {idx}")

# For 'left' slice bound (equivalent to get_slice_bound with side='left')
left_bound = idx.searchsorted('c', side='left')
print(f"Left slice bound for 'c': {left_bound}")

# For 'right' slice bound (equivalent to get_slice_bound with side='right')
right_bound = idx.searchsorted('c', side='right')
print(f"Right slice bound for 'c': {right_bound}")

# Create a slice using these bounds
sliced_idx = idx[left_bound:right_bound]
print(f"Sliced index: {sliced_idx}")

Index: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Left slice bound for 'c': 2
Right slice bound for 'c': 3
Sliced index: Index(['c'], dtype='object')


### get_value Method

The `get_value` method provides a fast lookup of a value from a 1-dimensional ndarray.

In [23]:
# Create a Series
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s)

# Get value using standard indexing
value = s['b']  # or s.loc['b'] or s.at['b']
print(f"\nValue for 'b': {value}")

a    10
b    20
c    30
dtype: int64

Value for 'b': 20
