#### Pandas Tutorial - Part 41

This notebook covers:
- Creating interval ranges with `interval_range()`
- Series methods including `abs()` and other operations

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### Creating Interval Ranges with `interval_range()`

The `interval_range()` function creates a fixed frequency `IntervalIndex`, which is useful for representing intervals of values.

### Basic Usage with Numeric Values

In [2]:
# Create an interval range from 0 to 5
interval_idx = pd.interval_range(start=0, end=5)
print(interval_idx)

IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]], dtype='interval[int64, right]')


In [3]:
# Examine the properties of the interval index
print(f"Type: {type(interval_idx)}")
print(f"Length: {len(interval_idx)}")
print(f"Closed on: {interval_idx.closed}")
print(f"Data type: {interval_idx.dtype}")

Type: <class 'pandas.core.indexes.interval.IntervalIndex'>
Length: 5
Closed on: right
Data type: interval[int64, right]


### Using Datetime Values

In [4]:
# Create an interval range with datetime values
date_interval = pd.interval_range(
    start=pd.Timestamp('2023-01-01'),
    end=pd.Timestamp('2023-01-04')
)
print(date_interval)

IntervalIndex([(2023-01-01 00:00:00, 2023-01-02 00:00:00],
               (2023-01-02 00:00:00, 2023-01-03 00:00:00],
               (2023-01-03 00:00:00, 2023-01-04 00:00:00]],
              dtype='interval[datetime64[ns], right]')


### Specifying Frequency

In [5]:
# Create an interval range with a specific frequency
freq_interval = pd.interval_range(start=0, periods=4, freq=1.5)
print(freq_interval)

IntervalIndex([(0.0, 1.5], (1.5, 3.0], (3.0, 4.5], (4.5, 6.0]], dtype='interval[float64, right]')


In [6]:
# Date intervals with month frequency
month_interval = pd.interval_range(
    start=pd.Timestamp('2023-01-01'),
    periods=3,
    freq='MS'  # Month start frequency
)
print(month_interval)

IntervalIndex([(2023-01-01 00:00:00, 2023-02-01 00:00:00],
               (2023-02-01 00:00:00, 2023-03-01 00:00:00],
               (2023-03-01 00:00:00, 2023-04-01 00:00:00]],
              dtype='interval[datetime64[ns], right]')


### Specifying the Closure of Intervals

In [7]:
# Default is 'right' closed
right_closed = pd.interval_range(start=0, end=5)
print("Right closed intervals:")
print(right_closed)

Right closed intervals:
IntervalIndex([(0, 1], (1, 2], (2, 3], (3, 4], (4, 5]], dtype='interval[int64, right]')


In [8]:
# Left closed intervals
left_closed = pd.interval_range(start=0, end=5, closed='left')
print("Left closed intervals:")
print(left_closed)

Left closed intervals:
IntervalIndex([[0, 1), [1, 2), [2, 3), [3, 4), [4, 5)], dtype='interval[int64, left]')


In [9]:
# Both sides closed
both_closed = pd.interval_range(start=0, end=5, closed='both')
print("Both sides closed intervals:")
print(both_closed)

Both sides closed intervals:
IntervalIndex([[0, 1], [1, 2], [2, 3], [3, 4], [4, 5]], dtype='interval[int64, both]')


In [10]:
# Neither side closed
neither_closed = pd.interval_range(start=0, end=5, closed='neither')
print("Neither side closed intervals:")
print(neither_closed)

Neither side closed intervals:
IntervalIndex([(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)], dtype='interval[int64, neither]')


### Using IntervalIndex with DataFrames

In [11]:
# Create a DataFrame with an IntervalIndex
intervals = pd.interval_range(start=0, end=5)
df = pd.DataFrame({
    'value': range(5),
    'category': ['A', 'B', 'C', 'D', 'E']
}, index=intervals)
df

Unnamed: 0,value,category
"(0, 1]",0,A
"(1, 2]",1,B
"(2, 3]",2,C
"(3, 4]",3,D
"(4, 5]",4,E


In [12]:
# Check if a value is contained in any interval
value_to_check = 2.5
for interval in df.index:
    if value_to_check in interval:
        print(f"{value_to_check} is in {interval}, which has value {df.loc[interval, 'value']} and category {df.loc[interval, 'category']}")

2.5 is in (2, 3], which has value 2 and category C


##### Series Methods

Pandas Series objects have many methods for data manipulation and analysis. Let's explore some of them, starting with the `abs()` method.

### The `abs()` Method

The `abs()` method returns a Series with the absolute value of each element.

In [13]:
# Create a Series with negative and positive values
s = pd.Series([-1.10, 2, -3.33, 4])
print("Original Series:")
print(s)

# Get absolute values
abs_s = s.abs()
print("\nAbsolute values:")
print(abs_s)

Original Series:
0   -1.10
1    2.00
2   -3.33
3    4.00
dtype: float64

Absolute values:
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64


In [14]:
# Absolute values of complex numbers
complex_s = pd.Series([1.2 + 1j, 2.3 - 2.1j, -3.4 + 4.2j])
print("Complex Series:")
print(complex_s)

print("\nAbsolute values:")
print(complex_s.abs())

Complex Series:
0    1.2+1.0j
1    2.3-2.1j
2   -3.4+4.2j
dtype: complex128

Absolute values:
0    1.562050
1    3.114482
2    5.403702
dtype: float64


In [15]:
# Absolute values of timedeltas
td_s = pd.Series([pd.Timedelta('1 days'), pd.Timedelta('-2 days'), pd.Timedelta('3 hours')])
print("Timedelta Series:")
print(td_s)

print("\nAbsolute values:")
print(td_s.abs())

Timedelta Series:
0     1 days 00:00:00
1   -2 days +00:00:00
2     0 days 03:00:00
dtype: timedelta64[ns]

Absolute values:
0   1 days 00:00:00
1   2 days 00:00:00
2   0 days 03:00:00
dtype: timedelta64[ns]


### Practical Example: Finding Values Closest to a Target

In [16]:
# Create a DataFrame
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})
print("Original DataFrame:")
print(df)

Original DataFrame:
   a   b    c
0  4  10  100
1  5  20   50
2  6  30  -30
3  7  40  -50


In [17]:
# Find rows with values in column 'c' closest to 43
target_value = 43
distance = (df['c'] - target_value).abs()
sorted_indices = distance.argsort()

print(f"\nRows sorted by distance of column 'c' from {target_value}:")
print(df.loc[sorted_indices])


Rows sorted by distance of column 'c' from 43:
   a   b    c
1  5  20   50
0  4  10  100
2  6  30  -30
3  7  40  -50


### Other Series Methods

Let's explore some other useful Series methods.

In [18]:
# Create a sample Series
s = pd.Series([1, 2, 3, 4, 5, 2, 3, 1, np.nan])
print("Sample Series:")
print(s)

Sample Series:
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    2.0
6    3.0
7    1.0
8    NaN
dtype: float64


In [19]:
# Basic statistics
print(f"Mean: {s.mean()}")
print(f"Median: {s.median()}")
print(f"Standard deviation: {s.std()}")
print(f"Minimum: {s.min()}")
print(f"Maximum: {s.max()}")

Mean: 2.625
Median: 2.5
Standard deviation: 1.407885953173359
Minimum: 1.0
Maximum: 5.0


In [20]:
# Count non-NA values
print(f"Count of non-NA values: {s.count()}")

# Check for NA values
print(f"NA values: {s.isna()}")
print(f"Count of NA values: {s.isna().sum()}")

Count of non-NA values: 8
NA values: 0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8     True
dtype: bool
Count of NA values: 1


In [21]:
# Value counts
print("Value counts:")
print(s.value_counts())

# Normalized value counts (proportions)
print("\nNormalized value counts:")
print(s.value_counts(normalize=True))

Value counts:
1.0    2
2.0    2
3.0    2
4.0    1
5.0    1
Name: count, dtype: int64

Normalized value counts:
1.0    0.250
2.0    0.250
3.0    0.250
4.0    0.125
5.0    0.125
Name: proportion, dtype: float64


In [22]:
# Cumulative operations
print("Cumulative sum:")
print(s.cumsum())

print("\nCumulative product:")
print(s.cumprod())

Cumulative sum:
0     1.0
1     3.0
2     6.0
3    10.0
4    15.0
5    17.0
6    20.0
7    21.0
8     NaN
dtype: float64

Cumulative product:
0      1.0
1      2.0
2      6.0
3     24.0
4    120.0
5    240.0
6    720.0
7    720.0
8      NaN
dtype: float64


In [23]:
# Filtering
print("Values greater than 2:")
print(s[s > 2])

Values greater than 2:
2    3.0
3    4.0
4    5.0
6    3.0
dtype: float64


In [24]:
# Applying functions
print("Square of each value:")
print(s.apply(lambda x: x**2 if pd.notna(x) else x))

Square of each value:
0     1.0
1     4.0
2     9.0
3    16.0
4    25.0
5     4.0
6     9.0
7     1.0
8     NaN
dtype: float64


In [25]:
# Sorting
print("Sorted values:")
print(s.sort_values())

print("\nSorted values (descending):")
print(s.sort_values(ascending=False))

Sorted values:
0    1.0
7    1.0
1    2.0
5    2.0
2    3.0
6    3.0
3    4.0
4    5.0
8    NaN
dtype: float64

Sorted values (descending):
4    5.0
3    4.0
2    3.0
6    3.0
1    2.0
5    2.0
0    1.0
7    1.0
8    NaN
dtype: float64


In [26]:
# Replace values
print("Replace 2 with 200:")
print(s.replace(2, 200))

Replace 2 with 200:
0      1.0
1    200.0
2      3.0
3      4.0
4      5.0
5    200.0
6      3.0
7      1.0
8      NaN
dtype: float64


##### Conclusion

In this notebook, we've explored:

1. Creating interval ranges with `interval_range()`:
   - Basic usage with numeric values
   - Using datetime values
   - Specifying frequency
   - Controlling interval closure
   - Using IntervalIndex with DataFrames

2. Series methods:
   - The `abs()` method for getting absolute values
   - Finding values closest to a target
   - Basic statistics methods
   - Counting and handling NA values
   - Value counts
   - Cumulative operations
   - Filtering, applying functions, sorting, and replacing values

These tools are essential for data manipulation and analysis in pandas.