#### Pandas Tutorial - Part 42

This notebook covers various Series methods including:
- Arithmetic operations with `add()`
- Label manipulation with `add_prefix()` and `add_suffix()`
- Boolean operations
- Categorical data with `cat` accessor
- Value clipping with `clip()`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### Arithmetic Operations with Series

Pandas Series support various arithmetic operations. The `add()` method is one such operation that allows for element-wise addition with special handling for missing values.

### The `add()` Method

The `add()` method performs element-wise addition between two Series objects or a Series and a scalar value. It also provides options for handling missing values.

In [2]:
# Create two Series with some missing values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

print("Series a:")
print(a)
print("\nSeries b:")
print(b)

Series a:
a    1.0
b    1.0
c    1.0
d    NaN
dtype: float64

Series b:
a    1.0
b    NaN
d    1.0
e    NaN
dtype: float64


In [3]:
# Basic addition (NaN values propagate)
print("a + b:")
print(a + b)

a + b:
a    2.0
b    NaN
c    NaN
d    NaN
e    NaN
dtype: float64


In [4]:
# Using add() with fill_value
print("a.add(b, fill_value=0):")
print(a.add(b, fill_value=0))

a.add(b, fill_value=0):
a    2.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64


In [5]:
# Adding a scalar value
print("a.add(10):")
print(a.add(10))

a.add(10):
a    11.0
b    11.0
c    11.0
d     NaN
dtype: float64


### Other Arithmetic Operations

Similar to `add()`, pandas Series support other arithmetic operations like subtraction, multiplication, and division.

In [6]:
# Subtraction
print("a.sub(b, fill_value=0):")
print(a.sub(b, fill_value=0))

# Multiplication
print("\na.mul(b, fill_value=1):")
print(a.mul(b, fill_value=1))

# Division
print("\na.div(b, fill_value=1):")
print(a.div(b, fill_value=1))

a.sub(b, fill_value=0):
a    0.0
b    1.0
c    1.0
d   -1.0
e    NaN
dtype: float64

a.mul(b, fill_value=1):
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64

a.div(b, fill_value=1):
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64


##### Label Manipulation

Pandas provides methods to manipulate the labels (index) of Series and DataFrame objects.

### The `add_prefix()` Method

The `add_prefix()` method adds a prefix to the labels of a Series or the column names of a DataFrame.

In [7]:
# Create a simple Series
s = pd.Series([1, 2, 3, 4])
print("Original Series:")
print(s)

Original Series:
0    1
1    2
2    3
3    4
dtype: int64


In [8]:
# Add prefix to Series labels
s_prefixed = s.add_prefix('item_')
print("Series with prefixed labels:")
print(s_prefixed)

Series with prefixed labels:
item_0    1
item_1    2
item_2    3
item_3    4
dtype: int64


In [9]:
# Create a simple DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]})
print("Original DataFrame:")
print(df)

Original DataFrame:
   A  B
0  1  3
1  2  4
2  3  5
3  4  6


In [10]:
# Add prefix to DataFrame column names
df_prefixed = df.add_prefix('col_')
print("DataFrame with prefixed column names:")
print(df_prefixed)

DataFrame with prefixed column names:
   col_A  col_B
0      1      3
1      2      4
2      3      5
3      4      6


### The `add_suffix()` Method

The `add_suffix()` method adds a suffix to the labels of a Series or the column names of a DataFrame.

In [11]:
# Add suffix to Series labels
s_suffixed = s.add_suffix('_value')
print("Series with suffixed labels:")
print(s_suffixed)

Series with suffixed labels:
0_value    1
1_value    2
2_value    3
3_value    4
dtype: int64


In [12]:
# Add suffix to DataFrame column names
df_suffixed = df.add_suffix('_col')
print("DataFrame with suffixed column names:")
print(df_suffixed)

DataFrame with suffixed column names:
   A_col  B_col
0      1      3
1      2      4
2      3      5
3      4      6


##### Boolean Operations

Pandas Series can be used in boolean contexts, but there are some special considerations.

### The `bool()` Method

The `bool()` method converts a single-element Series to a boolean value. It raises a ValueError if the Series has more than one element or if the element is not boolean.

In [13]:
# Create a single-element boolean Series
s_true = pd.Series([True])
s_false = pd.Series([False])

print(f"s_true.bool(): {s_true.bool()}")
print(f"s_false.bool(): {s_false.bool()}")

s_true.bool(): True
s_false.bool(): False


  print(f"s_true.bool(): {s_true.bool()}")
  print(f"s_false.bool(): {s_false.bool()}")


In [14]:
# Try bool() on a multi-element Series
try:
    s_multi = pd.Series([True, False])
    s_multi.bool()
except ValueError as e:
    print(f"Error: {e}")

Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


  s_multi.bool()


In [15]:
# Try bool() on a non-boolean Series
try:
    s_nonbool = pd.Series([1])
    s_nonbool.bool()
except ValueError as e:
    print(f"Error: {e}")

Error: bool cannot act on a non-boolean single element Series


  s_nonbool.bool()


### Boolean Indexing

Boolean indexing is a powerful feature in pandas that allows you to filter data based on conditions.

In [16]:
# Create a Series
s = pd.Series([1, 2, 3, 4, 5])

# Filter values greater than 2
mask = s > 2
print("Boolean mask:")
print(mask)

print("\nFiltered Series:")
print(s[mask])

Boolean mask:
0    False
1    False
2     True
3     True
4     True
dtype: bool

Filtered Series:
2    3
3    4
4    5
dtype: int64


In [17]:
# Multiple conditions
mask = (s > 2) & (s < 5)
print("Values between 2 and 5 (exclusive):")
print(s[mask])

Values between 2 and 5 (exclusive):
2    3
3    4
dtype: int64


##### Categorical Data with the `cat` Accessor

The `cat` accessor provides access to categorical operations for Series with categorical data.

In [18]:
# Create a categorical Series
s = pd.Series(['a', 'b', 'c', 'a', 'b', 'c'], dtype='category')
print("Categorical Series:")
print(s)
print("\nData type:", s.dtype)

Categorical Series:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (3, object): ['a', 'b', 'c']

Data type: category


In [19]:
# Get categories
print("Categories:")
print(s.cat.categories)

Categories:
Index(['a', 'b', 'c'], dtype='object')


In [20]:
# Rename categories
s_renamed = s.cat.rename_categories(['A', 'B', 'C'])
print("Series with renamed categories:")
print(s_renamed)

Series with renamed categories:
0    A
1    B
2    C
3    A
4    B
5    C
dtype: category
Categories (3, object): ['A', 'B', 'C']


In [21]:
# Reorder categories
s_reordered = s.cat.reorder_categories(['c', 'b', 'a'])
print("Series with reordered categories:")
print(s_reordered)

Series with reordered categories:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (3, object): ['c', 'b', 'a']


In [22]:
# Add categories
s_added = s.cat.add_categories(['d', 'e'])
print("Series with added categories:")
print(s_added)
print("\nNew categories:")
print(s_added.cat.categories)

Series with added categories:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (5, object): ['a', 'b', 'c', 'd', 'e']

New categories:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')


In [23]:
# Remove categories
s_removed = s_added.cat.remove_categories(['d'])
print("Series with removed categories:")
print(s_removed)
print("\nRemaining categories:")
print(s_removed.cat.categories)

Series with removed categories:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (4, object): ['a', 'b', 'c', 'e']

Remaining categories:
Index(['a', 'b', 'c', 'e'], dtype='object')


In [24]:
# Set categories
s_set = s.cat.set_categories(['a', 'b', 'c', 'd', 'e'])
print("Series with set categories:")
print(s_set)
print("\nSet categories:")
print(s_set.cat.categories)

Series with set categories:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (5, object): ['a', 'b', 'c', 'd', 'e']

Set categories:
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')


In [25]:
# Make categories ordered
s_ordered = s.cat.as_ordered()
print("Series with ordered categories:")
print(s_ordered)
print("\nIs ordered:", s_ordered.cat.ordered)

Series with ordered categories:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (3, object): ['a' < 'b' < 'c']

Is ordered: True


In [26]:
# Make categories unordered
s_unordered = s_ordered.cat.as_unordered()
print("Series with unordered categories:")
print(s_unordered)
print("\nIs ordered:", s_unordered.cat.ordered)

Series with unordered categories:
0    a
1    b
2    c
3    a
4    b
5    c
dtype: category
Categories (3, object): ['a', 'b', 'c']

Is ordered: False


##### Value Clipping with `clip()`

The `clip()` method trims values at specified thresholds, replacing values outside the thresholds with the threshold values.

In [27]:
# Create a DataFrame
data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Original DataFrame:
   col_0  col_1
0      9     -2
1     -3     -7
2      0      6
3     -1      8
4      5     -5


In [28]:
# Clip values between -4 and 6
df_clipped = df.clip(-4, 6)
print("DataFrame with values clipped between -4 and 6:")
print(df_clipped)

DataFrame with values clipped between -4 and 6:
   col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4


In [29]:
# Clip using different thresholds per row
t = pd.Series([2, -4, -1, 6, 3])
print("Threshold Series:")
print(t)

df_row_clipped = df.clip(t, t + 4, axis=0)
print("\nDataFrame with row-specific clipping:")
print(df_row_clipped)

Threshold Series:
0    2
1   -4
2   -1
3    6
4    3
dtype: int64

DataFrame with row-specific clipping:
   col_0  col_1
0      6      2
1     -3     -4
2      0      3
3      6      8
4      5      3


In [30]:
# Clip a Series
s = pd.Series([1, 10, -5, 3, -10, 8])
print("Original Series:")
print(s)

s_clipped = s.clip(-3, 7)
print("\nSeries with values clipped between -3 and 7:")
print(s_clipped)

Original Series:
0     1
1    10
2    -5
3     3
4   -10
5     8
dtype: int64

Series with values clipped between -3 and 7:
0    1
1    7
2   -3
3    3
4   -3
5    7
dtype: int64


In [31]:
# Clip in-place
s_inplace = s.copy()
s_inplace.clip(-3, 7, inplace=True)
print("Series after in-place clipping:")
print(s_inplace)

Series after in-place clipping:
0    1
1    7
2   -3
3    3
4   -3
5    7
dtype: int64


##### Conclusion

In this notebook, we've explored various Series methods in pandas:

1. Arithmetic operations with `add()` and similar methods, including special handling for missing values.
2. Label manipulation with `add_prefix()` and `add_suffix()` for both Series and DataFrames.
3. Boolean operations, including the `bool()` method for single-element Series and boolean indexing for filtering data.
4. Categorical data operations using the `cat` accessor, including managing categories and their order.
5. Value clipping with `clip()` to trim values at specified thresholds, with options for different thresholds per row and in-place modification.

These methods are essential tools for data manipulation and analysis in pandas, allowing for flexible and powerful operations on your data.