#### Pandas Series Methods - Part 45

This notebook covers important Series methods in pandas, including `interpolate()`, `isin()`, `notna()`, `notnull()`, and `nsmallest()`.

In [1]:
import pandas as pd
import numpy as np

##### Series.interpolate()

The `interpolate()` method fills NaN values in a Series or DataFrame using different interpolation methods.

In [2]:
# Basic example with linear interpolation
s = pd.Series([0, 1, np.nan, 3])
print("Original Series:")
print(s)

print("\nInterpolated Series:")
print(s.interpolate())

Original Series:
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64

Interpolated Series:
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64


In [3]:
# Pad (forward fill) with limit
s = pd.Series([np.nan, "single_one", np.nan,
              "fill_two_more", np.nan, np.nan, np.nan,
              4.71, np.nan])
print("Original Series:")
print(s)

print("\nInterpolated Series with pad method and limit=2:")
print(s.interpolate(method='pad', limit=2))

Original Series:
0              NaN
1       single_one
2              NaN
3    fill_two_more
4              NaN
5              NaN
6              NaN
7             4.71
8              NaN
dtype: object

Interpolated Series with pad method and limit=2:
0              NaN
1       single_one
2       single_one
3    fill_two_more
4    fill_two_more
5    fill_two_more
6              NaN
7             4.71
8             4.71
dtype: object


  print(s.interpolate(method='pad', limit=2))


In [4]:
# Polynomial interpolation
s = pd.Series([0, 2, np.nan, 8])
print("Original Series:")
print(s)

print("\nInterpolated Series with polynomial method:")
print(s.interpolate(method='polynomial', order=2))

Original Series:
0    0.0
1    2.0
2    NaN
3    8.0
dtype: float64

Interpolated Series with polynomial method:
0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64


In [5]:
# Interpolation in a DataFrame
df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
                   (np.nan, 2.0, np.nan, np.nan),
                   (2.0, 3.0, np.nan, 9.0),
                   (np.nan, 4.0, -4.0, 16.0)],
                  columns=list('abcd'))
print("Original DataFrame:")
print(df)

print("\nInterpolated DataFrame (linear, forward direction, along axis=0):")
print(df.interpolate(method='linear', limit_direction='forward', axis=0))

Original DataFrame:
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  NaN  2.0  NaN   NaN
2  2.0  3.0  NaN   9.0
3  NaN  4.0 -4.0  16.0

Interpolated DataFrame (linear, forward direction, along axis=0):
     a    b    c     d
0  0.0  NaN -1.0   1.0
1  1.0  2.0 -2.0   5.0
2  2.0  3.0 -3.0   9.0
3  2.0  4.0 -4.0  16.0


In [6]:
# Polynomial interpolation on a single column
print("Original 'd' column:")
print(df['d'])

print("\nInterpolated 'd' column with polynomial method:")
print(df['d'].interpolate(method='polynomial', order=2))

Original 'd' column:
0     1.0
1     NaN
2     9.0
3    16.0
Name: d, dtype: float64

Interpolated 'd' column with polynomial method:
0     1.0
1     4.0
2     9.0
3    16.0
Name: d, dtype: float64


##### Series.isin()

The `isin()` method checks whether values are contained in a Series. It returns a boolean Series showing whether each element matches an element in the passed sequence of values.

In [7]:
# Basic example
s = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
print("Original Series:")
print(s)

print("\nCheck if values are in ['apple', 'cherry', 'fig']:")
print(s.isin(['apple', 'cherry', 'fig']))

Original Series:
0         apple
1        banana
2        cherry
3          date
4    elderberry
dtype: object

Check if values are in ['apple', 'cherry', 'fig']:
0     True
1    False
2     True
3    False
4    False
dtype: bool


In [8]:
# Using isin() to filter a Series
fruits = ['apple', 'cherry']
filtered = s[s.isin(fruits)]
print("Filtered Series:")
print(filtered)

Filtered Series:
0     apple
2    cherry
dtype: object


In [9]:
# Using isin() with a set (more efficient for large collections)
fruits_set = {'apple', 'cherry', 'fig'}
print("Check if values are in a set:")
print(s.isin(fruits_set))

Check if values are in a set:
0     True
1    False
2     True
3    False
4    False
dtype: bool


##### Series.notna() and Series.notnull()

The `notna()` and `notnull()` methods detect existing (non-missing) values in a Series. They return a boolean Series indicating if the values are not NA.

In [11]:
# Basic example
s = pd.Series([5, 6, np.nan])
print("Original Series:")
print(s)

print("\nUsing notna():")
print(s.notna())

print("\nUsing notnull() (alias of notna()):")
print(s.notnull())

Original Series:
0    5.0
1    6.0
2    NaN
dtype: float64

Using notna():
0     True
1     True
2    False
dtype: bool

Using notnull() (alias of notna()):
0     True
1     True
2    False
dtype: bool


In [12]:
# Using notna() to filter a Series
filtered = s[s.notna()]
print("Filtered Series (non-NA values only):")
print(filtered)

Filtered Series (non-NA values only):
0    5.0
1    6.0
dtype: float64


In [14]:
# Using notna() with a DataFrame
df = pd.DataFrame({
    'age': [5, 6, np.nan],
    'born': [pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')],
    'name': ['Alfred', 'Batman', ''],
    'toy': [None, 'Batmobile', 'Joker']
})
print("Original DataFrame:")
print(df)

print("\nUsing notna():")
print(df.notna())

Original DataFrame:
   age       born    name        toy
0  5.0        NaT  Alfred       None
1  6.0 1939-05-27  Batman  Batmobile
2  NaN 1940-04-25              Joker

Using notna():
     age   born  name    toy
0   True  False  True  False
1   True   True  True   True
2  False   True  True   True


##### Series.nsmallest()

The `nsmallest()` method returns the smallest n elements in a Series.

In [15]:
# Basic example
s = pd.Series([10, 3, 8, 5, 2, 7, 1, 9, 4, 6])
print("Original Series:")
print(s)

print("\nSmallest 3 elements:")
print(s.nsmallest(3))

Original Series:
0    10
1     3
2     8
3     5
4     2
5     7
6     1
7     9
8     4
9     6
dtype: int64

Smallest 3 elements:
6    1
4    2
1    3
dtype: int64


In [16]:
# Handling duplicate values with different 'keep' options
s = pd.Series([3, 5, 2, 7, 2, 9, 1, 2, 6])
print("Original Series with duplicates:")
print(s)

print("\nSmallest 3 elements (keep='first'):")
print(s.nsmallest(3, keep='first'))

print("\nSmallest 3 elements (keep='last'):")
print(s.nsmallest(3, keep='last'))

print("\nSmallest 3 elements (keep='all'):")
print(s.nsmallest(3, keep='all'))

Original Series with duplicates:
0    3
1    5
2    2
3    7
4    2
5    9
6    1
7    2
8    6
dtype: int64

Smallest 3 elements (keep='first'):
6    1
2    2
4    2
dtype: int64

Smallest 3 elements (keep='last'):
6    1
7    2
4    2
dtype: int64

Smallest 3 elements (keep='all'):
6    1
2    2
4    2
7    2
dtype: int64


##### Practical Examples

Let's explore some practical examples of these methods.

### Example 1: Filling Missing Values in Time Series Data

In [17]:
# Create a time series with missing values
dates = pd.date_range('2023-01-01', periods=10, freq='D')
values = [10, 11, np.nan, np.nan, 14, 15, np.nan, 17, np.nan, 19]
ts = pd.Series(values, index=dates)
print("Original Time Series:")
print(ts)

Original Time Series:
2023-01-01    10.0
2023-01-02    11.0
2023-01-03     NaN
2023-01-04     NaN
2023-01-05    14.0
2023-01-06    15.0
2023-01-07     NaN
2023-01-08    17.0
2023-01-09     NaN
2023-01-10    19.0
Freq: D, dtype: float64


In [18]:
# Linear interpolation
ts_linear = ts.interpolate(method='linear')
print("Linear Interpolation:")
print(ts_linear)

Linear Interpolation:
2023-01-01    10.0
2023-01-02    11.0
2023-01-03    12.0
2023-01-04    13.0
2023-01-05    14.0
2023-01-06    15.0
2023-01-07    16.0
2023-01-08    17.0
2023-01-09    18.0
2023-01-10    19.0
Freq: D, dtype: float64


In [19]:
# Polynomial interpolation
ts_poly = ts.interpolate(method='polynomial', order=2)
print("Polynomial Interpolation:")
print(ts_poly)

Polynomial Interpolation:
2023-01-01    10.0
2023-01-02    11.0
2023-01-03    12.0
2023-01-04    13.0
2023-01-05    14.0
2023-01-06    15.0
2023-01-07    16.0
2023-01-08    17.0
2023-01-09    18.0
2023-01-10    19.0
Freq: D, dtype: float64


In [20]:
# Time-based interpolation (specific to time series)
ts_time = ts.interpolate(method='time')
print("Time-based Interpolation:")
print(ts_time)

Time-based Interpolation:
2023-01-01    10.0
2023-01-02    11.0
2023-01-03    12.0
2023-01-04    13.0
2023-01-05    14.0
2023-01-06    15.0
2023-01-07    16.0
2023-01-08    17.0
2023-01-09    18.0
2023-01-10    19.0
Freq: D, dtype: float64


### Example 2: Filtering Data Based on Categories

In [21]:
# Create a DataFrame with product data
products = pd.DataFrame({
    'product_id': range(1, 11),
    'product_name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard', 
                     'Mouse', 'Headphones', 'Speaker', 'Camera', 'Printer'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Accessories',
                'Accessories', 'Audio', 'Audio', 'Electronics', 'Office'],
    'price': [1200, 800, 500, 300, 80, 50, 150, 200, 600, 250]
})
print("Product Data:")
print(products)

Product Data:
   product_id product_name     category  price
0           1       Laptop  Electronics   1200
1           2        Phone  Electronics    800
2           3       Tablet  Electronics    500
3           4      Monitor  Electronics    300
4           5     Keyboard  Accessories     80
5           6        Mouse  Accessories     50
6           7   Headphones        Audio    150
7           8      Speaker        Audio    200
8           9       Camera  Electronics    600
9          10      Printer       Office    250


In [22]:
# Filter products by category
selected_categories = ['Electronics', 'Audio']
filtered_products = products[products['category'].isin(selected_categories)]
print("Filtered Products (Electronics and Audio only):")
print(filtered_products)

Filtered Products (Electronics and Audio only):
   product_id product_name     category  price
0           1       Laptop  Electronics   1200
1           2        Phone  Electronics    800
2           3       Tablet  Electronics    500
3           4      Monitor  Electronics    300
6           7   Headphones        Audio    150
7           8      Speaker        Audio    200
8           9       Camera  Electronics    600


In [23]:
# Filter products by name
excluded_products = ['Laptop', 'Phone', 'Tablet']
filtered_products = products[~products['product_name'].isin(excluded_products)]
print("Filtered Products (excluding Laptop, Phone, and Tablet):")
print(filtered_products)

Filtered Products (excluding Laptop, Phone, and Tablet):
   product_id product_name     category  price
3           4      Monitor  Electronics    300
4           5     Keyboard  Accessories     80
5           6        Mouse  Accessories     50
6           7   Headphones        Audio    150
7           8      Speaker        Audio    200
8           9       Camera  Electronics    600
9          10      Printer       Office    250


### Example 3: Finding Outliers in Data

In [24]:
# Create a Series with random data
np.random.seed(42)
data = pd.Series(np.random.normal(100, 15, 100))
print("Summary Statistics:")
print(data.describe())

Summary Statistics:
count    100.000000
mean      98.442302
std       13.622526
min       60.703823
25%       90.986415
50%       98.095656
75%      106.089281
max      127.784173
dtype: float64


In [25]:
# Find the 5 smallest values (potential lower outliers)
lower_outliers = data.nsmallest(5)
print("5 Smallest Values:")
print(lower_outliers)

5 Smallest Values:
74    60.703823
79    70.186466
37    70.604948
13    71.300796
49    73.554398
dtype: float64


In [26]:
# Find the 5 largest values (potential upper outliers)
upper_outliers = data.nlargest(5)
print("5 Largest Values:")
print(upper_outliers)

5 Largest Values:
31    127.784173
6     123.688192
73    123.469655
71    123.070548
3     122.845448
dtype: float64


In [27]:
# Define outliers using IQR method
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

print(f"Lower bound: {lower_bound}")
print(f"Upper bound: {upper_bound}")

# Find outliers
outliers = data[(data < lower_bound) | (data > upper_bound)]
print("\nOutliers:")
print(outliers)

Lower bound: 68.33211618611159
Upper bound: 128.74357953662252

Outliers:
74    60.703823
dtype: float64


##### Summary

In this notebook, we've explored several important Series methods in pandas:

1. **Series.interpolate()**: Fills NaN values using various interpolation methods.
2. **Series.isin()**: Checks whether values are contained in a Series.
3. **Series.notna()** and **Series.notnull()**: Detect existing (non-missing) values.
4. **Series.nsmallest()**: Returns the smallest n elements in a Series.

These methods are essential for data cleaning, filtering, and analysis in pandas.