# Pandas Series Methods - Part 45

This notebook covers important Series methods in pandas, including `interpolate()`, `isin()`, `notna()`, `notnull()`, and `nsmallest()`.

In [None]:
import pandas as pd
import numpy as np

## Series.interpolate()

The `interpolate()` method fills NaN values in a Series or DataFrame using different interpolation methods.

In [None]:
# Basic example with linear interpolation
s = pd.Series([0, 1, np.nan, 3])
print("Original Series:")
print(s)

print("\nInterpolated Series:")
print(s.interpolate())

In [None]:
# Pad (forward fill) with limit
s = pd.Series([np.nan, "single_one", np.nan,
              "fill_two_more", np.nan, np.nan, np.nan,
              4.71, np.nan])
print("Original Series:")
print(s)

print("\nInterpolated Series with pad method and limit=2:")
print(s.interpolate(method='pad', limit=2))

In [None]:
# Polynomial interpolation
s = pd.Series([0, 2, np.nan, 8])
print("Original Series:")
print(s)

print("\nInterpolated Series with polynomial method:")
print(s.interpolate(method='polynomial', order=2))

In [None]:
# Interpolation in a DataFrame
df = pd.DataFrame([(0.0, np.nan, -1.0, 1.0),
                   (np.nan, 2.0, np.nan, np.nan),
                   (2.0, 3.0, np.nan, 9.0),
                   (np.nan, 4.0, -4.0, 16.0)],
                  columns=list('abcd'))
print("Original DataFrame:")
print(df)

print("\nInterpolated DataFrame (linear, forward direction, along axis=0):")
print(df.interpolate(method='linear', limit_direction='forward', axis=0))

In [None]:
# Polynomial interpolation on a single column
print("Original 'd' column:")
print(df['d'])

print("\nInterpolated 'd' column with polynomial method:")
print(df['d'].interpolate(method='polynomial', order=2))

## Series.isin()

The `isin()` method checks whether values are contained in a Series. It returns a boolean Series showing whether each element matches an element in the passed sequence of values.

In [None]:
# Basic example
s = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
print("Original Series:")
print(s)

print("\nCheck if values are in ['apple', 'cherry', 'fig']:")
print(s.isin(['apple', 'cherry', 'fig']))

In [None]:
# Using isin() to filter a Series
fruits = ['apple', 'cherry']
filtered = s[s.isin(fruits)]
print("Filtered Series:")
print(filtered)

In [None]:
# Using isin() with a set (more efficient for large collections)
fruits_set = {'apple', 'cherry', 'fig'}
print("Check if values are in a set:")
print(s.isin(fruits_set))

## Series.notna() and Series.notnull()

The `notna()` and `notnull()` methods detect existing (non-missing) values in a Series. They return a boolean Series indicating if the values are not NA.

In [None]:
# Basic example
s = pd.Series([5, 6, np.NaN])
print("Original Series:")
print(s)

print("\nUsing notna():")
print(s.notna())

print("\nUsing notnull() (alias of notna()):")
print(s.notnull())

In [None]:
# Using notna() to filter a Series
filtered = s[s.notna()]
print("Filtered Series (non-NA values only):")
print(filtered)

In [None]:
# Using notna() with a DataFrame
df = pd.DataFrame({
    'age': [5, 6, np.NaN],
    'born': [pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')],
    'name': ['Alfred', 'Batman', ''],
    'toy': [None, 'Batmobile', 'Joker']
})
print("Original DataFrame:")
print(df)

print("\nUsing notna():")
print(df.notna())

## Series.nsmallest()

The `nsmallest()` method returns the smallest n elements in a Series.

In [None]:
# Basic example
s = pd.Series([10, 3, 8, 5, 2, 7, 1, 9, 4, 6])
print("Original Series:")
print(s)

print("\nSmallest 3 elements:")
print(s.nsmallest(3))

In [None]:
# Handling duplicate values with different 'keep' options
s = pd.Series([3, 5, 2, 7, 2, 9, 1, 2, 6])
print("Original Series with duplicates:")
print(s)

print("\nSmallest 3 elements (keep='first'):")
print(s.nsmallest(3, keep='first'))

print("\nSmallest 3 elements (keep='last'):")
print(s.nsmallest(3, keep='last'))

print("\nSmallest 3 elements (keep='all'):")
print(s.nsmallest(3, keep='all'))

## Practical Examples

Let's explore some practical examples of these methods.

### Example 1: Filling Missing Values in Time Series Data

In [None]:
# Create a time series with missing values
dates = pd.date_range('2023-01-01', periods=10, freq='D')
values = [10, 11, np.nan, np.nan, 14, 15, np.nan, 17, np.nan, 19]
ts = pd.Series(values, index=dates)
print("Original Time Series:")
print(ts)

In [None]:
# Linear interpolation
ts_linear = ts.interpolate(method='linear')
print("Linear Interpolation:")
print(ts_linear)

In [None]:
# Polynomial interpolation
ts_poly = ts.interpolate(method='polynomial', order=2)
print("Polynomial Interpolation:")
print(ts_poly)

In [None]:
# Time-based interpolation (specific to time series)
ts_time = ts.interpolate(method='time')
print("Time-based Interpolation:")
print(ts_time)

### Example 2: Filtering Data Based on Categories

In [None]:
# Create a DataFrame with product data
products = pd.DataFrame({
    'product_id': range(1, 11),
    'product_name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard', 
                     'Mouse', 'Headphones', 'Speaker', 'Camera', 'Printer'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Accessories',
                'Accessories', 'Audio', 'Audio', 'Electronics', 'Office'],
    'price': [1200, 800, 500, 300, 80, 50, 150, 200, 600, 250]
})
print("Product Data:")
print(products)

In [None]:
# Filter products by category
selected_categories = ['Electronics', 'Audio']
filtered_products = products[products['category'].isin(selected_categories)]
print("Filtered Products (Electronics and Audio only):")
print(filtered_products)

In [None]:
# Filter products by name
excluded_products = ['Laptop', 'Phone', 'Tablet']
filtered_products = products[~products['product_name'].isin(excluded_products)]
print("Filtered Products (excluding Laptop, Phone, and Tablet):")
print(filtered_products)

### Example 3: Finding Outliers in Data

In [None]:
# Create a Series with random data
np.random.seed(42)
data = pd.Series(np.random.normal(100, 15, 100))
print("Summary Statistics:")
print(data.describe())

In [None]:
# Find the 5 smallest values (potential lower outliers)
lower_outliers = data.nsmallest(5)
print("5 Smallest Values:")
print(lower_outliers)

In [None]:
# Find the 5 largest values (potential upper outliers)
upper_outliers = data.nlargest(5)
print("5 Largest Values:")
print(upper_outliers)

In [None]:
# Define outliers using IQR method
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

print(f"Lower bound: {lower_bound}")
print(f"Upper bound: {upper_bound}")

# Find outliers
outliers = data[(data < lower_bound) | (data > upper_bound)]
print("\nOutliers:")
print(outliers)

## Summary

In this notebook, we've explored several important Series methods in pandas:

1. **Series.interpolate()**: Fills NaN values using various interpolation methods.
2. **Series.isin()**: Checks whether values are contained in a Series.
3. **Series.notna()** and **Series.notnull()**: Detect existing (non-missing) values.
4. **Series.nsmallest()**: Returns the smallest n elements in a Series.

These methods are essential for data cleaning, filtering, and analysis in pandas.