#### Pandas Tutorial - Part 46

This notebook covers various Series methods including:
- Finding smallest values with `nsmallest()`
- Counting unique values with `nunique()`
- Calculating percentage change with `pct_change()`
- Reordering index levels with `reorder_levels()`
- Repeating elements with `repeat()`
- Replacing values with `replace()`

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

##### Finding Smallest Values

The `nsmallest()` method returns the smallest n elements of a Series.

In [2]:
# Create a Series with population data
countries_population = {"Italy": 59000000, "France": 65000000,
                       "Brunei": 434000, "Malta": 434000,
                       "Maldives": 434000, "Iceland": 337000,
                       "Nauru": 11300, "Tuvalu": 11300,
                       "Anguilla": 11300, "Monserat": 5200}
s = pd.Series(countries_population)
print("Countries population:")
print(s)

Countries population:
Italy       59000000
France      65000000
Brunei        434000
Malta         434000
Maldives      434000
Iceland       337000
Nauru          11300
Tuvalu         11300
Anguilla       11300
Monserat        5200
dtype: int64


In [3]:
# Get the 5 smallest values (default)
print("The 5 smallest populations:")
print(s.nsmallest())

The 5 smallest populations:
Monserat      5200
Nauru        11300
Tuvalu       11300
Anguilla     11300
Iceland     337000
dtype: int64


In [4]:
# Get the 3 smallest values
print("The 3 smallest populations (keep='first'):")
print(s.nsmallest(3))

The 3 smallest populations (keep='first'):
Monserat     5200
Nauru       11300
Tuvalu      11300
dtype: int64


In [5]:
# Get the 3 smallest values, keeping the last duplicates
print("The 3 smallest populations (keep='last'):")
print(s.nsmallest(3, keep='last'))

The 3 smallest populations (keep='last'):
Monserat     5200
Anguilla    11300
Tuvalu      11300
dtype: int64


In [6]:
# Get the 3 smallest values, keeping all duplicates
print("The 3 smallest populations (keep='all'):")
print(s.nsmallest(3, keep='all'))

The 3 smallest populations (keep='all'):
Monserat     5200
Nauru       11300
Tuvalu      11300
Anguilla    11300
dtype: int64


##### Counting Unique Values

The `nunique()` method returns the number of unique elements in a Series.

In [7]:
# Create a Series with some duplicates
s = pd.Series([1, 3, 5, 7, 7])
print("Series:")
print(s)

Series:
0    1
1    3
2    5
3    7
4    7
dtype: int64


In [8]:
# Count unique values
print(f"Number of unique values: {s.nunique()}")

Number of unique values: 4


In [9]:
# Create a Series with NaN values
s_with_nan = pd.Series([1, 3, 5, 7, 7, np.nan, np.nan])
print("Series with NaN values:")
print(s_with_nan)

Series with NaN values:
0    1.0
1    3.0
2    5.0
3    7.0
4    7.0
5    NaN
6    NaN
dtype: float64


In [10]:
# Count unique values, excluding NaN (default)
print(f"Number of unique values (excluding NaN): {s_with_nan.nunique()}")

Number of unique values (excluding NaN): 4


In [11]:
# Count unique values, including NaN
print(f"Number of unique values (including NaN): {s_with_nan.nunique(dropna=False)}")

Number of unique values (including NaN): 5


##### Calculating Percentage Change

The `pct_change()` method calculates the percentage change between the current and a prior element.

In [12]:
# Create a Series with stock prices
stock_prices = pd.Series([100, 102, 99, 101, 105, 110, 108], 
                         index=pd.date_range('2023-01-01', periods=7))
print("Stock prices:")
print(stock_prices)

Stock prices:
2023-01-01    100
2023-01-02    102
2023-01-03     99
2023-01-04    101
2023-01-05    105
2023-01-06    110
2023-01-07    108
Freq: D, dtype: int64


In [13]:
# Calculate percentage change (default: 1 period)
print("Percentage change (1 period):")
print(stock_prices.pct_change())

Percentage change (1 period):
2023-01-01         NaN
2023-01-02    0.020000
2023-01-03   -0.029412
2023-01-04    0.020202
2023-01-05    0.039604
2023-01-06    0.047619
2023-01-07   -0.018182
Freq: D, dtype: float64


In [14]:
# Calculate percentage change over 2 periods
print("Percentage change (2 periods):")
print(stock_prices.pct_change(periods=2))

Percentage change (2 periods):
2023-01-01         NaN
2023-01-02         NaN
2023-01-03   -0.010000
2023-01-04   -0.009804
2023-01-05    0.060606
2023-01-06    0.089109
2023-01-07    0.028571
Freq: D, dtype: float64


In [15]:
# Create a Series with missing values
stock_prices_with_nan = pd.Series([100, np.nan, 99, 101, np.nan, 110, 108], 
                                 index=pd.date_range('2023-01-01', periods=7))
print("Stock prices with missing values:")
print(stock_prices_with_nan)

Stock prices with missing values:
2023-01-01    100.0
2023-01-02      NaN
2023-01-03     99.0
2023-01-04    101.0
2023-01-05      NaN
2023-01-06    110.0
2023-01-07    108.0
Freq: D, dtype: float64


In [16]:
# Calculate percentage change with fill_method='pad' (default)
print("Percentage change with fill_method='pad':")
print(stock_prices_with_nan.pct_change(fill_method='pad'))

Percentage change with fill_method='pad':
2023-01-01         NaN
2023-01-02    0.000000
2023-01-03   -0.010000
2023-01-04    0.020202
2023-01-05    0.000000
2023-01-06    0.089109
2023-01-07   -0.018182
Freq: D, dtype: float64


  print(stock_prices_with_nan.pct_change(fill_method='pad'))


In [17]:
# Calculate percentage change with fill_method='ffill'
print("Percentage change with fill_method='ffill':")
print(stock_prices_with_nan.pct_change(fill_method='ffill'))

Percentage change with fill_method='ffill':
2023-01-01         NaN
2023-01-02    0.000000
2023-01-03   -0.010000
2023-01-04    0.020202
2023-01-05    0.000000
2023-01-06    0.089109
2023-01-07   -0.018182
Freq: D, dtype: float64


  print(stock_prices_with_nan.pct_change(fill_method='ffill'))


In [18]:
# Calculate percentage change with fill_method='bfill'
print("Percentage change with fill_method='bfill':")
print(stock_prices_with_nan.pct_change(fill_method='bfill'))

Percentage change with fill_method='bfill':
2023-01-01         NaN
2023-01-02   -0.010000
2023-01-03    0.000000
2023-01-04    0.020202
2023-01-05    0.089109
2023-01-06    0.000000
2023-01-07   -0.018182
Freq: D, dtype: float64


  print(stock_prices_with_nan.pct_change(fill_method='bfill'))


##### Reordering Index Levels

The `reorder_levels()` method rearranges the levels of a MultiIndex.

In [19]:
# Create a Series with MultiIndex
midx = pd.MultiIndex.from_tuples([('a', 'one'), ('a', 'two'), ('b', 'one'), ('b', 'two')],
                                names=['letter', 'number'])
s = pd.Series([1, 2, 3, 4], index=midx)
print("Series with MultiIndex:")
print(s)

Series with MultiIndex:
letter  number
a       one       1
        two       2
b       one       3
        two       4
dtype: int64


In [20]:
# Reorder levels
s_reordered = s.reorder_levels([1, 0])
print("Series with reordered levels:")
print(s_reordered)

Series with reordered levels:
number  letter
one     a         1
two     a         2
one     b         3
two     b         4
dtype: int64


In [21]:
# Create a DataFrame with MultiIndex
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]},
                 index=midx)
print("DataFrame with MultiIndex:")
print(df)

DataFrame with MultiIndex:
               A  B
letter number      
a      one     1  5
       two     2  6
b      one     3  7
       two     4  8


In [22]:
# Reorder levels in DataFrame
df_reordered = df.reorder_levels([1, 0])
print("DataFrame with reordered levels:")
print(df_reordered)

DataFrame with reordered levels:
               A  B
number letter      
one    a       1  5
two    a       2  6
one    b       3  7
two    b       4  8


##### Repeating Elements

The `repeat()` method repeats elements of a Series.

In [23]:
# Create a simple Series
s = pd.Series(['a', 'b', 'c'])
print("Original Series:")
print(s)

Original Series:
0    a
1    b
2    c
dtype: object


In [24]:
# Repeat each element twice
s_repeated = s.repeat(2)
print("Series with each element repeated twice:")
print(s_repeated)

Series with each element repeated twice:
0    a
0    a
1    b
1    b
2    c
2    c
dtype: object


In [25]:
# Repeat elements a different number of times
s_varied = s.repeat([1, 2, 3])
print("Series with elements repeated [1, 2, 3] times:")
print(s_varied)

Series with elements repeated [1, 2, 3] times:
0    a
1    b
1    b
2    c
2    c
2    c
dtype: object


In [26]:
# Repeat with 0 times
s_zero = s.repeat(0)
print("Series with elements repeated 0 times:")
print(s_zero)

Series with elements repeated 0 times:
Series([], dtype: object)


##### Replacing Values

The `replace()` method replaces values in a Series.

In [27]:
# Create a Series
s = pd.Series(['apple', 'banana', 'carrot', 'apple', 'banana'])
print("Original Series:")
print(s)

Original Series:
0     apple
1    banana
2    carrot
3     apple
4    banana
dtype: object


In [28]:
# Replace a single value
s_replaced = s.replace('apple', 'orange')
print("Series with 'apple' replaced by 'orange':")
print(s_replaced)

Series with 'apple' replaced by 'orange':
0    orange
1    banana
2    carrot
3    orange
4    banana
dtype: object


In [29]:
# Replace multiple values
s_multi_replaced = s.replace(['apple', 'banana'], ['orange', 'grape'])
print("Series with multiple replacements:")
print(s_multi_replaced)

Series with multiple replacements:
0    orange
1     grape
2    carrot
3    orange
4     grape
dtype: object


In [30]:
# Replace using a dictionary
s_dict_replaced = s.replace({'apple': 'orange', 'banana': 'grape'})
print("Series with dictionary replacements:")
print(s_dict_replaced)

Series with dictionary replacements:
0    orange
1     grape
2    carrot
3    orange
4     grape
dtype: object


In [31]:
# Replace using regex
s_regex = pd.Series(['apple', 'banana', 'carrot', 'pineapple', 'strawberry'])
s_regex_replaced = s_regex.replace(r'^a.*', 'fruit', regex=True)
print("Original Series:")
print(s_regex)
print("\nSeries with regex replacement:")
print(s_regex_replaced)

Original Series:
0         apple
1        banana
2        carrot
3     pineapple
4    strawberry
dtype: object

Series with regex replacement:
0         fruit
1        banana
2        carrot
3     pineapple
4    strawberry
dtype: object


In [32]:
# Replace in-place
s_inplace = s.copy()
s_inplace.replace('apple', 'orange', inplace=True)
print("Series after in-place replacement:")
print(s_inplace)

Series after in-place replacement:
0    orange
1    banana
2    carrot
3    orange
4    banana
dtype: object


In [33]:
# Replace with numeric values
s_num = pd.Series([1, 2, 3, 4, 5])
s_num_replaced = s_num.replace(1, 10)
print("Original numeric Series:")
print(s_num)
print("\nSeries with numeric replacement:")
print(s_num_replaced)

Original numeric Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64

Series with numeric replacement:
0    10
1     2
2     3
3     4
4     5
dtype: int64


##### Conclusion

In this notebook, we've explored various Series methods in pandas:

1. Finding smallest values with `nsmallest()`, which returns the n smallest elements in a Series with options for handling duplicates.
2. Counting unique values with `nunique()`, which returns the number of unique elements in a Series with options for handling NaN values.
3. Calculating percentage change with `pct_change()`, which computes the percentage change between elements with options for handling missing values.
4. Reordering index levels with `reorder_levels()`, which rearranges the levels of a MultiIndex.
5. Repeating elements with `repeat()`, which creates a new Series with repeated elements.
6. Replacing values with `replace()`, which substitutes values in a Series with other values.

These methods are essential tools for data manipulation and analysis in pandas, allowing for flexible and powerful operations on your data.