# 02.01 - Pandas Series

## Introduction

Pandas is a data manipulation and analysis library in Python. It provides data structures and functions needed to manipulate structured data. The key data structures in Pandas are Series and DataFrame. In this notebook, we'll be focusing on Series, a one-dimensional labeled array capable of holding any data type.

We will be covering the following topics:

1. **Creating a Pandas Series** - Here, we will learn how to create a Pandas Series from various data types like lists, dictionaries, and NumPy arrays.
2. **Accessing Elements in a Series** - This involves retrieving data from a series, either by using the index or label.
3. **Operations on Series** - We will cover how to perform mathematical and boolean operations on a series.
4. **Handling Missing Data in Series** - This section deals with identifying and handling missing or null data in a series.
5. **Series Functions** - We'll learn about various built-in Series functions like .describe(), .count(), and .value_counts().

Understanding these basics of Pandas Series is crucial when dealing with data manipulation and analysis in Python.

## Section 1: Creating a Pandas Series

### 1.1 - From a List

Pandas Series can be created from list data structures in Python. The index of the series will be the default integer index.

**Example 1: Creating a Simple Series from a List**

In [2]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series)
# Output:
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# dtype: int64

0    1
1    2
2    3
3    4
4    5
dtype: int64


**Example 2: Creating a Series from a List with a Custom Index**

In [3]:
import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series)
# Output:
# a    1
# b    2
# c    3
# d    4
# e    5
# dtype: int64

a    1
b    2
c    3
d    4
e    5
dtype: int64


**Example 3: Creating a Series from a List of Different Data Types**

In [4]:
import pandas as pd

data = [1, "two", 3.0, "four", 5]
series = pd.Series(data)

print(series)
# Output:
# 0       1
# 1     two
# 2     3.0
# 3    four
# 4       5
# dtype: object

0       1
1     two
2     3.0
3    four
4       5
dtype: object


**Example 4: Creating a Series from a List of Objects**

In [5]:
import pandas as pd

class CustomObject:
    def __init__(self, name):
        self.name = name

data = [CustomObject("Object1"), CustomObject("Object2"), CustomObject("Object3")]
series = pd.Series(data)

print(series)
# Output:
# 0    <__main__.CustomObject object at 0x10f406cb0>
# 1    <__main__.CustomObject object at 0x12fcfcc10>
# 2    <__main__.CustomObject object at 0x12fcfd000>
# dtype: object

0    <__main__.CustomObject object at 0x10f406cb0>
1    <__main__.CustomObject object at 0x12fcfcc10>
2    <__main__.CustomObject object at 0x12fcfd000>
dtype: object


**Example 5: Creating a Series from a List with Missing Values**

In [6]:
import pandas as pd

data = [1, 2, None, 4, 5]
series = pd.Series(data)

print(series)
# Output:
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    5.0
# dtype: float64

0    1.0
1    2.0
2    NaN
3    4.0
4    5.0
dtype: float64


### 1.2 - From a Dictionary

Pandas Series can also be created from dictionary data structures in Python. The keys of the dictionary would become the index of the series, and the values of the dictionary would become the values of the series.

**Example 1: Creating a Simple Series from a Dictionary**

In [7]:
import pandas as pd

data = {'a': 1, 'b': 2, 'c': 3}
series = pd.Series(data)

print(series)
# Output:
# a    1
# b    2
# c    3
# dtype: int64

a    1
b    2
c    3
dtype: int64


**Example 2: Creating a Series from a Dictionary with Missing Values**

In [8]:
import pandas as pd

data = {'a': 1, 'b': 2, 'c': None}
series = pd.Series(data)

print(series)
# Output:
# a    1.0
# b    2.0
# c    NaN
# dtype: float64

a    1.0
b    2.0
c    NaN
dtype: float64


**Example 3: Creating a Series from a Dictionary with Different Data Types**

In [9]:
import pandas as pd

data = {'a': 1, 'b': "two", 'c': 3.0, 'd': None}
series = pd.Series(data)

print(series)
# Output:
# a      1
# b    two
# c    3.0
# d   None
# dtype: object

a       1
b     two
c     3.0
d    None
dtype: object


**Example 4: Creating a Series from a Dictionary of Objects**

In [10]:
import pandas as pd

class CustomObject:
    def __init__(self, name):
        self.name = name

data = {'obj1': CustomObject("Object1"), 'obj2': CustomObject("Object2"), 'obj3': CustomObject("Object3")}
series = pd.Series(data)

print(series)
# Output:
# obj1    <__main__.CustomObject object at 0x10f406e90>
# obj2    <__main__.CustomObject object at 0x12fcfdf00>
# obj3    <__main__.CustomObject object at 0x12fcfcc10>
# dtype: object

obj1    <__main__.CustomObject object at 0x10f406e90>
obj2    <__main__.CustomObject object at 0x12fcfdf00>
obj3    <__main__.CustomObject object at 0x12fcfcc10>
dtype: object


**Example 5: Creating a Series from a Dictionary with a Custom Index**

If a custom index is provided, it will override the dictionary keys. Missing values are filled with `NaN`.

In [11]:
import pandas as pd

data = {'a': 1, 'b': 2, 'c': 3}
index = ['a', 'b', 'd']
series = pd.Series(data, index=index)

print(series)
# Output:
# a    1.0
# b    2.0
# d    NaN
# dtype: float64

a    1.0
b    2.0
d    NaN
dtype: float64


### 1.3 - From a NumPy Array

Pandas Series can also be created from NumPy arrays. The index of the series will be the default integer index.

**Example 1: Creating a Simple Series from a NumPy Array**

In [12]:
import pandas as pd
import numpy as np

data = np.array([1, 2, 3, 4, 5])
series = pd.Series(data)

print(series)
# Output:
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# dtype: int32

0    1
1    2
2    3
3    4
4    5
dtype: int64


**Example 2: Creating a Series from a NumPy Array with a Custom Index**

In [13]:
import pandas as pd
import numpy as np

data = np.array([1, 2, 3, 4, 5])
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series)
# Output:
# a    1
# b    2
# c    3
# d    4
# e    5
# dtype: int64

a    1
b    2
c    3
d    4
e    5
dtype: int64


**Example 3: Creating a Series from a NumPy Array of Different Data Types**

In [14]:
import pandas as pd
import numpy as np

data = np.array([1, "two", 3.0, "four", 5])
series = pd.Series(data)

print(series)
# Output:
# 0       1
# 1     two
# 2     3.0
# 3    four
# 4       5
# dtype: object

0       1
1     two
2     3.0
3    four
4       5
dtype: object


**Example 4: Creating a Series from a NumPy Array of Objects**

In [15]:
import pandas as pd
import numpy as np

class CustomObject:
    def __init__(self, name):
        self.name = name

data = np.array([CustomObject("Object1"), CustomObject("Object2"), CustomObject("Object3")])
series = pd.Series(data)

print(series)
# Output:
# 0    <__main__.CustomObject object at 0x10f4075e0>
# 1    <__main__.CustomObject object at 0x12fcfe500>
# 2    <__main__.CustomObject object at 0x12fcfd2a0>
# dtype: object

0    <__main__.CustomObject object at 0x10f4075e0>
1    <__main__.CustomObject object at 0x12fcfe500>
2    <__main__.CustomObject object at 0x12fcfd2a0>
dtype: object


**Example 5: Creating a Series from a NumPy Array with Missing Values**

In [16]:
import pandas as pd
import numpy as np

data = np.array([1, 2, np.nan, 4, 5])
series = pd.Series(data)

print(series)
# Output:
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    5.0
# dtype: float64

0    1.0
1    2.0
2    NaN
3    4.0
4    5.0
dtype: float64


## Section 2: Accessing Elements in a Series

Accessing elements in a Pandas Series can be performed in several ways, primarily through indexing by number (integer index) or label (index label).

### 2.1 - Accessing by Index

Each element in a series has a unique integer index, which is assigned by default and starts from 0. This index can be used to access the corresponding element.

**Example 1: Accessing a Single Element by Index**

In [17]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series[3])  # Output: 4

4


**Example 2: Accessing Multiple Elements by Index**

In [18]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series[[1, 3, 4]])  # Output: 1    2
                          #          3    4
                          #          4    5
                          # dtype: int64

1    2
3    4
4    5
dtype: int64


**Example 3: Accessing a Range of Elements by Index**

In [19]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series[1:4])  # Output: 1    2
                    #         2    3
                    #         3    4
                    # dtype: int64

1    2
2    3
3    4
dtype: int64


### 2.2 - Accessing by Label

If a series has a defined index label, this label can be used to access the corresponding element. This is similar to how keys are used to access values in a dictionary.

**Example 1: Accessing a Single Element by Label**

In [21]:
import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series['c'])  # Output: 3

3


**Example 2: Accessing Multiple Elements by Label**

In [22]:
import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series[['b', 'd', 'e']])  # Output: b    2
                                #         d    4
                                #         e    5
                                # dtype: int64

b    2
d    4
e    5
dtype: int64


**Example 3: Accessing a Range of Elements by Label**

In [23]:
import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series['b':'d'])  # Output: b    2
                        #         c    3
                        #         d    4
                        # dtype: int64

b    2
c    3
d    4
dtype: int64


**Example 4: Accessing an Element by Label Using the `.loc` Accessor**

The `.loc` accessor is used to access a group of rows and columns by label(s) or a boolean array.

In [24]:
import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series.loc['b'])  # Output: 2

2


**Example 5: Accessing Multiple Elements by Label Using the `.loc` Accessor**

In [25]:
import pandas as pd

data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series.loc[['a', 'c', 'e']])  # Output: a    1
                                    #         c    3
                                    #         e    5
                                    # dtype: int64

a    1
c    3
e    5
dtype: int64


### 2.3 - Accessing by Condition

You can also access and manipulate elements in a series based on a certain condition or criteria.

**Example 1: Accessing Elements That Satisfy a Condition**

In [26]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series[series > 3])  # Output: 3    4
                           #         4    5
                           # dtype: int64

3    4
4    5
dtype: int64


**Example 2: Modifying Elements That Satisfy a Condition**

In [27]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

series[series > 3] = 10

print(series)  # Output: 0     1
               #         1     2
               #         2     3
               #         3    10
               #         4    10
               # dtype: int64

0     1
1     2
2     3
3    10
4    10
dtype: int64


**Example 3: Accessing Elements That Satisfy Multiple Conditions**

In [28]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series[(series > 2) & (series < 5)])  # Output: 2    3
                                            #         3    4
                                            # dtype: int64

2    3
3    4
dtype: int64


**Example 4: Accessing Elements Using the `.where()` Function**

The `.where()` function returns a new series that replaces all values where the condition is `False`.

In [29]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series.where(series > 3))  # Output: 0    NaN
                                 #         1    NaN
                                 #         2    NaN
                                 #         3    4.0
                                 #         4    5.0
                                 # dtype: float64

0    NaN
1    NaN
2    NaN
3    4.0
4    5.0
dtype: float64


## Section 3: Operations on Series

### 3.1 - Mathematical Operations

When working with Pandas Series, various mathematical operations can be performed. Both unary and binary mathematical operators work with series.

**Example 1: Addition of a Constant to a Series**

In [31]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series + 10)  # Output: 0    11
                    #         1    12
                    #         2    13
                    #         3    14
                    #         4    15
                    # dtype: int64

0    11
1    12
2    13
3    14
4    15
dtype: int64


**Example 2: Subtraction of a Constant from a Series**

In [32]:
import pandas as pd

data = [11, 12, 13, 14, 15]
series = pd.Series(data)

print(series - 10)  # Output: 0    1
                    #         1    2
                    #         2    3
                    #         3    4
                    #         4    5
                    # dtype: int64

0    1
1    2
2    3
3    4
4    5
dtype: int64


**Example 3: Addition of Two Series**

When adding two series together, Pandas aligns them by their index. If the index does not match, the result is `NaN`.

In [33]:
import pandas as pd

data1 = [1, 2, 3, 4, 5]
data2 = [10, 20, 30, 40, 50]
series1 = pd.Series(data1)
series2 = pd.Series(data2)

print(series1 + series2)  # Output: 0    11
                          #         1    22
                          #         2    33
                          #         3    44
                          #         4    55
                          # dtype: int64

0    11
1    22
2    33
3    44
4    55
dtype: int64


**Example 4: Multiplication of Two Series**

In [34]:
import pandas as pd

data1 = [1, 2, 3, 4, 5]
data2 = [10, 20, 30, 40, 50]
series1 = pd.Series(data1)
series2 = pd.Series(data2)

print(series1 * series2)  # Output: 0     10
                          #         1     40
                          #         2     90
                          #         3    160
                          #         4    250
                          # dtype: int64

0     10
1     40
2     90
3    160
4    250
dtype: int64


**Example 5: Division of Two Series**

In [35]:
import pandas as pd

data1 = [10, 40, 90, 160, 250]
data2 = [10, 20, 30, 40, 50]
series1 = pd.Series(data1)
series2 = pd.Series(data2)

print(series1 / series2)  # Output: 0    1.0
                          #         1    2.0
                          #         2    3.0
                          #         3    4.0
                          #         4    5.0
                          # dtype: float64

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64


### 3.2 - Boolean Operations

Boolean operations in Pandas Series are performed element-wise and return a series of `True` or `False` values.

### Example 1: Comparison of a Series with a Constant

In [36]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series > 3)  # Output: 0    False
                   #         1    False
                   #         2    False
                   #         3     True
                   #         4     True
                   # dtype: bool

0    False
1    False
2    False
3     True
4     True
dtype: bool


### Example 2: Comparison of Two Series

In [37]:
import pandas as pd

data1 = [1, 2, 3, 4, 5]
data2 = [5, 4, 3, 2, 1]
series1 = pd.Series(data1)
series2 = pd.Series(data2)

print(series1 > series2)  # Output: 0    False
                          #         1    False
                          #         2     False
                          #         3     True
                          #         4     True
                          # dtype: bool

0    False
1    False
2    False
3     True
4     True
dtype: bool


### Example 3: Logical AND Operation

In [38]:
import pandas as pd

data1 = [True, False, True, False, True]
data2 = [True, True, False, False, True]
series1 = pd.Series(data1)
series2 = pd.Series(data2)

print(series1 & series2)  # Output: 0     True
                          #         1    False
                          #         2    False
                          #         3    False
                          #         4     True
                          # dtype: bool

0     True
1    False
2    False
3    False
4     True
dtype: bool


### Example 4: Logical OR Operation

In [39]:
import pandas as pd

data1 = [True, False, True, False, True]
data2 = [True, True, False, False, True]
series1 = pd.Series(data1)
series2 = pd.Series(data2)

print(series1 | series2)  # Output: 0    True
                          #         1    True
                          #         2    True
                          #         3    False
                          #         4    True
                          # dtype: bool

0     True
1     True
2     True
3    False
4     True
dtype: bool


### Example 5: Logical NOT Operation

In [40]:
import pandas as pd

data = [True, False, True, False, True]
series = pd.Series(data)

print(~series)  # Output: 0    False
                #         1     True
                #         2    False
                #         3     True
                #         4    False
                # dtype: bool

0    False
1     True
2    False
3     True
4    False
dtype: bool


## Section 4: Handling Missing Data in Series

Pandas provides various methods to handle missing data (represented as `NaN`) in a Pandas Series.

### 4.1 - Detecting Missing Values

Pandas provides the `isnull()` and `notnull()` functions to detect missing values in a series. These functions return a boolean same-sized object indicating if the values are NA.

**Example 1: Using the `isnull()` Function**

In [41]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

print(series.isnull())
# Output:
# 0    False
# 1     True
# 2    False
# 3     True
# 4    False
# dtype: bool

0    False
1     True
2    False
3     True
4    False
dtype: bool


**Example 2: Using the `notnull()` Function**

In [42]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

print(series.notnull())
# Output:
# 0     True
# 1    False
# 2     True
# 3    False
# 4     True
# dtype: bool

0     True
1    False
2     True
3    False
4     True
dtype: bool


**Example 3: Using `notnull()` to Filter Out Missing Values**

In [43]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

filtered_series = series[series.notnull()]

print(filtered_series)
# Output:
# 0    1.0
# 2    3.0
# 4    5.0
# dtype: float64

0    1.0
2    3.0
4    5.0
dtype: float64


**Example 4: Count the Number of Missing Values**

In [44]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

num_missing = series.isnull().sum()

print(num_missing)
# Output: 2

2


**Example 5: Using `isnull()` in Conjunction with the `any()` Function**

The `any()` function returns `True` if at least one element of an iterable is `True`. It can be used in conjunction with `isnull()` to check if there are any missing values in a series.

In [45]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

print(series.isnull().any())
# Output: True

True


### 4.2 - Filling Missing Values

Pandas provides several methods to fill missing values in a series, such as `fillna()`, `ffill()`, and `bfill()`. You can also fill missing values with statistical methods like mean, median, mode, etc.

**Example 1: Using the `fillna()` Function**

The `fillna()` function is used to fill NA/NaN values using the specified method.

In [46]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

print(series.fillna(0))
# Output:
# 0    1.0
# 1    0.0
# 2    3.0
# 3    0.0
# 4    5.0
# dtype: float64

0    1.0
1    0.0
2    3.0
3    0.0
4    5.0
dtype: float64


**Example 2: Filling Missing Values with Mean**

You can use the mean of the non-null values in the series to fill the missing values.

In [47]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

mean = series.mean()
print(series.fillna(mean))
# Output:
# 0    1.0
# 1    3.0
# 2    3.0
# 3    3.0
# 4    5.0
# dtype: float64

0    1.0
1    3.0
2    3.0
3    3.0
4    5.0
dtype: float64


**Example 3: Filling Missing Values with Median**

You can use the median of the non-null values in the series to fill the missing values.

In [48]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

median = series.median()
print(series.fillna(median))
# Output:
# 0    1.0
# 1    3.0
# 2    3.0
# 3    3.0
# 4    5.0
# dtype: float64

0    1.0
1    3.0
2    3.0
3    3.0
4    5.0
dtype: float64


**Example 4: Filling Missing Values with Mode**

You can use the mode of the non-null values in the series to fill the missing values.

In [49]:
import pandas as pd

data = [1, 1, 3, None, 5]
series = pd.Series(data)

mode = series.mode()[0]
print(series.fillna(mode))
# Output:
# 0    1.0
# 1    1.0
# 2    3.0
# 3    1.0
# 4    5.0
# dtype: float64

0    1.0
1    1.0
2    3.0
3    1.0
4    5.0
dtype: float64


**Example 5: Using `interpolate()` function to fill missing values**

The `interpolate()` function is used to fill NA values in the series or dataframe, but it uses various interpolation techniques to fill the missing values rather than hard-coding the value.

In [50]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

print(series.interpolate())
# Output:
# 0    1.0
# 1    2.0
# 2    3.0
# 3    4.0
# 4    5.0
# dtype: float64

0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64


### 4.3 - Dropping Missing Values

The `dropna()` function is used to remove missing values (represented as `NaN`) from a series.

**Example 1: Dropping Missing Values**

In [51]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

print(series.dropna())
# Output:
# 0    1.0
# 2    3.0
# 4    5.0
# dtype: float64

0    1.0
2    3.0
4    5.0
dtype: float64


**Example 2: Dropping Missing Values in Place**

The `inplace` parameter, when set to `True`, allows to modify the original series.

In [52]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

series.dropna(inplace=True)
print(series)
# Output:
# 0    1.0
# 2    3.0
# 4    5.0
# dtype: float64

0    1.0
2    3.0
4    5.0
dtype: float64


**Example 3: Verifying the Removal of Missing Values**

Once the missing values have been dropped, you can verify their removal by using the `isnull()` function in conjunction with the `any()` function.

In [53]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

series.dropna(inplace=True)
print(series.isnull().any())
# Output: False

False


**Example 4: Counting the Number of Values Before and After Dropping Missing Values**

This can be helpful for comparing the size of the series before and after the operation.

In [55]:
import pandas as pd

data = [1, None, 3, None, 5]
series = pd.Series(data)

num_before = series.count() + series.isnull().sum()
series.dropna(inplace=True)
num_after = series.count() + series.isnull().sum()

print("Number of values before: ", num_before)
print("Number of values after: ", num_after)
# Output:
# Number of values before:  5
# Number of values after:  3

Number of values before:  5
Number of values after:  3


**Example 5: Dropping Missing Values from a Series with a Custom Index**

If a series has a custom index, `dropna()` will also drop the corresponding index label for each missing value.

In [56]:
import pandas as pd

data = [1, None, 3, None, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)

print(series.dropna())
# Output:
# a    1.0
# c    3.0
# e    5.0
# dtype: float64

a    1.0
c    3.0
e    5.0
dtype: float64


### 4.4 - Replacing Missing Values

In some cases, instead of dropping or filling missing values, you may want to replace them with a specific value or values. The `replace()` function is used to replace a set of values with another set of values.

**Example 1: Replacing a Single Value**

In [61]:
import pandas as pd

data = [1, 2, 3, 3, 5]
series = pd.Series(data)

print(series.replace(3, 4))
# Output:
# 0    1.0
# 1    0.0
# 2    3.0
# 3    0.0
# 4    5.0
# dtype: float64

0    1
1    2
2    4
3    4
4    5
dtype: int64


**Example 2: Replacing Multiple Values**

In [62]:
import pandas as pd

data = [1, 'unknown', 3, 'unknown', 5]
series = pd.Series(data)

print(series.replace('unknown', 0))
# Output:
# 0    1
# 1    0
# 2    3
# 3    0
# 4    5
# dtype: int64

0    1
1    0
2    3
3    0
4    5
dtype: int64


  print(series.replace('unknown', 0))


**Example 3: Replacing Values with a Dictionary**

In this example, different values are replaced with different replacement values.

In [63]:
import pandas as pd

data = [1, 'unknown', 3, 'missing', 5]
series = pd.Series(data)

print(series.replace({'unknown': 0, 'missing': -1}))
# Output:
# 0    1
# 1    0
# 2    3
# 3   -1
# 4    5
# dtype: int64

0    1
1    0
2    3
3   -1
4    5
dtype: int64


  print(series.replace({'unknown': 0, 'missing': -1}))


**Example 4: Replacing Values in Place**

The `inplace` parameter, when set to `True`, allows to modify the original series.

In [64]:
import pandas as pd

data = [1, 'unknown', 3, 'missing', 5]
series = pd.Series(data)

series.replace('unknown', 0, inplace=True)
print(series)
# Output:
# 0         1
# 1         0
# 2         3
# 3    missing
# 4         5
# dtype: int64

0          1
1          0
2          3
3    missing
4          5
dtype: object


**Example 5: Replacing Values Using Regular Expressions**

In this example, the `regex` parameter is set to `True`, which means that regular expressions can be used in the `to_replace` parameter.

In [65]:
import pandas as pd

data = ['one', 'two', 'three', 'four', 'five']
series = pd.Series(data)

print(series.replace(r'^t.*$', 'match', regex=True))
# Output:
# 0      one
# 1    match
# 2    match
# 3     four
# 4     five
# dtype: object

0      one
1    match
2    match
3     four
4     five
dtype: object


In this example, all values that start with 't' are replaced with 'match'.

## Section 5: Series Functions

### 5.1 - `.describe()`

The `.describe()` function is used to generate descriptive statistics of a Series or the columns of a DataFrame. It provides central tendency, dispersion, and shape of the dataset's distribution, excluding `NaN` values.

**Example 1: Basic Usage of .describe()**

In [66]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series.describe())
# Output:
# count    5.0
# mean     3.0
# std      1.581139
# min      1.0
# 25%      2.0
# 50%      3.0
# 75%      4.0
# max      5.0
# dtype: float64

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
25%      2.000000
50%      3.000000
75%      4.000000
max      5.000000
dtype: float64


**Example 2: .describe() with Object Data**

When applied to an object Series, `.describe()` returns the count, number of unique values, most frequent value, and frequency of the most frequent value.

In [67]:
import pandas as pd

data = ['a', 'b', 'a', 'a', 'b', 'c', 'b', 'b', 'a']
series = pd.Series(data)

print(series.describe())
# Output:
# count     9
# unique    3
# top       a
# freq      4
# dtype: object

count     9
unique    3
top       a
freq      4
dtype: object


**Example 3: .describe() with datetime Data**

When applied to a datetime Series, `.describe()` returns the count, number of unique values, most frequent value, and frequency of the most frequent value.

In [68]:
import pandas as pd

data = pd.date_range('2020-01-01', periods=5, freq='D')
series = pd.Series(data)

print(series.describe())
# Output:
# count                      5
# mean     2020-01-03 00:00:00
# min      2020-01-01 00:00:00
# 25%      2020-01-02 00:00:00
# 50%      2020-01-03 00:00:00
# 75%      2020-01-04 00:00:00
# max      2020-01-05 00:00:00
# dtype: object

count                      5
mean     2020-01-03 00:00:00
min      2020-01-01 00:00:00
25%      2020-01-02 00:00:00
50%      2020-01-03 00:00:00
75%      2020-01-04 00:00:00
max      2020-01-05 00:00:00
dtype: object


**Example 4: .describe() with Options**

You can pass options to `.describe()` to control what summary statistics are calculated. For example, you can use the `percentiles` option to calculate arbitrary percentiles.

In [69]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series.describe(percentiles=[.05, .25, .75, .95]))
# Output:
# count    5.0
# mean     3.0
# std      1.581139
# min      1.0
# 5%       1.2
# 25%      2.0
# 50%      3.0
# 75%      4.0
# 95%      4.8
# max      5.0
# dtype: float64

count    5.000000
mean     3.000000
std      1.581139
min      1.000000
5%       1.200000
25%      2.000000
50%      3.000000
75%      4.000000
95%      4.800000
max      5.000000
dtype: float64


### 5.2 - .count()

The `.count()` function is used to count the number of non-NA/null values in the Series.

**Example 1: Basic Usage of .count()**

In [72]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series.count())  # Output: 5

5


**Example 2: .count() with Missing Values**

If the Series contains missing values, they are not included in the count.

In [73]:
import pandas as pd

data = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data)

print(series.count())  # Output: 5

5


**Example 4: .count() with Boolean Values**

When used with a Series of boolean values, `.count()` returns the number of `True` and `False` values, not counting `None` values.

In [74]:
import pandas as pd

data = [True, False, True, None, True]
series = pd.Series(data)

print(series.count())  # Output: 4

4


**Example 5: .count() with Filtering**

You can use `.count()` in conjunction with boolean indexing to count the number of values that satisfy a certain condition.

In [75]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)

print(series[series > 2].count())  # Output: 3

3


In this example, `.count()` is used to count the number of values in the series that are greater than 2.

### 5.3 - .value_counts()

The `.value_counts()` function is used to get a Series containing counts of unique values. The resulting object will have values sorted in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

**Example 1: Basic Usage of .value_counts()**

In [76]:
import pandas as pd

data = ['a', 'b', 'a', 'a', 'b', 'c', 'b', 'b', 'a']
series = pd.Series(data)

print(series.value_counts())
# Output:
# a    4
# b    4
# c    1
# dtype: int64

a    4
b    4
c    1
Name: count, dtype: int64


**Example 2: .value_counts() with Normalization**

By setting the `normalize` parameter to `True`, `.value_counts()` will return the relative frequencies of the unique values.

In [77]:
import pandas as pd

data = ['a', 'b', 'a', 'a', 'b', 'c', 'b', 'b', 'a']
series = pd.Series(data)

print(series.value_counts(normalize=True))
# Output:
# a    0.444444
# b    0.444444
# c    0.111111
# dtype: float64

a    0.444444
b    0.444444
c    0.111111
Name: proportion, dtype: float64


**Example 3: .value_counts() with Binning**

If the Series is a numeric series, we can also bin the values into discrete intervals with the `bins` parameter.

In [78]:
import pandas as pd

data = [1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4]
series = pd.Series(data)

print(series.value_counts(bins=2))
# Output:
# (2.5, 4.0]      9
# (0.996, 2.5]    5
# Name: count, dtype: int64

(2.5, 4.0]      9
(0.996, 2.5]    5
Name: count, dtype: int64


**Example 4: .value_counts() with Missing Values**

By default, `.value_counts()` excludes NA values. However, you can include them by passing `dropna=False`.

In [79]:
import pandas as pd
import numpy as np

data = ['a', 'b', 'a', np.nan, 'b', 'c', 'b', np.nan, 'a']
series = pd.Series(data)

print(series.value_counts(dropna=False))
# Output:
# a      3
# b      3
# NaN    2
# c      1
# dtype: int64

a      3
b      3
NaN    2
c      1
Name: count, dtype: int64


**Example 5: .value_counts() with Data Sorting**

The `sort` parameter can be used to sort by values.

In [80]:
import pandas as pd

data = ['a', 'b', 'a', 'a', 'b', 'c', 'b', 'b', 'a']
series = pd.Series(data)

print(series.value_counts(sort=True))
# Output:
# a    4
# b    4
# c    1
# dtype: int64

a    4
b    4
c    1
Name: count, dtype: int64


Here, the output is sorted by the count in descending order. If `sort=False`, the output will be sorted by the index (i.e., the values in the original Series).

## Challenge

Following the concept of handling missing data in a Pandas Series as mentioned above, your task is to:

1. Create a pandas series from a list which includes some missing values (None or np.nan).
2. Write a function called `missing_data_handler` that takes this Series as an input and performs the following tasks:
    - Counts and prints the number of missing and non-missing values.
    - Fills the missing values with the mean of the non-missing values in the Series, and prints the updated Series.
    - Finally, it should return the count of missing values after filling them.

### Output Format

- The function should first print the count of missing and non-missing values.
- Then, it should print the Series after filling the missing values.
- Finally, it should return the count of missing values (which should be 0 after filling).

### Explanation

Consider the following code:

```python
import pandas as pd
import numpy as np

# Create a series with missing values
series = pd.Series([1, np.nan, 3, np.nan, 5]);

# Handle missing data
missing_data_handler(series)

```

The output should show:

- The count of missing and non-missing values before handling missing data.
- The Series after filling missing values with the mean.
- The count of missing values after handling them (which should be 0).

In [None]:
### WRITE YOUR CODE BELOW THIS LINE ###


### WRITE YOUR CODE ABOVE THIS LINE ###