In [2]:
# import the libraries 
import pandas as pd

## Create a series from a list

In [12]:
data = [10,20,30,40]
series = pd.Series(data=data)

series

0    10
1    20
2    30
3    40
dtype: int64

#### Explanation:
- The data list contains the values [10, 20, 30, 40].
- The index of the Series is automatically generated as 0, 1, 2, 3, corresponding to each value.

#### NOTE
- The index is not part of the values - the index is called *axis*
- The values of the index is called the *axis labels*

Thus, a Series has three attributes namely:
- values 
- index
- name (optional) - we have not asigned a name to our series yet! 

In [14]:
# give a name to the series - optional in nature 
series.name = 'Basic Series'

series

0    10
1    20
2    30
3    40
Name: Basic Series, dtype: int64

In [15]:
# let us look at the inde of the series 
series.index

RangeIndex(start=0, stop=4, step=1)

In [6]:
# First level of abstraction for index - referenece data points
series[1]

20

### Custom Indexing:
The index itself acts like a built-in data structure, which you can modify independently of the values. You can change it, reset it, or even create custom labels.

In [16]:
# Creating a Series with a custom index
data = [10, 20, 30, 40]
index = ['a', 'b', 'c', 'd']

series = pd.Series(data=data, index=index)

print(series)

a    10
b    20
c    30
d    40
dtype: int64


In [17]:
series['b'] # reference data points 

20

In [18]:
# what's the index now?
series.index

Index(['a', 'b', 'c', 'd'], dtype='object')

## Create a series with custom index from a dictionary

In [9]:
# Creating a dictionary
data_dict = {'a': 10, 'b': 20, 'c': 30, 'd': 40}

# Converting the dictionary to a Series
series_from_dict = pd.Series(data_dict)

print(series_from_dict)

a    10
b    20
c    30
d    40
dtype: int64


#### Question: Does the data need to be homegenous?

In [20]:
# The actual data (or values) for a series does not have to be numeric or homogeneous
data_dict = {'a': 10, 'b': 'Harry Potter', 'c': False, 'd': 'Lionel Messi'}

# Converting the dictionary to a Series
series_from_dict = pd.Series(data_dict)

print(series_from_dict)

a              10
b    Harry Potter
c           False
d    Lionel Messi
dtype: object


#### NOTE

The datatype of the Series is now *object* - i.e. a python object.

- The object data type is also used for a series with string values. In addition, it is also used for values that have heterogeneous or mixed types.

#### Question: How to incorporate NULL values?

In [26]:
import numpy as np

# create a series with null values 
nan_series = pd.Series(data=[12,20,30,np.nan])

nan_series

0    12.0
1    20.0
2    30.0
3     NaN
dtype: float64

#### How to get the size of array with missing values?

In [27]:
# get the size of array with missing values - use the size attribute 
nan_series.size

4

In [29]:
# get the size of the array excluding the missing values - use the count method 
nan_series.count()

3

NOTE - the `np.nan` used in the above code is the same as `NULL` is SQL !

## Similarity with numpy array

In [30]:
# creating a numpy array
numpy_series = np.array([10,20,30,40])
numpy_series

array([10, 20, 30, 40])

In [32]:
numpy_series[0]

10

Both the numpy array and the series have the *boolean array* concept. We can use this concept to filter the series!

In [34]:
# consider the following series
series = pd.Series(data=[12,45,67,88,34,54,76,99,7,8,10])

# find the mean of the series 
mean_series = np.mean(series)
print(f"The mean of the series is {mean_series}")

# create the boolean array 
bool_series = series > mean_series
print("Boolean Series (True when value greater than mean): ")
print(bool_series)

# use this boolean series to filer the original series 
print("Using the boolean series to filter the original :")
print(series[bool_series])

The mean of the series is 45.45454545454545
Boolean Series (True when value greater than mean): 
0     False
1     False
2      True
3      True
4     False
5      True
6      True
7      True
8     False
9     False
10    False
dtype: bool
Using the boolean series to filter the original :
2    67
3    88
5    54
6    76
7    99
dtype: int64


As you can see, the filtering has taken place and we have a new series which has only those values that are greater than the mean !

## Categorical Data

### Creating a Categorical Series Using dtype="category"
You can create a categorical Series directly by specifying the dtype as "category" in the Series constructor.

In [35]:
categories = pd.Series(data=['apple','banana','orange','apple'],
                       dtype='category')

print(categories)
print(f"Categories : {categories.cat.categories}")

0     apple
1    banana
2    orange
3     apple
dtype: category
Categories (3, object): ['apple', 'banana', 'orange']
Categories : Index(['apple', 'banana', 'orange'], dtype='object')


### Converting an Existing Series to a Categorical Series Using .astype("category")
You can also convert an existing Series to categorical by calling the .astype("category") method:

In [36]:
# Creating a normal Series
data = ["dog", "cat", "dog", "bird", "cat"]
series = pd.Series(data)

# Converting the Series to categorical
categorical_series = series.astype('category')

print(categorical_series)

0     dog
1     cat
2     dog
3    bird
4     cat
dtype: category
Categories (3, object): ['bird', 'cat', 'dog']


#### Benefits of Using Categorical Data:
- Less Memory: Categorical data uses less memory than regular string data because the categories are stored as numerical codes internally.
- Improved Performance: Operations on categorical data are faster compared to working with raw strings.
- Enforced Membership: Categorical data ensures that the values in your Series belong to a predefined set of categories.
- Ordering: You can impose an order on categories, making it useful for ranked or hierarchical data.

In [37]:
# Creating an ordered categorical data 

lmh = ['low','medium','high','medium','low','high','high','low','medium','low']

ordered_categories = pd.Series(data=lmh,
                               dtype=pd.CategoricalDtype(
                                   categories=['low','medium','high'],
                                   ordered=True
                               ))

# Display the Series with its ordered categories
print(ordered_categories)
print(f"Is ordered : {ordered_categories.cat.ordered}")

0       low
1    medium
2      high
3    medium
4       low
5      high
6      high
7       low
8    medium
9       low
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']
Is ordered : True


In [38]:
# filter wtth the ordering 
ordered_categories[ordered_categories>'low']

1    medium
2      high
3    medium
5      high
6      high
8    medium
dtype: category
Categories (3, object): ['low' < 'medium' < 'high']

### Reordering Categories 
To reorder categories in a Pandas categorical Series, you can use the cat.reorder_categories() method. This allows you to rearrange the categories in any order you like. Additionally, if the categorical data is ordered, you can also change the order to define a new sorting or ranking scheme.

Here’s how you can reorder categories in a Pandas Series.

#### Example 1: Reordering Categories
Let's create a categorical Series and reorder its categories:

In [40]:
# Creating a categorical Series
data = pd.Series(data = ["medium", "low", "high", "medium", "low"], 
                 dtype = pd.CategoricalDtype(
                     categories=["low", "medium", "high"], 
                     ordered=True
                     )
                 )

# Reordering the categories
reordered_data = data.cat.reorder_categories(["high", "medium", "low"], ordered=True)

# Display the reordered Series
print(reordered_data)
print("\nReordered categories:", reordered_data.cat.categories)

0    medium
1       low
2      high
3    medium
4       low
dtype: category
Categories (3, object): ['high' < 'medium' < 'low']

Reordered categories: Index(['high', 'medium', 'low'], dtype='object')


#### Example 2: Reordering and Keeping Categories Unordered
If your data is unordered, you can still reorder categories without imposing an order on them:

In [42]:
# Reordering the categories without imposing an order
reordered_unordered = data.cat.reorder_categories(["high", "medium", "low"], ordered=False)

# Display the reordered Series
print(reordered_unordered)
print("\nReordered categories (unordered):", reordered_unordered.cat.categories)

0    medium
1       low
2      high
3    medium
4       low
dtype: category
Categories (3, object): ['high', 'medium', 'low']

Reordered categories (unordered): Index(['high', 'medium', 'low'], dtype='object')


#### END
This marks the end of this notebook!