# Lesson 16: Pandas Series


|Particulars|Description|
|-|-|
|**Topics Covered**|Creating Pandas series|
||Measures of central tendency|
||Pandas series operations|
||Comparison between Python lists, NumPy arrays, and Pandas series|
|||
|**Lesson Description**|In this class, a student will create a Pandas series.|
|||
|**Lesson Duration**|45 minutes|
|||
|**Learning Outcomes**|Create a Pandas series.|
||Retrieve an item from a Pandas series.|
||Apply basic statistical operations on a Pandas series.|



---

### Teacher-Student Tasks

Before we go ahead with more concepts on Machine Learning and Artificial Intelligence, let's first learn a bit of Data Analysis, so that you can get a better understanding of data. Every dataset tells you a story if you look at it through the right lenses.

To give you a perspective, imagine that you have data on the number of sales happening in every month of every single shop in your city. You would notice that during the festivals, the sales volume of sweets rises by a tremendous magnitude. Similarly, the sales volume of clothes, jewellery, and electronic products also rose significantly in this period.

If you have the tourism data, then you would see that a lot of people in India go on a vacation in May and June, which makes sense because schools are closed in these two months due to summer vacation.

Through data, you can observe a trend, and based on that trend you can draw meaningful insights, helping you in making decisions in your daily life, in business organizations, in medical and engineering applications, etc. 

When it comes to Data Analysis in Python, we use a module called Pandas which is specifically designed to manipulate, manage and analyze a huge amount of data by creating Pandas Series and Pandas DataFrames. 

In this lesson, we will learn about the Pandas Series.

---

#### Pandas Series 

A Pandas series is a one-dimensional array that can hold various data types. It is similar to a Python list and a NumPy array.

Without going too much into the theory, let's get started with the Pandas series right away. At the end of the class, we will learn when to use a Python list, a NumPy array, and a Pandas series.

---

#### Task 1: Python List To Pandas Series Conversion

Let's understand the Pandas series through an example. Suppose there are `30` students in your class and their weights vary in the range of `45` to `60` kg (both inclusive).

We can create a Pandas series containing the weights of the students by first creating a Python list and then converting it to a Pandas series. To create a Pandas series, you have to first import the `pandas` module using the `import` keyword. 

```
import pandas as pd
```

Here, `pd` is an alias (or nickname) for `pandas`

Then you can call the `Series()` function to convert a Python list or a NumPy array into a Pandas series:

```
weights = pd.Series([random.randint(45, 60) for i in range(30)])
```

**Note:** Unlike other functions, the `Series()` function begins with the uppercase letter `S`.

In [None]:
# S1.1: Create a Pandas series containing 30 random integers between 45 and 60.
import pandas as pd
import random
weights = pd.Series([random.randint(45,60)for i in range(30)])
weights

0     60
1     47
2     49
3     55
4     50
5     49
6     53
7     52
8     55
9     56
10    50
11    53
12    58
13    60
14    46
15    51
16    48
17    52
18    48
19    56
20    45
21    54
22    46
23    55
24    60
25    46
26    58
27    47
28    51
29    59
dtype: int64

The first column in the output represents the indices of all the items in the `weights` Pandas series. The second column contains the weights of the students. The data type of each item is an `int`.

**Note:** Ignore the `64` in the `int64` for the time being. 

Using the `Series()` function, you can convert any one-dimensional Python list into a Pandas series. Now, let's verify whether `weights` is a Pandas series or not:

In [None]:
# S1.2: Verify the type of value stored in the 'weights' variable using the 'type()' function.
type(weights)

pandas.core.series.Series

The `type()` function returns `pandas.core.series.Series` as an output that confirms that `weights` are indeed a Pandas series. 

A Pandas series can also contain items of multiple data types. Recall that in one of the previous class, we created 4 different variables to store the attributes of a planet (*Learned in Lesson:  Variables and Data Types*).

||Mercury|
|-|-|
|Diameter (km)|4879|
|Gravity ($m/s^2$)|3.7|
|Ring|No|


Let's store the name of a planet, its diameter, gravity, and whether it has a ring or not in a Python list and then convert it into a pandas series:


In [None]:
# S1.3: Create a Python list that contains planet name, diameter, gravity, and False if the planet has a ring.
# Convert the list into a Pandas series. Also, verify whether the list successfully is converted to a Pandas series or not.
planets = pd.Series(['Mercury',4879,3.7,False])
planets

0    Mercury
1       4879
2        3.7
3      False
dtype: object

Here the data type is an `object`. Pandas cannot return the data type of every individual item. Hence, it has returned `object` data type to represent one common data type for all the items.

You can also use the `size` keyword to find the number of items in a Pandas series:

In [None]:
# S1.4: Find the number of items in the 'weights' Pandas series using the 'size' keyword.
weights.size

30

So, there are `30` items in the `weights` Pandas series. 

You can also use the `shape` keyword to find the number of rows and columns in a Pandas series:

In [None]:
# S1.5: Find the number of rows and columns in the 'weights' Pandas series using the 'shape' keyword.
weights.shape

(30,)

So, there are `30` rows and `1` column in the `weights` Pandas series.

---

#### Task 2: The `mean(), min(), max()` Functions 

The `mean()` function does not take any input and returns the average value of all the items as an output.

To apply this function, you need to write the Pandas series; whose mean value you need to compute, followed by the dot (`.`) operator:


In [None]:
# S2.1: Calculate the average value of all the numbers in a Pandas series.
weights.mean()

52.3

Thus, the mean value of `weights` series is approximately `54.567`. Similarly, you can also find the minimum and maximum values in a Pandas series using the `min()` and `max()` functions:

In [None]:
# S2.2: Using the 'min()' and 'max()' functions, print the minimum and maximum values in the 'weights' Pandas series.
print(weights.min())
print(weights.max())

45
60


Thus, the minimum and maximum values in `weights` series is 46 and 60 respectively.

---

#### Task 3: The `head()` and `tail()` Functions

Sometimes instead of looking at the full dataset, we just want to look at the first few rows or the last few rows of the dataset. In such cases, we can use the `head()` and `tail()` functions.

The `head()` function shows the first five and the `tail()` function shows the last five items in a Pandas series.

Let us apply `head()` and `tail()` functions on `weights` series.

In [None]:
# S3.1: Print only the first 5 items in a Pandas series using the 'head()' function.
weights.head()

0    60
1    47
2    49
3    55
4    50
dtype: int64

The numbers in the first column in the output are the indices of each item in the Pandas series. Since we print the first five items, the indices range from `0` to `4`.

In [None]:
# S3.2: Using the 'tail()' function, print the last 5 items in the Pandas series.
weights.tail()

25    46
26    58
27    47
28    51
29    59
dtype: int64

Since we printed the last five items of the series, the indices range from `25` to `29`.

Within the `head()` and `tail()` functions, you can specify the `n` number of first items and the `n` number of last items you wish to see in a Pandas series.

Let us print the first 8 items of the `weights` series using `head()` function.

In [None]:
# S3.3: Using the 'head()' function, print the first 8 items of the weights series.
weights.head(8)

0    60
1    47
2    49
3    55
4    50
5    49
6    53
7    52
dtype: int64

Similarly, let us print the last 12 items of the `weights` series using `tail()` function.

In [None]:
# S3.4: Using the 'tail()' function, print the last 12 items of the weights series.
weights.tail(12)

18    48
19    56
20    45
21    54
22    46
23    55
24    60
25    46
26    58
27    47
28    51
29    59
dtype: int64

Thus, we can view a small sample of a series using pandas `head()` and `tail()` functions.

---

#### Task 4: Indexing A Pandas Series

Indexing a Pandas series is the same as indexing a Python list or a NumPy array. 

Let's say we want to get the weights of the students whose indices range from `13` to `21`, then you can write the variable storing the Pandas series followed by the square brackets `[]`. Inside the square brackets, you can mention the range of items you wish to retrieve from a series.

**Syntax:** `pandas_series[start_index:end_index]`

For example, to print the weights of the students whose indices range from `13` to `21`, the `start_index` would be `13` and `end_index` would be the next number after `21` i.e. `22`. Let us print these weights.

In [None]:
# S4.1: Retrieve items from a Pandas series using the indexing method.
weights[13:22]

13    60
14    46
15    51
16    48
17    52
18    48
19    56
20    45
21    54
dtype: int64

In [None]:
# S4.2: Print the items ranging from indices 17 to 27.
weights[17:28]

17    52
18    48
19    56
20    45
21    54
22    46
23    55
24    60
25    46
26    58
27    47
dtype: int64

Hence, we can select particular rows and columns of data from a series using indexing method.

---

#### Task 5: The `mode()` Function

Let's say you want to find out the weights of the most number of students in your class, then you can use the `mode()` function.

In [None]:
# S5.1: Compute the modal value in the 'weight' series.
weights.mode()

0    46
1    55
2    60
dtype: int64

**Note**: A dataset can have more than one modal value. 


---

#### Task 6: The `sort_values()` Function

We can use the `sort_values()` function to arrange the numbers in a Pandas series either in an ascending order or in descending order.

To arrange the numbers in a Pandas series in increasing order, use the `sort_values()` function with the `ascending=True` as an input.

In [None]:
# S6.1: Arrange the weights in the increasing order using the 'sort_values()' function.
weights.sort_values(ascending=True)

20    45
14    46
22    46
25    46
1     47
27    47
18    48
16    48
5     49
2     49
4     50
10    50
15    51
28    51
17    52
7     52
11    53
6     53
21    54
3     55
8     55
23    55
9     56
19    56
26    58
12    58
29    59
13    60
24    60
0     60
dtype: int64

To arrange the numbers in a Pandas series in the decreasing order, use the `sort_values()` function with the `ascending=False` as an input.

In [None]:
# Student Action: Using the 'sort_values()' function, arrange the weights in the decreasing order.
weights.sort_values(ascending=False)

0     60
24    60
13    60
29    59
12    58
26    58
19    56
9     56
23    55
8     55
3     55
21    54
6     53
11    53
7     52
17    52
28    51
15    51
10    50
4     50
2     49
5     49
16    48
18    48
27    47
1     47
25    46
22    46
14    46
20    45
dtype: int64

**Note:** The `sort_values()` function does not have any separate attribute like `descending = True` for arranging numbers in descending order. Use `ascending = False` only for descending order.

---

#### Task 7: The `median()` Function

To find the median value in a Pandas series, we can simply use the `median()` function.


In [None]:
# S7.1: Using the 'median()' function, find the median weight in the weights series.
weights.median()

52.0

Thus, the median value of `weights` series is `55.5`.

---

#### Task 8: The `value_counts()` Function

To count the number of occurrences of an item in a Pandas series, you can use the `value_counts()` function:

In [None]:
# S8.1: Count the number of times each item in the 'weights' Pandas series occurs.
weights.value_counts()

60    3
55    3
46    3
58    2
56    2
53    2
52    2
51    2
50    2
49    2
48    2
47    2
59    1
54    1
45    1
dtype: int64

**Note:** The `value_counts()` function is not available for Python lists and NumPy arrays.

There is more to the Pandas series. We will learn it in detail along with Pandas DataFrames from the next class onwards. This is just an introductory class to Pandas.

---

#### Python List vs NumPy Array vs Pandas Series

You might now wonder when to use a Python list, a NumPy array, and a Pandas series?

There are no hard rules to decide when to use which one of these data structures but as a guide you may consider the following:

1. When you just want to store data, retrieve data, and add more data, use a Python list.

2. When you want to store numerical data (one-dimensional or multidimensional) and want to perform a lot of mathematical operations, then use a NumPy array as it faster than a Python list and it is easy to create a multidimensional array using a NumPy array.

3. When you want to import data from an external file such as `TXT, XLXS, CSV, XML`, etc. then use a Pandas series. In the next class, you will learn how to import data from an external file. Additionally, Pandas allow you to interpret data in different ways. It also allows you to do complicated data extraction, manipulation, and data processing operations on a dataset. Throughout this course, we will use the Pandas library to handle data.

We will stop here. In the next class, we will learn about Pandas DataFrame which is a collection of Pandas series.

---