# [What is a Pandas Series?](#)

A Pandas Series is a **one-dimensional labeled array** capable of holding data of any type (integer, string, float, Python objects, etc.). It is one of the core data structures in the Pandas library, along with DataFrames. A Series can be thought of as a single column of data in a table or spreadsheet, with an associated array of labels called an *index*.


Key characteristics of a Series:
- **Homogeneous Data**: All elements in a Series must be of the same data type. If you attempt to create a Series with mixed data types, Pandas will automatically upcast the data to the most compatible data type.

- **Mutable**: Series objects are mutable, meaning you can change, add, or delete elements after creation.

- **Size Immutable**: While the contents of a Series can be modified, the size of a Series is fixed upon creation. To change the size, you need to create a new Series object.

- **Labeled Index**: Each element in a Series is associated with a unique label called an index. The index can be integer-based (default) or can consist of user-defined labels (e.g., strings, dates, or other hashable objects).


Comparison with NumPy Arrays and Python Lists:

NumPy Arrays:
- Like NumPy arrays, Pandas Series are homogeneous, meaning they contain elements of the same data type.
- Series objects are built on top of NumPy arrays and inherit many of their attributes and methods.
- However, Series have additional functionality, such as an associated index for labeling and alignment, which is not present in NumPy arrays.

Python Lists:
- Unlike Python lists, Pandas Series are homogeneous and cannot contain elements of different data types.
- Series objects are more memory-efficient and offer better performance for numerical computations compared to Python lists.
- Series provide a wide range of built-in methods and functionalities specifically designed for data manipulation and analysis, which are not available in Python lists.


In summary, Pandas Series are powerful one-dimensional data structures that combine the best features of NumPy arrays and Python lists, along with additional capabilities for data manipulation and analysis. They are a fundamental building block in the Pandas library and are widely used in data science and machine learning tasks.

## <a id='toc1_'></a>[Creating a Series](#toc0_)

There are several ways to create a Pandas Series. The most common methods involve using a list, NumPy array, or dictionary. You can also specify an index when creating a Series to label each element. The `pd.Series()` constructor is used to create a Series object.


### <a id='toc1_1_'></a>[Creating a Series from a List, NumPy Array, or Dictionary](#toc0_)


1. **From a Python List**:
You can create a Series by passing a Python list to the `pd.Series()` constructor.


In [1]:
import pandas as pd

data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

2. **From a NumPy Array**:
You can create a Series by passing a NumPy array to the `pd.Series()` constructor.


In [2]:
import numpy as np

data = np.array([1, 2, 3, 4, 5])
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

3. **From a Dictionary**:
You can create a Series by passing a dictionary to the `pd.Series()` constructor. The keys of the dictionary become the index labels, and the values become the Series values.


In [3]:
data = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series = pd.Series(data)
series

a    1
b    2
c    3
d    4
e    5
dtype: int64

### <a id='toc1_2_'></a>[Specifying an Index when Creating a Series](#toc0_)


By default, the `pd.Series()` constructor assigns an integer index starting from 0. However, you can specify a custom index by passing an `index` parameter to the constructor.


In [4]:
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
series

a    1
b    2
c    3
d    4
e    5
dtype: int64

### <a id='toc1_3_'></a>[Using the ‍`pd.Series()` Constructor](#toc0_)


The `pd.Series()` constructor is the primary way to create a Series object. It accepts various data types as input, such as lists, NumPy arrays, dictionaries, and scalar values.


The general syntax for creating a Series using the `pd.Series()` constructor is:

```python
pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
```

- `data`: The input data for the Series, which can be a list, NumPy array, dictionary, or scalar value.
- `index`: (optional) The index labels for the Series. If not provided, an integer index starting from 0 is assigned.
- `dtype`: (optional) The data type of the Series elements. If not provided, Pandas infers the data type automatically.
- `name`: (optional) A name for the Series, which can be useful when combining multiple Series into a DataFrame.
- `copy`: (optional) Whether to create a copy of the input data. By default, it is set to `False`.
- `fastpath`: (optional) A performance optimization parameter, which is set to `False` by default.


In [5]:
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index, name='example', dtype='float64')
series

a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
Name: example, dtype: float64

*These are the fundamental ways to create a Pandas Series. By using the `pd.Series()` constructor and specifying the input data and optional parameters, you can easily create Series objects to store and manipulate one-dimensional labeled data.*

## <a id='toc2_'></a>[Series Attributes](#toc0_)

Pandas Series objects have several attributes that provide useful information about the Series. These attributes allow you to access the underlying data, index, and other properties of the Series. Let's explore some of the commonly used Series attributes.


<img src="../images/series-properties.png" width="800">

### <a id='toc2_1_'></a>[`dtype`: Data Type of the Series Elements](#toc0_)


The `dtype` attribute returns the data type of the elements in the Series. It provides information about the type of data stored in the Series, such as integers, floats, strings, or custom data types.


In [6]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series.dtype


dtype('int64')

In this example, the `dtype` attribute indicates that the Series contains integer values of type `int64`.


### <a id='toc2_2_'></a>[`shape`: Tuple of Series Dimensions](#toc0_)


The `shape` attribute returns a tuple representing the dimensions of the Series. For a one-dimensional Series, the `shape` attribute returns a tuple with a single element indicating the length of the Series.


In [7]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series.shape

(5,)

The output `(5,)` indicates that the Series has a single dimension with a length of 5.


### <a id='toc2_3_'></a>[`size`: Number of Elements in the Series](#toc0_)


The `size` attribute returns the total number of elements in the Series. It represents the length of the Series.


In [8]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series.size

5

In this example, the `size` attribute indicates that the Series contains 5 elements.


### <a id='toc2_4_'></a>[`index`: Index Object of the Series](#toc0_)


The `index` attribute returns the index object associated with the Series. The index object contains the labels or keys used to identify each element in the Series.


In [9]:
data = [1, 2, 3, 4, 5]
index = ['a', 'b', 'c', 'd', 'e']
series = pd.Series(data, index=index)
series.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

The output shows that the Series has an index object containing the labels 'a', 'b', 'c', 'd', and 'e'.


### <a id='toc2_5_'></a>[`values`: NumPy Array of Series Values](#toc0_)

The `values` attribute returns the underlying data of the Series as a NumPy array. It allows you to access the actual values stored in the Series.


In [10]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series.values

array([1, 2, 3, 4, 5])

The output shows the values of the Series as a NumPy array.


*These are some of the important attributes of a Pandas Series. By accessing these attributes, you can gain insights into the data type, dimensions, size, index, and underlying values of the Series. These attributes are useful for inspecting and understanding the structure and content of a Series object.*

## <a id='toc3_'></a>[Vectorized Operations and Label Alignment](#toc0_)

One of the powerful features of Pandas Series is the ability to perform vectorized operations, similar to NumPy arrays. Vectorized operations allow you to perform element-wise computations on the entire Series without the need for explicit looping. Additionally, Series objects automatically align data based on their labels during operations, providing flexibility and convenience in data manipulation.


### <a id='toc3_1_'></a>[Vectorized Operations](#toc0_)


Pandas Series support various vectorized operations, such as arithmetic operations, mathematical functions, and boolean operations. These operations are applied element-wise to the Series.


In [11]:
data = [1, 2, 3, 4, 5]
series = pd.Series(data)

In [12]:
# Arithmetic operations
series + 2

0    3
1    4
2    5
3    6
4    7
dtype: int64

In [13]:
series * 3

0     3
1     6
2     9
3    12
4    15
dtype: int64

In [14]:
series ** 2

0     1
1     4
2     9
3    16
4    25
dtype: int64

In [15]:
# Mathematical functions
np.sin(series)

0    0.841471
1    0.909297
2    0.141120
3   -0.756802
4   -0.958924
dtype: float64

In [16]:
np.log(series)

0    0.000000
1    0.693147
2    1.098612
3    1.386294
4    1.609438
dtype: float64

In [17]:
# Boolean operations
series > 3

0    False
1    False
2    False
3     True
4     True
dtype: bool

In [18]:
series == 4

0    False
1    False
2    False
3     True
4    False
dtype: bool

In the above examples, the arithmetic operations (`+`, `*`, `**`) are applied element-wise to the Series, returning a new Series with the results. Similarly, mathematical functions from NumPy, such as `sin()` and `log()`, can be directly applied to the Series, computing the respective function for each element.


Boolean operations, such as comparisons (`>`, `==`), return a new Series with boolean values indicating the result of the comparison for each element.


### <a id='toc3_2_'></a>[Label Alignment](#toc0_)


When performing operations between two Series objects, Pandas automatically aligns the data based on their labels. This means that the operation is performed on the corresponding elements with the same label, regardless of their position in the Series.


In [19]:
series1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
series2 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])

series1 + series2

a    NaN
b    6.0
c    8.0
d    NaN
dtype: float64

In this example, `series1` and `series2` have different indexes. When adding these Series together, Pandas aligns the data based on the common labels ('b' and 'c') and performs the addition operation. The resulting Series will have the union of the indexes from both Series.


For labels that are present in one Series but not the other, the result will be marked as missing (`NaN`).


Label alignment is a powerful feature that allows you to perform operations on Series objects without explicitly aligning the data beforehand. It simplifies the process of working with labeled data and reduces the need for manual data alignment.


*Vectorized operations and label alignment in Pandas Series provide a concise and efficient way to perform computations on labeled data. These features, along with the ability to handle missing data, set Pandas apart from other tools for working with labeled data, making it a valuable library for data manipulation and analysis.*

## <a id='toc4_'></a>[Common Series Methods](#toc0_)

Pandas Series provide a wide range of built-in methods for data manipulation and analysis. These methods allow you to perform common operations on Series objects efficiently. Let's explore some of the frequently used Series methods.


### <a id='toc4_1_'></a>[`head()` and `tail()`: Viewing the First or Last n Elements](#toc0_)


The `head()` and `tail()` methods allow you to view the first or last `n` elements of a Series, respectively. By default, these methods return the first or last 5 elements.


In [20]:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
series = pd.Series(data)

series.head()  # First 5 elements

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [21]:
series.tail(3)  # Last 3 elements

7     8
8     9
9    10
dtype: int64

### <a id='toc4_2_'></a>[`unique()`: Returning Unique Values in a Series](#toc0_)


The `unique()` method returns an array of unique values in the Series, eliminating any duplicates.


In [22]:
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
series = pd.Series(data)

series.unique()

array([1, 2, 3, 4])

### <a id='toc4_3_'></a>[`value_counts()`: Counting Occurrences of Unique Values](#toc0_)


The `value_counts()` method returns a Series containing the counts of each unique value in the original Series. It provides a convenient way to calculate the frequency of each value.


In [23]:
data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
series = pd.Series(data)

series.value_counts()

4    4
3    3
2    2
1    1
Name: count, dtype: int64

### <a id='toc4_4_'></a>[`sort_values()`: Sorting a Series by Values](#toc0_)


The `sort_values()` method sorts the Series by its values in ascending or descending order. By default, it sorts the values in ascending order.


In [24]:
data = [4, 2, 8, 1, 9, 5]
series = pd.Series(data)

series.sort_values()

3    1
1    2
0    4
5    5
2    8
4    9
dtype: int64

In [25]:
series.sort_values(ascending=False)

4    9
2    8
5    5
0    4
1    2
3    1
dtype: int64

To sort the values in descending order, you can pass the parameter `ascending=False` to the `sort_values()` method.


### <a id='toc4_5_'></a>[`sort_index()`: Sorting a Series by Index](#toc0_)


The `sort_index()` method sorts the Series by its index labels in ascending or descending order. By default, it sorts the index labels in ascending order.


In [26]:
data = [4, 2, 8, 1, 9, 5]
index = ['b', 'd', 'a', 'e', 'c', 'f']
series = pd.Series(data, index=index)

series.sort_index()

a    8
b    4
c    9
d    2
e    1
f    5
dtype: int64

To sort the index labels in descending order, you can pass the parameter `ascending=False` to the `sort_index()` method.


*These are just a few examples of the common methods available for Pandas Series. Pandas provides a rich set of methods for data manipulation, analysis, and visualization. By leveraging these methods, you can efficiently perform various operations on Series objects and extract meaningful insights from your data.*

## <a id='toc5_'></a>[Converting Series Data Types](#toc0_)

Data type conversion is a crucial skill in data manipulation with Pandas. Understanding how to change the data type of a Series can help you optimize memory usage, perform specific operations, and ensure data consistency.


### <a id='toc5_1_'></a>[Using `astype()` for Explicit Conversion](#toc0_)


The primary method for converting data types in a Series is the `astype()` method. It allows you to explicitly specify the desired data type.


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Create a sample Series
s = pd.Series(['1', '2', '3', '4', '5'])
s.dtype

dtype('O')

In [3]:
# Convert to integer
s_int = s.astype(int)
s_int.dtype

dtype('int64')

In [4]:
# Convert to float
s_float = s.astype(float)
s_float.dtype

dtype('float64')

### <a id='toc5_2_'></a>[Handling Numeric Conversions](#toc0_)


When converting between numeric types, be aware of potential data loss or unexpected results.


In [5]:
# Create a Series with mixed numeric types
s_mixed = pd.Series([1, 2.5, 3, 4.7, 5])
s_mixed.dtype

dtype('float64')

In [6]:
# Convert to integer (note the rounding)
s_mixed_int = s_mixed.astype(int)
s_mixed_int

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [7]:
# Convert to float (no data loss)
s_mixed_float = s_mixed.astype(float)
s_mixed_float

0    1.0
1    2.5
2    3.0
3    4.7
4    5.0
dtype: float64

### <a id='toc5_3_'></a>[Converting to Datetime](#toc0_)


Pandas provides powerful datetime conversion capabilities using `pd.to_datetime()`.


In [9]:
# Create a Series with date strings
dates = pd.Series(['2023-01-01', '2023-02-15', '2023-03-30'])
dates

0    2023-01-01
1    2023-02-15
2    2023-03-30
dtype: object

In [10]:
# Convert to datetime
dates_dt = pd.to_datetime(dates)
dates_dt.dtype

dtype('<M8[ns]')

In [11]:
# Handling different date formats
mixed_dates = pd.Series(['2023-01-01', '15/02/2023', 'March 30, 2023'])
pd.to_datetime(mixed_dates, format='mixed')

0   2023-01-01
1   2023-02-15
2   2023-03-30
dtype: datetime64[ns]

### <a id='toc5_4_'></a>[Converting to Categorical Data](#toc0_)


Categorical data can be memory-efficient and useful for certain operations. You will learn about categorical data in more detail in the later lectures.


In [12]:
# Create a Series with repeated values
colors = pd.Series(['red', 'blue', 'green', 'red', 'blue', 'red', 'green'])
colors

0      red
1     blue
2    green
3      red
4     blue
5      red
6    green
dtype: object

In [13]:
# Convert to categorical
colors_cat = colors.astype('category')
colors_cat.dtype

CategoricalDtype(categories=['blue', 'green', 'red'], ordered=False, categories_dtype=object)

In [14]:
# Examine categories
colors_cat.cat.categories

Index(['blue', 'green', 'red'], dtype='object')

In [15]:
# Get category codes
colors_cat.cat.codes

0    2
1    0
2    1
3    2
4    0
5    2
6    1
dtype: int8

For error handling, you can use the errors parameter to handle errors during conversion. The errors parameter can take the following values:
- `raise`: This is the default value and raises an exception if an error occurs.
- `ignore`: This ignores errors and returns the original input.
- `coerce`: This converts errors to `NaN` values.

In [18]:
# Series with unconvertible value
s_with_error = pd.Series(['1', '2', '3', 'four', '5'])
s_with_error

0       1
1       2
2       3
3    four
4       5
dtype: object

In [23]:
s_with_error.astype(int, errors='ignore')

0       1
1       2
2       3
3    four
4       5
dtype: object

## <a id='toc6_'></a>[Conclusion](#toc0_)

In this lecture:
- We explored the fundamentals of Pandas Series, a powerful one-dimensional data structure in the Pandas library.

- We learned that Series are similar to NumPy arrays but with additional functionalities and flexibility.

- We covered the key characteristics of Series, including their homogeneous nature, mutability, size immutability, and labeled index.

- We also compared Series with NumPy arrays and Python lists, highlighting the advantages of using Series for data manipulation and analysis.

- We discussed various ways to create a Series, such as from a Python list, NumPy array, or dictionary, and how to specify custom index labels.

- We explored the `pd.Series()` constructor and its parameters for creating Series objects.

- We dived into the important attributes of Series, including `dtype` for accessing the data type, `shape` for getting the dimensions, `size` for obtaining the number of elements, `index` for retrieving the index object, and `values` for accessing the underlying data as a NumPy array.

- We learned about vectorized operations in Series, which allow for efficient element-wise computations without explicit looping.

- We also explored the concept of label alignment, where Pandas automatically aligns data based on their labels during operations, providing flexibility and convenience.

- Lastly, we explored some common Series methods, such as `head()` and `tail()` for viewing the first or last `n` elements, `unique()` for returning unique values, `value_counts()` for counting occurrences of unique values, `sort_values()` for sorting Series by values, and `sort_index()` for sorting Series by index labels.

Throughout this lecture, we provided code examples and explanations to illustrate the concepts and techniques related to Pandas Series. We emphasized the importance of understanding Series as a fundamental building block in the Pandas library for data manipulation and analysis.


As we continue our journey in the Pandas Fundamentals chapter, we will build upon this knowledge of Series and explore more advanced topics, such as data indexing, selection, and manipulation using Pandas DataFrames.


By mastering the concepts and techniques covered in this lecture, you will be well-equipped to work with Pandas Series effectively and leverage their capabilities in your data analysis and machine learning projects.