# Chapter 4: Series Introduction

In [31]:
import pandas as pd
import numpy as np

- Series is used to model one-dimensional data

In [1]:
series = {
    'index': [0, 1, 2, 3],
    'data': [145, 142, 38, 13],
    'name': "songs"
}

In [2]:
series

{'index': [0, 1, 2, 3], 'data': [145, 142, 38, 13], 'name': 'songs'}

In [6]:
def get(series, idx):
    value_idx = series['index'].index(idx)
    return series['data'][value_idx]

In [7]:
get(series, 1)

142

## 4.1 The Index Abstraction

- ``Index`` is the core feature of pandas' data structures
- Many of operations performed on a ``Series`` operate directly on the index

In [8]:
songs = {
    'index': ["Paul", "John", "George", "Ringo"],
    'data': [145, 142, 38, 13],
    'name': 'counts'
}

In [10]:
songs['index'].index('John')

1

In [14]:
get(songs, "John")

142

## 4.2 The Pandas Series

- Series is one-dimensional though it looks like it is two-dimensional
- Generic name for an index is an ``axis``
- To get the best speed and to leverage vectorized operations, the values should be of the same type though not required

In [15]:
songs2 = pd.Series([145, 142, 38, 13],
                    name="counts")


In [16]:
songs2

0    145
1    142
2     38
3     13
Name: counts, dtype: int64

In [17]:
songs2.index

RangeIndex(start=0, stop=4, step=1)

In [18]:
# Index can also be string based
songs3 = pd.Series([145, 142, 38, 13],
                  name="counts",
                  index=["Paul", "John", "George", "Ringo"])

In [19]:
songs3

Paul      145
John      142
George     38
Ringo      13
Name: counts, dtype: int64

In [20]:
songs3.index

Index(['Paul', 'John', 'George', 'Ringo'], dtype='object')

## 4.3 The NaN value

- When pandas determines that a series holds numeric values but cannot find a number to represent an entry, it will use ``NaN``
- The type of series is float64 and not int64 because it supports ``NaN``
- ``None``, ``NaN``, ``nan``, ``<NA>`` and null all refers to empty or missing data found in a pandas series or dataframe

In [23]:
nan_series = pd.Series([2, np.nan],
                    index=["Ono", "Clapton"])

In [24]:
nan_series

Ono        2.0
Clapton    NaN
dtype: float64

In [25]:
# count method disregards NaN
nan_series.count()

1

In [28]:
# inspect number of entries (including missing values)
nan_series.size

2

## 4.4 Optional Integer Support for NaN

- int64 type does not support missing data
- We can pass dtype=int64

In [29]:
nan_series2 = pd.Series([2, None],
                        index=['Ono', 'Clapton'],
                        dtype='Int64')

In [30]:
nan_series2

Ono           2
Clapton    <NA>
dtype: Int64

## 4.5 Similar to NumPy

- Series behaves similarly to numpy array

In [37]:
# Both respond to index operations
numpy_ser = np.array([145, 142, 38, 13])
print(numpy_ser[1])
print(songs3[1])

142
142


In [39]:
# Both have common methods
print(songs3.mean())
print(numpy_ser.mean())

84.5
84.5


In [42]:
# Both have a notion of boolean array
mask = songs3 > songs3.median()
print(mask)

Paul       True
John       True
George    False
Ringo     False
Name: counts, dtype: bool


In [43]:
# We can filter using this mask
songs3[mask]

Paul    145
John    142
Name: counts, dtype: int64

In [44]:
songs3[songs3 > songs3.median()]

Paul    145
John    142
Name: counts, dtype: int64

## 4.6 Categorical Data

When we load data, we can indicate that the data is categorical. Benefits of categorical values:
- Use less memory than strings
- Improve performance
- Can have an ordering
- Can perform operations on categories
- Enforce membership on values

### Ordering

In [47]:
s = pd.Series(["m", "l", "xs", "s", "xl"], dtype='category')
s

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['l', 'm', 's', 'xl', 'xs']

In [48]:
# ordering
s.cat.ordered

False

In [52]:
# convert non-categorical series to an ordered category
s2 = pd.Series(["m", "l", "xs", "s", "xl"])
size_type = pd.api.types.CategoricalDtype(
    categories=['s', 'm', 'l'], ordered=True)
s3 = s2.astype(size_type)

In [53]:
s3

0      m
1      l
2    NaN
3      s
4    NaN
dtype: category
Categories (3, object): ['s' < 'm' < 'l']

In [54]:
s3.cat.ordered

True