# Series

A Series is really a 1D NumPy array under the hood. It consists of a NumPy array coupled with an array of labels. Just like a NumPy array, a series can be wholly composed of any data type. The labels are together called the index of the series. A series consists of two components—1D data and the index.

The general construct for creating a series data structure is as follows:

In [1]:
import pandas as pd
import numpy as np

index = np.arange(1,11)
data = np.random.rand(10)
s = pd.Series(data, index=index)


In [3]:
s[1] = 0
data

array([0.        , 0.06614176, 0.25208197, 0.73551032, 0.85779239,
       0.45176961, 0.40112904, 0.99204802, 0.24115998, 0.20428025])

Here, data can be one of the following:

- An array
- Any kind of iterables
- A Python dictionary
- A scalar value

> **Note:** If an index is not specified, the following default index `[0,... n-1]` will be created, where `n` is the length of the data.

In [4]:
from array import array

# iterable: array
v = pd.Series(array('B', [1, 2, 3]))
v.index = [1,2,3]
v

1    1
2    2
3    3
dtype: int64

In [5]:
import calendar as cal

# iterable: list
monthNames = [cal.month_name[i] for i in np.arange(1,6)]
print(monthNames)
# print(monthNames)
months = pd.Series(np.arange(1, 6), index = monthNames)
months

['January', 'February', 'March', 'April', 'May']


January     1
February    2
March       3
April       4
May         5
dtype: int64

When a dictionary is used to create a Series, the keys form the index, and the values form the 1D data of the Series:

In [12]:
currDict = {
    'US' : 'dollar',
    'UK' : 'pound',
    'Germany': 'euro',
    'Mexico':'peso',
    'Nigeria':'naira',
    'China':'yuan',
    'Japan':'yen'
}

x = pd.Series(currDict)
x['UK'::2]

UK        pound
Mexico     peso
China      yuan
dtype: object

The index of a pandas Series structure is of type `pandas.core.index.Index‍`

If an index is also specified when creating the Series, then this specified index setting overrides the dictionary keys. If the specified index contains values that are not keys in the original dictionary, `NaN` is appended against that index in the `Series`:

In [15]:
stock_prices = {
    'GOOG': 1180.97,
    'FB': 62.57,
    'TWTR': 64.50,
    'AMZN': 358.69,
    'AAPL': 500.6
}
stock_price_series = pd.Series(
    stock_prices,
    index = ['GOOG','FB','YHOO','TWTR','AMZN','AAPL'],
    name = 'stockPrices',
)
stock_price_series

GOOG    1180.97
FB        62.57
YHOO        NaN
TWTR      64.50
AMZN     358.69
AAPL     500.60
Name: stockPrices, dtype: float64

Note that a Series also has a name attribute that can be set as shown in the preceding snippet. The name attribute is useful in tasks such as combining Series objects into a DataFrame structure (DataFrame will be introduced later).

A Series can also be initialized with just a scalar value. For scalar data, an index must be provided. The value will be repeated for as many index values as possible. One possible use of this method is to provide a quick and dirty method of initialization, with the Series structure to be filled in later.

In [88]:
# scalar
pd.Series(1, index=range(5))

0    1
1    1
2    1
3    1
4    1
dtype: int64

## Operations on Series

The behavior of a Series is very similar to that of NumPy arrays, discussed previously, with one caveat being that an operation such as slicing also slices the index of the series.

### Assignment

Values can be set and accessed using the index label in a dictionary-like manner:

In [90]:
stock_price_series['GOOG']

1180.97

In [17]:
stock_price_series['GOOG'] = 1200.0
stock_price_series

GOOG    1200.00
FB        62.57
YHOO        NaN
TWTR      64.50
AMZN     358.69
AAPL     500.60
Name: stockPrices, dtype: float64

Just as in the case of `dict`, `KeyError` is raised if you try to retrieve a missing label:

In [7]:
stock_price_series['MSFT']

KeyError: 'MSFT'

This error can be avoided by explicitly using get as follows:

In [8]:
stock_price_series.get('MSFT', np.NaN)

nan

In [97]:
stock_price_series

GOOG    1200.00
FB        62.57
YHOO        NaN
TWTR      64.50
AMZN     358.69
AAPL     500.60
Name: stockPrices, dtype: float64

### Slicing

The slice operation behaves the same way as a NumPy array. Slicing can be done using the index numbers as shown in the following code:

In [105]:
stock_price_series[:4]

GOOG    1200.00
FB        62.57
YHOO        NaN
TWTR      64.50
Name: stockPrices, dtype: float64

In [19]:
stock_price_series[stock_price_series > 100]

GOOG    1200.00
AMZN     358.69
AAPL     500.60
Name: stockPrices, dtype: float64

### Other Operations

Arithmetic and statistical operations can be applied, just like for a NumPy array. Such operations take place in a vectorized mode in a Series, just as in NumPy arrays, and do not require to be looped through:

In [20]:
stock_price_series

GOOG    1200.00
FB        62.57
YHOO        NaN
TWTR      64.50
AMZN     358.69
AAPL     500.60
Name: stockPrices, dtype: float64

In [157]:
np.mean(stock_price_series)

437.27200000000005

In [158]:
stock_price_series.mean()

437.27200000000005

In [21]:
np.std(stock_price_series)

417.4446361087899

In [22]:
stock_price_series.std()

466.7172915909588

In [161]:
np.std(stock_price_series)

417.4446361087899

In [11]:
np.std(stock_price_series, ddof=1)

458.95544612739917

Elementwise operations can also be performed on a Series:

In [167]:
stock_price_series / stock_price_series

0    1.0
1    1.0
2    NaN
3    1.0
4    1.0
5    1.0
Name: stockPrices, dtype: float64

An important feature of a Series is that data is automatically aligned based on the label:

In [164]:
stock_price_series[1:]

1     62.57
2       NaN
3     64.50
4    358.69
5    500.60
Name: stockPrices, dtype: float64

In [165]:
stock_price_series[:-2]

0    1200.00
1      62.57
2        NaN
3      64.50
Name: stockPrices, dtype: float64

In [166]:
stock_price_series[1:] + stock_price_series[:-2]

0       NaN
1    125.14
2       NaN
3    129.00
4       NaN
5       NaN
Name: stockPrices, dtype: float64

Thus, we can see that for non-matching labels, `NaN` is inserted. The default behavior is that the union of the indexes is produced for unaligned Series structures. This is preferable as information is preserved rather than lost. 