In [8]:
import numpy as np
import pandas as pd

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create Series is to call:

In [9]:
s = pd.Series(data, index=index)

NameError: name 'data' is not defined

data can be many different things:

    a Python dict
    an ndarray
    a scalar value|

The passed index is a list of axis labels.

From ndarray

If data is an ndarray, an index must be the same length as the data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

In [None]:
In [3]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a    0.564589
b    0.183707
c   -1.070957
d   -0.043442
e   -0.492080
dtype: float64

In [None]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [None]:
pd.Series(np.random.randn(5))

0   -0.580674
1   -0.711509
2   -0.302701
3    0.314411
4    0.976955
dtype: float64

Pandas supports non-unique index values. If an operation that does not support duplicate index values is attempted, an exception will be raised at that time.

From dict

Series can be created from dicts:

In [None]:
d = {'b': 1, 'a': 0, 'c': 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

When data is a dict, and an index is not passed, the Series index will be ordered by the dict’s insertion order. There is no sorting if you have Python version >= 3.6 and Pandas version >= 0.23.

From scalar value

If data is a scalar value, an index must be provided. The value will be repeated to match the length of the index.


In [10]:
pd.Series(5., index=['a','b','c','d','e'])

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64

Series is ndarray-like

Series acts very similarly to a ndarray from NumPy and is a valid argument to most NumPy functions. Operations such as slicing will also slice the index.

In [11]:
s[0]

  s[0]


0.5645889572756084

In [14]:
s[:3]

a    0.564589
b    0.183707
c   -1.070957
dtype: float64

In [15]:
s[s > s.median()]

a    0.564589
b    0.183707
dtype: float64

In [16]:
s[[4,3,1]]

  s[[4,3,1]]


e   -0.492080
d   -0.043442
b    0.183707
dtype: float64

In [17]:
np.exp(s)

a    1.758725
b    1.201664
c    0.342681
d    0.957488
e    0.611354
dtype: float64

Each series has a dtype.

In [18]:
s.dtype

dtype('float64')

While Series is ndarray-like, if you need an actual ndarray, then use Series.to_numpy()

In [19]:
s.to_numpy()

array([ 0.56458896,  0.18370731, -1.07095651, -0.04344204, -0.49207983])

Series is dict-like

A Series is like a fixed-size dict in which you can get and set values by an index label.

In [20]:
s['a']

0.5645889572756084

In [21]:
s['e'] = 12

In [22]:
s

a     0.564589
b     0.183707
c    -1.070957
d    -0.043442
e    12.000000
dtype: float64

In [23]:
'e' in s

True

In [24]:
'f' in s

False

If a label is not contained, an exception is raised:

In [25]:
s['f']

KeyError: 'f'

Vectorized operations

When working with raw NumPy arrays, looping through value-by-value is usually not necessary. The same is true when working with Series in Pandas. Series can also be passed into most NumPy methods expecting an ndarray.

In [26]:
s + s

a     1.129178
b     0.367415
c    -2.141913
d    -0.086884
e    24.000000
dtype: float64

In [27]:
s * 2

a     1.129178
b     0.367415
c    -2.141913
d    -0.086884
e    24.000000
dtype: float64

In [28]:
np.exp(s)

a         1.758725
b         1.201664
c         0.342681
d         0.957488
e    162754.791419
dtype: float64

A key difference between Series and ndarray is that operations between Series automatically align data based on the label. Thus, you can write computations without considering whether the Series involved have the same labels.

In [29]:
s1 = s[1:]

In [30]:
s2 = s[:-1]

In [31]:
s1 + s2

a         NaN
b    0.367415
c   -2.141913
d   -0.086884
e         NaN
dtype: float64

The result of an operation between unaligned Series will have the union of the indexes involved. If a label is not found in one Series or the other, the result will be marked as missing NaN.

Name attribute

Series can also have a name attribute.

In [32]:
s = pd.Series(np.random.randn(5), name='something')

In [33]:
s

0   -0.164863
1   -1.097121
2   -2.162601
3    0.789277
4    0.182634
Name: something, dtype: float64

In [34]:
s.name

'something'