# **Series**

A Series is a one-dimensional array-like object containing an array of data (of any NumPy data type) and an associated array of data labels, called its index. The simplest Series is formed from only an array of data:

In [23]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [4]:
obj = Series([4, 7, -9, 3, 1])

obj

0    4
1    7
2   -9
3    3
4    1
dtype: int64

The string representation of a Series displayed interactively shows the index on the left and the values on the right. Since we did not specify an index for the data, a default one consisting of the integers 0 through N - 1 (where N is the length of the data) is created. You can get the array representation and index object of the Series via its values and index attributes, respectively:


In [5]:
obj.values

array([ 4,  7, -9,  3,  1], dtype=int64)

In [6]:
obj.index

RangeIndex(start=0, stop=5, step=1)

Often it will be desirable to create a Series with an index identifying each data point:


In [10]:
obj2 = Series([3, 4, 7, -2, 1, 3], index=['b', 'k', 'c', 'q', 'r', 'q'])

obj2

b    3
k    4
c    7
q   -2
r    1
q    3
dtype: int64

Compared with a regular NumPy array, you can use values in the index when selecting single values or a set of values:


In [15]:
obj2['q']

q   -2
q    3
dtype: int64

In [17]:
obj2[['k', 'b', 'c']]

k    4
b    3
c    7
dtype: int64

In [20]:
obj2[:'c']

b    3
k    4
c    7
dtype: int64

NumPy array operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link:


In [21]:
obj2[obj2 > 0]

b    3
k    4
c    7
r    1
q    3
dtype: int64

In [22]:
obj2 * 2

b     6
k     8
c    14
q    -4
r     2
q     6
dtype: int64

In [24]:
np.exp(obj2)

b      20.085537
k      54.598150
c    1096.633158
q       0.135335
r       2.718282
q      20.085537
dtype: float64

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping of index values to data values. It can be substituted into many functions that expect a dict:

In [29]:
'b' in obj2, 'a' in obj2

(True, False)

Should you have data contained in a Python dict, you can create a Series from it by passing the dict:


In [32]:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}

obj3 = Series(sdata)

obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

When only passing a dict, the index in the resulting Series will have the dict’s keys in sorted order.

In [37]:
states = ['California', 'Ohio', 'Oregon', 'Texas', 'Loralai', 'Quetta']

obj4 = Series(sdata, index= states)

obj4

California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Loralai           NaN
Quetta            NaN
dtype: float64

In this case, 3 values found in sdata were placed in the appropriate locations, but since no value for 'California','Loralai' and, 'Quetta' was found, it appears as NaN (not a number) which is considered in pandas to mark missing or NA values. We use the terms “missing” or “NA” to refer to missing data. The isnull and notnull functions in pandas should be used to detect missing data:


In [42]:
pd.isna(obj4), pd.notnull(obj4) # isna() and isnull()have same result

(California     True
 Ohio          False
 Oregon        False
 Texas         False
 Loralai        True
 Quetta         True
 dtype: bool,
 California    False
 Ohio           True
 Oregon         True
 Texas          True
 Loralai       False
 Quetta        False
 dtype: bool)

A critical Series feature for many applications is that it automatically aligns differently-indexed data in arithmetic operations:


In [43]:
obj3, obj4

(Ohio      35000
 Texas     71000
 Oregon    16000
 Utah       5000
 dtype: int64,
 California        NaN
 Ohio          35000.0
 Oregon        16000.0
 Texas         71000.0
 Loralai           NaN
 Quetta            NaN
 dtype: float64)

In [44]:
obj3 + obj4

California         NaN
Loralai            NaN
Ohio           70000.0
Oregon         32000.0
Quetta             NaN
Texas         142000.0
Utah               NaN
dtype: float64

Both the Series object itself and its index have a name attribute, which integrates with other key areas of pandas functionality:


In [48]:
obj4.name = 'people'

obj4.index.name = 'state'

obj4

state
California        NaN
Ohio          35000.0
Oregon        16000.0
Texas         71000.0
Loralai           NaN
Quetta            NaN
Name: people, dtype: float64

A Series’s index can be altered in place by assignment:

In [53]:
obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan', 'Ali']

obj

Bob      4
Steve    7
Jeff    -9
Ryan     3
Ali      1
dtype: int64