# Pandas Series: Walkthrough

In [1]:
import numpy as np
import pandas as pd

`Series` is a 1D labeled array capable of holding an y data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index.

In [3]:
s = pd.Series(['data'], index=['index'])

`data` can be different things:
- a Python dict
- an ndarry
- a scalar value

The pass `index` is a list of axis labels.

### From `ndarray`
If `data` is an `ndarray`, an index must be the same length as the data. In o index is passed, one will be created having values `[0, ..., len(data) - 1]`.

In [4]:
# specify the index
s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
s

a   -2.035138
b   -0.057130
c   -0.124906
d    1.067390
e   -1.470917
dtype: float64

In [5]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [7]:
# Pandas can also create a default index
pd.Series(np.random.rand(5))

0    0.505430
1    0.649689
2    0.141526
3    0.328998
4    0.425865
dtype: float64

### From `dict`
Series can be created from `dicts`.

In [8]:
d = {'b': 1, 'a': 0, 'c': 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

When data is a `dict` and an `index` is not passed, the `Series` index will be ordered by the `dict`'s insertion order. There is no sorting if you have `Python` version >= 3.6 and `Pandas` version >= 0.23.

### From `scalar` value
If data is a `scalar` value, an `index` must be provided. The value will be repeated to match the length of the `index`.

In [9]:
pd.Series(5, index=['a', 'b', 'c', 'd', 'e'])

a    5
b    5
c    5
d    5
e    5
dtype: int64

## Series is `ndarray`-like
`Series` acts very similarly to `ndarray` from `numpy` and is a valid argument to most `numpy` functions. Operations such as slicing will also slice the index.

In [10]:
s[0]

-2.035137579647807

In [11]:
s[:3]

a   -2.035138
b   -0.057130
c   -0.124906
dtype: float64

In [12]:
s[s > s.median()]

b   -0.05713
d    1.06739
dtype: float64

In [13]:
s[[4, 3, 1]]

e   -1.470917
d    1.067390
b   -0.057130
dtype: float64

In [14]:
np.exp(s)

a    0.130663
b    0.944471
c    0.882580
d    2.907780
e    0.229715
dtype: float64

In [15]:
s.dtype

dtype('float64')

In [16]:
s.to_numpy()

array([-2.03513758, -0.05713035, -0.12490551,  1.06738982, -1.47091712])

## Series is `dict`-like
A `Series` is like a fixed-size `dict` in which you can get and set values by an `index` label.

In [17]:
s['a']

-2.035137579647807

In [20]:
s['e'] = 12
s

a    -2.035138
b    -0.057130
c    -0.124906
d     1.067390
e    12.000000
dtype: float64

In [21]:
'e' in s

True

In [22]:
'f' in s

False

In [28]:
try:
  s['f']
except KeyError:
  print('KeyError: \'f\'')

KeyError: 'f'


## Vectorized operations
When working with raw `numpy` arrays, looping through value-by-value is usually not necessary. The same is true when working with `Series` in `Pandas`. `Series` can also be passed into most `numpy` methods expecting an `ndarray`.

In [29]:
s + s

a    -4.070275
b    -0.114261
c    -0.249811
d     2.134780
e    24.000000
dtype: float64

In [31]:
s * 2

a    -4.070275
b    -0.114261
c    -0.249811
d     2.134780
e    24.000000
dtype: float64

In [32]:
np.exp(s)

a         0.130663
b         0.944471
c         0.882580
d         2.907780
e    162754.791419
dtype: float64

A key difference between `Series` and `ndaray` is that operations between `Series` automatically align data based on the label. Thus, you can write computations without considering whether the `Series` involved have the same labels.

In [33]:
s1 = s[1:]
s2 = s[:-1]
s1 + s2

a         NaN
b   -0.114261
c   -0.249811
d    2.134780
e         NaN
dtype: float64

The result of an operation between unaligned `Series` will have the union of the indexes involved. If a label is not found in one `Series` or the other, the result will be marked as missing `NaN`.

## Name attribute
`Series` can also have a `name` attribute.

In [34]:
s = pd.Series(np.random.randn(5), name='something')
s

0   -0.946825
1    1.216063
2    0.123337
3    0.878028
4    0.984506
Name: something, dtype: float64

In [35]:
s.name

'something'