# Working with Pandas Series

## Creating Series

In [63]:
import numpy as np
import pandas as pd

### From array

#### `index`: must be same size as `ndarray`/`list`

In [64]:
a = pd.Series([1, 3, 5, 7], index=['a', 'c', 'd', 'b'])
a

a    1
c    3
d    5
b    7
dtype: int64

#### No `index`: default integer index starting at 0

In [65]:
b = pd.Series([1, 3, 5, np.nan, 6, 8])
b

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

### From dict

#### No `index`: use keys

In [66]:
c = pd.Series({"b": 1, "a": 0, "c": 2})
c

b    1
a    0
c    2
dtype: int64

#### `index`: pull corresponding values from data

In [67]:
d = pd.Series({"b": 1, "a": 0, "c": 2}, index=["b", "d", "c"])
d

b    1.0
d    NaN
c    2.0
dtype: float64

### From scalar

#### `index` (required): set every value to data value

In [68]:
e = pd.Series(2, index=["a", "b", "c"])
e

a    2
b    2
c    2
dtype: int64

## Working with Series

### Series is ndarray-like

`Series` acts similarly to a `ndarray` and is a valid argument to most NumPy functions.
Operations such as slicing will also slice the index.

In [69]:
f = pd.Series(range(5), index=['a', 'b', 'c', 'd', 'e'])
f

a    0
b    1
c    2
d    3
e    4
dtype: int64

In [70]:
f.iloc[0]

np.int64(0)

In [71]:
f[:2]

a    0
b    1
dtype: int64

In [72]:
f > f.median()

a    False
b    False
c    False
d     True
e     True
dtype: bool

In [73]:
f[f > f.median()]

d    3
e    4
dtype: int64

In [74]:
f.iloc[[4, 3, 1]]

e    4
d    3
b    1
dtype: int64

In [75]:
np.exp(f)

a     1.000000
b     2.718282
c     7.389056
d    20.085537
e    54.598150
dtype: float64

### Series is dict-like

You can get and set values by index label.

In [76]:
f['a']

np.int64(0)

In [77]:
f['a'] = 100
f

a    100
b      1
c      2
d      3
e      4
dtype: int64

In [78]:
try:
    f['f']
except KeyError as e:
    print(repr(e))

KeyError('f')


In [79]:
f.get('f', np.nan)

nan

In [80]:
f.a

np.int64(100)

## Working with `Series`

### Vector operations

You can do operations on a `Series` similar to working with `ndarray`.

In [81]:
f + f

a    200
b      2
c      4
d      6
e      8
dtype: int64

In [82]:
f * 3 + [0, 0.25, 0.5, 0.75, 1] - 1

a    299.00
b      2.25
c      5.50
d      8.75
e     12.00
dtype: float64

When operating with non-`Series` sequences, the length must match.

### Label alignment

`Series` automatically align the data based on label.

In [83]:
g = pd.Series(.5, index=['b', 'c', 'e', 'f'], name='g')
f + g

a    NaN
b    1.5
c    2.5
d    NaN
e    4.5
f    NaN
dtype: float64

## `name` attribute

You can label any `Series` with a `name`

In [84]:
h = pd.Series(np.random.randn(5), name="random series")
h

0   -0.087819
1   -0.130121
2    0.802910
3    0.374418
4   -1.616428
Name: random series, dtype: float64

In [85]:
h.name

'random series'

You can change the name of a `Series` by setting the `name` attribute.

In [86]:
h.name = 'Series 1'
h

0   -0.087819
1   -0.130121
2    0.802910
3    0.374418
4   -1.616428
Name: Series 1, dtype: float64

In a `DataFrame`, the `Series` name will be the column label.

## Methods

You can chain methods that return a `Series`. 
There are lots of common math and comparison methods.

In [87]:
i = h.mul(10).round(1)
i

0    -0.9
1    -1.3
2     8.0
3     3.7
4   -16.2
Name: Series 1, dtype: float64

You can apply a custom function and do aggregating/grouping.

In [88]:
i.apply(lambda x: x ** 2).groupby(i < 0).mean()

Series 1
False    38.845000
True     88.313333
Name: Series 1, dtype: float64

You can sort by index or values

In [89]:
i = i.sort_values()
i

4   -16.2
1    -1.3
0    -0.9
3     3.7
2     8.0
Name: Series 1, dtype: float64

In [90]:
i.rank()

4    1.0
1    2.0
0    3.0
3    4.0
2    5.0
Name: Series 1, dtype: float64

And there's so much more you can do.

## Accessors

You can also access a lot of type-specific methods for strings, datetimes and some Pandas-specific dtypes.

In [91]:
j = pd.Series(['Yes', 'no', 'y', '', 'NO'], name='votes')
j

0    Yes
1     no
2      y
3       
4     NO
Name: votes, dtype: object

In [92]:
j = j.str.upper()
j

0    YES
1     NO
2      Y
3       
4     NO
Name: votes, dtype: object

In [93]:
j[j.str.contains('Y')] = 'Yes'
j[j.str.contains('N')] = 'No'
j

0    Yes
1     No
2    Yes
3       
4     No
Name: votes, dtype: object

In [94]:
j[~j.isin(['Yes', 'No'])] = 'Unknown'
j

0        Yes
1         No
2        Yes
3    Unknown
4         No
Name: votes, dtype: object