# Working with Pandas Series

## Creating Series

In [1]:
import numpy as np
import pandas as pd

### From array

#### `index`: must be same size as `ndarray`/`list`

In [7]:
a = pd.Series([1, 3, 5, 7], index=['a', 'c', 'd', 'b'])
a

a    1
b    7
c    3
d    5
dtype: int64

#### No `index`: default integer index starting at 0

In [8]:
b = pd.Series([1, 3, 5, np.nan, 6, 8])
b

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

### From dict

#### No `index`: use keys

In [9]:
c = pd.Series({"b": 1, "a": 0, "c": 2})
c

b    1
a    0
c    2
dtype: int64

#### `index`: pull corresponding values from data

In [10]:
d = pd.Series({"b": 1, "a": 0, "c": 2}, index=["b", "d", "c"])
d

b    1.0
d    NaN
c    2.0
dtype: float64

### From scalar

#### `index` (required): set every value to data value

In [11]:
e = pd.Series(2, index=["a", "b", "c"])
e

a    2
b    2
c    2
dtype: int64

## Working with Series

### Series is ndarray-like

`Series` acts similarly to a `ndarray` and is a valid argument to most NumPy functions.
Operations such as slicing will also slice the index.

In [12]:
f = pd.Series(range(5), index=['a', 'b', 'c', 'd', 'e'])
f

a    0
b    1
c    2
d    3
e    4
dtype: int64

In [14]:
f.iloc[0]

0

In [15]:
f[:2]

a    0
b    1
dtype: int64

In [17]:
f > f.median()

a    False
b    False
c    False
d     True
e     True
dtype: bool

In [18]:
f[f > f.median()]

d    3
e    4
dtype: int64

In [20]:
f[[4, 3, 1]]

e    4
d    3
b    1
dtype: int64

In [21]:
np.exp(f)

a     1.000000
b     2.718282
c     7.389056
d    20.085537
e    54.598150
dtype: float64

### Series is dict-like

You can get and set values by index label.

In [22]:
f['a']

0

In [23]:
f['a'] = 100
f

a    100
b      1
c      2
d      3
e      4
dtype: int64

In [24]:
try:
    f['f']
except KeyError as e:
    print(repr(e))

KeyError('f')


In [25]:
f.get('f', np.nan)

nan

In [29]:
f.a

array([100,   1,   2,   3,   4])

## Working with `Series`

### Vector operations

You can do operations on a `Series` similar to working with `ndarray`.

In [30]:
f + f

a    200
b      2
c      4
d      6
e      8
dtype: int64

In [31]:
f * 3 + [0, 0.25, 0.5, 0.75, 1] - 1

a    299.00
b      2.25
c      5.50
d      8.75
e     12.00
dtype: float64

When operating with non-`Series` sequences, the length must match.

### Label alignment

`Series` automatically align the data based on label.

In [32]:
g = pd.Series(.5, index=['b', 'c', 'e', 'f'], name='g')
f + g

a    NaN
b    1.5
c    2.5
d    NaN
e    4.5
f    NaN
dtype: float64

## `name` attribute

You can label any `Series` with a `name`

In [33]:
h = pd.Series(np.random.randn(5), name="random series")
h

0   -1.992051
1    1.220770
2    0.047314
3    1.492991
4    0.032169
Name: random series, dtype: float64

In [34]:
h.name

'random series'

You can change the name of a `Series` by setting the `name` attribute.

In [35]:
h.name = 'Series 1'
h

0   -1.992051
1    1.220770
2    0.047314
3    1.492991
4    0.032169
Name: Series 1, dtype: float64

In a `DataFrame`, the `Series` name will be the column label.

## Methods

You can chain methods that return a `Series`. 
There are lots of common math and comparison methods.

In [36]:
i = h.mul(10).round(1)
i

0   -19.9
1    12.2
2     0.5
3    14.9
4     0.3
Name: Series 1, dtype: float64

You can apply a custom function and do aggregating/grouping.

In [37]:
i.apply(lambda x: x ** 2).groupby(i < 0).mean()

Series 1
False     92.7975
True     396.0100
Name: Series 1, dtype: float64

You can sort by index or values

In [38]:
i = i.sort_values()
i

0   -19.9
4     0.3
2     0.5
1    12.2
3    14.9
Name: Series 1, dtype: float64

In [39]:
i.rank()

0    1.0
4    2.0
2    3.0
1    4.0
3    5.0
Name: Series 1, dtype: float64

And there's so much more you can do.

## Accessors

You can also access a lot of type-specific methods for strings, datetimes and some Pandas-specific dtypes.

In [40]:
j = pd.Series(['Yes', 'no', 'y', '', 'NO'], name='votes')
j

0    Yes
1     no
2      y
3       
4     NO
Name: votes, dtype: object

In [41]:
j = j.str.upper()
j

0    YES
1     NO
2      Y
3       
4     NO
Name: votes, dtype: object

In [42]:
j[j.str.contains('Y')] = 'Yes'
j[j.str.contains('N')] = 'No'
j

0    Yes
1     No
2    Yes
3       
4     No
Name: votes, dtype: object

In [43]:
j[~j.isin(['Yes', 'No'])] = 'Unknown'
j

0        Yes
1         No
2        Yes
3    Unknown
4         No
Name: votes, dtype: object