# Making a Series with Python
   

In [7]:
series = {
    'index':[0, 1, 2, 3,],
    'data':[145, 142, 38, 13],
    'name':'songs'
}

In [10]:
def get(series, idx):
    value_idx = series['index'].index(idx)
    return series['data'][value_idx]

In [11]:
get(series, 1)

142

In [12]:
songs = {
    'index':['Paul', 'John', 'George', 'Ringo'],
    'data' :[2, 4, 6, 54],
    'name' :'counts'
}

In [13]:
get(songs, 'John')

4

# The pandas Series

In [1]:
import pandas as pd

In [2]:
songs2 = pd.Series([145, 142, 38,13],
                  name='counts')

In [3]:
songs2

0    145
1    142
2     38
3     13
Name: counts, dtype: int64

In [11]:
songs3 = pd.Series([145, 142, 38,13],
                  name='counts',
                  index=['P','J','G','Ringo'])

In [12]:
songs3

P        145
J        142
G         38
Ringo     13
Name: counts, dtype: int64

In [13]:
class Foo:
    pass

In [15]:
ringo = pd.Series(
    ['Richard','Starkey', 13, Foo()],
    name='ringo')

In [16]:
ringo

0                                    Richard
1                                    Starkey
2                                         13
3    <__main__.Foo object at 0x7f4b8b5d87b8>
Name: ringo, dtype: object

# Similar to NumPy
Series object behaves similarly to a NumPy arrary

In [17]:
import numpy as np

In [18]:
numpy_ser = np.array([145, 142, 38, 13])

In [19]:
songs3[1]

142

In [20]:
numpy_ser[1]

142

In [23]:
songs3.mean()

84.5

In [22]:
numpy_ser.mean()

84.5

Both have notion of Boolean array. A boolean array is a series with the same index as the series you are working with that has boolean values, and it can be used as a mask to filter out items. Normal Python lists do not support such fancy index operations, like stikcing a list into an index operation.

Here we make a mask:

In [26]:
mask = songs3 > songs3.median() #boolean array

In [27]:
mask

P         True
J         True
G        False
Ringo    False
Name: counts, dtype: bool

NumPy also has filtering by boolean arrays, but lacks `.median` method on an array.

# Categorical Data
if we know data is limited to few values, can use categorical data
## Benefits
- use less memory than strings
- improve performance
- can have ordering
- can perform operations on categories
- enforce membership on values

To create we can pass `dtype="category"` into the series constructor or can call the `.astype("category")` method on a series:

In [28]:
s = pd.Series(['m','l','xs','s','xl'], dtype='category')

In [29]:
s

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['l', 'm', 's', 'xl', 'xs']

If this series represents the size, there is a natural ordering from smallest to largest. By default, categories don't have an ordering. We can verify by inspecting the $\color{blue}{.cat}$ attribute

In [30]:
s.cat.ordered

False

To convert a non-categorical series to an ordered category, can create a type with the `CategoricalDtype` constructor and appropriate parameters. Then pass the type into the `.astype` method:

In [43]:
s2 = pd . Series ([ 'm' , 'l' , 'xs', 's', 'xl'])

In [44]:
size_type = pd.api.types.CategoricalDtype (
    categories=['s','m','l'], ordered=True)

In [45]:
s3 =s2.astype(size_type)

In [46]:
s3

0      m
1      l
2    NaN
3      s
4    NaN
dtype: category
Categories (3, object): ['s' < 'm' < 'l']

In [47]:
s3 > 's'

0     True
1     True
2    False
3    False
4    False
dtype: bool

In [48]:
s.cat.reorder_categories(['xs', 's','m','l','xl'],
                         ordered=True)

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']

String and datetime series have a 
`str` and `dt` attributes. If we convert to categorical types, can still use `str` or `dt` attributes on them:

In [51]:
s3.str.upper()

0      M
1      L
2    NaN
3      S
4    NaN
dtype: object

In [52]:
s.str.upper()

0     M
1     L
2    XS
3     S
4    XL
dtype: object

# Summary 
*Method*|*Description*
:---|:---
`pd.Series(data=None, index=None, dtype=None,name=None, copy=False)`|Create a series from data (sequence, dictionary, or scalar).
`s.index`|Access index of series.
`s.astype(dtype, errors='raise')`|Cast a series to dtype. To ignore errors (and return original object) use errors='ignore'.
`s[boolean_array]`| Return values from s where boolean_array is True.
`s.cat.ordered`|Determine if a categorical series is ordered.
`s.cat.reorder_categories(new_categories, ordered=False)`| Add categories (potentially ordered) to the series. `new_categories` must include all categories.