In [36]:
import pandas as pd
import numpy as np

<h1>Series</h1>
A Series is for 1-dimensional data. Like an array. It has a name, the values, and an index.
By using the index abstraction, Series supports other index types like strings and dates, as well as arbitrarily ordered indices or duplicate index values.

The values of the Series do not have to be numeric or homogenous.

In [37]:
songs = pd.Series([145, 142, 38, 13], name='counts')
songs

0    145
1    142
2     38
3     13
Name: counts, dtype: int64

<h1>Index</h1>
The leftmost column is the index, not part of the values.
The generic name for an index is an axis, and the values of the index (0,1,2,3) are called axis labels.
The index object is an attribute of the Series and DataFrame.

In [38]:
songs.index

RangeIndex(start=0, stop=4, step=1)

The index can also be string values.
Pandas reports this index type as object.

In [39]:
songs3 = pd.Series([145,142,38,13], index=['Paul', 'John', 'George', 'Ringo'], name='counts')
print(songs3)
print(songs3.index)

Paul      145
John      142
George     38
Ringo      13
Name: counts, dtype: int64
Index(['Paul', 'John', 'George', 'Ringo'], dtype='object')


<h1>NaN</h1>
The NaN value is present when pandas determines that a series holds numeric values but cannot find a number to represent an entry.

When NaN is present, the datatype becomes float because float64 supports NaN while int64 does not.

In [40]:
nan_series = pd.Series([2, np.nan], index=['Ono', 'Clapton'])
nan_series

Ono        2.0
Clapton    NaN
dtype: float64

<h2>Similar to NumPy</h2>
The Series object behaves similarly to a NumPy array.
They both have methods in common like mean().
They both have boolean arrays.
Normal Python lists don't have this.

<h3>Boolean Array</h3>
Boolean arrays can be used as a mask to filter out items.
It has the same index as the series you're working with and contains a boolean value to include the series value.

<h5>Create a mask of the series</h5>

In [41]:

mask = songs3 > songs3.median()
mask

Paul       True
John       True
George    False
Ringo     False
Name: counts, dtype: bool

<h5>Pass the mask to the series in an index operation</h5>

In [42]:
songs3[mask]

Paul    145
John    142
Name: counts, dtype: int64

<h1>Catagorical Data</h1>

If you know that our data is limited to a few values, you can indicate that the data is catagorical when you load it.
Catagorical values have a few benefits:
<ul>
    <li> Use less memory than strings
    <li> Improve performance
    <li> Can have an ordering
    <li> Can perform operations on categories
    <li> Enforce membership on values
</ul>

To create a category, pass dtype='Category' into the Series constructor:

In [43]:
s = pd.Series(['m', 'l', 'xs', 's', 'xl'], dtype='category')
s

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['l', 'm', 's', 'xl', 'xs']

By default, categories don't have an ordering. You can check the .cat.ordered property...

In [44]:
s.cat.ordered

False

<h2>Ordered Category</h2>

To convert a non-categorical series to an ordered category, create a CategoricalDtype object with the appropriate parameters and pass it in to the series.astype method.

In [45]:
s2 = pd.Series(['m', 'l', 'xs', 's', 'xl'])
size_type = pd.api.types.CategoricalDtype(categories=['s', 'm', 'l'], ordered=True)
s3 = s2.astype(size_type)
s3

0      m
1      l
2    NaN
3      s
4    NaN
dtype: category
Categories (3, object): ['s' < 'm' < 'l']

We can also add ordering information to categorical data.

In [46]:
s.cat.reorder_categories(['xs', 's', 'm', 'l', 'xl'], ordered=True)

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']