### Series

The Pandas `Series` is basically a 1-dimensional array with an explicit index.

As well as this index, an implicit index, based on position exists as well (so just like a Python list or NumPy array).

We can create a series object using Python lists to define both the index and the values (think associative arrays!):

In [1]:
import pandas as pd
import numpy as np

In [2]:
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

In [3]:
s

a    10
b    20
c    30
dtype: int64

As you can see from the display above, we have both the index and the values (and the values are in an int64 data type).

We can now reference items using the explicit index:

In [4]:
s['a']

np.int64(10)

In [5]:
s['c'] = 1000
s

a      10
b      20
c    1000
dtype: int64

If you find this similar to Python dictionaries, you're quite right. Python dictionaries are one implementation of associative arrays, and the Pandas `Series` is another implementation.

We can add more elements to the `Series` objects just like we would with a Python `dict`, we simply assign to a new index value:

In [6]:
s['d'] = 500
s

a      10
b      20
c    1000
d     500
dtype: int64

In fact, we can create a `Series` instance by using a plain Python `dict`:

In [7]:
capitals = {
    'USA': 'Washington D.C.',
    'Canada': 'Ottawa',
    'UK': 'London',
    'France': 'Paris'
}
s = pd.Series(capitals)
s

USA       Washington D.C.
Canada             Ottawa
UK                 London
France              Paris
dtype: object

As you can see, the keys of the dictionary became the indices, and the values in the dictionary became the values in the series.

We can get the index object from a series by using the `index` property:

In [8]:
s.index

Index(['USA', 'Canada', 'UK', 'France'], dtype='object')

And the values:

In [9]:
s.values

array(['Washington D.C.', 'Ottawa', 'London', 'Paris'], dtype=object)

These values are actually a plain NumPy array (no special index attached, other than the positional index):

In [10]:
type(s.values)

numpy.ndarray

We can even get the key/value pairs by using `items()`:

In [11]:
s.items()

<zip at 0x2c4224e46c0>

This is a generator object, so we need to iterate through it to get at the actual values:

In [12]:
list(s.items())

[('USA', 'Washington D.C.'),
 ('Canada', 'Ottawa'),
 ('UK', 'London'),
 ('France', 'Paris')]

Unlike a Python dict however, the index of a `Series` can contain repeated elements:

In [13]:
areas = pd.Series(
    ['USA', 'Topeka', 'France', 'Lyon', 'UK', 'Glasgow'],
    index=['country', 'city', 'country', 'city', 'country', 'city']
)

In [14]:
areas

country        USA
city        Topeka
country     France
city          Lyon
country         UK
city       Glasgow
dtype: object

As you can see our index contains repeated elements, unlike a Python dictionary.

The way this works when we select an item by index value is that any items matching the index value are returned:

In [15]:
areas['city']

city     Topeka
city       Lyon
city    Glasgow
dtype: object

Notice how a `Series` object was returned.

So how do we change the value for a single item?

If we try to do an assignment:

In [16]:
areas['city'] = 'London'

In [17]:
areas

country       USA
city       London
country    France
city       London
country        UK
city       London
dtype: object

Well now, that's maybe not what we wanted! :-)

In [18]:
areas = pd.Series(
    ['USA', 'Topeka', 'France', 'Lyon', 'UK', 'Glasgow'],
    index=['country', 'city', 'country', 'city', 'country', 'city']
)

Remember I mentioned there was an implicit positional index? Basically integers for the positions `0`, `1`, etc?

Turns out we can use those numerical (integer) indices to reference elements in the series as well:

In [19]:
areas[5]

  areas[5]


'Glasgow'

In [20]:
areas[2:]

country     France
city          Lyon
country         UK
city       Glasgow
dtype: object

So we could modify an individual element using a positional index:

In [21]:
areas[5] = 'London'
areas

  areas[5] = 'London'


country       USA
city       Topeka
country    France
city         Lyon
country        UK
city       London
dtype: object

What's interesting about the explicit index is that it can be used in slicing and fancy indexing too:

In [22]:
s = pd.Series([10, 20, 30, 40, 50], index=list('abcde'))
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [23]:
s['a':'d']

a    10
b    20
c    30
d    40
dtype: int64

In [24]:
s[['a', 'c', 'd']]

a    10
c    30
d    40
dtype: int64

One thing that is very important to note is that when using a custom index, the slice **includes** the endpoint.

Why does Pandas do this?

Primarily because when we deal with labels, instead of positions, trying to slice a series would require knowledge of the "next" label - it is usually easier to deal with end inclusive ranges since we already know the start/end labels we want to use, and finding the "next" one could be tricky.

So far we have been using `[]` for both the custom index and the positional index:

In [25]:
s['a'], s[0]

  s['a'], s[0]


(np.int64(10), np.int64(10))

How does Pandas know which index we mean? Well in this case it's pretty simple, the custom index consists of strings, and the positional index consists of integers. So when Pandas sees `s['a']` it knows this is the custom index, but when it sees `s[0]`, since the custom index is strings, it correctly interprets it as the positional index.

So, what happens if custom index also consists of integers?

In [26]:
s = pd.Series([100, 200, 300], index=[10, 20, 30])
s

10    100
20    200
30    300
dtype: int64

In [27]:
s[10]

np.int64(100)

As you can see, in this case Pandas will use the explicit index, which means we can no longer use the positional index:

In [28]:
try:
    s[0]
except KeyError as ex:
    print('KeyError: ', ex)

KeyError:  0


But it gets more confusing than that, if we try to slice, Pandas will actually use the implicit index, not the custom index:

In [29]:
s[0:3]

10    100
20    200
30    300
dtype: int64

On the other hand, fancy indexing will use the custom (explicit) index:

In [30]:
try:
    s[[0, 3, 4]]
except KeyError as ex:
    print('KeyError:', ex)

KeyError: "None of [Index([0, 3, 4], dtype='int64')] are in the [index]"


So using the square brackets (`[]`) works, but can quickly get confusing since what it will do with an argument may dewill depend on the data type of the index.

Pandas implements two attributes, `loc` and `iloc` that we can use instead that can be used to access elements by the explicit index, or by the implicit positional index.

The `iloc` attribute is used when we want to use the positional index:

In [31]:
s.iloc[0]

np.int64(100)

And the `loc` attribute is used when we want to access the data using the explicit index:

In [32]:
s.loc[10]

np.int64(100)

**Note**: The `loc` and `iloc` are attributes, not methods, and we use square brackets (`[]`), not parentheses (`()`).

Of course, both of them support slicing and fancy indexing:

In [33]:
s.iloc[0:4]

10    100
20    200
30    300
dtype: int64

In [34]:
s.loc[10:30]

10    100
20    200
30    300
dtype: int64

Again, note how slicing using the explicit index is inclusive, unlike when we use the implcit index.

We can also provide a `name` attribute for any `Series` object:

In [35]:
s

10    100
20    200
30    300
dtype: int64

In [36]:
s.name = 'test'
s

10    100
20    200
30    300
Name: test, dtype: int64

We can also specify this name when we create the series:

In [37]:
areas = pd.Series(
    ['USA', 'Topeka', 'France', 'Lyon', 'UK', 'Glasgow'],
    index=['country', 'city', 'country', 'city', 'country', 'city'],
    name='Areas'
)
areas

country        USA
city        Topeka
country     France
city          Lyon
country         UK
city       Glasgow
Name: Areas, dtype: object

We'll see later why this is important when we look at `DataFrames`.

We can also use boolean masking (which remember will use the values, not the index when calculating conditional logic expressions), so no confusion regarding which index it uses - it does not use any.

In [38]:
areas[areas != 'Glasgow']

country       USA
city       Topeka
country    France
city         Lyon
country        UK
Name: Areas, dtype: object

We have seen how to select and mutate values in a series. How do we delete an item?

It's not as straightforward as you might think, since we also have an (immutable) explicit index associated with the series.

Instead, we can use the `drop()` method, specifying the indices we want to drop from the series, which will return a **new** series, but without affecting the original:

In [39]:
s = pd.Series([10, 20, 30], index=list('abc'), name='test')
s

a    10
b    20
c    30
Name: test, dtype: int64

In [40]:
new = s.drop(['a', 'c'])
new

b    20
Name: test, dtype: int64

In [41]:
s

a    10
b    20
c    30
Name: test, dtype: int64

Can we drop by position? Not directly, no.

We can however recover the explicit index value for a specific location.

Remember how we studied Indexes in a previous set of lectures? The Index is just another series, with implicit positional indexing.

In [42]:
s.index

Index(['a', 'b', 'c'], dtype='object')

So we can get the explicit index value for a specific (or set of specific) positional indices:

In [43]:
s.index[[0, 2]]

Index(['a', 'c'], dtype='object')

And we can now use this in our `drop()` call:

In [44]:
s.drop(s.index[[0, 2]])

b    20
Name: test, dtype: int64

Again though, the original series is not affected:

In [45]:
s

a    10
b    20
c    30
Name: test, dtype: int64