<a href="https://colab.research.google.com/github/brunofbpaula/DataScience-UM-Coursera/blob/main/Pandas/Series.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Series Data Structure

It is one of the core data structures in pandas. It's sort of a cross between a list and a dictionary. Items are stored in order and there's labels with which you can retrieve them.

They are displayed in two columns of data, which the first one is the special index and the second the actual data.

In [1]:
import pandas as pd

In [None]:
# In the code line below, there's a list of jujutsu kaisen characters, all strings.
jjk = ['Satoru Gojo', 'Suguru Geto', 'Toji Fushiguro']

# The Series function identifies the type of data as an 'object'.
pd.Series(jjk)

0       Satoru Gojo
1       Suguru Geto
2    Toji Fushiguro
dtype: object

In [None]:
# And it's possible to acess a data by its index.
pd.Series(jjk)[1]

'Suguru Geto'

Underneath Pandas stores series values in a typed array using the Numpy library. This offers significant speedup when processing data versus traditional python lists.

In [None]:
last_titles = [2012, 2015, 2017]

# Pandas identifies a list of integers as the type int64.
pd.Series(last_titles)

0    2012
1    2015
2    2017
dtype: int64

## Missing data

In python, the None type indicates a lack of data. In pandas, in a list of strings with one element of a None type, panda inserts this element as a None and sets the type 'object' for the underlying array.

In [None]:
jjk = ['Yuji Itadori', 'Megumi Fushiguro', None]

pd.Series(jjk)

0        Yuji Itadori
1    Megumi Fushiguro
2                None
dtype: object

However, it is different when it comes to lists of numbers. When a list of integers and/or floats contains a None type, pandas automatically converts this to a special floating point value designated as NaN, which stands for 'Not a Number'. And NaN and None are not the same thing.

In [None]:
last_titles = [2017, None, None]

print(pd.Series(last_titles))

0    2017.0
1       NaN
2       NaN
dtype: float64

In [None]:
import numpy as np

print(f'NaN is equivalent to None = {np.nan == None}')
print(f'NaN is equal to NaN = {np.nan == np.nan}')
print(f'NaN is NaN = {np.isnan(np.nan)}')

NaN is equivalent to None = False
NaN is equal to NaN = False
NaN is NaN = True


### Dictionaries

When a Series is created directly from a dictionary, the keys are assigned to the index of the values and not just incrementing integers.

In [None]:
jjk = {
    'Protagonist': 'Yuji Itadori',
    'The Strongest': 'Satoru Gojo',
    'King of Curses': 'Ryomen Sukuna'
}

jjk = pd.Series(jjk)

print(jjk)
print(f'\n{jjk.index}')

Protagonist        Yuji Itadori
The Strongest       Satoru Gojo
King of Curses    Ryomen Sukuna
dtype: object

Index(['Protagonist', 'The Strongest', 'King of Curses'], dtype='object')


### More complex type of data

Dealing with tuples, separeted indexes and missing data.

In [None]:
# A list of tuples. It's stored the same way strings are, as an 'object'.
jjk = [('Yuta', 'Rika'), ('Satoru Gojo', 'Suguru Geto'), ('Yuji Itadori', 'Megumi Fushiguro')]
pd.Series(jjk)

0                        (Yuta, Rika)
1          (Satoru Gojo, Suguru Geto)
2    (Yuji Itadori, Megumi Fushiguro)
dtype: object

In [None]:
# Passing the index as a list
characters = ["Satoru Gojo", "Ryomen Sukuna", "Kenjaku"]
titles = ['Honored One', 'King of Curses', 'Mastermind']

pd.Series(characters, index=[titles])

Honored One         Satoru Gojo
King of Curses    Ryomen Sukuna
Mastermind              Kenjaku
dtype: object

In [3]:
# Providing data not-existent inside dictionary as a index

jjk_domains = {
    'Satoru Gojo': 'Infinity',
    'Ryomen Sukuna': 'Malevolent Shrine',
    'Mahito': 'Self-Embodiment of Perfection'
}

pd.Series(jjk_domains, index=["Mahito", 'Ryomen Sukuna', 'Yuta Okkotsu'])

Mahito           Self-Embodiment of Perfection
Ryomen Sukuna                Malevolent Shrine
Yuta Okkotsu                               NaN
dtype: object

The result is that the Series object doesn't have a domain value for Yuta Okkotsu, so it sets a NaN (sometimes it can be a None) value to it. At the same time, Satoru Gojo and his Infinity, which were in the original dataset, are left out of the object.