#### Notes on 
# Series Data Structure

##### Series is like a cross between a list and a dictionary.
The items are all stored in an order and there's labels with which you can retrieve them. 

In [1]:
import pandas as pd

In [2]:
# Create a series by passing in a list of values. 

students = ['Alice', 'Jack', 'Molly']
pd.Series(students)

0    Alice
1     Jack
2    Molly
dtype: object

In [7]:
# Underneath panda stores series values in a typed array using the Numpy library. 
# This offers significant speedup when processing data versus traditional python lists.

numbers = [1, 2, 3]
pd.Series(numbers)
# And we see on my architecture that the result is a dtype of int64 objects

0    1
1    2
2    3
dtype: int64

###### How does Pandas deal with Missing Values?

In [5]:
# How Numpy and thus pandas handle missing data - 

# Underneath, pandas does some type conversion. If we create a list of strings and we have 
# one element, a None type, pandas inserts it as a None and uses the type object for the 
# underlying array. 

students = ['Alice', 'Jack', None]
pd.Series(students)

0    Alice
1     Jack
2     None
dtype: object

In [8]:
# However, if we create a list of numbers, integers or floats, and put in the None type,
# pandas automatically converts this to a special floating point value designated as NaN, 
# which stands for "Not a Number".

numbers = [1, 2, None]
pd.Series(numbers)

0    1.0
1    2.0
2    NaN
dtype: float64

In [8]:
# Underneath, pandas represents NaN as a floating point number, and because integers can be typecast to
# floats, pandas went and converted our integers to floats. 

# So when you're wondering why the list of integers you put into a Series is not floats, it's probably 
# because there is some missing data.

##### Properties of NaN

In [9]:
# NaN is *NOT* equivilent to None 

import numpy as np
np.nan == None

False

In [10]:
# You actually can't do an equality test of NAN to itself
# the answer is always False. 

np.nan == np.nan

False

In [11]:
# Instead, you need to use special functions to test for the presence of not a number, 
# such as the Numpy library isnan().

np.isnan(np.nan)

True

##### Ways to create Series 

In [11]:
# A series can be created directly from dictionary data
# index is automatically assigned to the keys of the dictionary 

students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}
s = pd.Series(students_scores)
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

In [12]:
# Once the series has been created, we can get the index object using the index attribute.

s.index

Index(['Alice', 'Jack', 'Molly'], dtype='object')

In [13]:
# The dtype of object is not just for strings, but for arbitrary objects. 
# Lets create list of tuples.

students = [("Alice","Brown"), ("Jack", "White"), ("Molly", "Green")]
pd.Series(students)

# We see that each of the tuples is stored in the series object, and the type is object.

0    (Alice, Brown)
1     (Jack, White)
2    (Molly, Green)
dtype: object

In [14]:
# Passing in the index as a list explicitly to the series.

s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

In [15]:
# Pandas overrides the automatic creation 
# to favor only and all of the indices values that you provided. 

# It will ignore from your dictionary all keys which are not in your index, and pandas will add None or NaN type values 
# for any index value you provide, which is not in your dictionary key list.


students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}

s = pd.Series(students_scores, index=['Alice', 'Molly', 'Sam'])
s

Alice    Physics
Molly    English
Sam          NaN
dtype: object