<br>
we want to check out:

* how to store and manipulate single dimensional indexed data in the Series object.
* how to create a Series from lists and dictionaries?
* how indicies on data work?
* how pandas typecasts data including missing values?

In [1]:
import pandas as pd


<br>
The Series is one of the core data structures in pandas. it is a cross between a list and a dictionary.

data are all stored in an order and there's labels with which you can retrieve them. An easy way to visualize this is two columns of data. The first is the special index, a lot like keys in a dictionary. While the second is your actual data. It's important to note that the data column has a label of its own and can be retrieved using the .name attribute.

This is different than with dictionaries and is useful when it comes to merging multiple columns of data.

<br>
in the following example, We see here that the pandas has automatically identified the type of data in this Series as "object" and set the dytpe parameter as appropriate, and We see that the values are indexed with integers, starting at zero.

# create a Series from lists

In [2]:
students = ['Amir', 'Reza', 'Saleh']

pd.Series(students)

0     Amir
1     Reza
2    Saleh
dtype: object

<br>
Underneath pandas stores series values in a typed array using the Numpy library. This offers significant speedup when processing data versus traditional python lists.

In [3]:
numbers = [1,2,3,4]

pd.Series(numbers)

0    1
1    2
2    3
3    4
dtype: int64

how Numpy and thus pandas ***handle missing data*** ?

In Python, we have the **none type** to indicate **a lack of data**.

Underneath, pandas does some **type conversion**. If we create **a list of strings** in that we have one element, a **None type**, **pandas automatically converts this to a special string value designated as a None and uses the type object for the underlying array**.


In [4]:
students = ["Amir", 3, 'Saleh']

pd.Series(students)

0     Amir
1        3
2    Saleh
dtype: object

it happens because **integers can be typecast to strings**.

In [5]:
students = ["Amir", 3, 'Saleh']

pd.Series(students)

0     Amir
1        3
2    Saleh
dtype: object

if we create **a list of numbers, integers or floats,** and put in the **None type**, **pandas automatically converts this to a special floating point value designated as NaN, which stands for "Not a Number"**.


In [6]:
numbers = [1, 2, None, 4]

pd.Series(numbers)

0    1.0
1    2.0
2    NaN
3    4.0
dtype: float64

You'll notice a couple of things:

* First, **NaN can have different values**. 
* Second, pandas set the dytpe of this series to floating point numbers instead of object or ints.

why not just leave this as an integer?

Underneath, pandas represents NaN as a floating point number, because **integers can be typecast to floats**.

so, when you're wondering why the list of integers you put into a Series is not floats, it returns the dtype of a float64 it's probably because there is some **missing data**.

<br>
None and NaN might be being used by the data scientist in the same way, to denote missing data, but that underneath these are not represented by pandas in the same
way.

In [7]:
import numpy as np

**NaN is NOT equivilent to None** and when we try the **equality test**, the result is False.

In [8]:
np.nan == None

False

we actually **can't do an equality test of NAN to itself**. When you do, the answer is always False.

In [9]:
np.nan == np.nan

False

we need to use special functions **to test for the presence of nan(not a number)**, such as the **Numpy library isnan()**.

In [10]:
np.isnan(np.nan)

True

So keep in mind when we see **NaN, it's meaning is similar to None,** but it's **a numeric value and treated differently** for efficiency reasons.

# create a Series from dictionary

oftentime, we have labeled data that you want to manipulate. 

A Series can be created directly from dictionary data. If you do this, **the index is automatically assigned to the keys of the dictionary** that you provided and not just 
incrementing integers.


In [11]:
student_major = {"Amir" : "Software Engineering",
                 "Sara" : "Nursing",
                "Farshad" : "Medical",
                "Shirin" : "Interior Design"}

serie = pd.Series(student_major)
serie

Amir       Software Engineering
Sara                    Nursing
Farshad                 Medical
Shirin          Interior Design
dtype: object

We see that, since it was string data, pandas set the data type of the series to "object", and We see that the index, the first column, is also a list of strings.

it happens because **integers can be typecast to strings**.

In [12]:
student_major = {'Amir' : 5,
                 'Sara' : 4,
                'Farhad' : 3,
                'Shirin' : 'Interior Design'}

pd.Series(student_major)

Amir                    5
Sara                    4
Farhad                  3
Shirin    Interior Design
dtype: object

Once the series has been created, we can get the index object using the **.index** attribute.

In [13]:
serie.index

Index(['Amir', 'Sara', 'Farshad', 'Shirin'], dtype='object')

In [14]:
numbers = [1, 2, None, 4]

pd.Series(numbers).index

RangeIndex(start=0, stop=4, step=1)

<br>
a lot of things in the pandas are implemented as numpy arrays, and have the dtype value set. This is true of indicies, and here pandas infered that we were using string objects for the index.

<br>
The dtype of object is not just for strings, but for arbitrary objects.

In [15]:
students = [('Amir Hosein', 'Sedaghati'), ('Reza', 'Avar'), ('Saleh', 'Falah')]

pd.Series(students)

0    (Amir Hosein, Sedaghati)
1                (Reza, Avar)
2              (Saleh, Falah)
dtype: object

<br>
We see that each of the tuples is stored in the series object, and the type is object.

In [16]:
pd.Series(data= ['Software Engineering', 'Nursing', 'Medical', 'Interior Desig'], index= ["Amir", "Sara", "Farshad", "Shirin"])


Amir       Software Engineering
Sara                    Nursing
Farshad                 Medical
Shirin           Interior Desig
dtype: object

So, what happens if your list of values in the index object are not aligned with the keys in your dictionary for creating the series?

Pandas will ignore from your dictionary all keys which are not in your index, and pandas will add None or NaN type values for any index value you provide, which is not in your dictionary key list.

In [17]:
student_major = {"Amir" : "Software Engineering",
                 "Sara" : "Nursing",
                "Farshad" : "Medical",
                "Shirin" : "Interior Design"}

pd.Series(student_major, index= ["Amir", "Sara", "Davood", "Reza"])

Amir      Software Engineering
Sara                   Nursing
Davood                     NaN
Reza                       NaN
dtype: object

<br>
The result is that the Series object doesn't have Farshad and Shririn in it, even though he was in our original dataset, but it explicitly does have Davood and Reza in it as a missing value.