# The Series Data Structure

Data structures in pandas is a cross between a list and a dictionary. The items are all stored in an order and there's labels with which you can retrieve them. 

An easy way to visualize this is two columns of data. The first is the special index, a lot like keys in a dictionary. While the second is your actual data. 

It's important to note that the data column has a label of its own and can be retrieved using the .name attribute. This is different than with dictionaries and is useful when it comes to merging multiple columns of data.

In [1]:
import pandas as pd

Let's import pandas to get started

As expected, it can created a series by passing in a list of values. When this are done, Pandas automatically assigns an index starting with zero and sets the name of the series to None. One of the easiest ways to create a series is to use an array-like object, like a list. 

In [4]:
#Here I'll make a list of the three of students, Alice, Jack, and Molly, all as strings
students = ['Alice', 'Jack', 'Molly']

#Now we just call the Series function in pandas and pass in the students
print(pd.Series(students))

print('-----'*3)

#Lets create a little list of numbers
numbers = [1, 2, 3]

#And turn that into a series
print(pd.Series(numbers))

0    Alice
1     Jack
2    Molly
dtype: object
---------------
0    1
1    2
2    3
dtype: int64


The result is a Series object which is nicely rendered to the screen. We see here that  the pandas has automatically identified the type of data in this Series as "object" and set the dytpe parameter as appropriate. We see that the values are indexed with integers, starting at zero.

While a list might be a common way to create some play data, often you have label data that you want to manipulate. A series can be created directly from dictionary data. If you do this, the index is automatically assigned to the keys of the dictionary that you provided and not just incrementing integers.

In [7]:
students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}
s = pd.Series(students_scores)
print(s)

print('-----'*3)

#Once the series has been created, we can get the index object using the index attribute.
print('Index: ' + s.index)

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object
---------------
Index(['Index: Alice', 'Index: Jack', 'Index: Molly'], dtype='object')


The dtype of object is not just for strings or numbers, so it can be used with arbitrary objects. Lets create a more complex type of data, say, a list of tuples.

In [8]:
students = [("Alice","Brown"), ("Jack", "White"), ("Molly", "Green")]
pd.Series(students)

#As we can see, each of the tuples is stored in the series object and the type is object.

0    (Alice, Brown)
1     (Jack, White)
2    (Molly, Green)
dtype: object

You can also separate your index creation from the data by passing in the index as a list explicitly to the series.

In [9]:
s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

An important thing to say is that if your list of values in the index object are not aligned with the keys in your dictionary for creating the series, pandas overrides the automatic creation to favor only and all of the indices values that you provided. So it will ignore from your dictionary all keys which are not in your index, and pandas will add None or NaN type values for any index value you provide, which is not in your dictionary key list.

In [10]:
students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}
#When I create the series object though I'll only ask for an index with three students, and I'll exclude Jack
s = pd.Series(students_scores, index=['Alice', 'Molly', 'Sam'])
print(s)

Alice    Physics
Molly    English
Sam          NaN
dtype: object
