# Series Data Structure In Pandas

The series is one of the core data structures in pandas. It is like a crossover between a list and a dictionary. The items are stored in an order and there's labels with which you can retrieve them. An easy way to visualize this is two columns of data. The first is the special index, a lot like keys in a dictionary. While the second is your actual data. It's important to note that the data column has a label of its own and can be retrieved using the .name attribute. This is different than with dictionaries and is useful when it comes to merging multiple columns of data.

## Creating Series Datastructure In Pandas

### 1. Via Passing A List in the Series Attribute.

Lets see the syntax for this

###### < Variable >=Pandas.Series(< List >)

Lets see some examples.

In [1]:
#First of all lets import Pandas
import pandas as pd

In [2]:
#Lets now pass an list to create a series
sampleList=['Ash','May','Brock']

SeriesStructure1=pd.Series(sampleList)

print(SeriesStructure1)

0      Ash
1      May
2    Brock
dtype: object


As observable above when we pass a list through the Series attribute Pandas starts assigning each and every element of dataset(List Passed Through Series) starting from zero.

Another thing to observe in the given output is that pandas has automatically recognised and printed the datatype of the newly created Series. As each and every element of the list passed was an object(object is a collective name given to string, arrays and etc) the Series formed from this list will have the datatype object. 

In [3]:
# Example2. This time in place of passing string we are going to pass numbers.
sampleList=[90,3,45]

SeriesStructure2=pd.Series(sampleList)

print(SeriesStructure2)

0    90
1     3
2    45
dtype: int64


As observable this time the dtype of Series is int 64 as the list passed through series was of datatype int64.

Also note the underlying structure of the whole Pandas library is Numpy.

##### Important

Now lets discuss a few very important things. In python when we want to indicate lack of data we generally use the none keyword.
But lets now say that when we pass a list(made up of strings) through the Series attribute of Pandas and have one of its element as none. When it happens, Pandas simply classify the none object as string and assigns it a index in the series datatype we are going to get.

Note that Pandas in place of treating  None as None datatype, pandas treats it as the common datatype of list(Objects in this case) and simply uses it as object datatype.

Lets see an example

In [4]:
sampleList=['Ashutosh','7',None]

SeriesConstruct3=pd.Series(sampleList)

print(SeriesConstruct3)

0    Ashutosh
1           7
2        None
dtype: object


Now when we create a list of numbers wheather it be integers, floats or double and add a None element in it, Pandas converts this None into special floating point value designated as NaN, which expand to Not A Number. Lets understand it via examples.

In [5]:
sampleList=[4,6,8,None]

SeriesConstruct=pd.Series(sampleList)

print(SeriesConstruct)

0    4.0
1    6.0
2    8.0
3    NaN
dtype: float64


One more important thing to note here. We passed in integers with a None value but in output Series we see that Pandas has set the dtype Of the Series to be float64. This is because of NaN. Since we have the floating element NaN present in the Series we observe that Pandas typecasts the integer elements to float also and we get the common dtype as float.

It is important to stress that None and NaN might be being used by the data scientist in the same way, to denote missing data, but that underneath these are not represented by pandas in the same way.

NaN is *NOT* equivilent to None and when we try the equality test, the result is False.

Lets bring in numpy which allows us to generate an NaN value.

In [6]:
import numpy as np

In [7]:
np.nan == None

False

It turns out that you actually can't do an equality test of NAN to itself. When you do, the answer is always False. 

In [8]:
np.nan==np.nan

False

Instead, you need to use special functions to test for the presence of not a number, such as the Numpy library isnan().

In [9]:
np.isnan(np.nan)

True

## 2.Creating A Series Via Dictionary
Till now we were creating series via lists. In this the labels\Indexes were system generated while the data to which the indexes\labels were linked to were supplied by the data inside the list. 

Another way a creating a series is creating it using dictionaries. In this we observe that keys of the dictionary work as the labels\indexes and the values of dictionary work as data to which indexes are linked.

This gives us the additional functionality of manipulating the labels\indexes of Series which we didn't had in case of lists.

Lets understand with examples.

In [3]:
sampleDictionary={'A':'Amanda','B':'Cole','C':'Dwayne'}

newConstruct=pd.Series(sampleDictionary)

print(newConstruct)

A    Amanda
B      Cole
C    Dwayne
dtype: object


#### The Index Attribute

The indexes\labels of existing Series can pe accessed via the Index attribute.

The general syntax is as follows:

###### < Name Of Series >.index

Lets see and example.

In [5]:
newConstruct.index

Index(['A', 'B', 'C'], dtype='object')

As observable the dtype of the indexes is also object as in dictionary the keys were passed as strings.

Now, this is kind of interesting. The dtype of object is not just for strings, but for arbitrary objects. Lets create a more complex type of data, say, a list of tuples.

In [12]:
sampleTuple= [("Alice","Brown"), ("Jack", "White"), ("Molly", "Green")]
newConstruct=pd.Series(sampleTuple)
print(newConstruct)

0    (Alice, Brown)
1     (Jack, White)
2    (Molly, Green)
dtype: object


We see that each of the tuples is stored in the series object, and the type is object.

You can also separate your index creation from the data by passing in the index as a list explicitly to the series.

In [13]:
s = pd.Series(['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
print(s)

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object


In [14]:
# So what happens if your list of values in the index object are not aligned with the keys 
# in your dictionary for creating the series? Well, pandas overrides the automatic creation 
# to favor only and all of the indices values that you provided. So it will ignore from your 
# dictionary all keys which are not in your index, and pandas will add None or NaN type values 
# for any index value you provide, which is not in your dictionary key list.

# Here's and example. I'll pass in a dictionary of three items, in this case students and
# their courses
students_scores = {'Alice': 'Physics',
                   'Jack': 'Chemistry',
                   'Molly': 'English'}
# When I create the series object though I'll only ask for an index with three students, and
# I'll exclude Jack
s = pd.Series(students_scores, index=['Alice', 'Molly', 'Sam'])
s

Alice    Physics
Molly    English
Sam          NaN
dtype: object

In [15]:
# The result is that the Series object doesn't have Jack in it, even though he was in our
# original dataset, but it explicitly does have Sam in it as a missing value.