In the previous code we saw Index in Dataframe, here we will focus on index in Series which is again core to the Pandas functionality.

In [1]:
import pandas as pd

In [2]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')

In [3]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In the previous code, we saw Dataframe has an Index, Here we will see Series also has an Index which comes from the Dataframe.

For instance: 

In [4]:
drinks.continent.head()

0      Asia
1    Europe
2    Africa
3    Europe
4    Africa
Name: continent, dtype: object

Here we will see 0, 1, 2, 3, 4 which is the Index for the Series 'continent' which came from the Dataframe. So the Index on the left and values are on the right.

Lets say, we didn't use default Index for the Dataframe and instead we set something else as the Index of the Dataframe. For example :

In [5]:
drinks.set_index('country', inplace=True)
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa


Here 'country' has been turned into Index and lets see what happens if we select 'continent' Series as below:

In [6]:
drinks.continent.head()

country
Afghanistan      Asia
Albania        Europe
Algeria        Africa
Andorra        Europe
Angola         Africa
Name: continent, dtype: object

Now also we are seeing result same as last command, Index is on left and values on the right which is just real content of pandas Series but the Index just came from Dataframe and is attached to each row. 

We have actually seen Series many time before and they all include an index, probably just didn't notice it. Lets say an example below:

In [7]:
drinks.continent.value_counts()

Africa           53
Europe           45
Asia             44
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64

Above is actually a Series. Keep the above command as it is with index as below:

In [8]:
drinks.continent.value_counts().index

Index(['Africa', 'Europe', 'Asia', 'North America', 'Oceania',
       'South America'],
      dtype='object')

In [9]:
drinks.continent.value_counts().values

array([53, 45, 44, 23, 16, 12], dtype=int64)

Now because 'continent' is a Series which is output and not some special value counts or objects.
we can actually use the index to select values from the Series what i mean is as below:

In [10]:
drinks.continent.value_counts()['Africa'] 
# Here from the 'continent' Series find the index 'Africa' and show me the value.

53

Above shows, Its like pull out contents by referring index and column name. This worked becuase, its a Series object.

###### Lets talk about Sorting:

Sort values of a Series and sorts in Ascending order as below: 

In [13]:
drinks.continent.value_counts().sort_values()

South America    12
Oceania          16
North America    23
Asia             44
Europe           45
Africa           53
Name: continent, dtype: int64

Here Series is sorted in Ascending order and what if i want to sort index itself in ascending order as below:

In [14]:
drinks.continent.value_counts().sort_index()

Africa           53
Asia             44
Europe           45
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64

Now Index is sorted in ascending alphabetical order.

In the previous code, we talked about 3 reasons Index exists and they were IDENTIFICATION, SELECTION and ALIGNMENT.

###### Now lets see what is ALIGNMENT which is one of the reason Index exists:

Here we will create another Series:

In [25]:
people = pd.Series([3000000, 85000], index=['Albania', 'Andorra'], name='population')
#This is how we construct Series by giving values, index and name

In [26]:
people

Albania    3000000
Andorra      85000
Name: population, dtype: int64

Above is a small dataset which i created and lets say i wanna use 'people' and 'drinks' dataset to calculate the total beer servings for each country. And we will do that by multiplying the count of people times the data in the beer serving Series which tells you the average per person. 

So we will take average per person multiply it by number of people to see the total beer servings per year in particular country. 

In [27]:
drinks.beer_servings

country
Afghanistan      0
Albania         89
Algeria         25
Andorra        245
Angola         217
              ... 
Venezuela      333
Vietnam        111
Yemen            6
Zambia          32
Zimbabwe        64
Name: beer_servings, Length: 193, dtype: int64

Above drinks.beer_servings look like and now that times(multiply) people as below :

In [28]:
drinks.beer_servings*people

Afghanistan            NaN
Albania        267000000.0
Algeria                NaN
Andorra         20825000.0
Angola                 NaN
                  ...     
Venezuela              NaN
Vietnam                NaN
Yemen                  NaN
Zambia                 NaN
Zimbabwe               NaN
Length: 193, dtype: float64

Here it shows, for countries that were not represented in 'people' Series, pandas wont do the math because it need number of people. so it marked it as 'NaN'(Not a Number) which is missing values. 

But for the countries in which 'people' Series have the population amount(count of people), it does the multiplication(math). 
And here is the thing, it aligned them by the Series. 

So in brief, Alignment allows us to put data together and work with it together even if it is not exactly the same length.
As long as we say which rows correspond to which other row. So pandas took Albania and Andorra value from 'people' and 'drinks' dataset and it knew to do math because both dataset are shared based upon shared index.

###### Useful tip: 

How to add above 'people' Series to a Dataframe? 
As below: 

concat(): which is concatenation is used to concatenate rows on top of other rows or columns next to other columns and that is controlled with axis parameter.

In [29]:
pd.concat([drinks, people], axis=1).head() #axis=1 is put the objects side by side 

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent,population
Afghanistan,0,0,0,0.0,Asia,
Albania,89,132,54,4.9,Europe,3000000.0
Algeria,25,0,14,0.7,Africa,
Andorra,245,138,312,12.4,Europe,85000.0
Angola,217,57,45,5.9,Africa,


Here in drinks dataframe we got new column 'population' even though it didn't have complete data it kept the data in right spot because of people Series index and that's the magic of automatic alignment that pandas does using index. 