# What do I need to know about the pandas index? (Part 2)

🐼 Tuto on pandas by Data School - Exercice performed by Dorian.H Mekni 🥇 | Tue 08 Dec 2020

In [3]:
import pandas as pd

In [4]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')

In [5]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [6]:
drinks.continent.head()

0      Asia
1    Europe
2    Africa
3    Europe
4    Africa
Name: continent, dtype: object

In [7]:
drinks.set_index('country', inplace=True)
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,Asia
Albania,89,132,54,4.9,Europe
Algeria,25,0,14,0.7,Africa
Andorra,245,138,312,12.4,Europe
Angola,217,57,45,5.9,Africa


In [8]:
drinks.continent.head()

country
Afghanistan      Asia
Albania        Europe
Algeria        Africa
Andorra        Europe
Angola         Africa
Name: continent, dtype: object

In [9]:
drinks.continent.value_counts()

Africa           53
Europe           45
Asia             44
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64


☝🏻 This is a Series and as such it has an index. 


In [19]:
drinks.continent.value_counts().index

Index(['Africa', 'Europe', 'Asia', 'North America', 'Oceania',
       'South America'],
      dtype='object')

In [18]:
drinks.continent.value_counts().values

array([53, 45, 44, 23, 16, 12])


☝🏻 And an index hold values seen above in an array. We can therefore use the index to select data from the targeted series.   


In [32]:
drinks.continent.value_counts()['Africa']

53

✅ it worked as it is a Series object. 

In [16]:
drinks.continent.value_counts().sort_values()

South America    12
Oceania          16
North America    23
Asia             44
Europe           45
Africa           53
Name: continent, dtype: int64


✅ The Series is now sorted into an ascending order. 



⭐️ Let's now sort the index. 


In [12]:
drinks.continent.value_counts().sort_index()

Africa           53
Asia             44
Europe           45
North America    23
Oceania          16
South America    12
Name: continent, dtype: int64


✅ Now in a sorted alphabetical order. 



⭐️ Let's now see how we create a Series. 


In [23]:
people = pd.Series([3000000, 85000], index=['Albania', 'Andorra'], name='population')
people

Albania    3000000
Andorra      85000
Name: population, dtype: int64


☝🏻 This is how we constructed the Series. We first gave the values, then the name. 



⭐️ Let's now calculate the total beer serving for each country. To achieve this, I'll multiply these numbers of people times the data in the beer serving Series that tells you the average per person.  


In [24]:
drinks.beer_servings * people 

Afghanistan            NaN
Albania        267000000.0
Algeria                NaN
Andorra         20825000.0
Angola                 NaN
                  ...     
Venezuela              NaN
Vietnam                NaN
Yemen                  NaN
Zambia                 NaN
Zimbabwe               NaN
Length: 193, dtype: float64

✅ It worked out for the one that had the population amount but returned a missing values for the countries without a population to multiply. 

☝🏻It also alignes them by Series. It didn't just take our two numbers and multiply them by the first two rows. 



🧐 In summary, alignment allows us to put data together and work with it jointly even if it's not exactly the same lenght as long as you tell them which rows correspond to wich other row. Also worth to mention, is that the multiplication happened for the two numbers provided cause both are shared upon the same shared index. 


# 🎩 Bonus tips : iPython | Jupyter Notebook ONLY


🤠 What if I just want to add a new column into my existing dataframe called drinks ? 


In [31]:
pd.concat([drinks, people], axis=1, sort=False).head()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent,population
Afghanistan,0,0,0,0.0,Asia,
Albania,89,132,54,4.9,Europe,3000000.0
Algeria,25,0,14,0.7,Africa,
Andorra,245,138,312,12.4,Europe,85000.0
Angola,217,57,45,5.9,Africa,



✅ The dataframe drinks has now a new column 'population'.



➕ Concat() can be used to concatenate rows on top of other rows or columns next other columns. To operate this, we fix the axis parameter to either 0 or 1. 



🙏🏻 Thank you !
👋🏻 See you in the next one !
