### Indexing and Slicing

Row labels, Columns labels: loc

row index, col index: iloc

In [1]:
import pandas as pd

In [2]:
brics = pd.read_csv("brics.csv",index_col=0)
brics

Unnamed: 0_level_0,country,capital,area,population
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0
SA,South Africa,Pretoria,1.221,52.98


In [3]:
brics.loc[["RU","IN","CH"]] #row,col

Unnamed: 0_level_0,country,capital,area,population
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0


In [4]:
brics.loc[["BR","SA"]]

Unnamed: 0_level_0,country,capital,area,population
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BR,Brazil,Brasilia,8.516,200.4
SA,South Africa,Pretoria,1.221,52.98


In [5]:
brics.loc[["RU","IN","CH"],["country","capital"]]

Unnamed: 0_level_0,country,capital
Code,Unnamed: 1_level_1,Unnamed: 2_level_1
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing


In [6]:
brics.loc[:,["country","capital"]]

Unnamed: 0_level_0,country,capital
Code,Unnamed: 1_level_1,Unnamed: 2_level_1
BR,Brazil,Brasilia
RU,Russia,Moscow
IN,India,New Delhi
CH,China,Beijing
SA,South Africa,Pretoria


In [7]:
# brics.loc[["BR":"CH"]] will not work and give syntax as we are working with labels
# So we use iloc for index positions

brics.iloc[[1,2,3],[1,2]]

Unnamed: 0_level_0,capital,area
Code,Unnamed: 1_level_1,Unnamed: 2_level_1
RU,Moscow,17.1
IN,New Delhi,3.286
CH,Beijing,9.597


In [8]:
brics.iloc[:,[1,2]]

Unnamed: 0_level_0,capital,area
Code,Unnamed: 1_level_1,Unnamed: 2_level_1
BR,Brasilia,8.516
RU,Moscow,17.1
IN,New Delhi,3.286
CH,Beijing,9.597
SA,Pretoria,1.221


In [9]:
brics.iloc[1:3,0:2] #range of rows

Unnamed: 0_level_0,country,capital
Code,Unnamed: 1_level_1,Unnamed: 2_level_1
RU,Russia,Moscow
IN,India,New Delhi


### Duplicates

In [10]:
brics.duplicated()

Code
BR    False
RU    False
IN    False
CH    False
SA    False
dtype: bool

Duplication is checked on a whole row and not on columns and values in those columns 

In [11]:
brics.drop_duplicates()

Unnamed: 0_level_0,country,capital,area,population
Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
IN,India,New Delhi,3.286,1252.0
CH,China,Beijing,9.597,1357.0
SA,South Africa,Pretoria,1.221,52.98


In [12]:
df = pd.DataFrame({
    "brand": ['yum', 'yum', 'ind'],
    "style": ['cup', 'cup', 'bottle']
})
df

Unnamed: 0,brand,style
0,yum,cup
1,yum,cup
2,ind,bottle


In [13]:
df.duplicated()

0    False
1     True
2    False
dtype: bool

In [15]:
# we have to reset index and drop the extra index created when the original index column joins the dataframe
df.drop_duplicates().reset_index().drop(columns=["index"],axis=1)

Unnamed: 0,brand,style
0,yum,cup
1,ind,bottle


In [17]:
df = pd.DataFrame({
    "brand": ['yum', 'yum', 'ind'],
    "style": ['cup', 'bottle', 'bottle']
})
df

Unnamed: 0,brand,style
0,yum,cup
1,yum,bottle
2,ind,bottle


In [18]:
df.duplicated(subset=["brand"])

0    False
1     True
2    False
dtype: bool

If column has same values like above, we can use this:

In [19]:
df.drop_duplicates(subset=["brand"])

Unnamed: 0,brand,style
0,yum,cup
2,ind,bottle
