**Introduction to Pandas**

**Dataset** - https://drive.google.com/file/d/1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_/view?usp=sharing


In [None]:
# Importing Libraries

import numpy as np
import pandas as pd

**Downloading the dataset onto local colab server**

In [None]:
!gdown 1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_

Downloading...
From: https://drive.google.com/uc?id=1E3bwvYGf1ig32RmcYiWc0IXPN-mD_bI_
To: /content/gapminder.csv
  0% 0.00/83.8k [00:00<?, ?B/s]100% 83.8k/83.8k [00:00<00:00, 74.7MB/s]


**Importing the dataset using pandas**

In [None]:
# Used to read import files ( here .csv means comma separated files)
df=pd.read_csv("/content/gapminder.csv")
df

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


**Checking type of the dataframe**

In [None]:
# df is a dataframe
type(df)

pandas.core.frame.DataFrame

**Accessing individual columns of the dataframe**

In [None]:
# method 1 - used for accessing columns
df["country"]

0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: country, Length: 1704, dtype: object

In [None]:
df['year']

0       1952
1       1957
2       1962
3       1967
4       1972
        ... 
1699    1987
1700    1992
1701    1997
1702    2002
1703    2007
Name: year, Length: 1704, dtype: int64

**Alternative way of accessing individual columns of the dataframe**

In [None]:
# method 2 - used for accessing columns
df.country

0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: country, Length: 1704, dtype: object

**Method - 2 not useful, for column names with space in between**

**For Example** - 'df.name of the city' is not possible, better to use df['column_name'] for accessing pandas columns

In [None]:
# each column is a pandas series
type(df["country"])

pandas.core.series.Series

**Getting basic information regarding the pandas dataset:** 

> Type of the dataframe

> RangeIndex

> List all data columns

> Data Type of each columns

> Number of null rows in each column

> Memory Usage

In [None]:
# used to get basic information about the data frame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   country     1704 non-null   object 
 1   year        1704 non-null   int64  
 2   population  1704 non-null   int64  
 3   continent   1704 non-null   object 
 4   life_exp    1704 non-null   float64
 5   gdp_cap     1704 non-null   float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB


**Inspecting first few rows of a pandas dataframe**

In [None]:
# used to return the first 5 rows
df.head()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106


**Inspecting last few rows of a pandas dataframe**

In [None]:
# used to return the last 5 rows
df.tail()

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.44996
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623
1703,Zimbabwe,2007,12311143,Africa,43.487,469.709298


In [None]:
# WE can pass values inside head to get desired number of first rows
df.head(10)

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.85303
2,Afghanistan,1962,10267083,Asia,31.997,853.10071
3,Afghanistan,1967,11537966,Asia,34.02,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
5,Afghanistan,1977,14880372,Asia,38.438,786.11336
6,Afghanistan,1982,12881816,Asia,39.854,978.011439
7,Afghanistan,1987,13867957,Asia,40.822,852.395945
8,Afghanistan,1992,16317921,Asia,41.674,649.341395
9,Afghanistan,1997,22227415,Asia,41.763,635.341351


In [None]:
# WE can pass values inside tail to get desired number of last rows
df.tail(20)

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
1684,Zambia,1972,4506497,Africa,50.107,1773.498265
1685,Zambia,1977,5216550,Africa,51.386,1588.688299
1686,Zambia,1982,6100407,Africa,51.821,1408.678565
1687,Zambia,1987,7272406,Africa,50.821,1213.315116
1688,Zambia,1992,8381163,Africa,46.1,1210.884633
1689,Zambia,1997,9417789,Africa,40.238,1071.353818
1690,Zambia,2002,10595811,Africa,39.193,1071.613938
1691,Zambia,2007,11746035,Africa,42.384,1271.211593
1692,Zimbabwe,1952,3080907,Africa,48.451,406.884115
1693,Zimbabwe,1957,3646340,Africa,50.469,518.764268


**Inspecting the shape of the dataframe**

In [None]:
# returns the shape of the dataframe
df.shape

(1704, 6)

In [None]:
df.shape[0]

1704

In [None]:
df.shape[1]

6

**Looking at summary statistics of the dataframe.**

**Please Note:** By default it returns summary statistics for only float and integer type attributes

In [None]:
# used for getting information metrics about the data
# by default , gives data only about integer and float columns
df.describe()

Unnamed: 0,year,population,life_exp,gdp_cap
count,1704.0,1704.0,1704.0,1704.0
mean,1979.5,29601210.0,59.474439,7215.327081
std,17.26533,106157900.0,12.917107,9857.454543
min,1952.0,60011.0,23.599,241.165876
25%,1965.75,2793664.0,48.198,1202.060309
50%,1979.5,7023596.0,60.7125,3531.846988
75%,1993.25,19585220.0,70.8455,9325.462346
max,2007.0,1318683000.0,82.603,113523.1329


**However we can highlight summary statistics of object type colums by using the include parameter inside describe function of pandas dataframe**

In [None]:
# include parameter helps us in getting the columns of all data types mentioned
df.describe(include=["object","int64","float64"])

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
count,1704,1704.0,1704.0,1704,1704.0,1704.0
unique,142,,,5,,
top,Afghanistan,,,Africa,,
freq,12,,,624,,
mean,,1979.5,29601210.0,,59.474439,7215.327081
std,,17.26533,106157900.0,,12.917107,9857.454543
min,,1952.0,60011.0,,23.599,241.165876
25%,,1965.75,2793664.0,,48.198,1202.060309
50%,,1979.5,7023596.0,,60.7125,3531.846988
75%,,1993.25,19585220.0,,70.8455,9325.462346


In [None]:
df.describe(include=["object"])

Unnamed: 0,country,continent
count,1704,1704
unique,142,5
top,Afghanistan,Africa
freq,12,624


**Viewing all available columns of a dataframe**

In [None]:
# returns the columns
df.columns

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap'], dtype='object')

In [None]:
# also returns the columns
df.keys()

Index(['country', 'year', 'population', 'continent', 'life_exp', 'gdp_cap'], dtype='object')

**Inspecting first few rows of a pandas series**

In [None]:
df['country'].head()

0    Afghanistan
1    Afghanistan
2    Afghanistan
3    Afghanistan
4    Afghanistan
Name: country, dtype: object

In [None]:
df['country'].head(20)

0     Afghanistan
1     Afghanistan
2     Afghanistan
3     Afghanistan
4     Afghanistan
5     Afghanistan
6     Afghanistan
7     Afghanistan
8     Afghanistan
9     Afghanistan
10    Afghanistan
11    Afghanistan
12        Albania
13        Albania
14        Albania
15        Albania
16        Albania
17        Albania
18        Albania
19        Albania
Name: country, dtype: object

In [None]:
# selectively returning top 20 rows for "country","year","gdp_cap" columns
df[["country","year","gdp_cap"]].head(20)

Unnamed: 0,country,year,gdp_cap
0,Afghanistan,1952,779.445314
1,Afghanistan,1957,820.85303
2,Afghanistan,1962,853.10071
3,Afghanistan,1967,836.197138
4,Afghanistan,1972,739.981106
5,Afghanistan,1977,786.11336
6,Afghanistan,1982,978.011439
7,Afghanistan,1987,852.395945
8,Afghanistan,1992,649.341395
9,Afghanistan,1997,635.341351


**Finding out all unique values in a pandas column**

In [None]:
# returns all the unique values
df['country'].unique()

array(['Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina',
       'Australia', 'Austria', 'Bahrain', 'Bangladesh', 'Belgium',
       'Benin', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Central African Republic', 'Chad', 'Chile', 'China',
       'Colombia', 'Comoros', 'Congo, Dem. Rep.', 'Congo, Rep.',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Czech Republic',
       'Denmark', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Ethiopia',
       'Finland', 'France', 'Gabon', 'Gambia', 'Germany', 'Ghana',
       'Greece', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Haiti',
       'Honduras', 'Hong Kong, China', 'Hungary', 'Iceland', 'India',
       'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kenya', 'Korea, Dem. Rep.',
       'Korea, Rep.', 'Kuwait', 'Leba

**Count of all unique value in a pandas column**

In [None]:
# returns the number of unique values
df["country"].nunique()

142

**Count of all unique values in a pandas column**

In [None]:
# returns the freq of each unique value
a=df["country"].value_counts()

In [None]:
df["continent"].value_counts()

Africa      624
Asia        396
Europe      360
Americas    300
Oceania      24
Name: continent, dtype: int64

In [None]:
# normalize parameter helps in normalizing the freq count
df["continent"].value_counts(normalize=True)

Africa      0.366197
Asia        0.232394
Europe      0.211268
Americas    0.176056
Oceania     0.014085
Name: continent, dtype: float64

In [None]:
df["country"].value_counts(normalize=True)

Afghanistan          0.007042
Pakistan             0.007042
New Zealand          0.007042
Nicaragua            0.007042
Niger                0.007042
                       ...   
Eritrea              0.007042
Equatorial Guinea    0.007042
El Salvador          0.007042
Egypt                0.007042
Zimbabwe             0.007042
Name: country, Length: 142, dtype: float64

In [None]:
df

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


**Renaming pandas column/columns in a pandas dataframe**

In [None]:
# method 1 - For renaming column names
df.rename(columns={"country":"Country"})

Unnamed: 0,Country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [None]:
df

Unnamed: 0,country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [None]:
# inplace permanently changes the df
df.rename(columns={"country":"Country"},inplace=True)
df

Unnamed: 0,Country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [None]:
# method -2 used for renaming the columns names
df.rename({"year":"Year"},axis=1)

Unnamed: 0,Country,Year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


In [None]:
df

Unnamed: 0,Country,year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


**Another way of renaming pandas column/columns in a pandas dataframe**

In [None]:
df.rename({"year":"Year"},axis=1,inplace=True)
df

Unnamed: 0,Country,Year,population,continent,life_exp,gdp_cap
0,Afghanistan,1952,8425333,Asia,28.801,779.445314
1,Afghanistan,1957,9240934,Asia,30.332,820.853030
2,Afghanistan,1962,10267083,Asia,31.997,853.100710
3,Afghanistan,1967,11537966,Asia,34.020,836.197138
4,Afghanistan,1972,13079460,Asia,36.088,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,Africa,62.351,706.157306
1700,Zimbabwe,1992,10704340,Africa,60.377,693.420786
1701,Zimbabwe,1997,11404948,Africa,46.809,792.449960
1702,Zimbabwe,2002,11926563,Africa,39.989,672.038623


**Dropping pandas column/columns from a pandas dataframe**

In [None]:
# Drops the columns
df.drop("continent",axis=1,inplace=True)

In [None]:
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.853030
2,Afghanistan,1962,10267083,31.997,853.100710
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623


**Dropping rows from a pandas dataframe**

In [None]:
# drops the rows
df.drop([1,2,5])

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
6,Afghanistan,1982,12881816,39.854,978.011439
7,Afghanistan,1987,13867957,40.822,852.395945
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623


**Creating a new column from a set of existing columns in a pandas dataframe**

In [None]:
# we can create new columbns using existing columns
df["new"]=df['life_exp']+df['gdp_cap']
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap,new
0,Afghanistan,1952,8425333,28.801,779.445314,808.246315
1,Afghanistan,1957,9240934,30.332,820.853030,851.185030
2,Afghanistan,1962,10267083,31.997,853.100710,885.097710
3,Afghanistan,1967,11537966,34.020,836.197138,870.217138
4,Afghanistan,1972,13079460,36.088,739.981106,776.069106
...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306,768.508306
1700,Zimbabwe,1992,10704340,60.377,693.420786,753.797786
1701,Zimbabwe,1997,11404948,46.809,792.449960,839.258960
1702,Zimbabwe,2002,11926563,39.989,672.038623,712.027623


In [None]:
df["sub"]=df['life_exp']-df['gdp_cap']
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap,new,sub
0,Afghanistan,1952,8425333,28.801,779.445314,808.246315,-750.644314
1,Afghanistan,1957,9240934,30.332,820.853030,851.185030,-790.521030
2,Afghanistan,1962,10267083,31.997,853.100710,885.097710,-821.103710
3,Afghanistan,1967,11537966,34.020,836.197138,870.217138,-802.177138
4,Afghanistan,1972,13079460,36.088,739.981106,776.069106,-703.893106
...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306,768.508306,-643.806306
1700,Zimbabwe,1992,10704340,60.377,693.420786,753.797786,-633.043786
1701,Zimbabwe,1997,11404948,46.809,792.449960,839.258960,-745.640960
1702,Zimbabwe,2002,11926563,39.989,672.038623,712.027623,-632.049623


In [None]:
df["own"]=[i for i in range(1704)]
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap,new,sub,own
0,Afghanistan,1952,8425333,28.801,779.445314,808.246315,-750.644314,0
1,Afghanistan,1957,9240934,30.332,820.853030,851.185030,-790.521030,1
2,Afghanistan,1962,10267083,31.997,853.100710,885.097710,-821.103710,2
3,Afghanistan,1967,11537966,34.020,836.197138,870.217138,-802.177138,3
4,Afghanistan,1972,13079460,36.088,739.981106,776.069106,-703.893106,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306,768.508306,-643.806306,1699
1700,Zimbabwe,1992,10704340,60.377,693.420786,753.797786,-633.043786,1700
1701,Zimbabwe,1997,11404948,46.809,792.449960,839.258960,-745.640960,1701
1702,Zimbabwe,2002,11926563,39.989,672.038623,712.027623,-632.049623,1702


In [None]:
df["sub"]=[i for i in range(1704)]
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap,new,sub,own
0,Afghanistan,1952,8425333,28.801,779.445314,808.246315,0,0
1,Afghanistan,1957,9240934,30.332,820.853030,851.185030,1,1
2,Afghanistan,1962,10267083,31.997,853.100710,885.097710,2,2
3,Afghanistan,1967,11537966,34.020,836.197138,870.217138,3,3
4,Afghanistan,1972,13079460,36.088,739.981106,776.069106,4,4
...,...,...,...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306,768.508306,1699,1699
1700,Zimbabwe,1992,10704340,60.377,693.420786,753.797786,1700,1700
1701,Zimbabwe,1997,11404948,46.809,792.449960,839.258960,1701,1701
1702,Zimbabwe,2002,11926563,39.989,672.038623,712.027623,1702,1702


In [None]:
# df.drop()
df.drop(columns=["own","sub","new"],inplace=True)

In [None]:
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.853030
2,Afghanistan,1962,10267083,31.997,853.100710
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623


**Pandas Series**

In [None]:
# Accessing series/column
ser=df['Country']

In [None]:
ser

0       Afghanistan
1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
           ...     
1699       Zimbabwe
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
Name: Country, Length: 1704, dtype: object

**Accessing elements in a pandas series**

In [None]:
ser[12]

'Albania'

In [None]:
ser[11]

'Afghanistan'

**Slicing in pandas series**

In [None]:
ser[6:15]

6     Afghanistan
7     Afghanistan
8     Afghanistan
9     Afghanistan
10    Afghanistan
11    Afghanistan
12        Albania
13        Albania
14        Albania
Name: Country, dtype: object

**Indexing in pandas series**

In [None]:
# returns the index values of the series
ser.index

RangeIndex(start=0, stop=1704, step=1)

In [None]:
ser.keys()

RangeIndex(start=0, stop=1704, step=1)

**Changing default index of a pandas series**

In [None]:
ser.index=np.arange(1,ser.shape[0]+1,step=1,dtype="int64")
ser

1       Afghanistan
2       Afghanistan
3       Afghanistan
4       Afghanistan
5       Afghanistan
           ...     
1700       Zimbabwe
1701       Zimbabwe
1702       Zimbabwe
1703       Zimbabwe
1704       Zimbabwe
Name: Country, Length: 1704, dtype: object

In [None]:
ser.index

Int64Index([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
            ...
            1695, 1696, 1697, 1698, 1699, 1700, 1701, 1702, 1703, 1704],
           dtype='int64', length=1704)

In [None]:
ser.index[1]

2

In [None]:
# by default the index starts from 0
data=pd.Series(["a","b","c","d","e","f"])
data

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object

**Defining custom index for a pandas series**

In [None]:
# We can define the index for series using index parameter
data1=pd.Series(["a","b","c","d","e","f"],index=[1,2,3,4,5,6])
data1

1    a
2    b
3    c
4    d
5    e
6    f
dtype: object

In [None]:
# index values can be of type object
data2=pd.Series(["a","b","c","d","e","f"],index=["a","b","c","d","e","f"])
data2

a    a
b    b
c    c
d    d
e    e
f    f
dtype: object

In [None]:
data1

1    a
2    b
3    c
4    d
5    e
6    f
dtype: object

In [None]:
# accessing values using labels/index ---> takes explicit defined index
data1[2]

'b'

In [None]:
data1[3]

'c'

In [None]:
# slicing values ---> takes implict indexes while slicing
data1[2:5]

3    c
4    d
5    e
dtype: object

In [None]:
data1

1    a
2    b
3    c
4    d
5    e
6    f
dtype: object

**Slicing index values can be tricky in pandas series.**

To avoid confusion -> 

**Use data1.loc** for explicit indexing

**Use data1.iloc** for implicit indexing

In [None]:
data1.loc[3]

'c'

In [None]:
data1.iloc[2]

'c'

In [None]:
# Note ---> Last value is included in .loc REMEMBER THIS
data1.loc[2:4]

2    b
3    c
4    d
dtype: object

In [None]:
# last values is not taken , takes implicit indexes
data1.iloc[2:4]

3    c
4    d
dtype: object

In [None]:
import pandas as pd
s1 = pd.Series([1,2,3,4], index = ['a','b','c','d'])
# s1[]

In [None]:
s1

a    1
b    2
c    3
d    4
dtype: int64

In [None]:
s1['c']

3

In [None]:
s1[2]

3

In [None]:
data2=pd.Series(["a","b","c","d","e","f"],index=["g","h","i","j","k","l"])
data2

g    a
h    b
i    c
j    d
k    e
l    f
dtype: object

In [None]:
data2.loc["j"]

'd'

In [None]:
data2.iloc[1:4]

h    b
i    c
j    d
dtype: object

In [None]:
# we can slice data using string values as well
data2.loc["h":"j"]

h    b
i    c
j    d
dtype: object

**Using explicit and implicit indexing for pandas dataframe**

In [None]:
df

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
0,Afghanistan,1952,8425333,28.801,779.445314
1,Afghanistan,1957,9240934,30.332,820.853030
2,Afghanistan,1962,10267083,31.997,853.100710
3,Afghanistan,1967,11537966,34.020,836.197138
4,Afghanistan,1972,13079460,36.088,739.981106
...,...,...,...,...,...
1699,Zimbabwe,1987,9216418,62.351,706.157306
1700,Zimbabwe,1992,10704340,60.377,693.420786
1701,Zimbabwe,1997,11404948,46.809,792.449960
1702,Zimbabwe,2002,11926563,39.989,672.038623


In [None]:
df.loc[1]

Country       Afghanistan
Year                 1957
population        9240934
life_exp           30.332
gdp_cap         820.85303
Name: 1, dtype: object

In [None]:
# for accessing specific rows we pass in list of indexes
df.loc[[1,10,100]]

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
1,Afghanistan,1957,9240934,30.332,820.85303
10,Afghanistan,2002,25268405,42.129,726.734055
100,Bangladesh,1972,70759295,45.252,630.233627


In [None]:
# Slicing specific rows and columns from year to life_exp
df.loc[[1,10,100],"Year":"life_exp"]

Unnamed: 0,Year,population,life_exp
1,1957,9240934,30.332
10,2002,25268405,42.129
100,1972,70759295,45.252


In [None]:
# accessing specific rows and columns
df.loc[[1,10,100],["Year","life_exp"]]

Unnamed: 0,Year,life_exp
1,1957,30.332
10,2002,42.129
100,1972,45.252


In [None]:
df.iloc[1]

Country       Afghanistan
Year                 1957
population        9240934
life_exp           30.332
gdp_cap         820.85303
Name: 1, dtype: object

In [None]:
df.iloc[0]

Country       Afghanistan
Year                 1952
population        8425333
life_exp           28.801
gdp_cap        779.445314
Name: 0, dtype: object

In [None]:
df.iloc[[1,10,100]]

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
1,Afghanistan,1957,9240934,30.332,820.85303
10,Afghanistan,2002,25268405,42.129,726.734055
100,Bangladesh,1972,70759295,45.252,630.233627


In [None]:
df.iloc[[1,10,100],"Year":"life_exp"]

TypeError: ignored

In [None]:
df.iloc[[1,10,100],1:4]

Unnamed: 0,Year,population,life_exp
1,1957,9240934,30.332
10,2002,25268405,42.129
100,1972,70759295,45.252


In [None]:
# NOTE ---> For columns it is using implicit indexes ( last value is not included)
df.iloc[10:20,1:4]

Unnamed: 0,Year,population,life_exp
10,2002,25268405,42.129
11,2007,31889923,43.828
12,1952,1282697,55.23
13,1957,1476505,59.28
14,1962,1728137,64.82
15,1967,1984060,66.22
16,1972,2263554,67.69
17,1977,2509048,68.93
18,1982,2780097,70.42
19,1987,3075321,72.0


In [None]:
# iloc can take -ve values ---> will return rows from last
df.iloc[-20:]

Unnamed: 0,Country,Year,population,life_exp,gdp_cap
1684,Zambia,1972,4506497,50.107,1773.498265
1685,Zambia,1977,5216550,51.386,1588.688299
1686,Zambia,1982,6100407,51.821,1408.678565
1687,Zambia,1987,7272406,50.821,1213.315116
1688,Zambia,1992,8381163,46.1,1210.884633
1689,Zambia,1997,9417789,40.238,1071.353818
1690,Zambia,2002,10595811,39.193,1071.613938
1691,Zambia,2007,11746035,42.384,1271.211593
1692,Zimbabwe,1952,3080907,48.451,406.884115
1693,Zimbabwe,1957,3646340,50.469,518.764268


In [None]:
df.loc[-1]

KeyError: ignored

In [None]:
data1=pd.Series(["a","b","c","d","e","f"],index=[1,2,3,4,5,6])
data1

1    a
2    b
3    c
4    d
5    e
6    f
dtype: object

In [None]:
data1[3]

'c'

In [None]:
data[3:4]

3    d
dtype: object

In [None]:
data

0    a
1    b
2    c
3    d
4    e
5    f
dtype: object