Introduction to Pandas in Python
1. What is Pandas?
Pandas is an open-source data analysis and data manipulation library for Python. It provides powerful data structures such as Series and DataFrame, which allow for efficient handling and analysis of structured data.

Pandas is built on top of NumPy and is widely used for data preprocessing, cleaning, transformation, and analysis in data science and machine learning.

Introduction to NumPy
What is NumPy?
NumPy (Numerical Python) is a powerful Python library used for numerical computing. It provides support for multi-dimensional arrays and mathematical functions, making it an essential library for handling structured data before using Pandas.

Why NumPy?
1.Faster than Python lists for numerical operations

2.Efficient array manipulations

**Installing NumPy**
If you haven’t installed NumPy, you can do so using:

In [1]:
!pip install numpy




Importing NumPy

In [2]:
import numpy as np

**Creating a NumPy Array**

In [3]:
arr = np.array([1, 2, 3, 4, 5])
print("NumPy Array:", arr)

NumPy Array: [1 2 3 4 5]


Basic Operations with NumPy


In [4]:
arr = np.array([10, 20, 30, 40, 50])


In [5]:
print("Array + 4:", arr + 4)  # Adds 5 to each element
print("Array * 2:", arr * 2)  # Multiplies each element by 2
print("Mean of array:", np.mean(arr))  # Calculates mean
print("Standard Deviation:", np.std(arr))  # Calculates standard deviation

Array + 4: [14 24 34 44 54]
Array * 2: [ 20  40  60  80 100]
Mean of array: 30.0
Standard Deviation: 14.142135623730951


2. Installing and Importing Pandas
Installing Pandas (if not already installed)

In [6]:
!pip install pandas



Importing the Pandas Library


In [7]:
import pandas as pd


Checking Pandas Version


In [8]:
print("Pandas Version:", pd.__version__)



Pandas Version: 2.2.2


3. Loading Data into a Pandas DataFrame


Reading a CSV File into a DataFrame

In [9]:
df=pd.read_csv("/content/countries.csv")

In [10]:
df


Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


4. Exploring the DataFrame

Displaying Data

In [11]:
df.head(100)

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
95,Japan,日本,JP,JPN,127080000.0,377930.0,Tokyo,35.682839,139.759455,Eastern Asia,Asia
96,Jersey,Jersey,JE,JEY,99000.0,116.0,Saint Helier,47.384387,4.683325,,
97,Jordan,الأردن,JO,JOR,6666960.0,89342.0,Amman,31.951569,35.923962,Middle East,Asia
98,Kazakhstan,Қазақстан,KZ,KAZ,17377800.0,2724900.0,Astana,51.128220,71.430668,Southern and Central Asia,Asia


In [12]:
df.tail(100)

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
114,Madagascar,Madagasikara,MG,MDG,21842167.0,587041.0,Antananarivo,-18.910012,47.525581,Eastern Africa,Africa
115,Malawi,Malawi,MW,MWI,15805239.0,118484.0,Lilongwe,-13.987511,33.768144,Eastern Africa,Africa
116,Malaysia,Malaysia,MY,MYS,30430500.0,330803.0,Kuala Lumpur,3.151696,101.694237,Southeast Asia,Asia
117,Maldives,Maldives,MV,MDV,341256.0,300.0,Malé,16.370036,-2.290024,Southern and Central Asia,Asia
118,Mali,Mali,ML,MLI,15768000.0,1240192.0,Bamako,12.605033,-7.986514,Western Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


Checking DataFrame Information

In [13]:
print("dataframe Info:")
df.info()

dataframe Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 214 entries, 0 to 213
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   country      214 non-null    object 
 1   native_name  213 non-null    object 
 2   iso2         213 non-null    object 
 3   iso3         214 non-null    object 
 4   population   210 non-null    float64
 5   area         207 non-null    float64
 6   capital      212 non-null    object 
 7   capital_lat  212 non-null    float64
 8   capital_lng  212 non-null    float64
 9   region       205 non-null    object 
 10  continent    206 non-null    object 
dtypes: float64(4), object(7)
memory usage: 18.5+ KB


Summary Statistics

In [14]:
print("\nSummary Statistics:")
df.describe()


Summary Statistics:


Unnamed: 0,population,area,capital_lat,capital_lng
count,210.0,207.0,212.0,212.0
mean,33651440.0,653760.4,18.81233,16.545432
std,133046100.0,1866909.0,26.590545,69.625088
min,30.0,2.02,-54.283545,-176.174022
25%,757260.5,13066.0,2.874467,-13.375923
50%,6315500.0,109884.0,18.025227,16.725786
75%,22683780.0,481771.0,41.31597,47.637577
max,1367110000.0,17124440.0,78.223156,179.11865


Checking Data Types of Columns

In [15]:
df.dtypes

Unnamed: 0,0
country,object
native_name,object
iso2,object
iso3,object
population,float64
area,float64
capital,object
capital_lat,float64
capital_lng,float64
region,object


5. Selecting Specific Columns in a DataFrame

Selecting a Single Column

In [16]:
df['country']

Unnamed: 0,country
0,Afghanistan
1,Albania
2,Algeria
3,American Samoa
4,Angola
...,...
209,Wallis and Futuna
210,Western Sahara
211,Yemen
212,Zambia


Selecting Multiple Columns

In [17]:
df[['capital', 'region']]

Unnamed: 0,capital,region
0,Kabul,Southern and Central Asia
1,Tirana,Southern Europe
2,Algiers,Northern Africa
3,Pago Pago,Polynesia
4,Luanda,Central Africa
...,...,...
209,Mata-Utu,Polynesia
210,El Aaiún,Northern Africa
211,Sana'a,Middle East
212,Lusaka,Eastern Africa


Selecting Multiple Columns for a Specific Row Range

In [18]:
df[['population', 'area', 'capital']][15:210]

Unnamed: 0,population,area,capital
15,9475100.0,207600.0,Minsk
16,11225469.0,30528.0,Brussels
17,349728.0,22966.0,Belmopan
18,9988068.0,112622.0,Porto-Novo
19,64237.0,54.0,Hamilton
...,...,...,...
205,264652.0,12189.0,Port Vila
206,30206307.0,916445.0,Caracas
207,89708900.0,331212.0,Hanoi
208,,,


In [19]:
df[['country', 'native_name', 'population']][5:211]

Unnamed: 0,country,native_name,population
5,Anguilla,Anguilla,13452.0
6,Antigua and Barbuda,Antigua and Barbuda,86295.0
7,Argentina,Argentina,42669500.0
8,Armenia,Հայաստան,3009800.0
9,Aruba,Aruba,101484.0
...,...,...,...
206,Venezuela,Venezuela,30206307.0
207,Vietnam,Việt Nam,89708900.0
208,Wales,,
209,Wallis and Futuna,Wallis et Futuna,13135.0


In [20]:
df[['country', 'iso3', 'iso2']][10:215]

Unnamed: 0,country,iso3,iso2
10,Australia,AUS,AU
11,Austria,AUT,AT
12,Azerbaijan,AZE,AZ
13,Bahrain,BHR,BH
14,Bangladesh,BGD,BD
...,...,...,...
209,Wallis and Futuna,WLF,WF
210,Western Sahara,ESH,EH
211,Yemen,YEM,YE
212,Zambia,ZMB,ZM


6. Filtering Rows in a DataFrame
Filtering Rows Based on a Condition


In [21]:
df[df['continent'] == 'Asia']

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
8,Armenia,Հայաստան,AM,ARM,3009800.0,29743.0,Yerevan,40.177612,44.512585,Middle East,Asia
12,Azerbaijan,Azərbaycan,AZ,AZE,9552500.0,86600.0,Baku,40.375443,49.832675,Middle East,Asia
13,Bahrain,‏البحرين,BH,BHR,1316500.0,765.0,Manama,26.223504,50.582244,Middle East,Asia
14,Bangladesh,বাংলাদেশ,BD,BGD,157486000.0,147570.0,Dhaka,23.759357,90.378814,Southern and Central Asia,Asia
20,Bhutan,ʼbrug-yul,BT,BTN,755030.0,38394.0,Thimphu,27.472762,89.629548,Southern and Central Asia,Asia
25,Brunei,Negara Brunei Darussalam,BN,BRN,393372.0,5765.0,Bandar Seri Begawan,4.889545,114.941757,Southeast Asia,Asia
29,Cambodia,Kâmpŭchéa,KH,KHM,15184120.0,181035.0,Phnom Penh,11.568271,104.922443,Southeast Asia,Asia
37,China,中国,CN,CHN,1367110000.0,9640011.0,Beijing,39.906217,116.391276,Eastern Asia,Asia
45,Cyprus,Κύπρος,CY,CYP,858000.0,9251.0,Nicosia,35.17393,33.364726,Middle East,Asia


Filtering Rows to Exclude Specific Teams

In [22]:
df[~df['continent'].isin(["asia","Europe","Africa"])]

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
5,Anguilla,Anguilla,AI,AIA,13452.0,91.0,The Valley,41.559572,-98.980548,Caribbean,North America
6,Antigua and Barbuda,Antigua and Barbuda,AG,ATG,86295.0,442.0,Saint John's,47.561701,-52.715149,Caribbean,North America
7,Argentina,Argentina,AR,ARG,42669500.0,2780400.0,Buenos Aires,-34.607568,-58.437089,South America,South America
...,...,...,...,...,...,...,...,...,...,...,...
205,Vanuatu,Vanuatu,VU,VUT,264652.0,12189.0,Port Vila,-17.741497,168.315016,Melanesia,Oceania
206,Venezuela,Venezuela,VE,VEN,30206307.0,916445.0,Caracas,10.506098,-66.914602,South America,South America
207,Vietnam,Việt Nam,VN,VNM,89708900.0,331212.0,Hanoi,21.029450,105.854444,Southeast Asia,Asia
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania


Filtering Rows Where 'match_id' is Greater Than a Certain Value

In [23]:
df[df['area'] > 12189.0]

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
7,Argentina,Argentina,AR,ARG,42669500.0,2780400.0,Buenos Aires,-34.607568,-58.437089,South America,South America
...,...,...,...,...,...,...,...,...,...,...,...
207,Vietnam,Việt Nam,VN,VNM,89708900.0,331212.0,Hanoi,21.029450,105.854444,Southeast Asia,Asia
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


7. Counting Values in a DataFrame
Counting Occurrences of a Specific Value in a Column

In [24]:
df[df['continent'] == "Asia"].shape[0]

48

8.Handling Missing Values
Checking for Missing Values

In [25]:
print("\nMissing Values:")
df.isnull().sum()



Missing Values:


Unnamed: 0,0
country,0
native_name,1
iso2,1
iso3,0
population,4
area,7
capital,2
capital_lat,2
capital_lng,2
region,9


Detecting Missing Values in the DataFrame

In [26]:
df.isnull()

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...
209,False,False,False,False,False,False,False,False,False,False,False
210,False,False,False,False,False,False,False,False,False,False,False
211,False,False,False,False,False,False,False,False,False,False,False
212,False,False,False,False,False,False,False,False,False,False,False


Removing Rows with Missing Values


In [27]:
df.dropna()

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


In [28]:
df

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


Filling Missing Values

In [29]:
df.fillna(9)

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


In [30]:
df['country'] = df['country'].fillna(0)

In [55]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,native_name,iso3,population,capital,capital,capital_lat,capital_lng,region,continent
iso2,country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
BE,Belgium,België,BEL,11225469.0,30528.0,Brussels,50.846557,4.351697,Western Europe,Europe
CH,Switzerland,Schweiz,CHE,8183800.0,41284.0,Bern,46.948271,7.451451,Western Europe,Europe
LI,Liechtenstein,Liechtenstein,LIE,37132.0,160.0,Vaduz,47.139286,9.522796,Western Europe,Europe
LU,Luxembourg,Luxembourg,LUX,549700.0,2586.0,Luxembourg,49.815868,6.129675,Western Europe,Europe
NL,Netherlands,Nederland,NLD,16881000.0,41850.0,Amsterdam,52.372760,4.893604,Western Europe,Europe
...,...,...,...,...,...,...,...,...,...,...
CI,Ivory Coast,Côte d'Ivoire,CIV,23821000.0,322463.0,Yamoussoukro,6.809107,-5.273263,,
JE,Jersey,Jersey,JEY,99000.0,116.0,Saint Helier,47.384387,4.683325,,
RS,Serbia,Srbija,SRB,7186862.0,49037.0,Belgrade,44.817813,20.456897,,
TW,Taiwan,臺灣,TWN,23424615.0,36193.0,Taipei,25.037520,121.563680,,


9. Selecting Data using Indexing
Integer-Based Indexing with iloc[]


In [31]:
df.iloc[10:233]

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
10,Australia,Australia,AU,AUS,,7692024.0,Canberra,-35.297591,149.101268,Australia and New Zealand,Oceania
11,Austria,Österreich,AT,AUT,8527230.0,83871.0,Vienna,48.208354,16.372504,Western Europe,Europe
12,Azerbaijan,Azərbaycan,AZ,AZE,9552500.0,86600.0,Baku,40.375443,49.832675,Middle East,Asia
13,Bahrain,‏البحرين,BH,BHR,1316500.0,765.0,Manama,26.223504,50.582244,Middle East,Asia
14,Bangladesh,বাংলাদেশ,BD,BGD,157486000.0,147570.0,Dhaka,23.759357,90.378814,Southern and Central Asia,Asia
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


In [32]:
df.iloc[33, 10]

'North America'

In [33]:
df.iloc[0:2]

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe


In [34]:
df.iloc[0:3, 1:3]

Unnamed: 0,native_name,iso2
0,افغانستان,AF
1,Shqipëria,AL
2,الجزائر,DZ


Label-Based Indexing with loc[]

In [35]:
print(df.loc[60])

country              Fiji
native_name          Fiji
iso2                   FJ
iso3                  FJI
population       859178.0
area              18272.0
capital              Suva
capital_lat    -18.141588
capital_lng    178.442166
region          Melanesia
continent         Oceania
Name: 60, dtype: object


In [36]:
df.loc[[0, 100, 200]]

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
100,Kiribati,Kiribati,KI,KIR,106461.0,811.0,South Tarawa,1.349078,173.038651,Micronesia,Oceania
200,United Arab Emirates,دولة الإمارات العربية المتحدة,AE,ARE,9446000.0,83600.0,Abu Dhabi,24.474796,54.370576,Middle East,Asia


In [37]:
print(df.loc[: "iso2"])

               country       native_name iso2 iso3  population       area  \
0          Afghanistan         افغانستان   AF  AFG  26023100.0   652230.0   
1              Albania         Shqipëria   AL  ALB   2895947.0    28748.0   
2              Algeria           الجزائر   DZ  DZA  38700000.0  2381741.0   
3       American Samoa    American Samoa   AS  ASM     55519.0      199.0   
4               Angola            Angola   AO  AGO  24383301.0  1246700.0   
..                 ...               ...  ...  ...         ...        ...   
209  Wallis and Futuna  Wallis et Futuna   WF  WLF     13135.0      142.0   
210     Western Sahara   الصحراء الغربية   EH  ESH    586000.0   266000.0   
211              Yemen           اليَمَن   YE  YEM  25956000.0   527968.0   
212             Zambia            Zambia   ZM  ZMB  15023315.0   752612.0   
213           Zimbabwe          Zimbabwe   ZW  ZWE  13061239.0   390757.0   

       capital  capital_lat  capital_lng                     region contine

In [38]:
print(df.loc[:, ["iso2", "iso3"]])

    iso2 iso3
0     AF  AFG
1     AL  ALB
2     DZ  DZA
3     AS  ASM
4     AO  AGO
..   ...  ...
209   WF  WLF
210   EH  ESH
211   YE  YEM
212   ZM  ZMB
213   ZW  ZWE

[214 rows x 2 columns]


10.Filtering Data Using Conditions

In [39]:
print(df.loc[df["capital_lng"] == 9])

Empty DataFrame
Columns: [country, native_name, iso2, iso3, population, area, capital, capital_lat, capital_lng, region, continent]
Index: []


11. Handling Duplicates
Checking for Duplicates


In [40]:
print("\nDuplicate Rows:", df.duplicated().sum())


Duplicate Rows: 0


Removing Duplicate Rows

In [41]:
df.drop_duplicates(inplace=True) # Remove duplicate rows permanently
df

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


12. Adding and Removing Columns
Adding a New Column


In [42]:
df['new_column2']=df['country']+df['native_name']
df

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent,new_column2
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia,Afghanistanافغانستان
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe,AlbaniaShqipëria
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa,Algeriaالجزائر
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania,American SamoaAmerican Samoa
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa,AngolaAngola
...,...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania,Wallis and FutunaWallis et Futuna
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa,Western Saharaالصحراء الغربية
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia,Yemenاليَمَن
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe,ZambiaZambia


Dropping a Column

In [43]:
df = df.drop('new_column2', axis=1)
df

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


In [44]:
df

Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
0,Afghanistan,افغانستان,AF,AFG,26023100.0,652230.0,Kabul,34.526011,69.177684,Southern and Central Asia,Asia
1,Albania,Shqipëria,AL,ALB,2895947.0,28748.0,Tirana,41.326873,19.818791,Southern Europe,Europe
2,Algeria,الجزائر,DZ,DZA,38700000.0,2381741.0,Algiers,36.775361,3.060188,Northern Africa,Africa
3,American Samoa,American Samoa,AS,ASM,55519.0,199.0,Pago Pago,-14.275479,-170.704830,Polynesia,Oceania
4,Angola,Angola,AO,AGO,24383301.0,1246700.0,Luanda,-8.827270,13.243951,Central Africa,Africa
...,...,...,...,...,...,...,...,...,...,...,...
209,Wallis and Futuna,Wallis et Futuna,WF,WLF,13135.0,142.0,Mata-Utu,-13.282042,-176.174022,Polynesia,Oceania
210,Western Sahara,الصحراء الغربية,EH,ESH,586000.0,266000.0,El Aaiún,27.154512,-13.195392,Northern Africa,Africa
211,Yemen,اليَمَن,YE,YEM,25956000.0,527968.0,Sana'a,15.353857,44.205884,Middle East,Asia
212,Zambia,Zambia,ZM,ZMB,15023315.0,752612.0,Lusaka,-15.416449,28.282154,Eastern Africa,Europe


13. Sorting Data
Sorting by a Column

In [45]:
df=df.sort_values(by='region', ascending=False)
print("\nSorted Data by region:")
df


Sorted Data by region:


Unnamed: 0,country,native_name,iso2,iso3,population,area,capital,capital_lat,capital_lng,region,continent
16,Belgium,België,BE,BEL,11225469.0,30528.0,Brussels,50.846557,4.351697,Western Europe,Europe
183,Switzerland,Schweiz,CH,CHE,8183800.0,41284.0,Bern,46.948271,7.451451,Western Europe,Europe
109,Liechtenstein,Liechtenstein,LI,LIE,37132.0,160.0,Vaduz,47.139286,9.522796,Western Europe,Europe
111,Luxembourg,Luxembourg,LU,LUX,549700.0,2586.0,Luxembourg,49.815868,6.129675,Western Europe,Europe
133,Netherlands,Nederland,NL,NLD,16881000.0,41850.0,Amsterdam,52.372760,4.893604,Western Europe,Europe
...,...,...,...,...,...,...,...,...,...,...,...
93,Ivory Coast,Côte d'Ivoire,CI,CIV,23821000.0,322463.0,Yamoussoukro,6.809107,-5.273263,,
96,Jersey,Jersey,JE,JEY,99000.0,116.0,Saint Helier,47.384387,4.683325,,
164,Serbia,Srbija,RS,SRB,7186862.0,49037.0,Belgrade,44.817813,20.456897,,
186,Taiwan,臺灣,TW,TWN,23424615.0,36193.0,Taipei,25.037520,121.563680,,


14. Grouping and Aggregation
Grouping by 'ball' and Calculating Aggregations

In [47]:
grouped_df = df.groupby('country')['area'].mean()
print("Mean of 'country'  by area:")
print(grouped_df)

Mean of 'country'  by area:
country
Afghanistan           652230.0
Albania                28748.0
Algeria              2381741.0
American Samoa           199.0
Angola               1246700.0
                       ...    
Wallis and Futuna        142.0
Western Sahara        266000.0
Yemen                 527968.0
Zambia                752612.0
Zimbabwe              390757.0
Name: area, Length: 214, dtype: float64


In [48]:
grouped_df = df.groupby('country')['area'].median()
print("Median of 'country' by area:")
print(grouped_df)

Median of 'country' by area:
country
Afghanistan           652230.0
Albania                28748.0
Algeria              2381741.0
American Samoa           199.0
Angola               1246700.0
                       ...    
Wallis and Futuna        142.0
Western Sahara        266000.0
Yemen                 527968.0
Zambia                752612.0
Zimbabwe              390757.0
Name: area, Length: 214, dtype: float64


In [49]:
grouped_df = df.groupby('country')['area'].var()
print("Variance of 'country' by area:")
print(grouped_df)

Variance of 'country' by area:
country
Afghanistan         NaN
Albania             NaN
Algeria             NaN
American Samoa      NaN
Angola              NaN
                     ..
Wallis and Futuna   NaN
Western Sahara      NaN
Yemen               NaN
Zambia              NaN
Zimbabwe            NaN
Name: area, Length: 214, dtype: float64


In [50]:
grouped_df = df.groupby('country')['area'].std()
print("Standard Deviation of 'country'  by area:")
print(grouped_df)

Standard Deviation of 'country'  by area:
country
Afghanistan         NaN
Albania             NaN
Algeria             NaN
American Samoa      NaN
Angola              NaN
                     ..
Wallis and Futuna   NaN
Western Sahara      NaN
Yemen               NaN
Zambia              NaN
Zimbabwe            NaN
Name: area, Length: 214, dtype: float64


15. Renaming a Column

In [51]:
df = df.rename(columns={"area": "capital"})

16. Resetting and Setting Index
Setting a Multi-Level Index

In [52]:
df = df.set_index(["iso2", "country"])

In [53]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,native_name,iso3,population,capital,capital,capital_lat,capital_lng,region,continent
iso2,country,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
BE,Belgium,België,BEL,11225469.0,30528.0,Brussels,50.846557,4.351697,Western Europe,Europe
CH,Switzerland,Schweiz,CHE,8183800.0,41284.0,Bern,46.948271,7.451451,Western Europe,Europe
LI,Liechtenstein,Liechtenstein,LIE,37132.0,160.0,Vaduz,47.139286,9.522796,Western Europe,Europe
LU,Luxembourg,Luxembourg,LUX,549700.0,2586.0,Luxembourg,49.815868,6.129675,Western Europe,Europe
NL,Netherlands,Nederland,NLD,16881000.0,41850.0,Amsterdam,52.372760,4.893604,Western Europe,Europe
...,...,...,...,...,...,...,...,...,...,...
CI,Ivory Coast,Côte d'Ivoire,CIV,23821000.0,322463.0,Yamoussoukro,6.809107,-5.273263,,
JE,Jersey,Jersey,JEY,99000.0,116.0,Saint Helier,47.384387,4.683325,,
RS,Serbia,Srbija,SRB,7186862.0,49037.0,Belgrade,44.817813,20.456897,,
TW,Taiwan,臺灣,TWN,23424615.0,36193.0,Taipei,25.037520,121.563680,,


Resetting the Index

In [58]:
df = df.reset_index()

In [59]:
df

Unnamed: 0,iso2,country,native_name,iso3,population,capital,capital.1,capital_lat,capital_lng,region,continent
0,BE,Belgium,België,BEL,11225469.0,30528.0,Brussels,50.846557,4.351697,Western Europe,Europe
1,CH,Switzerland,Schweiz,CHE,8183800.0,41284.0,Bern,46.948271,7.451451,Western Europe,Europe
2,LI,Liechtenstein,Liechtenstein,LIE,37132.0,160.0,Vaduz,47.139286,9.522796,Western Europe,Europe
3,LU,Luxembourg,Luxembourg,LUX,549700.0,2586.0,Luxembourg,49.815868,6.129675,Western Europe,Europe
4,NL,Netherlands,Nederland,NLD,16881000.0,41850.0,Amsterdam,52.372760,4.893604,Western Europe,Europe
...,...,...,...,...,...,...,...,...,...,...,...
209,CI,Ivory Coast,Côte d'Ivoire,CIV,23821000.0,322463.0,Yamoussoukro,6.809107,-5.273263,,
210,JE,Jersey,Jersey,JEY,99000.0,116.0,Saint Helier,47.384387,4.683325,,
211,RS,Serbia,Srbija,SRB,7186862.0,49037.0,Belgrade,44.817813,20.456897,,
212,TW,Taiwan,臺灣,TWN,23424615.0,36193.0,Taipei,25.037520,121.563680,,


In [None]:
17. Get the dimensions of the DataFrame

In [54]:
df.shape

(214, 9)