# World Population by Country 
**2023 World Population 8,005,176,000**

## About Dataset

**CONTENT**

The US Census Bureau's world population clock estimated that the global population as of September 2022 was 7,922,312,800 people and was expected to reach 8 billion by mid-November of 2022. This total far exceeds the 2015 world population of 7.2 billion. The world's population continues to increase by roughly 140 people per minute, with births outweighing deaths in most countries.

Overall, however, the rate of population growth has been slowing for several decades. This slowdown is expected to continue until the rate of population growth reaches zero (an equal number of births and deaths) around 2080-2100, at a population of approximately 10.4 billion people. After this time, the population growth rate is expected to turn negative, resulting in global population decline.

Countries with more than 1 billion people
China is currently the most populous country in the world, with a population estimated at more than 1.42 billion as of September 2022. Only one other country in the world boasts a population of more than 1 billion people: India, whose population is estimated to be 1.41 billion people—and rising.

**Data Collection**

We are going to consider a dataset of World Population by Country . It contains population of each country in 1980, 2000, 2022, 2023, 2050, along with growth rate of each country, density per km etc.\
We will use Python libraries (Pandas, Numpy, Matplotlib & Seaborn) to analyze the database.
The dataset can be downloaded here.
[Link](https://github.com/AnudipAE/DANLC/blob/master/cleaned.csv)\
In this analysis, we will be using Jupyter Notebook.


**Data Cleaning**

*1. Loading the Dataset*

In [9]:
import pandas as pd
data = pd.read_csv('countries_table.csv')
print(data.head())

         country  rank       area  landAreaKm cca2 cca3  netChange  \
0          India     1  3287590.0   2973190.0   IN  IND     0.4184   
1          China     2  9706961.0   9424702.9   CN  CHN    -0.0113   
2  United States     3  9372610.0   9147420.0   US  USA     0.0581   
3      Indonesia     4  1904569.0   1877519.0   ID  IDN     0.0727   
4       Pakistan     5   881912.0    770880.0   PK  PAK     0.1495   

   growthRate  worldPercentage   density  densityMi  place    pop1980  \
0      0.0081           0.1785  480.5033  1244.5036    356  696828385   
1     -0.0002           0.1781  151.2696   391.7884    156  982372466   
2      0.0050           0.0425   37.1686    96.2666    840  223140018   
3      0.0074           0.0347  147.8196   382.8528    360  148177096   
4      0.0198           0.0300  311.9625   807.9829    586   80624057   

      pop2000     pop2010     pop2022     pop2023     pop2030     pop2050  
0  1059633675  1240613620  1417173173  1428627663  1514994080  1

*2. Explore the dataset*

In [10]:
import pandas as pd
data = pd.read_csv('countries_table.csv')
print("Dataframe information")
print(data.info())
print("Dataframe Description")
print(data.describe())

Dataframe information
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 19 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   country          234 non-null    object 
 1   rank             234 non-null    int64  
 2   area             234 non-null    float64
 3   landAreaKm       234 non-null    float64
 4   cca2             233 non-null    object 
 5   cca3             234 non-null    object 
 6   netChange        226 non-null    float64
 7   growthRate       234 non-null    float64
 8   worldPercentage  228 non-null    float64
 9   density          234 non-null    float64
 10  densityMi        234 non-null    float64
 11  place            234 non-null    int64  
 12  pop1980          234 non-null    int64  
 13  pop2000          234 non-null    int64  
 14  pop2010          234 non-null    int64  
 15  pop2022          234 non-null    int64  
 16  pop2023          234 non-null    int64  

*3. Removing Missing Values*

In [11]:
import pandas as pd
data = pd.read_csv('countries_table.csv')
#printing the nan values
print("Before Removing nan Values")
print(data.info())
#removing the nan values
data.dropna(inplace=True)
print("After Removing nan Values")
print(data.info())

Before Removing nan Values
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 19 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   country          234 non-null    object 
 1   rank             234 non-null    int64  
 2   area             234 non-null    float64
 3   landAreaKm       234 non-null    float64
 4   cca2             233 non-null    object 
 5   cca3             234 non-null    object 
 6   netChange        226 non-null    float64
 7   growthRate       234 non-null    float64
 8   worldPercentage  228 non-null    float64
 9   density          234 non-null    float64
 10  densityMi        234 non-null    float64
 11  place            234 non-null    int64  
 12  pop1980          234 non-null    int64  
 13  pop2000          234 non-null    int64  
 14  pop2010          234 non-null    int64  
 15  pop2022          234 non-null    int64  
 16  pop2023          234 non-null    in

*4. Removing Duplicates*

In [13]:
import pandas as pd
data = pd.read_csv('countries_table.csv')
#Find the duplicate values
duplicate_rows = data.duplicated()
#Printing the no of duplicate rows
print("Before Removing Number of duplicate rows:", duplicate_rows.sum())
print(data[duplicate_rows])
#Removing duplicate values
data.drop_duplicates(inplace=True)
#Find the duplicate values
duplicate_rows = data.duplicated()
#Printing the no of duplicate rows
print("After Removing Number of duplicate rows:", duplicate_rows.sum())
print(data[duplicate_rows])
data.to_csv('after_drop_countries_table.csv',index=False)

Before Removing Number of duplicate rows: 0
Empty DataFrame
Columns: [country, rank, area, landAreaKm, cca2, cca3, netChange, growthRate, worldPercentage, density, densityMi, place, pop1980, pop2000, pop2010, pop2022, pop2023, pop2030, pop2050]
Index: []
After Removing Number of duplicate rows: 0
Empty DataFrame
Columns: [country, rank, area, landAreaKm, cca2, cca3, netChange, growthRate, worldPercentage, density, densityMi, place, pop1980, pop2000, pop2010, pop2022, pop2023, pop2030, pop2050]
Index: []


**In Database (After clening the data)**

In [14]:
import pandas as pd
data = pd.read_csv('after_drop_countries_table.csv')
print("Dataframe information")
print(data.info())

Dataframe information
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 19 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   country          234 non-null    object 
 1   rank             234 non-null    int64  
 2   area             234 non-null    float64
 3   landAreaKm       234 non-null    float64
 4   cca2             233 non-null    object 
 5   cca3             234 non-null    object 
 6   netChange        226 non-null    float64
 7   growthRate       234 non-null    float64
 8   worldPercentage  228 non-null    float64
 9   density          234 non-null    float64
 10  densityMi        234 non-null    float64
 11  place            234 non-null    int64  
 12  pop1980          234 non-null    int64  
 13  pop2000          234 non-null    int64  
 14  pop2010          234 non-null    int64  
 15  pop2022          234 non-null    int64  
 16  pop2023          234 non-null    int64  