**Exploring Global Population Trends in 2023**

**Introduction**

Greetings, fellow explorers! Today, we embark on a journey into the intricate tapestry of global population trends in the year 2023. This exploration is in response to the enlightening Data Visualization and Exploration lecture and lab conducted as part of AI Saturdays Lagos Cohort 8. If you're curious about the cohort structure, you can find more details [here](https://github.com/AISaturdaysLagos/cohort_structure/tree/main).

Our vessel for this voyage is the 2023 World Population dataset, a treasure trove of information available on Kaggle. Before we set sail, let's understand the significance of comprehending global population trends and why this dataset is a compass guiding us through the vast sea of demographic data.

In a world where change is the only constant, understanding population dynamics is crucial for policymakers, researchers, and anyone keen on unraveling the mysteries of human civilization. The dataset not only provides numbers but tells stories of countries, their growth, challenges, and unique characteristics.

Throughout this exploration, we navigated using tools and libraries that are the lifeblood of any data enthusiast—pandas for data manipulation, seaborn, plotly and matplotlib for visualization, and the ever-reliable Jupyter notebooks for our analysis canvas.



In [1]:
# TODO: Import any other modules you need e.g matplotlib, seaborn, plotly
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sb
import numpy as np
import plotly.express as px

*Dataset Overview*

Our dataset is a mosaic of countries and territories, each with its own story. The columns weave a narrative of population dynamics, including population size, yearly changes, density, land area, migration patterns, fertility rates, and more. This digital atlas guides us through the diverse landscapes of nations.



In [2]:
path = r"C:\Users\Administrator\Documents\AIsat\Dataset\WorldPopulation2023.csv"
# TODO: Read WorldPopulation2023.csv dataset into a well named dataframe
df = pd.read_csv(path)
df

Unnamed: 0,Rank,Country,Population2023,YearlyChange,NetChange,Density(P/Km²),Land Area(Km²),Migrants(net),Fert.Rate,MedianAge,UrbanPop%,WorldShare
0,36,Afghanistan,42239854,2.70 %,1111083,65,652860,-65846,4.4,17.0,26 %,0.53 %
1,138,Albania,2832439,-0.35 %,-9882,103,27400,-8000,1.4,38.0,67 %,0.04 %
2,34,Algeria,45606480,1.57 %,703255,19,2381740,-9999,2.8,28.0,75 %,0.57 %
3,212,American Samoa,43914,-0.81 %,-359,220,200,-790,2.2,29.0,N.A.,0.00 %
4,202,Andorra,80088,0.33 %,264,170,470,200,1.1,43.0,85 %,0.00 %
...,...,...,...,...,...,...,...,...,...,...,...,...
229,225,Wallis & Futuna,11502,-0.60 %,-70,82,140,-119,1.9,37.0,0 %,0.00 %
230,172,Western Sahara,587259,1.96 %,11273,2,266000,5600,2.2,32.0,95 %,0.01 %
231,44,Yemen,34449825,2.24 %,753211,65,527970,-29914,3.6,19.0,37 %,0.43 %
232,63,Zambia,20569737,2.76 %,552062,28,743390,-5000,4.2,17.0,46 %,0.26 %



*Exploratory Data Analysis (EDA)*

Our journey began with a meticulous examination of the dataset using the describe and info methods. This allowed us to grasp the statistical landscape and understand the nature of our data.



In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Rank            234 non-null    int64  
 1   Country         234 non-null    object 
 2   Population2023  234 non-null    int64  
 3   YearlyChange    234 non-null    object 
 4   NetChange       234 non-null    int64  
 5   Density(P/Km²)  234 non-null    int64  
 6   Land Area(Km²)  234 non-null    int64  
 7   Migrants(net)   234 non-null    int64  
 8   Fert.Rate       233 non-null    float64
 9   MedianAge       233 non-null    float64
 10  UrbanPop%       234 non-null    object 
 11  WorldShare      234 non-null    object 
dtypes: float64(2), int64(6), object(4)
memory usage: 22.1+ KB


In [4]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Rank,234.0,117.5,67.69417,1.0,59.25,117.5,175.75,234.0
Population2023,234.0,34375650.0,137386100.0,518.0,469648.25,5643895.0,23245367.25,1428628000.0
NetChange,234.0,300023.0,1001815.0,-2957105.0,236.0,28601.5,223685.5,11454490.0
Density(P/Km²),234.0,477.4145,2320.694,0.0,38.25,96.5,242.0,24360.0
Land Area(Km²),234.0,555956.8,1691024.0,0.0,2650.0,79720.0,407080.0,16376870.0
Migrants(net),234.0,13.01282,169833.4,-910475.0,-9776.75,-500.0,475.0,1784718.0
Fert.Rate,233.0,2.414163,1.155913,0.8,1.6,2.0,3.0,6.7
MedianAge,233.0,31.30901,9.628386,15.0,22.0,32.0,40.0,54.0


**Data Preprocessing and Cleaning**

No journey is without its challenges, and our data cleaning adventure was no exception. Armed with the wrangle_data function, we corrected data types, bid farewell to missing values, and ensured our dataset was shipshape for analysis. Not a duplicate in sight, our dataset stood ready for the next leg of our expedition.


In [5]:
for i in df.columns:
    print(i,df[i].unique())
    
columnToClean = ['UrbanPop%', 'WorldShare', 'YearlyChange']
for i in columnToClean:
    df[i] = df[i].replace({'%': '', 'N.A.': np.nan}, regex=True).str.strip()


Rank [ 36 138  34 212 202  42 223 200  33 140 197  55 100  90 177 154   8 187
  97  82 178  77 205 165  80 137 144   7 219 176 110  59  78 171  73  53
  38 220 204 117  67  65   2  28 163 113 222 124 130  85 190 158  89  51
 115 160 203  84  15  68  14 112 152 131 156 159  11 207 231 162 118  23
 184 185 146 141 132  19  47 217  91 206 194 179 192  70  75 148 164  81
 234  88 104  94 180   1   4  17  35 125 201  98  25 139  12  83  66  26
 193 129 109 103 151 122 147 120 107 214 142 168 167  50  62  46 175  58
 174 213 181 126 157 182  10 173 134 215 133 169 230  39  48  27 145 224
  49  72 186 123 106  54   6 232  56 150 208 119 127   5 221 128  92 108
  45  13  37  93 136 143  64   9  76 161 227 229 209 191 218 228 189 216
 188  40  71 105 196 102 114 211 116 149 166  69  24  29  86  32  61 198
 121  31 170  87 101  60  57  95  22  20 155  99 233 195 153  79  18 111
 210 226 199  30  41  96  21   3 135  43 183  52  16 225 172  44  63  74]
Country ['Afghanistan' 'Albania' 'Algeria' 'A

In [6]:
num_col = df.iloc[:,[i for i in range(len(df.columns)) if i != 1]].columns
for i in num_col:
    df[i] = df[i].astype('float')

**Data Visualization**

With our data prepped, we set sail into the visual seas. Behold, the top 10 most populated countries emerged, their magnitudes visualized in a captivating bar chart. India, China, and the USA stood tall, each with a unique tale to tell.



![image.png](attachment:image.png)

On the flip side, we explored the least 10 populated countries, where the populations may be smaller, but their stories are no less intriguing. From Holy See to Wallis & Futuna, each holds a piece of the global puzzle.


![image.png](attachment:image.png)

**Correlation Analysis**

A heatmap became our compass, guiding us through the relationships between numeric features. The negative dance of density and migrants, the intricate tango of yearly change and median age, and the harmonious rise of fertility rates and yearly change—all painted on the canvas of correlation.

![image.png](attachment:image.png)

**Relationships Between Features**

Zooming in on specific relationships, we unraveled the mysteries of population density versus migrants, gaining insights into the ebbs and flows of human movement.


![image.png](attachment:image.png)

*Fertility rate and Yearly change*

Fertility rate and Yearly change exhibit a positive correlation, suggesting that they are dependent on each other, and an increase in one will lead to an increase in the other.

![image.png](attachment:image.png)

*Yearly change and Median age*

Yearly change and Median age show a negative correlation, indicating that an increase in one leads to a decrease in the other.

![image.png](attachment:image.png)

**Additional Insights**

Our expedition wouldn't be complete without a few detours. The top and bottom 10 fertility rates sparked curiosity, revealing the extremes of population dynamics. Hong Kong and Niger, worlds apart in fertility, yet both offering unique perspectives on demographic shifts.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

**Conclusion**

As our exploration concludes, we stand at the intersection of data and storytelling. This journey reaffirms the importance of data visualization in not just presenting information but in crafting narratives that resonate. Understanding global population trends isn't just about numbers; it's about stories, connections, and the threads that weave us together.

Reflecting on this assignment, the experience was akin to traversing uncharted territories, and the lessons learned are the treasures we bring back. To fellow voyagers, keep exploring, keep visualizing, and let data be your guide.

*Article Resources*

For those eager to dive deeper into the data seas, here are some resources that guided us:

- [Kaggle - 2023 World Population Dataset](https://www.kaggle.com/datasets/joebeachcapital/world-population-by-country-2023)
- [Pandas Documentation](https://pandas.pydata.org/docs/)
- [Seaborn Documentation](https://seaborn.pydata.org/)
- [Matplotlib Documentation](https://matplotlib.org/stable/index.html)
- [Jupyter Notebooks Documentation](https://docs.jupyter.org/en/latest/)



**Acknowledgments**

A heartfelt thank you to our instructor and tutor for the Data Visualization and Exploration class, whose guidance lit the way. Gratitude to the AI Saturdays Lagos Cohort 8 organization team for fostering a community of learners.

**About the Author**

I am Oyelayo Seye, a curious soul navigating the realms of data science. You can stay connected with me on [LinkedIn](https://www.linkedin.com/in/seyeoyelayo/) and explore more of my work on [GitHub](https://github.com/Exwhybaba?tab=repositories). The motivation for this assignment stems from a passion for unraveling the stories hidden in data, and the belief that every dataset is a world waiting to be explored.

*Feedback*

Your thoughts and feedback are the wind in our sails. Share your comments, insights, and questions below. Let's continue this conversation as we embark on future explorations together. Happy exploring!