<a href="https://colab.research.google.com/github/Lakshmi-Chandana/World_Development_Explorer/blob/main/World_Development_Explorer_Part_B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PROJECT ON WORLD DEVELOPMENT EXPLORER**

## **Introduction**

This project is all about working and exploring the world data from the site World Development Explorer which gives information regarding many topics that are needed to analyze the world's situation. We can get information by comparing many factors like different topics and different indicators. There is a possibility to check based on particular region and country or set of countries(as per the requirement). We have an option to check through different zones of income like high income, low income or middle income along with different lending types. Moreover, we can work on the information based on time span by selecting the required time span. All this information can be viewed in different types of charts like Bubble chart, Bubble and Line chart, 3D Scatter, 2D Scatter, Bar chat, Line chart, Pie chart, Histogram chart, Boxplot chart, Choropleth chart, Sunburst chart and Table chart. Also there are different group by options, different styles for the charts to be displayed, height of the resulting chart and the language of the resulting chart. Below the resulting chart, there is a description about the type of chart that we selecte and some options related to that.



## **Why is this Analysis**

This type of investivations must be conducted often to check the position or graph of the world whether it is climbing up for the better situations or falling down so that the government can take necessary actions according to the results of this kind of exploration. There will be possibility for the government to take care of its people ahead before the situation goes out of the hands. Also we can find the areas to be developed, fields to be given importance, zones to be concentrated for the better of each area, country and the finally the world. And for the individual, this type of investigations help in gaining knowledge about different aspects in the world in different time periods and gets chance to work on this kind of data and with this experience, every interested individual can become a Data Analyst or Business Analyst.

## **Parameters for Analysis**

- In my analysis, I would like to see information regarding Environment, Climate change and Health by taking these as the main topics and I have chosen Agricultural land(% land area), Population growth(annual %) and total population as indicators.
- The countries that I have selected are Brunei Darussalam, Malaysia and Sri Lanka. 
- The choosen time period for these indicators is from 1988 to 1995. 
- I have selected "Country" as my group by option and "Ploty_dark" as my style for the resulting chart with height 600 px in English language.
- Initially I have selected "Bubble Chart" to display the data during the selected time span where we can use different charts to see the data in different types in later times of the analysis.

On the other side, I have downloaded the data with the selected parameters in table format from the website http://www.worlddev.xyz/ and uploaded the same csv file with the name **"*wdi_data.csv*"** into the current working colab notebook.

### **Importing the required libraries/modules to work on data**

In [1]:
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.templates.default = "plotly_dark"

## **Preparing the data**

In [2]:
df = pd.read_csv("wdi_data.csv")    # reading the csv file using pandas into dataframe
df.shape                            # checking the shape of the dataset

(24, 10)

In [3]:
df.head()                             # showing the first five rows

Unnamed: 0.1,Unnamed: 0,Year,SP.POP.GROW,AG.LND.AGRI.ZS,SP.POP.TOTL,Country Code,Country Name,Region,Income Group,Lending Type
0,0,1988,2.838974,2.27704,244404,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
1,1,1989,2.845338,2.27704,251458,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
2,2,1990,2.847428,2.087287,258721,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
3,3,1991,2.853521,2.087287,266210,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
4,4,1992,2.84484,2.087287,273892,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified


### We can see that there is an extra column named "Unnamed: 0". So we can drop off that column using drop()




In [4]:
df.drop(columns=["Unnamed: 0"], inplace=True)       # dropping the unnecessary column

In [5]:
df.head()                            # first five rows with required columns

Unnamed: 0,Year,SP.POP.GROW,AG.LND.AGRI.ZS,SP.POP.TOTL,Country Code,Country Name,Region,Income Group,Lending Type
0,1988,2.838974,2.27704,244404,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
1,1989,2.845338,2.27704,251458,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
2,1990,2.847428,2.087287,258721,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
3,1991,2.853521,2.087287,266210,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
4,1992,2.84484,2.087287,273892,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified


### Now the dataframe looks good with the required columns

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Year            24 non-null     int64  
 1   SP.POP.GROW     24 non-null     float64
 2   AG.LND.AGRI.ZS  24 non-null     float64
 3   SP.POP.TOTL     24 non-null     int64  
 4   Country Code    24 non-null     object 
 5   Country Name    24 non-null     object 
 6   Region          24 non-null     object 
 7   Income Group    24 non-null     object 
 8   Lending Type    24 non-null     object 
dtypes: float64(2), int64(2), object(5)
memory usage: 1.8+ KB


In [7]:
df_1990 = df[df["Year"]==1990]         # taking the data from the year 1990 and assigning it to dataframe
df_1990                               # showing the dataset for the year 1990 

Unnamed: 0,Year,SP.POP.GROW,AG.LND.AGRI.ZS,SP.POP.TOTL,Country Code,Country Name,Region,Income Group,Lending Type
2,1990,2.847428,2.087287,258721,BRN,Brunei Darussalam,East Asia & Pacific,High income,Not classified
10,1990,2.817285,20.564906,18029824,MYS,Malaysia,East Asia & Pacific,Upper middle income,IBRD
18,1990,1.272186,37.298676,17325773,LKA,Sri Lanka,South Asia,Lower middle income,IBRD


In [8]:
df_1990.shape                       # checking the shape of the dataset of the year 1990

(3, 9)

## **Now lets look into the analysis from line chart**

To explain the analysis more clearly, lets take line charts into consideration. We get three line charts as there are three indicators to be looked into. First lets see the below line chart which gives the information about Agricultural land i.e., the percenage of land area during the given time period 1988 to 1995 for the selected countries i.e., Brunei Darussalam, Sri Lanka, Malaysia.

#### **Making the Line Chart** 

Initially let us look at the line chart which gives the information about population growth in the selected time span i.e., from the year 1988 to 1995.

Below is the code to get the line chart for the same.

In [9]:
fig = px.line(df, 
              x="Year", 
              y="SP.POP.GROW", 
              title="Population growth")
fig.show()

Now let us look at the line chart which gives the information about  Agricultural land in the given time period 1988 to 1995.

Below is the code that gives the same.

In [10]:
fig = px.line(df, 
              x="Year", 
              y="AG.LND.AGRI.ZS", 
              title="Agricultural land")
fig.show()

Here comes the line chart which shows us the total population in the selected time period 1988 to 1995.

Below is the code for the same.

In [11]:
fig = px.line(df, 
              x="Year", 
              y="SP.POP.TOTL", 
              title="Total Population")
fig.show()

## **Analysis from Histogram Chart**

It is interesting to look into one dataset from different perspectives. So now let's use Histogram Chart to look into our dataset "df_1990" for our three selected indicators Population growth, Agricultural land and total population.

The other thing to be known is that a Histogram represents the distribution of a numerical variable. 

#### **Making a Histogram chart**

We are going to look into the Histogram Chart which shows us the data regarding population growth within our choosen parameters.

Below is the code which represents the same.

In [12]:
fig = px.histogram(data_frame=df_1990, 
                   x="SP.POP.GROW",
                   nbins=30,
                   title="Histogram chart for population growth")

fig.show()

- From the above graph we can see that there is one country where its population growth is in the range 1.2 to 1.29.
- And there are two countries where their population growth varies in the range of 2.8 to 2.89.

Now we will see the Histogram Chart which represents information about the details of Agricultural land for our selected parameters.

Below is the code which shows the same.

In [13]:
fig = px.histogram(data_frame=df_1990, 
                   x="AG.LND.AGRI.ZS",
                   nbins=30,
                   title="Histogram chart for Agricultural land")         
fig.show()

Upon observing the above histogram chart, we are able to know that 
- There is one country where its agricultural land is below 2.08.
- There is a country with its agricultural land below 20.56
- And there is a country which has its agricultural land below 37.29.

Finally let us look at one more histogram chart which gives the information about total population from the selected dataset "df_1990" for the required parameters.

Below is the code which represents the same.

In [14]:
fig = px.histogram(data_frame=df_1990, 
                   x="SP.POP.TOTL",
                   nbins=30,
                   title="Histogram chart for total population")
                   
fig.show()

From the above chart, what we get to know is, there is a country which has its total population as 258.72K and another country which is maintaining its total population as 18.02M.

## **Analysis from Bar Chart**

A bar chart is for comparing different observations or groups. We can go through the same dataset and observe its variations from different types of charts, one of them is bar chart.


### Preparing the data for the Bar chart

In [15]:
df_region = df.groupby("Region").count()      # taking the count for each category
df_region

Unnamed: 0_level_0,Year,SP.POP.GROW,AG.LND.AGRI.ZS,SP.POP.TOTL,Country Code,Country Name,Income Group,Lending Type
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
East Asia & Pacific,16,16,16,16,16,16,16,16
South Asia,8,8,8,8,8,8,8,8


In [16]:
df_region = df_region.reset_index()             # resetting the index 
df_region

Unnamed: 0,Region,Year,SP.POP.GROW,AG.LND.AGRI.ZS,SP.POP.TOTL,Country Code,Country Name,Income Group,Lending Type
0,East Asia & Pacific,16,16,16,16,16,16,16,16
1,South Asia,8,8,8,8,8,8,8,8


In [17]:
fig = px.bar(data_frame=df_region, 
             x="Region", y="Year",  
             color="Region",
             title="Bar Chart for region vs year"
             )
fig.show()

#### Below is the code which gives the information about total population based on region for our given parameters.

In [19]:
fig = px.bar(data_frame=df, 
             x="Region", y="SP.POP.TOTL", 
             hover_name="Country Name", 
             color="Region",
             height=500)
             
fig.show()

#### From the above chart we can see the total population of different regions. The overall population of the country Brunei Darussalam in the region East Asia & Pacific is upto 281.68k, and the total population of other East Asia & Pacific country Malaysia is upto 20.48M and the other country Sri Lanka which is in the region Sounth Asia is upto 18.24M where the order of population of these countries in descending order is Malaysia, Sri Lanka and the least populated country is Brunei Darussalam.

Here comes the end of this Project **WORLD DEVELOPMENT EXPLORER** which gives us a lot of information on various topics/indicators.