In [None]:
options(jupyter.rich_display = F)

# WRANGLING AN ECONOMIC DATA SET: IMF WORLD ECONOMIC OUTLOOK

**by Serhat Çevikel**

Today we will wrangle and analyze 2016 data of World Economic Outlook dataset by IMF

First please download following two data files and four R object files:

[weo_2016_wide_2.csv](../file/weo_2016_wide_2.csv)

[weo_description.csv](../file/weo_description.csv)

[weo_subset2.RData](../file/weo_subset2.RData)

[gdp_agg.RData](../file/gdp_agg.RData)

[weo_merged.RData](../file/weo_merged.RData)

[weo_merged2.RData](../file/weo_merged2.RData)

And read the data into R as such:

In [None]:
weo_data <- read.csv("~/file/weo_2016_wide_2.csv")
weo_desc <- read.csv("~/file/weo_description.csv")

Let's take a quick snapshot of the data:

In [None]:
str(weo_data)

In [None]:
str(weo_desc)

In [None]:
head(weo_desc, 11)

There are 45 numeric variables for 194 countries (some of the data might be missing). We will be interested in only a few of those series

Let's start with real GDP growth. The code of the series is NGDP_RPCH.

First let's plot the series:

In [None]:
plot(weo_data$NGDP_RPCH)

Quite dispersed...

Let's see the fastest and slowest ten growers in 2016:

In [None]:
weo_data[order(weo_data$NGDP_RPCH),c("Country", "NGDP_RPCH")][1:10,]

In [None]:
weo_data[order(weo_data$NGDP_RPCH, decreasing = T),c("Country", "NGDP_RPCH")][1:10,]

It is very thought-provoking that both the fastest and slowest growers are natural resource rich countries

Can we consider that 7.3% growth rate of Iceland and similar growth rate of Bhutan are comparable performances?

Now let's include a second variable: Per capita GDP (PPP). The series name is PPPPC. The "Purchasing Power Parity" adjustment accounts for differences in cost of living

Let's first subset the relavant columns:

In [None]:
weo_subset <- weo_data[,c("Country", "NGDP_RPCH", "PPPPC")]
weo_subset

# EXCLUDE MISSING CASES

Now we will exclude rows with missing information. complete.cases and na.omit will do that

In [None]:
missing <- which(!(complete.cases(weo_subset)))
missing

In [None]:
weo_subset[missing,]

In [None]:
weo_subset2 <- na.omit(weo_subset)

In [None]:
which(!(complete.cases(weo_subset2)))

# SCATTERPLOT ACROSS GDP GROWTH AND GDP PER CAPITA

In [None]:
plot(weo_subset2[,-1])

We see a weak positive relationship between income level and growth. In fact low and middle income countries are expected to grow faster than high income countries do in the long term

So the growth performances must be benchmarked against respective income categories

# DISCRETIZATION OF CONTINUOUS VARIABLES

Now we will create three income categories and add that as an additional variable

First let's have a five point summary + Mean:

In [None]:
summary(weo_subset2$PPPPC)

We want the groups to be of same size:

In [None]:
brks <- quantile(weo_subset2$PPPPC, c(0, 1/3, 2/3, 1))
brks

In [None]:
weo_subset2$income <- cut(weo_subset2$PPPPC,
                          breaks = brks,
                          label = c("low", "middle", "high"))

Let's see whether the groups are divided evenly:

In [None]:
table(weo_subset2$income)

Now let's see which countries are in which group:

In [None]:
with(weo_subset2, split(Country, income))

In fact this classification does not correspond to the classification made by Worldbank, IMF or similar supranational agencies, but it doesn't matter. Simplicity is more important here

# Exercise 1: Aggregate data

You can load the necessary object if you couldn't follow the steps up to now:

In [None]:
load("~/file/weo_subset2.RData")

Remember the aggregate function:

```R
aggregate(x, by, FUN, ..., simplify = TRUE, drop = TRUE)
```

Now please get the **median** growth rate (NGDP_RPCH) for each income category and save into gdp_agg. The column names should be income and gdpg as such:

```R
  income gdpg 
1 low    3.969
2 middle 2.849
3 high   2.197
```

Note that by argument takes a list object. You may use with() and median() functions along with aggregate and olsa names() function

**Solution:**

In [None]:
aggregate(weo_subset2$NGDP_RPCH,
          by = list(weo_subset2$income),
                          FUN = median)

In [None]:
gdp_agg <- with(weo_subset2,
                aggregate(NGDP_RPCH,
                          by = list(income),
                          FUN = median))
names(gdp_agg) <- c("income", "gdpg")
gdp_agg

As you see, low income countries grow faster on the average than middle and high income countries

# Exercise 2: Merge

You can load the necessary objects if you couldn't follow the steps up to now:

In [None]:
load("~/file/weo_subset2.RData")
load("~/file/gdp_agg.RData")

Now, based on the common column "income", merge data frames weo_subset2 and gdp_agg into weo_merged DF so that the median growth of the respective income group can be tracked along for all countries

**Solution:**

In [None]:
weo_merged <- merge(weo_subset2, gdp_agg, by = "income")
weo_merged

# Exercise 3: Get the deviation

You can load the necessary objects if you couldn't follow the steps up to now:

In [None]:
load("~/file/weo_merged.RData")

First create a copy of the object as such:

In [None]:
weo_merged2 <- weo_merged

Now create a new column that calculates the difference between the gdp growth rate of the country (NGDP_RPCH) and the median growth of its respective group (gdpg). Add the new column into weo_merged2 as "dev" 

**Solution:**

In [None]:
weo_merged2$dev <- with(weo_merged2, NGDP_RPCH - gdpg)

#weo_merged2$dev <- weo_merged2$NGDP_RPCH - weo_merged2$gdpg

In [None]:
weo_merged2

# Exercise 4: Plot the best and worst performances

You can load the necessary objects if you couldn't follow the steps up to now:

In [None]:
load("~/file/weo_merged2.RData")

Now using weo_merged2, create a bar chart for the top worst and best performers according to the deviation column 

Note that the data should be a numeric vector, the labels should be pased by "names.arg" argument and for better display of country names use las = 2 as the last option to barplot

**Solution:**

In [None]:
best <- weo_merged2[order(weo_merged2$dev, decreasing = T),
                    c("Country", "dev")][1:10,]

In [None]:
with(best, barplot(dev, names.arg = Country, las = 2))

In [None]:
worst <- weo_merged2[order(weo_merged2$dev),
                     c("Country", "dev")][1:10,]

In [None]:
with(worst, barplot(dev, names.arg = Country, las = 2))