# Data import, wrangling and analysis

**by Serhat Çevikel**

Today we will import an external data:

In [None]:
weo <- read.csv("../data/weo_clean.csv")

In [None]:
str(weo)

We have 162 countries and 12 variables:

In [None]:
names(weo)

Keeping the old names, we will change the variable names to more comprehensive ones:

In [None]:
old_names <- names(weo)

In [None]:
new_names <- c("Country", "GDP_growth", "GDP", "GDP_per_capita",
               "Output_gap", "Investment", "Saving", "Inflation",
               "Unemployment", "Primary_balance", "Net_debt", "Current_account")

In [None]:
names(weo) <- new_names

In [None]:
weo

## Add a new variable

Add a new variable Saving - Investment:

In [None]:
weo$Saving_gap <- with(weo, Saving - Investment)

## Discretize variables

Now we will three categories of income level: Low, medium and high

In [None]:
weo$Income_level <- with(weo, cut(GDP_per_capita,
              breaks = c(0, 5000, 20000,
                         max(GDP_per_capita, na.rm = T)),
              labels = c("low", "medium", "high")))

See the distribution across levels:

In [None]:
barplot(table(weo$Income_level))

## Get summaries

In [None]:
weo

Aggregate variables for each income level:

In [None]:
t(with(weo, aggregate(weo[,2:13], by = list(Income_level), FUN = median, na.rm = T)))

Please interpret this table ...

Create scatterplots of selected variables:

In [None]:
palette(c("green", "red", "blue"))

In [None]:
plot(weo[,c("GDP_growth", "Primary_balance", "Net_debt", "Current_account", "Saving_gap")],
    col = as.numeric(weo$Income_level))

In [None]:
plot(weo[,c("Current_account", "Saving_gap")],
    col = as.numeric(weo$Income_level))

We see Saving Gap is nearly identical to Current Account Balance for many countries

We may look at correlations:

In [None]:
cor(weo[,c("GDP_growth", "Primary_balance", "Net_debt", "Current_account", "Saving_gap")])

We can check the scatter between primary balance and current account:

In [None]:
plot(weo[,c("Primary_balance", "Current_account")],
    col = as.numeric(weo$Income_level))

Though there are outliers, for a majority of low income countries the current account balance and primary balance are lower