In [0]:
library(tidyverse) # metapackage with lots of helpful functions
library(magrittr)

# Read the data
Lets load some data!  You will find a dataset on Gert Hofstede's ["6-D model of national culture""](https://geerthofstede.com/culture-geert-hofstede-gert-jan-hofstede/6d-model-of-national-culture/). This popular measures of country-level culture in (by now) 6 dimensions became very popular in sociology, economics, and management science to explain cross-cultural interaction as well as frictions. a exhaustive documentation of the 2013 dataset can be found [here](https://geerthofstede.com/wp-content/uploads/2016/07/Manual-VSM-2013.pdf).

In [0]:
data <- read_csv2("https://www.dropbox.com/s/6can8ofrh1mqukh/vsm13.csv?dl=1", na = "#NULL!")

In [0]:
data %>% head()

In [0]:
data %>% glimpse()

* **`pdi:`**  The power distance index is defined as "the extent to which the less powerful members of organizations and institutions (like the family) accept and expect that power is distributed unequally."In this dimension, inequality and power is perceived from the followers, or the lower level. A higher degree of the Index indicates that hierarchy is clearly established and executed in society, without doubt or reason. A lower degree of the Index signifies that people question authority and attempt to distribute power.
* **`idv:`**  This index explores the "degree to which people in a society are integrated into groups."" Individualistic societies have loose ties that often only relates an individual to his/her immediate family. They emphasize the "I" versus the "we". Its counterpart, collectivism, describes a society in which tightly-integrated relationships tie extended families and others into in-groups. These in-groups are laced with undoubted loyalty and support each other when a conflict arises with another in-group.
* **`mas:`**  In this dimension, masculinity is defined as "a preference in society for achievement, heroism, assertiveness and material rewards for success."" Its counterpart represents "a preference for cooperation, modesty, caring for the weak and quality of life." Women in the respective societies tend to display different values. In feminine societies, they share modest and caring views equally with men. In more masculine societies, women are somewhat assertive and competitive, but notably less than men. In other words, they still recognize a gap between male and female values. This dimension is frequently viewed as taboo in highly masculine societies.
* **`uai:`**  The uncertainty avoidance index is defined as "a society's tolerance for ambiguity," in which people embrace or avert an event of something unexpected, unknown, or away from the status quo. Societies that score a high degree in this index opt for stiff codes of behavior, guidelines, laws, and generally rely on absolute truth, or the belief that one lone truth dictates everything and people know what it is. A lower degree in this index shows more acceptance of differing thoughts or ideas. Society tends to impose fewer regulations, ambiguity is more accustomed to, and the environment is more free-flowing.
* **`ltowvs:`** This dimension associates the connection of the past with the current and future actions/challenges. A lower degree of this index (short-term) indicates that traditions are honored and kept, while steadfastness is valued. Societies with a high degree in this index (long-term) views adaptation and circumstantial, pragmatic problem-solving as a necessity. A poor country that is short-term oriented usually has little to no economic development, while long-term oriented countries continue to develop to a point. 
* **`ivr:`**  This dimension is essentially a measure of happiness; whether or not simple joys are fulfilled. Indulgence is defined as "a society that allows relatively free gratification of basic and natural human desires related to enjoying life and having fun." Its counterpart is defined as "a society that controls gratification of needs and regulates it by means of strict social norms. Indulgent societies believe themselves to be in control of their own life and emotions; restrained societies believe other factors dictate their life and emotions.

Ok, looks interesting. Let's do the fololwing:

0. The data is not perfect. So some small upfront-munging is necessary.
1. Gert Hofstede claims this dimensions to emasure orthogonal features of culture. That raises the question if they reasy measure different constructs. To find out, lets execute a PCA on them. How do the dimensions load? And how do countries score? Illustrate and visualize the results.
2. Can we form meaningful "cultural clusters" among countries?
3. Let's create a meaningful measure for "cultural distance" between countries. What do we see? Interpret.
4. (Advanced) Does bilateral "cultural distance" or the assignment to a "cultural cluster" help us to explain other interaction between countries we might be interested in, such as trade, migration etc.? Here you will need some skills from M1-1 & 2.

In [0]:
# Load the packages we need
library(FactoMineR)
library(factoextra)

Some preprocessing upfront

In [0]:
# First, let's get rid of NAs (we could impute them, but lets be lazy for now)
data %<>% drop_na()

# define rownames (for the visualization)
rownames(data) <- data %>% pull(country)

Now its really your turn, let's do a PCA

In [0]:
res.pca <- PCA(data %>% select(-country, -ctr), scale.unit = TRUE, graph = FALSE)
glimpse(res.pca)

And now, let's visualize it...

In [0]:
fviz_pca_var(res.pca, 
             alpha.var = "cos2",
             col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE,
             ggtheme = theme_gray()) 

So, now without help... can you create a distance matrix?

So, which countries are the most, and the least culturally distant?

Lets do a kmeans clustering

In [0]:
km <- kmeans(scale(data[***]), centers = *)  
glimpse(km)

And visualize it...

In [0]:
fviz_cluster(***, data = ***,
             ggtheme = theme_gray())  