# Jennifer Payne

## Inquiry theme: Core Dimensions of Democracy

This analysis examines the core dimensions of democracy, in particular electoral, deliberative, participatory, egalitarian, and liberal democracy. The aim of this analysis is to explore balance and relationships across and between these dimensions of democracy, how these dimensions differ across countries and regions, and how these patterns have changed over time.

## Analytic questions

1. **What does the balance across the core dimensions of democracy look like?**
To what extent do countries and regions perform consistently across electoral, deliberative, participatory, egalitarian, and liberal democracy? How has this balance changed over time? Are well-balanced profiles becoming more common, or are profiles growing more uneven? Finally, are balanced democracies more persistent than imbalanced ones (e.g. given the same average level of "democractic-ness")?

2. **How have the relationships between different dimensions of democracy evolved over time, and across countries or regions?**
Which dimensions of democracy tend to move together across countries or regions? Do there exist temporal trends that cut across indices? Do improvements in one dimension of democracy predict later gains in others, i.e. are some dimensions of democracy more foundational? Do regions tend to show distinctive mixes of democratic strengths and weaknesses? 

## Analytic Question 1

**(A) What does the balance across the core dimensions of democracy look like?**
(B) To what extent do countries and regions perform consistently across electoral, deliberative, participatory, egalitarian, and liberal democracy? (C) How has this balance changed over time? (D) Are well-balanced profiles becoming more common, or are profiles growing more uneven? (E) Finally, are balanced democracies more persistent than imbalanced ones?

### Task Abstraction 

Based on Stasko and Amar's framework, this question involves a variety of tasks. 

Sub-questions A and B focus on how each dimension of democracy might **correlate** to each other dimension of democracy. The task for sub-question C is similar, in that it also involving **correlation**, examining how each democracy dimension correlates with time. One could also describe these tasks as **characterizing distribution** or variation, if we treat the different dimensions of democracy as attributes within a broader construct and examine the spread of their values within that higher-level concept. Given this framing, determining the delta between largest and smallest value could be considered a **determine range** task. 

Sub-question D involves **clustering**-- finding groups of countries that are well-balanced, and examining if more countries fit the cluster over time. Lastly, E extends D by examining both 'balanced' and 'imbalanced' **clusters** to see how membership changes over time (i.e. **correlates** with time). 

A **compute derived value** task might be involved, if one were to compute the standard deviation or other variability measure across the different dimensions of democracy. A **sort** task might also be involved, for example, in sorting countries based on the standard deviation across dimensions of democracy (i.e. democracy indices). Based on this sorted data, we might **find extremum(s)**, i.e. countries with the lowest and highest standard deviation across dimensions of democracy.

### Preliminary Sketches

#### Sketch 1. Parallel Coordinates

In [53]:
# <img src="parallel.jpg" alt="sketch" width="50%">

<img src="IMG_9598.jpg" alt="High-fidelity sketch" width="50%">

#### Sketch 2. Star Plots

In [54]:
# <img src="star.jpg" alt="star plots" width="50%">

<img src="IMG_9602.jpg" alt="sketch" width="50%">

#### Sketch 3. Standard Deviation Heat Map

In [None]:
# <img src="large.jpg" alt="sketch" width="50%">

<img src="large.jpg" alt="sketch" width="50%">

## Critical Analysis

**Sketch 1** is a parallel coordinates representation of the different dimensions of democracy,
i.e. the high-level indices in the V-Dem dataset. 

One concern with this type of plot
is that too many lines make it very difficult to read. Certainly, ~130 lines
seems like too many to display simuntaneously. As a result, we intend to experiment
with using regional averages, rather than lines for each country. We also intend
to experiment with allowing the viewer to **filter** the displayed data, showing only a few
countries at a time. 

Another concern is that the resultant shape can be
difficult to interpret-- however, in this case, our main question is simply binary. We ask: Is it close to even, or is it jagged? It is also helpful that we only need to represent five attributes of democracy, so the resultant shape will not be too complex. 

Despite these drawbacks, parallel coordinates seem like an appropriate
approach, given that it supports straightforward multivariate comparison, i.e. exploring multiple democracy dimensions simultaneously, allowing all high-level democracy indices to be visualized together in a single view.

**Sketch 2** uses star plots, faceted by country. This type of plot is an appropriate 
choice for showing relative magnitude across variables with identical ranges. In 
addition, using a slider for time means that if the viewer wants to make comparisons
over time, they have to keep previous values 'in their head' as they slide to a subsequent
year. In this way, the representation is best at supporting close temporal comparisons (as 
one could flip back and forth), but is far from excellent at supporting temporal
comparisons which stretch over a longer period of time.

**Sketch 3** uses color to represent a derived numeric attribute, standard deviation.
Using color limits this representation, in that we can only represent a limited 
number of different levels of this attribute, say ~5. This may be sufficient
to differentiate between 'balanced countries' and 'imbalanced' countries! However,
it is difficult to know without seeing the plot created. The provided color scale
is not optimal-- it might be preferable to use e.g. lightness or a more obvious 
rainbow scale, to provide a clear indication of which values are greater than others.


#### High Fidelity Version of Sketch 1

In [None]:
# <img src="high_fid.png" alt="high_fid" width="50%">

<img src="high_fid.png" alt="high_fid" width="80%">

## Analytic Question 2

**How have the relationships between different dimensions of democracy evolved over time, and across countries or regions?**
(2A) Which dimensions of democracy tend to move together across countries or regions? (2B) Do there exist temporal trends that cut across indices? (2C) Do improvements in one dimension of democracy predict later gains in others, i.e. are some dimensions of democracy more foundational? (2D) Do regions tend to show distinctive mixes of democratic strengths and weaknesses? 

### Task Abstraction 

Based on Stasko and Amar's framework, this question involves a variety of tasks. 

Sub-questions 2A and 2B involve a **correlate** task, since they focus on how the different dimensions of democracy move together, over time (2B) and across countries and regions (2A). Sub-question 2C also centers on **correlate**, but with a predictive emphasis. 
Sub-question 2D involves a **cluster** task: identifying groups of regions that share distinctive combinations of democratic strengths and weaknesses. 

Similar to analytic question 1, one might also use **characterize distribution** to describe tasks here, given that we treat the different dimensions of democracy as attributes within a broader construct and examine the spread of their values within that higher-level construct over time and regions. This question might also involve **finding anomalies**, to identify countries or regions that diverge from broader patterns.

#### Sketch 4. Simple Heat Map

In [None]:
# <img src="small.jpg" alt="sketch" width="50%">

<img src="small.jpg" alt="sketch" width="50%">

#### Sketch 5. Heat Map with Selectable Democracy Dimension

In [55]:
# <img src="heat.jpg" alt="sketch" width="50%">

<img src="IMG_9599.jpg" alt="sketch" width="50%">

#### Sketch 6. Line Graph By Country

In [None]:
# <img src="line.jpg" alt="sketch" width="50%">

<img src="IMG_9603.jpg" alt="sketch" width="50%">

## Critical Analysis

**Sketch 4** is a heat map which shows values for each high-level democractic index.
Conveniently, each of these indices takes a value between 0 and 1, which makes comparison
relatively straightforward. Upon reflection, this representation might actually be better
suited for answering analytic question 1, where the task would be to compare colors
e.g. along a single row. It might also be suited for answering question 2A, allowing
viewers to spot patterns â€” for example, which countries have similar profiles across
dimensions. The color encoding of index value provides an immediate indicator of differences.
Though color does not represent numerous values as effectively as e.g. position, this
seems fine, given that we do not need exact values, and ~5 values might be sufficient.
Including sorting or interactive reordering by region or average score could make it 
easier to identify clusters and regional distinctiveness.

**Sketch 5** is also a heat map, but in constrast with sketch 4, it extends over time (-- please note that the lo-fi sketch is missing year values along the x-axis).
This representation allows users to view trends over time. Color variation can be compared across years and across countries. One drawback of this representation is that color alone may not convey the direction of change clearly. Additionally, it is difficult to compare across countries seperated by several rows. Options such as collapsing across columns (years) or reordering rows (countries) would better support comparison tasks. 

**Sketch 6** is a line chart, showing values for each high-level index over time (year). The plot is faceted by country. This sketch is effective for showing trends over time for individual democracy indices. With faceting by country, this representation allows viewers to see how the indices move together or diverge within each country, supporting the **correlate** or **characterize distribution** task.

One drawback of the design is faceting by country leads to many facets, and the total
number of facets may be overwhelming, and is certainly too many to consider simultaneously. An alternative approach might be to facet by region (of which there are 17). Another option would be allowing the viewer to filter by country or region-- with a smaller number of facets, comparison across countries is easier. One critical question might be: is cross-country comparison a priority?

Despite these drawbacks, this sketch is the strongest representation for this question. It clearly shows how indices move together and change over time within each country. It supports **correlate** and **characterize distribution** tasks. Unlike heat maps, it uses position along a continuous axis, which is well suited for showing numeric data and more effectively illustrates change over time.

#### High Fidelity Version of Sketch 1

In [61]:
# <img src="high_fid2.jpeg" alt="sketch" width="80%">

<img src="high_fid2.jpeg" alt="sketch" width="80%">

# Exploratory Data Analysis

## Import libraries

In [None]:
import pandas as pd
import altair as alt

## Load data

In [None]:
alt.data_transformers.enable("vegafusion")
full_v_dem_data = pd.read_csv("../../data/raw/V-Dem-CY-Full+Others-v15.csv", low_memory = False)
full_v_dem_data['year'] = pd.to_datetime(full_v_dem_data['year'], format = '%Y')

print(f"Below is a sample of the full V-Dem dataset. There are {full_v_dem_data.shape[0]} rows and {full_v_dem_data.shape[1]} columns! \n")
full_v_dem_data

## Select subset & rename columns

Since the original dataset contains 4607 columns, our group elected to work with a subset:

In [None]:
columns_to_keep = ['country_name',
                   'year',
				   'e_regionpol_7C',
				   'e_regiongeo',

                   'v2x_polyarchy',
                   'v2x_libdem',
                   'v2x_partipdem',
                   'v2x_delibdem',
                   'v2x_egaldem',

                   'v2x_api',
                   'v2x_mpi',
                   'v2x_freexp_altinf',
                   'v2x_frassoc_thick',
                   'v2x_suffr',
                   'v2xel_frefair',
                   'v2x_elecoff',
                   'v2x_liberal',
                   'v2xcl_rol',
                   'v2x_jucon',
                   'v2xlg_legcon',
                   'v2x_partip',
                   'v2x_cspart',
                   'v2xdd_dd',
                   'v2xel_locelec',
                   'v2xel_regelec',
                   'v2xdl_delib',
                   'v2x_egal',
                   'v2xeg_eqprotec',
                   'v2xeg_eqaccess',
                   'v2xeg_eqdr',

                   'v2elgvsuflvl',
				   'v2expathhg',
                   'v2ddlexci',
                   'v2elrstrct',
                   'v2ddcredal',
                   'v2exfemhog',
                   'v2elcomvot',
                   'v2elfemrst',

				   'v2xca_academ',
				   'v2cafexch',
				   'v2cafres',
				   'v2cainsaut',
				   'v2casurv',
				   'v2clacfree'
				   ]

data = full_v_dem_data[columns_to_keep]
print(f"The subset of the dataset selected for the project has {data.shape[1]} columns.")

Next, I updated each attribute name, to ease further analyses.

In [None]:
dict_for_renaming = {

	# General attributes 
    "country_name": "Country",
    "year": "Year",
	"e_regionpol_7C": "Politico-Geographical Region",
	"e_regiongeo": "Region",

    # High-level indices
    "v2x_polyarchy": "Electoral Democracy Index",
    "v2x_libdem": "Liberal Democracy Index",
    "v2x_partipdem": "Participatory Democracy Index",
    "v2x_delibdem": "Deliberative Democracy Index",
    "v2x_egaldem": "Egalitarian Democracy Index",

	# Mid-level indices
    "v2x_api": "Additive Polyarchy Index",
    "v2x_mpi": "Multiplicative Polyarchy Index",
    "v2x_freexp_altinf": "Freedom of Expression and Alternative Sources of Information Index",
    "v2x_frassoc_thick": "Freedom of Association Thick Index",
    "v2x_suffr": "Share of Population with Suffrage",
    "v2xel_frefair": "Clean Elections Index",
    "v2x_elecoff": "Elected Officials Index",
    "v2x_liberal": "Liberal Components Index",
    "v2xcl_rol": "Equality Before the Law and Individual Liberty Index",
    "v2x_jucon": "Judicial Constraints on the Executive Index",
    "v2xlg_legcon": "Legislative Constraints on the Executive Index",
    "v2x_partip": "Participatory Component Index",
    "v2x_cspart": "Civil Society Participation Index",
    "v2xdd_dd": "Direct Popular Vote Index",
    "v2xel_locelec": "Local Government Index",
    "v2xel_regelec": "Regional Government Index",
    "v2xdl_delib": "Deliberative Component Index",
    "v2x_egal": "Egalitarian Component Index",
    "v2xeg_eqprotec": "Equal Protection Index",
    "v2xeg_eqaccess": "Equal Access Index",
    "v2xeg_eqdr": "Equal Distribution of Resources Index",

	# Other attributes of interest
    "v2elgvsuflvl": "Suffrage Level",
	"v2expathhg": "Head of Government (HOG) Appointment in Practice",
    "v2ddlexci": "Initiatives Permitted",
    "v2elrstrct": "Candidate restriction by ethnicity, race, religion, or language",
    "v2ddcredal": "Credible Elections", 
    "v2exfemhog": "Female Head of Government (HOG)",
    "v2elcomvot": "Compulsory Voting",
    "v2elfemrst": "Female Suffrage Restricted",

	# Academia-related attributes of interest
	'v2xca_academ': 'Academic Freedom Index',
	'v2cafexch': 'Freedom of academic exchange and dissemination',
	'v2cafres': 'Freedom to research and teach',
	'v2cainsaut': 'Institutional autonomy',
	'v2casurv': 'Campus integrity',
	'v2clacfree': 'Freedom of academic and cultural expression'
}

data = data.rename(columns = dict_for_renaming)

Then I created groups of attributes, to use in subsequent analyses.

In [None]:
high_level_indices = [
    "Electoral Democracy Index",
    "Liberal Democracy Index",
    "Participatory Democracy Index",
    "Deliberative Democracy Index",
    "Egalitarian Democracy Index"
]

mid_level_indices = [
    "Additive Polyarchy Index",
    "Multiplicative Polyarchy Index",
    "Freedom of Expression and Alternative Sources of Information Index",
    "Freedom of Association Thick Index",
    "Share of Population with Suffrage",
    "Clean Elections Index",
    "Elected Officials Index",
    "Liberal Components Index",
    "Equality Before the Law and Individual Liberty Index",
    "Judicial Constraints on the Executive Index",
    "Legislative Constraints on the Executive Index",
    "Participatory Component Index",
    "Civil Society Participation Index",
    "Direct Popular Vote Index",
    "Local Government Index",
    "Regional Government Index",
    "Deliberative Component Index",
    "Egalitarian Component Index",
    "Equal Protection Index",
    "Equal Access Index",
    "Equal Distribution of Resources Index"
]

academic_attributes = [
		'Freedom of academic exchange and dissemination',
		'Freedom to research and teach',
		'Institutional autonomy',
		'Campus integrity',
		'Freedom of academic and cultural expression'
]

## Univariate Numeric Summaries

In [None]:
numeric_summary = data.describe().T
numeric_summary["number_non_null"] = data.count()
numeric_summary["percent_non_null"] = ((data.count()/data.shape[0]) * 100).round(2)

numeric_summary

In [None]:
## Univariate Visual Summaries

### Attribute: Country

I created a histogram of records (i.e. rows) per country, in order to **characterize distributions**, in particular, to understand how observations are distributed across countries, and which countries might be over- or under-represented in the dataset. The number of records (i.e. rows) per country is highly variable, as shown in the figure below.

In [None]:
alt.Chart(data).mark_bar(size = 3).encode(
   x = alt.X("count():Q"),
   y = alt.Y("Country:N", axis = alt.Axis(labelFontSize = 8), sort = "-x"),
   color = alt.value("#238A8D")
).properties(title = "Records per Country", width = 400, height = 1800)

### Attribute: Year

I also created a histogram of records (i.e. rows) per year, in order to **characterize the distribution** of observations across years, and to understand the volume of data by year. The dataset includes more data for more recent years, notably since 1900, as shown below.

In [None]:
alt.Chart(data).mark_bar(size = 3).encode(
   x = alt.X("Year", title = "Year"),
   y = alt.Y("count()"),
   color = alt.value('#D4AF37')
).properties(title = "Records per Year", width = 800)

### Attribute: High-Level Indices

I created boxplots of the five high-levels democracy indices in order to **characterize distributions**, **compare groups**, **find extremums** and **find anomalies (outliers)**.

Notably, participatory democracy and liberal democracy indices have a large number of what one might consider outliers. Distributions of each index are generally similar. This is somewhat expected, as each index is constrained to a value between 0 and 1.

In [None]:
alt.Chart(data).transform_fold(high_level_indices, as_ = ["Index", "Value"]
).mark_boxplot(size = 12).encode(
    x = alt.X("Index:N", title = "High-level Democracy Indices"),
    y = alt.Y("Value:Q", title = "Index Score"),
    color = alt.Color("Index:N", legend = None, scale = alt.Scale(scheme = "viridis"))
).properties(title = "Distribution of High-Level Democracy Indices", width = 300, height =  400)

I also create histograms of for each of the five high-level indices. The aim of these representations was to **characterize the distributions** of *values* for each index.

Examining the distributions of all types of democracy scores (our five high-level indices), we observe the distributions are right-skewed.

In [None]:
Electoral = alt.Chart(data).mark_bar(size = 2).encode(
   x = alt.X("Electoral Democracy Index", 
             title = "Electoral Democracy Index", 
             bin = alt.Bin(maxbins=100),
             axis = alt.Axis(format=".1f"),
             scale = alt.Scale(domain=[0,1])),
   y = alt.Y("count()"),
   color = alt.value('#7E57C2')
).properties(title = "Distribution of Democracy Scores, by Democracy Type", width = 300, height = 100)

Liberal = alt.Chart(data).mark_bar(size = 2).encode(
   x = alt.X("Liberal Democracy Index",
             title = "Liberal Democracy Index",
             bin = alt.Bin(maxbins = 100),
             axis = alt.Axis(format = ".1f"),
             scale = alt.Scale(domain = [0,1])),
   y = alt.Y("count()"),
   color = alt.value('#FF6347')
).properties(height = 100)

Participatory = alt.Chart(data).mark_bar(size = 2).encode(
   x = alt.X("Participatory Democracy Index", 
             title = "Participatory Democracy Index",
             bin = alt.Bin(maxbins = 100),
             axis = alt.Axis(format = ".1f"),
             scale = alt.Scale(domain = [0,1])),
   y = alt.Y("count()"),
   color = alt.value('#1E90FF')
).properties(height = 100)

Deliberative = alt.Chart(data).mark_bar(size = 2).encode(
   x = alt.X("Deliberative Democracy Index",
             title = "Deliberative Democracy Index",
             bin = alt.Bin(maxbins = 100),
             axis = alt.Axis(format = ".1f"),
             scale = alt.Scale(domain = [0,1])),
   y = alt.Y("count()"),
   color = alt.value('#FFA500')
).properties(height = 100)

Egalitarian = alt.Chart(data).mark_bar(size = 2).encode(
   x = alt.X("Egalitarian Democracy Index",
             title = "Egalitarian Democracy Index",
             bin = alt.Bin(maxbins = 100),
             axis = alt.Axis(format = ".1f"),
             scale = alt.Scale(domain = [0,1])),
   y = alt.Y("count()"),
   color = alt.value('#2ECC71')
).properties(height = 100)

# Note: One could use .repeat() here, but in order to provide the option of breaking 
# out by individual chart, in case this would be useful later, I elect to repeat a base chart.

(Electoral & Liberal & Participatory & Deliberative & Egalitarian).resolve_scale(x='shared')

### Attribute: Mid-Level Indices

Similar to my approach for high-level indices, I created boxplots of the 21 mid-levels democracy indices in order to **characterize distributions**, and **compare groups**, **find extremums** and **find anomalies (outliers)**.

Across mid-level indices we see much greater variablity than we do across high level indices.

In [None]:
alt.Chart(data).transform_fold(mid_level_indices, as_= ["Index", "Value"]
).mark_boxplot(size=8).encode(
    x = alt.X("Index:N", title = "Mid-level Democracy Indices", axis = alt.Axis(labelAngle = -45)),
    y = alt.Y("Value:Q", title = "Score"),
    color = alt.value("#2A9DF4")
).properties(title = "Distribution of Mid-Level Democracy Indices", width = 600, height = 400)

Next, I use histograms to **characterize the distribution** of values for mid-level indices.

In [None]:
polyarchy_charts = alt.Chart(data).mark_bar(size = 2).encode(
    x = alt.X(alt.repeat("column"), type = "quantitative", bin = alt.Bin(maxbins = 100)),
    y = alt.Y("count()"),
    color = alt.value("#9D4EDD")
).properties(width = 200, height = 100).repeat(column = [
        "Additive Polyarchy Index",
        "Multiplicative Polyarchy Index"])

polyarchy_charts

Examining the leftmost visualization above, we see a range of additive polyarchy values, as one might expect. The distribution is right-skewed, i.e. has a long right tail, indicating that most records, i.e. most countries at most points in time tend to have lower democracy scores.

Examining the right visualization, we see many values with a score of 0. The y axis scale makes it difficult to see values other than zero. In order to examine non-zero values, we replot using a log scale.

In [None]:
multiplicative_polyarchy_chart = alt.Chart(data).mark_bar(size = 2).encode(
    x = alt.X("Multiplicative Polyarchy Index:Q", bin = alt.Bin(maxbins = 100)),
    y = alt.Y("count()", scale = alt.Scale(type = 'log')),
    color = alt.value("#9D4EDD")
).properties(width = 200, height = 400)

multiplicative_polyarchy_chart

We see that most records have a "Multiplicative Polyarchy Index" of 0. The multiplicative polyarchy index combines several component scores, including e.g. freedom of expression and freedom of association. The reason we see so many 0 values is because, not surprisingly, the multiplicative index *multiplies* it's component scores together, meaning that if any one component score is 0, a country's score will be 0 (even if other component scores are high).

We continue with histograms of values for each of the remaining mid-level indices.

From the plots below, we observe the following:

- Some indices (e.g. Clean Elections, Direct Popular Vote, Regional Government) have a high number of 0 or very near 0 values. This indicates most countries have weak performance on those indices.

- Share of Population with Suffrage is close to a binary distribution. This may be because voting is typically broadly allowed, or not present in a particular country.

- Most indices are right-skewed, i.e. have a long right tail.

In [None]:
mid_level_chart = alt.Chart(data).mark_bar(size = 3).transform_fold([
          "Freedom of Expression and Alternative Sources of Information Index",
          "Freedom of Association Thick Index",
          "Share of Population with Suffrage",
          "Clean Elections Index",
          "Elected Officials Index",
          "Liberal Components Index",
          "Equality Before the Law and Individual Liberty Index",
          "Judicial Constraints on the Executive Index",
          "Legislative Constraints on the Executive Index",
          "Participatory Component Index",
          "Civil Society Participation Index",
          "Direct Popular Vote Index",
          "Local Government Index",
          "Regional Government Index",
          "Deliberative Component Index",
          "Egalitarian Component Index",
          "Equal Protection Index",
          "Equal Access Index",
          "Equal Distribution of Resources Index"
      ], as_ = ["attribute", "value"]
	  ).encode(
          x = alt.X("value:Q", bin = alt.Bin(maxbins = 40), axis = alt.Axis(format = ".1f"), title = None),
          y = alt.Y("count()"),
          color = alt.value("#10B981")
        ).properties(width = 200, height = 100,  
        ).facet(facet = alt.Facet("attribute:N", title = None), columns = 4
		).resolve_scale(y = "independent", x = 'independent')

mid_level_chart

### Attribute: Academic-Freedom Index and Related Attributes

Since our analysis may include questions related to academic attributes, we examined these variables through exploratory data analysis.

We begin by creating a histogram to explore academic freedom index, with the aim of **characterizing the distribution** and **finding extremes and anomalies**.

In [None]:
academic_freedom_index = alt.Chart(data).mark_bar(size = 3).encode(
    x = alt.X('Academic Freedom Index:Q', bin = alt.Bin(maxbins = 50)),
    y = alt.Y("count()"),
    color = alt.value("#9B2C2C")
).properties(width = 350, height = 300)

academic_freedom_index

Using a similar approach with a similar goal, we continue with histograms exploring the attributes that constitute the academic freedom index.

In [None]:
academic_charts_base = alt.Chart(data).mark_bar(size = 3).encode(
    x = alt.X(alt.repeat("column"), type = "quantitative", bin = alt.Bin(maxbins = 50)),
    y = alt.Y("count()"),
    color = alt.value("#9B2C2C")
).properties(width = 150, height = 100)

academic_charts = academic_charts_base.repeat(column = academic_attributes)

academic_charts

## Multivariate Visual Data Analysis

### SPLOM of High-Level Indices

In order to explore how high-level attributes might **correlate**, we create a SPLOM.

In [None]:
high_level_indices = [
        "Electoral Democracy Index",
        "Liberal Democracy Index",
    	"Participatory Democracy Index",
    	"Deliberative Democracy Index",
    	"Egalitarian Democracy Index"
]

splom = alt.Chart(data).mark_point(filled = True, size = 2, opacity = 0.1
).encode(
    x = alt.X(alt.repeat('column'), type = 'quantitative'),
    y = alt.Y(alt.repeat('row'), type = 'quantitative'),
	color = alt.value('purple')
).properties(width = 150, height = 150
).repeat(row = high_level_indices, column = high_level_indices)

splom

Each of the high-level indices shows a roughly logarithmic relationship with the electoral democracy index.

At low levels of electoral democracy, small improvements in related dimensions (like the components of liberal, deliberative, participative and egalitarian democracy) are correlated with large increases in electoral democracy. At higher levels, progress is associated with smaller visible gains in electoral democracy (the curves flatten).

The Electoral Democracy Index reflects how fair and inclusive elections are, including things like suffrage, vote counting, and opposition freedom. These things may depend on the conditions that make it possible to hold fair elections in the first place, and these conditions might be reflected by some components of other indices (rule of law, equality, free expression, civil society). It may be that the electoral index tends to improve *after* those enabling conditions improve, resulting in the logarithmic curve.


## Academic Freedom Analysis

### SPLOM of Academic Attributes

In order to explore how the attributes that comprise the academic freedom index might **correlate**, we create a SPLOM.

In [None]:
alt.Chart(data).mark_point(opacity = 0.1, size = 1).encode(
    x = alt.X(alt.repeat('column'), type = 'quantitative'),
    y = alt.Y(alt.repeat('row'), type = 'quantitative'),
    color = alt.value('#FA8072')
).properties(width = 200, height = 200).repeat(row = academic_attributes, column = academic_attributes)

### Freedom to Research and Teach by Country Over Time

In order to **correlate**, **compare**, and **find anomalies**, below I create a faceted scatter-plot. 
The plots show how academic freedom evolves over time, across countries. 
The plots illustrates countries with stable (e.g. Austrailia) and volatile trajectories (e.g. Bolivia).

In [None]:
alt.Chart(data).mark_point(opacity = 0.2).encode(
    y = alt.Y('Freedom to research and teach:Q', title = 'Index Value'),
    x = alt.X('Year:T'),
    color = alt.value('#10B981')
).properties(width = 120, height=100
).facet(facet = alt.Facet('Country:N', title = "Freedom to Research and Teach By Country"), columns = 7).resolve_scale(x = 'independent')

In [None]:
## High-Level Indices Across Countries Over Time

Finally, in order to explore trends over time by country, we create a series of line graphs.

These support the Stasko task of **correlating** one or more attributes, in this case, the democracy index and year.

We make several observations:

- The egalitarian and deliberative democracy index values are not available prior to 1900.

- These plots are very difficult to read, with too much data in a small area, given 100 plus countries in the dataset. Future exploration could explore plotting by region instead of country.

In [None]:
alt.Chart(data).mark_line(opacity = 0.7).encode(
    x = alt.X('Year'),
    y = alt.Y('Liberal Democracy Index:Q', title = 'Liberal Democracy Index'),
    color = alt.Color('Country:N'),
).properties(title = 'Change in Liberal Democracy over Time by Country', width = 500, height = 300)

In [None]:
alt.Chart(data).mark_line(opacity = 0.7).encode(
    x = alt.X('Year'),
    y = alt.Y('Egalitarian Democracy Index:Q', title = 'Egaliatarian Democracy Index'),
    color = alt.Color('Country:N'),
).properties(title = 'Change in Egaliatarian Democracy over Time by Country', width = 500, height = 300)

In [None]:
alt.Chart(data).mark_line(opacity = 0.7).encode(
    x = alt.X('Year'),
    y = alt.Y('Electoral Democracy Index:Q', title = 'Electoral Democracy Index'),
    color = alt.Color('Country:N'),
).properties(title = 'Change in Electoral Democracy over Time by Country', width = 500, height = 300)

In [None]:
alt.Chart(data).mark_line(opacity = 0.7).encode(
    x = alt.X('Year'),
    y = alt.Y('Deliberative Democracy Index:Q', title = 'Deliberative Democracy Index'),
    color = alt.Color('Country:N'),
).properties(title = 'Change in Deliberative Democracy over Time by Country', width = 500, height = 300)

In [None]:
alt.Chart(data).mark_line(opacity = 0.7).encode(
    x = alt.X('Year'),
    y = alt.Y('Participatory Democracy Index:Q', title = 'Participatory Democracy Index'),
    color = alt.Color('Country:N'),
).properties(title = 'Change in Participatory Democracy over Time by Country', width = 500, height = 300)