Complexity-Index

Provided are R (calc_index.R) and Stata (calc_index.do) files to calculate complexity indices from Quantifying Lottery Choice Complexity (Enke & Shubatt 2023):

OPC: Objective Problem Complexity
SPC: Subjective Problem Complexity
OAC: Objective Aggregation Complexity
SAC: Subjective Aggregation Complexity
OLC: Objective Lottery Complexity
SLC: Subjective Lottery Complexity

Depending on the Input, the tool calculates these indices and saves them in the output folder, including the necessary features to obtain these indices.

Input

The code automatically recognizes the number of lotteries in the input data and therefore calculates either complexity indices of the choice problem and each lottery (see subsection Choice Complexity or just the complexity of the lottery if just one lottery is supplied (see subsection Lottery Complexity).

Max number of lotteries: 2
Max number of states per lottery: 9
Payout value is ignored if its probability is 0
Payouts should be distinct in each lottery
Problem ID is optional and should be indicated by the column name problem.
CSV format
Not existing probabilities and states can be indicated by ,"",, ,"NA", or ,,
Additional columns can be in the input data and will not be manipulated by code, as long as it does not match the pattern x_, p_, _a_, _a, _b_, _b and cor_

Choice Complexity

If two lotteries are supplied, the column names should be as displayed in the table below. The columns indicating payouts should take the form x_{l}_{i}, where $l \in {a,b}$ indicates which lottery the payout belongs to, and $i$ indexes the lottery states. Similarly, the column names indicating probabilities should take the form p_{l}_{i}. The state index $i$ should take values between 1 and $k_{{l}}$, where $k_{{l}}$ is the maximum number of states in any of the $l$ lotteries. As written, the code can process a maximum of $k_{{l}} = 9$ distinct states. If a payout is repeated in a lottery, they will be treated as separate states. To calculate the complexity-indices in (Enke & Shubatt 2023), we always first collapsed problems to treat same payouts as same state. If both lotteries take on a maximum of two states, then you would include the columns x_a_1, x_a_2, p_a_1, p_a_2, and similarly for $b$. See sample sample_all_indices_calculation_1.csv or sample_all_indices_calculation_2.csv in the sample_data folder for an example of a correct input format.

[problem]	x_a_1	x_a_2	x_a_3	p_a_1	p_a_2	p_a_3	x_b_1	x_b_2	x_b_3	p_a_1	p_a_2	p_a_3	[compound]	Any
1	10	5		0.5	0.5		3			1			0	other
2	2	4		0.3	0.7		2			1			1	column
3	1	2	3	0.2	0.2	0.6	1	2	3	0.2	0.5	0.3	1	possible

Compound

An additional optional column can be used to indicate that one of the lotteries is a compound lottery. However, we use a very specific type of compoundness in our estimation procedure. Specifically, we considered two-state lotteries where the probabilities p_{l}_1 and p_{l}_2 were not given explicitly to participants. Instead we told them that the probability of the first state $p$ would be drawn randomly from a uniform distribution on the interval $[p_{\min}, p_{\max}]$, and we varied the value of $p_{\min}$ and $p_{\max}$. The probability of the second state is then, of course, given by $1-p$. We allowed for at most one of the two options to have this type of compoundness. The compound indicator on which the complexity index model is trained only uses this definition of compoundness; we cannot speak to its robustness to alternative definitions. If you do not include compound in your column names, it will automatically be set to False for all problems.

Lottery Complexity

If only one lottery is supplied, the lottery column names can be either as displayed above (using only the _a_ columns and no _b_ columns). Alternatively, the _a_ segment may be omitted, in which case payoff columns will take the form x_{i} and probability columns will take the form p_{i}. Again, $i$ will range between 0 and $k$, the largest number of distinct states in any lottery; the code can handle a maximum value of $k = 9$. For an example of the correct input format, see sample_just_OLC_SLC_calculation_1.csv or sample_just_OLC_SLC_calculation_2.csv in the sample_data folder.

Output

The results will be saved in output with index_calculated_R.csv or index_calculated_stata.csv depending on which script you run, including features which are necessary for the calculations. (Additionally .RData and .dta, are saved, depending on the executed script).

Choice Complexity

The features for each index are the following. Please see Section 4 for the development of the indices and appendix Potential Complexity Features for details about feature definition in Quantifying Lottery Choice Complexity.
Consider a choice between two lotteries indexed by $j$ and denoted by letters $A$ and $B$. Each lottery is characterized by payout probabilities $(p_1^j,...p^j_{k_j})$ and payoff $(x_1^j,...x^j_{k_j})$ where $k_j$ denotes the number of distinct payout states of lottery $j$.

Features of `OPC` and `SPC`

Log excess dissimilarity (ln_excess_dissimilarity):
When $F_A(x)$ and $F_B(x)$ are the CDFs of Lottery A and B with EV(.) indicating the expected value of a lottery then Log excess dissimilarity is defined as

$$log\Big( 1+\int_\mathbb{R} |F_A(x) - F_B(x)|dx - |EV(A) - EV(B)|\Big) $$

No dominance (no_dominance):
$$\exists x_1 , x_2: F_A(x_1) < F_B(x_1) \land F_A(x_2) >F_B(x_2) $$
Average log payout magnitude (ave_ln_payout_magn):
$$\frac{1}{2} \Big [ log \Big(1 + 1/k_A \sum_{s=1}^{k_A} |x_s^A| \Big) + log \Big (1 + 1/k_B \sum_{s=1}^{k_B} |x_s^B|) \Big) \Big ]$$
Average log number of states (ave_ln_num_states_a):
$$\frac{log(1 + k_A) + log(1 + k_B)}{2}$$
Frac. lotteries involving loss (frac_involves_losses)
If one lottery in the choice is compound (according to definition above)
Absolute expected value difference (abs_ev_diff):
$$|EV(A) - EV(B)|$$
Absolute expected value difference squared (abs_ev_diff_sq):
$$|EV(A) - EV(B)|^2$$

Features of `OAC` and `SAC`

As above but without abs_ev_diff and abs_ev_diff_sq features.

Lottery Complexity

If just one lottery is supplied the lottery complexity is calculated (OLC/SLC). In principle, the ordering of the output is the same as for the Choice Complexity output. However, payouts and probabilities are now named x_1, x_2, ... p_1, p_2. Additionally, the features and indices don't have the appendix _a as just one lottery is in the dataset.

Features of `OLC_a/b` and `SLC_a/b`

The following defines features for both lotteries indicated with $j\in {A,B }$.

Log Variance (ln_variance_a/b):
$$log\Big ( 1+ \sum_{s=1}^{k_j} p_s^j(x_i^j)^2 -( \sum_{s=1}^{k_j} p_s^jx_s^j)^2 \Big)$$
Log payout magnitude (ln_payout_magn_a/b):
$$log\Big( 1 + 1/k_j\sum_{s=1}^{k_j}|x_s^j| \Big)$$
Log number of states (ln_num_states_a/b):

$$ log \Big ( 1 + k_j \Big )$$

1 if involves loss (involves_loss_a/b)
1 if involves compound probability(compound)

Running the R Script

The R Script calc_index.R runs by using the isolated environment stored in renv. In order to activate this environment, you will first need to ensure that R and the package renv are installed on your machine. In the R console, navigate to the project directory and run renv::restore(), to automatically install all required packages by using the renv.lock file. Then calling renv::activate() should be all you need to do to be ready to run the R code. All necessary code to activate your package environment can be found at the beginning of calc_index.R within the first if brackets.

Run calc_index.R. Which indices will be computed is automatically determined by the supplied data (See Section Input)

Running the Stata Script

Set in global root = "" a working directory
To the best of our knowledge, we included all necessary ssc install commands in the the beginning of the calc_index.do Stata Script
Run calc_index.do. Which indices will be computed is automatically determined by the supplied data (See Section Input).

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
coef		coef
helpers		helpers
output		output
renv		renv
sample_data		sample_data
.RData		.RData
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calc_index.R		calc_index.R
calc_index.do		calc_index.do
renv.lock		renv.lock

License

cshubatt/Complexity-Index

Folders and files

Latest commit

History

Repository files navigation

Complexity-Index

Input

Choice Complexity

Compound

Lottery Complexity

Output

Choice Complexity

Features of OPC and SPC

Features of OAC and SAC

Lottery Complexity

Features of OLC_a/b and SLC_a/b

Running the R Script

Running the Stata Script

About

Resources

License

Stars

Watchers

Forks

Languages

Features of `OPC` and `SPC`

Features of `OAC` and `SAC`

Features of `OLC_a/b` and `SLC_a/b`