## Introduction

Similar to Ponge at al 2021 [1], our algorithm is configurable to improve the efficiency of population synthesis. In Ponge et al 2021, the authors propose to set to 0 certain values in contingency tables which are used in Iterative Proportional Fitting algorithm.

![image.png](attachment:image.png)

In our algorithm there is a JSON configuration file which is described below.

### JSON config file

The JSON is a dictionary with 2 obligatory keys `no_ipf` and `ipf_forced_attribute`:

```{
    "no_ipf": "SOME VALUE",
    "ipf_forced_attributes": "SOME VALUE"
}```

The config file works only if both keys are provided.

### Key `ipf_forced_attributes`

Key `ipf_forced_attributes` works in a similar way as setting seed cells to 0 in Ponge at al 2021 [1]. It allows to enforce certain attribute combinations. It can be configured two-fold:

**Option 1 - missing**

```
{
    "no_ipf": "SOME VALUE",
    "ipf_forced_attributes": "missing"
}
```

If value is `missing`, then the configuration is disabled (i.e. we do not enforce any rules for the algorithm).

**Option 2 - configuration is added**

```
{
    "no_ipf": "SOME VALUE",
    "ipf_forced_attributes": 
    [
        {
            "if": 
            {
                "attribute1": [attribute_values...]
            }
            "then":
            {
                "attribute2": [attribute_values...]
            }
        }
    ]
}
```

Value for `"ipf_forced_attributes"` should be a list of dictionaries. Each dictionary has keys: `"if"` and `"then"`. Values of these dictionaries are always dictionaries with an attribute name and attribute values.

**How it works?**
With this config we force some attribute combinations to be implemented: **if** `attribute1` has certain values, **then** `attribute2` can only have values from certain list.

### Key `no_ipf`

Key `no_ipf` helps to tackle a problem with missing marginal distributions for some attributes. For example if we have a contingency table `age_sex` for the age of 0-100 but the table `income` from census shows income distribution for people aged 15-64, we should set the income to *missing* for people aged 0-15 and 64-100.  
This config can work 2-fold:

**Option 1 - missing**

```
{
    "no_ipf": "missing",
    "ipf_forced_attributes": "SOME VALUE"
}
```

If value is `missing`, then the configuration is disabled (i.e. we do not enforce any missing values).

**Option 2 - configuration is added**

```
{
    "ipf_forced_attributes": "SOME VALUE",
    "no_ipf": 
    [
        {
            "if": 
            {
                "attribute1": [attribute_values...]
            }
            "then":
            {
                "attribute2": "missing"
            }
        }
    ]
}
```

Value for `"ipf"` should be a list of dictionaries. Each dictionary has keys: `"if"` and `"then"`. Values of `"if"` dictionaries are always dictionaries with an attribute name and attribute values. Value of `"then"` dictionary is always `"missing"`.

**How it works?**
With this config we set some attribute combinations to missing: **if** `attribute1` has certain values, **then** `attribute2` is missing.

### Example 1

1a. All people aged 5,10,15,20 will have maritial status `not_married`.

1b. All females will have attribute `appearance` set to `"Beautiful"` or `"Very Beautiful"`.  

2a. If the person is 80, 90 or 100 years old, their work industry will be `missing`.

```
{
    "ipf_forced_attributes": : 
    [
        {
            "if": 
            {
                "age": [5, 10, 15, 20]
            }
            "then":
            {
                "maritial_status": ["Not_married"]
            }
        },
        {
            "if": 
            {
                "sex": ["F"]
            }
            "then":
            {
                "appearance": ["Beautiful", "Very Beautiful"]
            }
        }
    ]
    "no_ipf": 
    [
        {
            "if": 
            {
                "age": [80, 90, 100]
            }
            "then":
            {
                "work_industry": "missing"
            }
        }
    ]
}
```

### Example 2

No rules are provided so the algorithm will work as if the optional argument `config_file` was not provided.

```
{
    "no_ipf": "missing",
    "ipf_forced_attributes": "missing"
}
```

### Reference

[1] Ponge, J., Enbergs, M., Schüngel, M., Hellingrath, B., Karch, A., & Ludwig, S. (2021, December). Generating synthetic populations based on german census data. In 2021 Winter Simulation Conference (WSC) (pp. 1-12). IEEE.