# Tutorial: CSV2JSON

The _CSV2JSON_ class is used to transform metadata from a CSV to JSON format that can be used in further analysis.

Author: Andreas Lüschow

Last updated: 2021/07/28

-----

## Import

Import the appropriate class from __Bibliometa__:

In [1]:
from bibliometa.conversion import CSV2JSON

As you can see from the following output, the _CSV2JSON_ class has a lot of built-in functions:

In [2]:
dir(CSV2JSON)

['__abstractmethods__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_abc_impl',
 '_save_results',
 '_update_config',
 'get_config',
 'set_config',
 'start']

We are only interested in the public methods, so let's have a look at them:

In [3]:
[m for m in dir(CSV2JSON) if not m.startswith('_')]

['get_config', 'set_config', 'start']

The usage of _CSV2JSON_ class is quite simple: There are two methods to work with the class configuration, and only one method to actually start the conversion process.

-----

## Configuration

Most __Bibliometa__ classes come with already predefined configuration for their class attributes. In this case, you can see the default configuration using the _get_config()_ function on a new _CSV2JSON_ object. So let's create an object first:

In [4]:
c2j = CSV2JSON()

### Input, output, year range

And now let's have a look at the default configuration values:

In [5]:
c2j.get_config()

('i', None)
('o', None)
('from_', None)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

As a shortcut, you can also simply print out the object itself which will return a representation of the configuration values:

In [6]:
c2j

('i', None)
('o', None)
('from_', None)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

There are a lot of configuration options, let's go through them step by step.

__i__ (str): Input CSV file
* Path to the CSV file that will be converted.

__o__ (str): Output JSON file
* Path to the JSON file that will be created. If the path contains folders that are not existent yet, they will be created during the conversion process.

**from_** (int): Year where conversion starts
* Data from the input file are processed year by year. Only those data sets that are within a certain interval are considered in the conversion process. The _from__ parameter (mind the trailing underscore!) is used to define the "starting year", i.e., the "oldest" year that is respected in the conversion process.

__to__ (int): Year where conversion ends
* This is the last year that is considered in the conversion process.

__step__ (int): Interval between two years
* Using this parameter you can define how many "year slices" the conversion will produce. For example, let's assume that *from_* == 1750 and _to_ == 1850. Setting the parameter _step_ to 10 would create a JSON file for 1750, 1760, 1770, ... up to 1850 each. Setting _step_ to 25 would create JSON files for 1750, 1775, 1800, ... up to 1850. If you need only one single year, you have to set both *from_* and _to_ to the same year, _step_ will have no effect then. However, the parameter _step_ has to be always greater than zero; otherwise an error will be thrown.

At this point, let's try to change a configuration parameter using the _set_config()_ function. After each function call the current configuration is printed out automatically to check if your changes worked as expected:

In [7]:
c2j.set_config(from_=500)

('i', None)
('o', None)
('from_', 500)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

Calling the _set_config()_ function with keyword arguments allows you to change the configuration parameters according to your needs. Since it is a bit cumbersome to find the parameters in the output above, you can also use keyword arguments with the _get_config()_ function to check for specific configuration parameters:

In [8]:
c2j.get_config("from_")

('from_', 500)

In [9]:
c2j.get_config("from_", "to", "step") 

('from_', 500)
('to', None)
('step', 10)

As you can see, working with configuration parameters is quite simple.

In [10]:
c2j.get_config("i")

('i', None)

In [11]:
c2j.set_config(i="../data/my_own_data.csv")
c2j.get_config("i")

('i', '../data/my_own_data.csv')

Actually, if you know the parameter you want to change, you can also set and get configuration parameters using a dot notation. This is the preferred way if you need to change or access only a single parameter value, since the output does not include the parameter key itself:

In [12]:
c2j.config.i = "../data/my_very_own_data.csv"
c2j.config.i

'../data/my_very_own_data.csv'

However, if you need to change or access more than one configuration parameter, using the _set_config()_ and _get_config()_ functions is the way to go.

### Field definitions

Bur for now let's go back to explaining the remaining configuration parameters.

In [13]:
c2j.get_config()

('i', '../data/my_very_own_data.csv')
('o', None)
('from_', 500)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

__fields__ (list of dict): Fields and subfields to consider
* This parameter defines which columns from the input data will be converted to JSON. Each field/subfield combination needs to be represented in a single dictionary with the keys _content_, _type_, and _categories_:

    `{'content': ('515', 'a'),
    'type': ('515', '0'),
    'categories': ['actv']}`
    
    These three keys need to have a unique representation in the input CSV. In the example above, this means that we need a column containing values from field 515, subfield a, another column with values from field 515, subfield 0, and that content in field 515, subfield 0, has to be identical to a value from the "categories" list to be considered for conversion.
    
__subfield_sep__ (str): Separator between fields and subfields
* This separator is used to combine field and subfield values to a single string. For example, if your input CSV contains columns such as "515\\$a" and "515\\$0", the dollar sign \\$ is your subfield separator.

__split_char__ (str): Character between values in cells
* This/These character(s) is/are used to distinguish different values in the same CSV cell. 

__csv_sep__ (str): CSV separator
* The seperator that is used between single CSV fields (usually something like "\t" or "," or ";").

To understand the explanations above let's have a look into the example input CSV file that comes with the tutorial (using standard python code and the pandas library):

In [14]:
import pandas as pd

path = "../data/examples/demo.csv"  # This will later be our input file (parameter "i")
df = pd.read_csv(path, sep="\t")  # There's the CSV separator
df.head()

Unnamed: 0,id,name,515$a,515$0,515$z,350$0,350$a,340$x
0,cnp01448516,Laynborgh ### Ewald ### de ###,Schillingen ###,actv ###,,acti ###,Pastor ###,
1,cnp01449196,Rijfrock de Grymalscheit ### Johann ###,Wijs ###,actv ###,,acti ###,Pfarrer ###,
2,cnp01449439,Sauerborn ### Ludwig ###,Koblenz ###,actv ###,,,,
3,cnp01448445,Nikolaus ###,Ehrang ###,actv ###,,acti ###,Pastor ###,
4,cnp01449826,Boppard ### Reinhard ### von ###,Mosbach ###,actv ###,,acti ###,Pfarrer ###,


You can see nine columns:
* index column of the Pandas DataFrame (which has no name)
* id 
* name
* 515\\$a (mind the dollar sign as subfield separator!)
* 515\\$0
* 515\\$z
* 350\\$0
* 350\\$a
* 340\\$x

You can also see that mutiple values in a cell are divided by the string " ### ", which is our _split_char_ parameter.

In [15]:
c2j.get_config("split_char")

('split_char', ' ### ')

The _fields_ parameter in our configuration looks as follows:

In [16]:
c2j.get_config("fields")

('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])

Looking at the example CSV, this means that values in column "515\\$a" ("content") are considered during the conversion process only if the corresponding value in column "515\\$0" ("type") has the value "actv" ("categories).

The following row would thus be ignored (it contains only "brth" and "deat" values in column "515\\$0"):

In [17]:
df[df["id"] == "cnp01300387"]

Unnamed: 0,id,name,515$a,515$0,515$z,350$0,350$a,340$x
12,cnp01300387,Randon ### Claudius ###,Pontoise ### Rom ###,brth ### deat ###,,acti ###,Künstler ###,340 01$8ger$aum 1674 - nach 1704$xa1674a1704 ###


However, this row would be converted:

In [18]:
df[df["id"] == "cnp02161976"]

Unnamed: 0,id,name,515$a,515$0,515$z,350$0,350$a,340$x
26,cnp02161976,Vila ### Joan ### 1515?-1597,Barcelona ### Cervera ### Vic ###,actv ### actv ### deat ###,,acti ### prof ### prof ### prof ### prof ### p...,Teologia ### Bisbes ### Canonges ### Catedràti...,340 01$8cat$a1515?-1597$xa1515a1597 ###


To be more pecise, only the two values "Barcelona" and "Cervera" from column "515\\$a" would be considered in the conversion, since they are the only ones with a corresponding "actv" value in column "515\\$0".

To be even more precise, this row from the CSV file would only be considered during conversion if a year between 1515 and 1597 is used in the conversion. How can we know? For this, we have to look at two other configuration parameters, _datefield_ and _date_indicator_.

### Dates

In [19]:
c2j.get_config()

('i', '../data/my_very_own_data.csv')
('o', None)
('from_', 500)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

__datefield__ (tuple): Field and subfield of date information
* The example configuration defines that information about dates for a single row can be found in column "340\\$x"

__date_indicator__ (list of str): Indicators in datefield that are accepted
* Two values are possible: "0" means that biographical dates are considered; "1" means that activity dates are considered.

__interval_lower__ (int): Lower interval for single dates
* This parameter defines up to which lower bound data sets from the input CSV are considered for conversion if there is only a single year value available in the date column (see below for an example).

__interval_upper__ (int): Upper interval for single dates
* See the explanation for the previous parameter; here, the upper bound is defined.

You can see the date indicator at position 5 in the datefield column after the field number 340 and a space:

In [20]:
df[df["340$x"].notna()]["340$x"]

5                        340 11$8ger$a12. Jahrhundert ### 
7                                   340 11$8ger$a1161 ### 
8                        340 11$8ger$a13. Jahrhundert ### 
11                 340 01$8cat$a1803-1883$xa1803a1883 ### 
12       340 01$8ger$aum 1674 - nach 1704$xa1674a1704 ### 
                               ...                        
73638                   340 11$8ger$a1734$xa1734u     ### 
73639    340 01$8ger$a10.08.1775-14.11.1830$xa1775a1830...
73640    340 01$8ger$a29.09.1803-10.04.1872$xa1803a1872...
73641    340 01$8ger$a25.12.1821-09.05.1874$xa1821a1874...
73642              340 01$8ger$a1487-1558$xa1487a1558 ### 
Name: 340$x, Length: 72915, dtype: object

You can also see the values of subfield "\\$x" For example, in row with ID 11 this subfield has the value "a1803a1883" which means that the person represented in this data set lived from 1803 to 1883. Hence, this row would only be considered during conversion if a year between 1803 and 1883 is used in the configuration. (Which would be the case if we would set *from_* == 1800 and _to_ == 1825, or *from_* == 1700 and _to_ == 1850, or even *from_* == 1883 and _to_ == 1884 etc.)

In row with ID 73638, there is only one year in subfield "\\$x" available: 1734, which is the begin of an activity (because the date indicator is set to == 1 in this row). In these cases, the data set is only considered for conversion if "1" is given in parameter _datefield_ and if the value in subfield "\\$x" is within the interval defined by the current conversion year and the _intervall_lower_ and _interval_upper_ parameters.

Row with ID 73638 would hence be considered in the following example cases:
* *from_* == 1700, _to_ == 1800, _step_ == 1, _interval_lower_ == 0, _interval_upper_ == 0 (because _step_ == 1, which means that every single year between 1700 and 1800 in considered)
* *from_* == 1700, _to_ == 1800, _step_ == 10,  _interval_lower_ == 5, _interval_upper_ == 5 (because 1734 is within the interval 1730 (+/-5 years))
* *from_* == 1730, _to_ == 1740, _step_ == 2,  _interval_lower_ == 0, _interval_upper_ == 0 (because 1734 is already considered by using a _step_ == 2 from 1730 to 1740)
* *from_* == 1700, _to_ == 1750, _step_ == 25,  _interval_lower_ == 0, _interval_upper_ == 10 (because 1734 is within the upper 10-year-range of year 1725)

### Logging, encoding, verbose

There are only a couple of configuration parameters left, let's have a look at them.

In [21]:
c2j.get_config()

('i', '../data/my_very_own_data.csv')
('o', None)
('from_', 500)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

__log__ (str): Path to log file
* The conversion process and its errors are documented in a log file. If _verbose_ == True (see below), the logging information is also shown on standard output if its level is _log_level_std_ or above.

__log_level_std__ (str): Logging level considered for standard output
* Only log messages with this level (or above) are shown on the standard output. This parameter has no effect if _verbose_ == False. Possible severity levels can be found in the documentation of the logging package `loguru`: https://loguru.readthedocs.io/en/stable/api/logger.html

__log_level_file__ (str): Logging level considered for log file
* Only log messages with this level (or above) are shown in the log file.

__verbose__ (bool): Show detailed information on standard output
* Whether logging information is not only written to the log file but also shown on the standard output.

__encoding__ (str): File encoding
* File encoding of input and output files. The default value is "utf-8" and there is usually no need to change this.

-----

## The Conversion Process

Configuration parameters can already be passed when a _CSV2JSON_ object is constructed. For some classes in __Bibliometa__ this may be useful when a verbose output of the class initialization is desired. (However, in the case of the class _CSV2JSON_ there is nothing that can be shown, so passing the configuration parameters during initialization or afterwards makes no difference.)

We start with creating a new _CSV2JSON_ object that has the standard configuration values:

In [22]:
c2j = CSV2JSON()
c2j

('i', None)
('o', None)
('from_', None)
('to', None)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}, {'content': ('350', 'a'), 'type': ('350', '0'), 'categories': ['acti']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['0', '1'])
('interval_lower', 10)
('interval_upper', 10)
('log', None)
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

In the next step we define our custom configuration parameters as Python constants for better code readability and maintenance:

In [23]:
INPUT_FILE = "../data/examples/demo.csv"
OUTPUT_FILE = "../data/output/json/demo.json"
LOGFILE="../data/logs/csv2json_demo.out"
YEARS = (1700, 1750)
STEP = 10
FIELDS = [
    {"content": ("515", "a"),
     "type": ("515", "0"),
     "categories": ["actv"]}
]
DATE_INDICATOR = ["1"]  # i.e., only activity dates are considered
DATEFIELD = ("340", "x")
INTERVALS = (5,5)

Now we can update our _CSV2JSON_ object accordingly.

In [24]:
c2j.set_config(i=INPUT_FILE,
               o=OUTPUT_FILE,
               log=LOGFILE,
               from_=YEARS[0],
               to=YEARS[1],
               step=STEP,
               fields=FIELDS,
               date_indicator=DATE_INDICATOR,
               datefield=DATEFIELD,
               interval_lower=INTERVALS[0],
               interval_upper=INTERVALS[1],
              )

('i', '../data/examples/demo.csv')
('o', '../data/output/json/demo.json')
('from_', 1700)
('to', 1750)
('step', 10)
('fields', [{'content': ('515', 'a'), 'type': ('515', '0'), 'categories': ['actv']}])
('subfield_sep', '$')
('split_char', ' ### ')
('csv_sep', '\t')
('datefield', ('340', 'x'))
('date_indicator', ['1'])
('interval_lower', 5)
('interval_upper', 5)
('log', '../data/logs/csv2json_demo.out')
('log_level_std', 'INFO')
('log_level_file', 'DEBUG')
('verbose', False)
('encoding', 'utf-8')

Finally, we start the conversion process by calling the _start()_ function. This function does not take any parameters.

A progress bar for each year will indicate the conversion progress. Since we have defined *from_* == 1700, _to_ == 1750 and _step_ == 10, we will get six progress bars (one for 1700, 1710, 1720, 1730, 1740, and 1750 each).

In [25]:
c2j.start()

  0%|          | 0/73 [00:00<?, ?it/s]

  0%|          | 0/73 [00:00<?, ?it/s]

  0%|          | 0/73 [00:00<?, ?it/s]

  0%|          | 0/73 [00:00<?, ?it/s]

  0%|          | 0/73 [00:00<?, ?it/s]

  0%|          | 0/73 [00:00<?, ?it/s]

Let's have a look at the produced files, starting with the log file.

In [26]:
with open(c2j.config.log, "r", encoding=c2j.config.encoding) as f:
    log_text = f.read().splitlines()

log_text

['2021-07-28T15:39:23.667319+0200 INFO Start conversion from CSV to JSON.',
 '2021-07-28T15:39:24.373324+0200 INFO Size of import data: 73644 data sets',
 '2021-07-28T15:39:35.424939+0200 INFO Size of output data for year 1700 (+5/-5 years): 1164 data sets (1.58 %) ',
 '2021-07-28T15:39:46.224414+0200 INFO Size of output data for year 1710 (+5/-5 years): 1293 data sets (1.76 %) ',
 '2021-07-28T15:39:57.036569+0200 INFO Size of output data for year 1720 (+5/-5 years): 1320 data sets (1.79 %) ',
 '2021-07-28T15:40:07.707243+0200 INFO Size of output data for year 1730 (+5/-5 years): 1244 data sets (1.69 %) ',
 '2021-07-28T15:40:18.479628+0200 INFO Size of output data for year 1740 (+5/-5 years): 1274 data sets (1.73 %) ',
 '2021-07-28T15:40:30.357313+0200 INFO Size of output data for year 1750 (+5/-5 years): 1274 data sets (1.73 %) ']

Note that new content is always appended to the log file and the file is not deleted nor cleared when you start a new conversion process.

Now let's see which JSON files were produced:

In [27]:
import os

os.listdir(os.path.dirname(OUTPUT_FILE))

['demo_1710.json',
 'demo_1720.json',
 'demo_1740.json',
 'demo_1750.json',
 'demo_1730.json',
 'demo_1700.json']

As you can see, the appropriate year was appended to the file name defined in OUTPUT_FILE for each conversion
step.

How does the content of the JSON files look like?

In [28]:
import json

with open(os.path.join(os.path.dirname(OUTPUT_FILE), 'demo_1710.json'), "r", encoding=c2j.config.encoding) as f:
    d = json.load(f)
    
for i in list(d.items())[:5]:
    print(i)

('cnp01287518', {'515': {'a': ['Breslau']}})
('cnp01417964', {'515': {'a': ['Altorf']}})
('cnp01418304', {'515': {'a': ['Kiel']}})
('cnp01289912', {'515': {'a': ['Venedig']}})
('cnp01415306', {'515': {'a': ['Halle']}})


Looking at the first 5 entries in the JSON file, you can see that for each data set in the input file (represented by its ID) that fulfills the requirements (i.e., an appropriate date is given in the date column and field types are as defined), a JSON element is created that represents field, subfield and content values.

We can check this by looking at the original input data:

In [29]:
df[df["id"] == "cnp01287518"]["340$x"].values

array(['340 01$8ger$a1678-$xa1678u     ### 340 11$8ger$a1678-1750$xa1678a1750 ### '],
      dtype=object)

There are two dates available for this data set. The first one is ignored in the conversion process because the date indicator is "0" (and we set our configuration to only allow the date indicator "1"). However, the second date for this data set has the correct date indicator and -- as can be seen in the subfield \\$x -- the person represented in this data set was active between 1678 and 1750. Since the JSON file we looked at was created for the year 1710 and this year is within the 1678--1750 range, the data sets was converted correctly. 

Moreover, this person should be in ALL our six JSON files created, because all years between 1700 and 1750 are within the activity years of this person. Let's check this:

In [30]:
for file in os.listdir(os.path.dirname(OUTPUT_FILE)):
    with open(os.path.join(os.path.dirname(OUTPUT_FILE), file), "r", encoding=c2j.config.encoding) as f:
        d = json.load(f)
    print(file, d["cnp01287518"])

demo_1710.json {'515': {'a': ['Breslau']}}
demo_1720.json {'515': {'a': ['Breslau']}}
demo_1740.json {'515': {'a': ['Breslau']}}
demo_1750.json {'515': {'a': ['Breslau']}}
demo_1730.json {'515': {'a': ['Breslau']}}
demo_1700.json {'515': {'a': ['Breslau']}}


Indeed, the data for person "cnp01287518" is available in all JSON files.

So let's check if the person is correctly ignored when our conversion years are not within the 1678--1750 range. To test this, we have to run a new conversion with other configuration options. We can however re-use our previous configuration and just replace the new year parameters. Everything else stays the same.

In [31]:
c2j.config.from_ = 1751
c2j.config.to = 1751

Running the conversion is as simple as before:

In [32]:
c2j.start()

  0%|          | 0/73 [00:00<?, ?it/s]

Again we look at the produced JSON file:

In [33]:
with open(os.path.join(os.path.dirname(OUTPUT_FILE), 'demo_1751.json'), "r", encoding=c2j.config.encoding) as f:
    d = json.load(f)
    
for i in list(d.items())[:5]:
    print(i)    

('cnp01414461', {'515': {'a': ['Dannenbüttel']}})
('cnp01414528', {'515': {'a': ['Salzwedel']}})
('cnp01414634', {'515': {'a': ['Halle', 'Pulsnitz']}})
('cnp01415266', {'515': {'a': ['Rodleben', 'Roßlau']}})
('cnp02047759', {'515': {'a': ['Coswig']}})


It seems the conversion was successful. But is person "cnp01287518" available?

In [34]:
person_id = "cnp01287518"
if person_id not in d.keys():
    raise KeyError(f"Data set with ID {person_id} is not in the data!")

KeyError: 'Data set with ID cnp01287518 is not in the data!'

This shows that the data set with ID "cnp01287518" was ignored in the conversion process because the dates given in column "340\\$x" are not within the years defined in the conversion configuration.

-----