<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Adding-a-Property-to-wikirepo" data-toc-modified-id="Adding-a-Property-to-wikirepo-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Adding a Property to wikirepo</a></span><ul class="toc-item"><li><span><a href="#Adding-single-column-property" data-toc-modified-id="Adding-single-column-property-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Adding single column property</a></span></li><li><span><a href="#Adding-a-single-column-property-that-spans-time" data-toc-modified-id="Adding-a-single-column-property-that-spans-time-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Adding a single column property that spans time</a></span></li><li><span><a href="#Adding-a-multi-column-property" data-toc-modified-id="Adding-a-multi-column-property-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Adding a multi-column property</a></span></li></ul></li><li><span><a href="#Adding-a-Property-to-Wikidata" data-toc-modified-id="Adding-a-Property-to-Wikidata-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Adding a Property to Wikidata</a></span></li></ul></div>

**Adding Properties**

In this example we'll show how to add properties to wikirepo. See [examples/add_data]() for how to leverage Python libraries to add data to Wikidata.

Adding properties to wikirepo can be as simple as finding a wikirepo data module that queries a similar data structure, copying this module to the appropriate data directory for the new property (see next note), renaming the module to what the user should enter to query it, and assigning appropriate values to the variables that make up the module header: `pid`, `sub_pid`, `col_name`, `col_prefix`, `ignore_char` and `span`. To fully detail this, we're going to pretend that the following properties can't already be accessed by wikirepo:

- ['P1082' (population)](https://www.wikidata.org/wiki/Property:P1082)
- ['P6' (head of government)](https://www.wikidata.org/wiki/Property:P6)
- ['P172' (ethnicity)](https://www.wikidata.org/wiki/Property:P172)

The final modules for each of these can be found in [data/demographic/population](https://www.wikidata.org/wiki/Wikidata:Main_Page), [data/political/executive](https://www.wikidata.org/wiki/Wikidata:Main_Page), and [data/demographic/ethnic_div](https://www.wikidata.org/wiki/Wikidata:Main_Page) respectively. The focus will be how to add a property that already exists on [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) to wikirepo, with later versions covering the process of adding a property to Wikidata as well.

**Note:** by "the appropriate data directory for the new property" we mean that a new module should go into the [wikirepo/data]() directory that matches a Wikidata sub-page. Sometimes data isn't on the location's page itself, but rather on a sub-page. An example is that certain economic properties for [Germany](https://www.wikidata.org/wiki/Q183) are found on the page [economy of Germany](https://www.wikidata.org/wiki/Q8046). wikirepo checks for a property on the main page of a location first, and if the property is not found then the package checks the sub-page associated with the module's directory (the user is notified that the property does not exist for the given location if it is found in neither page). Properties are often moved from main pages to sub-pages, so even current main page property modules need to be organized based on where they could be re-indexed. Worst comes to worst, put the module in [data/misc]().

In [1]:
import os
import sys

analysis_dir = os.path.dirname(os.path.realpath('add_property.ipynb'))
analysis_dir = analysis_dir.split('wikirepo')[0] + 'wikirepo'
sys.path.insert(0, analysis_dir)
from wikirepo.data import time_utils, wd_utils

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:99% !important; }</style>"))

We'll use ['Q183' (Germany)](https://www.wikidata.org/wiki/Q183) for this example. First we'll initialize an `EntitiesDict` and the QID, and then we'll load in the entity:

In [2]:
ents_dict = wd_utils.EntitiesDict()
qid = 'Q183'

ent = wd_utils.load_ent(ents_dict=ents_dict, pq_id=qid)
ents_dict.key_lbls()

['Germany']

# Adding a Property to wikirepo

## Adding single column property

['P1082' (population)](https://www.wikidata.org/wiki/Property:P1082) is an example of a property that goes in a single column, which also only occurs once at any given time.

Let's start by defining our property and checking an element of the population data for Germany:

In [3]:
pop_pid = 'P1082'

pop_0_entry = wd_utils.get_prop(ents_dict=ents_dict, qid=qid, pid=pop_pid)[0]
pop_0_entry

{'mainsnak': {'snaktype': 'value',
  'property': 'P1082',
  'datavalue': {'value': {'amount': '+80500000',
    'unit': '1',
    'upperBound': '+80500500',
    'lowerBound': '+80499500'},
   'type': 'quantity'},
  'datatype': 'quantity'},
 'type': 'statement',
 'qualifiers': {'P585': [{'snaktype': 'value',
    'property': 'P585',
    'hash': 'd071256bb4b9260491239bfad2cc561ad8bf870c',
    'datavalue': {'value': {'time': '+2012-12-31T00:00:00Z',
      'timezone': 0,
      'before': 0,
      'after': 0,
      'precision': 11,
      'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'},
     'type': 'time'},
    'datatype': 'time'}]},
 'qualifiers-order': ['P585'],
 'id': 'Q183$3c493715-464a-6bad-ce6c-0f204e655157',
 'rank': 'normal',
 'references': [{'hash': '1112211b516d0ce090dfd3dd197bf7b7a4b88eaf',
   'snaks': {'P143': [{'snaktype': 'value',
      'property': 'P143',
      'datavalue': {'value': {'entity-type': 'item',
        'numeric-id': 764739,
        'id': 'Q764739'},
     

The big thing to notice in that is that the qualifier ['P585' (point in time)](https://www.wikidata.org/wiki/Property:P585) is present. That and that `prop_0_entry['mainsnak']['datavalue']['value']['amount']` is a single value tells us that this property should go into a single column. 

Let's check this value, as well as get its date:

In [4]:
pop_0_val = pop_0_entry['mainsnak']['datavalue']['value']['amount']
pop_0_val

'+80500000'

In [5]:
pop_0_t = pop_0_entry['qualifiers']['P585'][0]['datavalue']['value']['time']
pop_0_t

'+2012-12-31T00:00:00Z'

From that we see that we could have a character that needs to be ignored - specifically the `+`. We actually don't though, as wikirepo will convert this value to an integer, and `int('+string_number')` gets rid of the `+` for us.

**Note:** wikirepo will also take care of the date for us. The package will first format the date, and then it will use a provided `time_lvl` variable's value to truncate this formatted `datetime.date` object to an appropriate level. Here's a quick demo of this assuming that the `time_lvl` of our query is `yearly`:

In [6]:
pop_0_t_formatted = wd_utils.format_t(pop_0_t)
pop_0_t_formatted

datetime.date(2012, 12, 31)

In [7]:
time_utils.truncate_date(d=pop_0_t_formatted, time_lvl='yearly')

'2012'

The value itself will be included if the above year is included in the `timespan` value passed. If no `time_lvl` variable is passed, then the full date will be maintained, and its value will be queried if it's the most recent, with the date then being appended as a string for documentation of when the value comes from.

Final notes on the property module: the value in question can be accessed directly instead of through another property, so this tells us that we have no need for the `sub_pid` variable (more on this later); as the value goes into one column, we use the `col_name` variable instead of `col_prefix` (more on this later as well); and the value occurs at only one time, so we keep the `span` variable as `False` (more on this later too).

We now have all the information needed to make the **population** module's header:

In [8]:
pid = 'P1082'
sub_pid = None
col_name = 'population'
col_prefix = None
ignore_char = ''
span = False

The final module can again be found in [data/demographic/population](https://www.wikidata.org/wiki/Wikidata:Main_Page).

## Adding a single column property that spans time

An executive via ['P6' (head of government)](https://www.wikidata.org/wiki/Property:P6) is an example of a property that goes in a single column that further occurs over a span of time.

Let's start again by defining the pid and loading in an entry:

In [9]:
exec_pid = 'P6'

exec_0_entry = wd_utils.get_prop(ents_dict=ents_dict, qid=qid, pid=exec_pid)[0]
exec_0_entry

{'mainsnak': {'snaktype': 'value',
  'property': 'P6',
  'datavalue': {'value': {'entity-type': 'item',
    'numeric-id': 567,
    'id': 'Q567'},
   'type': 'wikibase-entityid'},
  'datatype': 'wikibase-item'},
 'type': 'statement',
 'qualifiers': {'P580': [{'snaktype': 'value',
    'property': 'P580',
    'hash': 'ad8007db4be39b05f62a2bf5821d32c5464bb183',
    'datavalue': {'value': {'time': '+2005-11-22T00:00:00Z',
      'timezone': 0,
      'before': 0,
      'after': 0,
      'precision': 11,
      'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'},
     'type': 'time'},
    'datatype': 'time'}]},
 'qualifiers-order': ['P580'],
 'id': 'q183$d0db3461-4291-0b36-4092-c40d14699212',
 'rank': 'preferred',
 'references': [{'hash': '7c5619b50f5af5766a660bda2eb09605dee4df72',
   'snaks': {'P143': [{'snaktype': 'value',
      'property': 'P143',
      'datavalue': {'value': {'entity-type': 'item',
        'numeric-id': 317027,
        'id': 'Q317027'},
       'type': 'wikibase-enti

Firstly we can see that the value in question cannot be directly subscripted for, as it is a QID entity itself. wikirepo will access the variable for us and derive its label, but let's find out who it is:

In [10]:
exec_0_qid = exec_0_entry['mainsnak']['datavalue']['value']['id']
exec_0_qid

'Q567'

In [11]:
wd_utils.get_lbl(ents_dict=ents_dict, pq_id=exec_0_qid)

'Angela Merkel'

That this entity is a span can be seen by the fact that it does not have ['P585' (point in time)](https://www.wikidata.org/wiki/Property:P585), but rather ['P580' (start time)](https://www.wikidata.org/wiki/Property:P580). Values in this property can also have the property ['P582' (end time)](https://www.wikidata.org/wiki/Property:P582). 

**Note:** wikirepo assumes that an entity that has a start time and lacks an end time is the current subject for the property, so the latest date in the `timespan` argument for query functions will be used. The opposite is true for if an end time is present without a start time - the first date in the `timespan` will be used based on the assumption that this is the first subject of the property.

Having values or subjects with start and end times implies that the `span` variable for the module header should in this case be `True`. We still are putting our results in a single column, so we use `col_name` instead of `col_prefix` (this is covered in the next section), and we can again the ignore `sub_pid` variable (also covered in the next section).

From this we have all the information we need for the **executive** module's header:

In [12]:
pid = 'P6'
sub_pid = None
col_name = 'executive'
col_prefix = None
ignore_char = ''
span = True

The resulting module can again be found in [data/political/executive](https://www.wikidata.org/wiki/Wikidata:Main_Page).

## Adding a multi-column property 

Ethnic diversity via ['P172' (ethnicity)](https://www.wikidata.org/wiki/Property:P172) is an example of a property that should be split over multiple columns. Rather than put all the information into a single column for the user to then split, wikirepo instead prefixes each potential element and creates columns for them for their respective data.

Let's look at the first element of German ethnicity:

In [13]:
ethnic_div_pid = 'P172'

ethnic_div_0_entry = wd_utils.get_prop(ents_dict=ents_dict, qid=qid, pid=ethnic_div_pid)[0]
ethnic_div_0_entry

{'mainsnak': {'snaktype': 'value',
  'property': 'P172',
  'datavalue': {'value': {'entity-type': 'item',
    'numeric-id': 42884,
    'id': 'Q42884'},
   'type': 'wikibase-entityid'},
  'datatype': 'wikibase-item'},
 'type': 'statement',
 'qualifiers': {'P1107': [{'snaktype': 'value',
    'property': 'P1107',
    'hash': '17752ff515cf871f1b4e82ae5ee4e1cea61556ff',
    'datavalue': {'value': {'amount': '+0.915', 'unit': '1'},
     'type': 'quantity'},
    'datatype': 'quantity'}]},
 'qualifiers-order': ['P1107'],
 'id': 'Q183$c3db8ed3-4346-945d-75d7-de9ff7181e83',
 'rank': 'normal',
 'references': [{'hash': '35ad938ca5a2b12719ee2b3fbe70f8bf27e77284',
   'snaks': {'P248': [{'snaktype': 'value',
      'property': 'P248',
      'datavalue': {'value': {'entity-type': 'item',
        'numeric-id': 11191,
        'id': 'Q11191'},
       'type': 'wikibase-entityid'},
      'datatype': 'wikibase-item'}],
    'P813': [{'snaktype': 'value',
      'property': 'P813',
      'datavalue': {'value': 

Each of the values for this property is an entity, and the values are stored within sub PIDs. As before, let's check some QIDs of this value:

In [14]:
ethnic_div_0_qid = ethnic_div_0_entry['mainsnak']['datavalue']['value']['id']
ethnic_div_0_qid

'Q42884'

In [15]:
ethnic_div_0_lbl = wd_utils.get_lbl(ents_dict=ents_dict, pq_id=ethnic_div_0_qid)
ethnic_div_0_lbl

'Germans'

In [16]:
ethnic_div_1_entry = wd_utils.get_prop(ents_dict=ents_dict, qid=qid, pid=ethnic_div_pid)[1]
ethnic_div_1_qid = ethnic_div_1_entry['mainsnak']['datavalue']['value']['id']
ethnic_div_1_lbl = wd_utils.get_lbl(ents_dict=ents_dict, pq_id=ethnic_div_1_qid)
ethnic_div_1_lbl

'Turks'

The value itself needs to be subsetted for using ['P1107' (proportion)](https://www.wikidata.org/wiki/Property:P1107). wikirepo will do this for us, but let's subset for the first value anyway:

In [17]:
ethnic_div_0_val = ethnic_div_0_entry['qualifiers']['P1107'][0]['datavalue']['value']['amount']
ethnic_div_0_val

'+0.915'

For this property we thus need to use a `sub_pid` variable that tells wikirepo where to look for the value. 

**None:** another use of `sub_pid` is to set its value to `bool`. This tells wikirepo to assign `True` if the property is present. An example of this is [data/institutional/org_membership]() where a boolean value is assigned to columns based on if a location is a member of an organization at a given time. Values of `False` need to be filled afterwards, and some values are replaced for organizations that are widely known. This is thus an example of a property that requires a bit more work than simply setting the module header.

Continuing, as we want the values to be put into separate columns where the QIDs labels for the entries get prefixed, we need to use the `col_prefix` variable and set the `col_name` variable to `None`. Let's choose `eth` for `col_prefix`, meaning that columns produced will be `eth_germans`, `eth_turks`, etc (an underscore is added automatically). To complete the needed information, the values themselves are only present at individual times, so in this case we can set `span` to `False`.

From here we have the full information for the header of the **ethnic_div** module:

In [18]:
pid = 'P172'
sub_pid = 'P1107'
col_name = None
col_prefix = 'eth'
ignore_char = ''
span = False

The final version of this module can be found in [data/demographic/ethnic_div](https://www.wikidata.org/wiki/Wikidata:Main_Page).

# Adding a Property to Wikidata