In [1]:
%env KIF_DEBUG=
%env KIF_RESOLVE_ENTITIES=1

env: KIF_DEBUG=
env: KIF_RESOLVE_ENTITIES=1


# Quickstart

KIF is a Wikidata-based framework for integrating knowledge sources.

This quickstart guide presents the basic API of KIF.

----

## Hello world!

We start by importing the `kif_lib` namespace:

In [2]:
from kif_lib import *

We'll also need the Wikidata vocabulary module `wd`:

In [3]:
from kif_lib.vocabulary import wd

Let us now create a KIF store pointing to the official Wikidata query service:

In [4]:
kb = Store('wdqs', 'https://query.wikidata.org/sparql')

The first argument to the `Store` constructor determines the type of store to be created.

(The `'wdqs'` instructs KIF to create a SPARQL store loaded with Wikidata mappings and optimized to query the official Wikidata query service.)

In general, a KIF store can be seen an inteface to a knowledge source: it allows us to view the source as a set of [Wikidata-like statements](https://www.mediawiki.org/wiki/Wikibase/DataModel).

The `kb` store we just created is an interface to Wikidata itself.  We can use it, for example, to fetch from Wikidata three statements about Brazil:

In [5]:
it = kb.filter(subject=wd.Brazil, limit=3)
for stmt in it:
    display(stmt)

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** **LabelProperty** "Бразилиа"@ab))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** **LabelProperty** "Brasil"@ace))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** **LabelProperty** "Бразилие"@ady))

## Filters

The `kb.filter(...)` call searches for statements in `kb` matching the restrictions `...`.

The result of a filter call is a (lazy) iterator `it` of statements:

In [6]:
it = kb.filter(subject=wd.Brazil)

We can advance `it` to obtain statements:

In [7]:
next(it)

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** **AliasProperty** "Repubblica Federativa del Brasile"@it))

If no `limit` argument is given to `kb.filter()`, the returned iterator contains *all* matching statements.

## Basic filters

We can filter statements by any combination of *subject*, *property*, and *value*.

For example:

*match any statement*

In [8]:
next(kb.filter())

(**Statement** (**Lexeme** [wd:L58445](http://www.wikidata.org/entity/L58445)) (**ValueSnak** **LexicalCategoryProperty** (**Item** [verb](http://www.wikidata.org/entity/Q24905))))

*match statements with subject "Brazil" and property "official website"*

In [9]:
next(kb.filter(subject=wd.Brazil, property=wd.official_website))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [official website](http://www.wikidata.org/entity/P856)) [https://www.gov.br](https://www.gov.br)))

*match statements with property "official website" and value "https://www.ibm.com/"*

In [10]:
next(kb.filter(property=wd.official_website, value='https://www.ibm.com/'))

(**Statement** (**Item** [IBM](http://www.wikidata.org/entity/Q37156)) (**ValueSnak** (**Property** [official website](http://www.wikidata.org/entity/P856)) [https://www.ibm.com/](https://www.ibm.com/)))

*match statements with value "78.046950192 dalton"*

In [11]:
next(kb.filter(value=Quantity('78.046950192', unit=wd.dalton)))

(**Statement** (**Item** [Cyclopropane, tris(methylene)-](http://www.wikidata.org/entity/Q83057204)) (**ValueSnak** (**Property** [mass](http://www.wikidata.org/entity/P2067)) (**Quantity** 78.046950192 (**Item** [dalton](http://www.wikidata.org/entity/Q483261)))))

We can also match statements having *some* (unknown) value:

In [12]:
next(kb.filter(snak=wd.date_of_birth.some_value()))

(**Statement** (**Item** [Emperor Gengshi of Han](http://www.wikidata.org/entity/Q7262)) (**SomeValueSnak** (**Property** [date of birth](http://www.wikidata.org/entity/P569))))

Or *no* value:

In [13]:
next(kb.filter(snak=wd.date_of_death.no_value()))

(**Statement** (**Item** [wd:Q123038537](http://www.wikidata.org/entity/Q123038537)) (**NoValueSnak** (**Property** [date of death](http://www.wikidata.org/entity/P570))))

## Fingerprints (indirect ids)

So far, we have been using the symbolic aliases defined in the `wd` module to specify entities in filters:

In [14]:
display(
    wd.Brazil,
    wd.continent)

(**Item** [Brazil](http://www.wikidata.org/entity/Q155))

(**Property** [continent](http://www.wikidata.org/entity/P30))

Alternatively, we can use their numeric Wikidata ids:

*match statements with subject Q155 (Brazil) and property P30 (continent)*

In [15]:
next(kb.filter(subject=wd.Q(155), property=wd.P(30)))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [continent](http://www.wikidata.org/entity/P30)) (**Item** [South America](http://www.wikidata.org/entity/Q18))))

Sometimes, however, ids are not enough.  We might need to specify an entity indirectly by giving not its id but a property it satisfies.

In cases like this, we can use a *fingerprint*:

*match statemets whose subject "is a cat" and value "is a human"*

In [16]:
next(kb.filter(subject=wd.instance_of(wd.house_cat), value=wd.instance_of(wd.human)))

(**Statement** (**Item** [Johannes Gutenberg](http://www.wikidata.org/entity/Q130704354)) (**ValueSnak** (**Property** [named after](http://www.wikidata.org/entity/P138)) (**Item** [Johannes Gutenberg](http://www.wikidata.org/entity/Q8958))))

Properties themselves can also be specified using fingerprints:

*match statements whose property is "equivalent to Schema.org's 'weight'"*

In [17]:
next(kb.filter(property=wd.equivalent_property('https://schema.org/weight')))

(**Statement** (**Item** [Roger Noble Burnham](http://www.wikidata.org/entity/Q52156214)) (**NoValueSnak** (**Property** [mass](http://www.wikidata.org/entity/P2067))))

The `-` (minus) operator can be used to invert the direction of the property used in the fingerprint:

*match statements whose subject is "the continent of Brazil"*

In [18]:
next(kb.filter(subject=-(wd.continent(wd.Brazil))))

(**Statement** (**Item** [South America](http://www.wikidata.org/entity/Q18)) (**ValueSnak** **AliasProperty** "América meridional"@es))

## And-ing and or-ing fingeprints

Entity ids and fingerpints can be combined using the operators `&` (and) and `|` (or).

For example:

*match four statements such that:*
- *subject is "Brazil" or "Argentina"*
- *property is "continent" or "highest point"*

In [19]:
it = kb.filter(subject=wd.Brazil | wd.Argentina, property=wd.continent | wd.highest_point, limit=4)
display(*it)

(**Statement** (**Item** [Argentina](http://www.wikidata.org/entity/Q414)) (**ValueSnak** (**Property** [continent](http://www.wikidata.org/entity/P30)) (**Item** [South America](http://www.wikidata.org/entity/Q18))))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [highest point](http://www.wikidata.org/entity/P610)) (**Item** [Pico da Neblina](http://www.wikidata.org/entity/Q739484))))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [continent](http://www.wikidata.org/entity/P30)) (**Item** [South America](http://www.wikidata.org/entity/Q18))))

(**Statement** (**Item** [Argentina](http://www.wikidata.org/entity/Q414)) (**ValueSnak** (**Property** [highest point](http://www.wikidata.org/entity/P610)) (**Item** [Aconcagua](http://www.wikidata.org/entity/Q39739))))

*match four statements such that:*
- *subject "continent is South America" and "official language is Portuguese"*
- *value "is a river" or "is a mountain"*

In [20]:
it = kb.filter(
    subject=wd.continent(wd.South_America) & wd.official_language(wd.Portuguese),
    value=wd.instance_of(wd.river) | wd.instance_of(wd.mountain),
    limit=4)
display(*it)

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [located in or next to body of water](http://www.wikidata.org/entity/P206)) (**Item** [Amazon](http://www.wikidata.org/entity/Q3783))))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [located in or next to body of water](http://www.wikidata.org/entity/P206)) (**Item** [Paraná River](http://www.wikidata.org/entity/Q127892))))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [located in or next to body of water](http://www.wikidata.org/entity/P206)) (**Item** [São Francisco River](http://www.wikidata.org/entity/Q142148))))

(**Statement** (**Item** [Brazil](http://www.wikidata.org/entity/Q155)) (**ValueSnak** (**Property** [highest point](http://www.wikidata.org/entity/P610)) (**Item** [Pico da Neblina](http://www.wikidata.org/entity/Q739484))))

*match four statements such that:*
- *subject "is a female" and ("was born in NYC" or "was born in Rio")*
- *property is "field of work" or "is equivalent to Schema.org's 'hasOccupation'"*

In [21]:
it = kb.filter(
    subject=wd.sex_or_gender(wd.female) & (wd.place_of_birth(wd.New_York_City) | wd.place_of_birth(wd.Rio_de_Janeiro)),
    property=wd.field_of_work | wd.equivalent_property('https://schema.org/hasOccupation'),
    limit=4)
display(*it)

(**Statement** (**Item** [Tanya Lopert](http://www.wikidata.org/entity/Q82746)) (**ValueSnak** (**Property** [occupation](http://www.wikidata.org/entity/P106)) (**Item** [film actor](http://www.wikidata.org/entity/Q10800557))))

(**Statement** (**Item** [Tanya Lopert](http://www.wikidata.org/entity/Q82746)) (**ValueSnak** (**Property** [occupation](http://www.wikidata.org/entity/P106)) (**Item** [actor](http://www.wikidata.org/entity/Q33999))))

(**Statement** (**Item** [Tanya Lopert](http://www.wikidata.org/entity/Q82746)) (**ValueSnak** (**Property** [occupation](http://www.wikidata.org/entity/P106)) (**Item** [stage actor](http://www.wikidata.org/entity/Q2259451))))

(**Statement** (**Item** [Nancy Chodorow](http://www.wikidata.org/entity/Q598648)) (**ValueSnak** (**Property** [field of work](http://www.wikidata.org/entity/P101)) (**Item** [gender studies](http://www.wikidata.org/entity/Q1662673))))

## Count and contains

A variant of the filter call is `kb.count()` which counts the number of statements matching the given restrictions:

In [22]:
kb.count(subject=wd.Brazil, property=wd.population | wd.official_language)

2

A related call is `kb.contains()`.  It tests whether a given statement occurs in `kb`:

In [23]:
stmt1 = wd.official_language(wd.Brazil, wd.Portuguese)
kb.contains(stmt1)

True

In [24]:
stmt2 = wd.official_language(wd.Brazil, wd.Spanish)
kb.contains(stmt2)

False

## Final remarks

This concludes the quickstart guide.

There are many other calls in KIF's Store API.  For more information see, the [API docs](https://ibm.github.io/kif/) and the [examples](https://github.com/IBM/kif/tree/main/examples) dir.