This notebook is a demonstration for the new Wikidata traversal based functions.

The new functions can be accessed by the import show below, from exogenous_signals in the news_signals library, using the **WikidataSearch** object.

In [1]:
import news_signals
from news_signals.exogenous_signals import WikidataSearch

First we instantiate the class, Using **entity_to_wikidata**, we can get the wikidataID for a named entity with this function.


:param entity_name: Name of the entity to search for

:return: Wikidata ID as a string, or None if not found

In [2]:
searcher = WikidataSearch()
entity_id = searcher.entity_to_wikidata("Albert Einstein")
print(entity_id)

Q937


**wikidata_related_entities** performs a Breadth-First Search on the outgoing links of an entity's Wikidata page. 

:param wikidata_id: The starting Wikidata ID.

:param depth: The number of hops (levels) to traverse.

:param labels: Whether to return Wikidata IDs or human-readable labels.

:param query: Optional custom SPARQL query for retrieving related entities.

:return: A list of related Wikidata IDs.
        
For this instance, lets set labels to True, and pass in our previous entity_id into the function

In [3]:
related_ids = searcher.wikidata_related_entities(entity_id, depth=1,labels=True)
print(f"Related entities: {related_ids}")

Related entities: ['Q57193', 'Q543804', 'Q5460604', 'Q21578', 'Q21200226', 'Q76346', 'Q2370801', 'Q635642', 'Q15056034', 'Q463303', 'Q206702', 'Q55594631', 'Q93996', 'Q153238', 'Q42309226', 'Q310794', 'Q1729754', 'Q97154', 'Q72', 'Q3012', 'Q138518', 'Q390003', 'Q25696257', 'Q464344', 'Q152087', 'Q42299', 'Q156478', 'Q355245', 'Q28861731', 'Q4357787', 'Q684415', 'Q87554', 'Q116635', 'Q642074', 'Q468357', 'Q708038', 'Q30', 'Q123885', 'Q39934978', 'Q216738', 'Q200639', 'Q43287', 'Q88665', 'Q8487137', 'Q156598', 'Q466089', 'Q414188', 'Q435651', 'Q37160', 'Q123371', 'Q1085', 'Q48835067', 'Q7213562', 'Q8092556', 'Q19185', 'Q57246', 'Q39', 'Q14708020', 'Q41304', 'Q9095', 'Q270794', 'Q942842', 'Q991', 'Q675617', 'Q68761', 'Q935', 'Q2497232', 'Q1876751', 'Q902624', 'Q819187', 'Q7322195', 'Q38193', 'Q26963166', 'Q533534', 'Q1309294', 'Q103505599', 'Q31519', 'Q64', 'Q4397938', 'Q2095524', 'Q338432', 'Q4175282', 'Q328195', 'Q70', 'Q2945826', 'Q329464', 'Q41688', 'Q6173448', 'Q253439', 'Q188771', '

Now after getting the labels, we can use the function **wikidata_to_ids** to get the list of labels back with their entity names.

:param wikidata_ids: List of Wikidata IDs (can include composite IDs).

:param language: Language code for labels.

:return: Dictionary mapping Wikidata IDs to labels.

In [None]:
labels = searcher.wikidata_to_labels(related_ids,language="en")
print(f"Related entities: {labels}")

Related entities: {'Q57193': 'Moritz Schlick', 'Q543804': 'German Academy of Sciences Leopoldina', 'Q5460604': 'Wikipedia:List of articles all languages should have', 'Q21578': 'Princeton University', 'Q21200226': 'Albert Einstein', 'Q76346': 'Mileva Marić', 'Q2370801': 'Academy of Sciences of the USSR', 'Q635642': 'Institute for Advanced Study', 'Q15056034': 'Pour le Mérite for Sciences and Arts order', 'Q463303': 'American Academy of Arts and Sciences', 'Q206702': 'University of Zurich', 'Q55594631': 'Lina Einstein', 'Q93996': 'Ernst Mach', 'Q153238': 'Leó Szilárd', 'Q42309226': 'honorary doctorate from Princeton University', 'Q310794': 'Karl Pearson', 'Q1729754': 'German University in Prague', 'Q97154': 'Heinrich Burkhardt', 'Q72': 'Zurich', 'Q3012': 'Ulm', 'Q138518': 'Princeton', 'Q390003': 'Einsteinhaus', 'Q25696257': 'Unknown', 'Q464344': 'Caputh', 'Q152087': 'Humboldt University of Berlin', 'Q42299': 'Bernhard Riemann', 'Q156478': 'Pour le Mérite', 'Q355245': 'Henry George', 'Q2

In this instance, we can easily skip the multiple steps above and by not specifying labels parameter as True (or just letting it default to False). 

We just need to pass in the entity name string now.

In [5]:
search2 = WikidataSearch()
entity = "Albert Einstein"
related_ids2 = search2.wikidata_related_entities(entity, depth=1)
print(f"Related entities: {related_ids2}")

Related entities: {'Q57193': 'Moritz Schlick', 'Q543804': 'German Academy of Sciences Leopoldina', 'Q5460604': 'Wikipedia:List of articles all languages should have', 'Q21578': 'Princeton University', 'Q21200226': 'Albert Einstein', 'Q76346': 'Mileva Marić', 'Q2370801': 'Academy of Sciences of the USSR', 'Q635642': 'Institute for Advanced Study', 'Q15056034': 'Pour le Mérite for Sciences and Arts order', 'Q463303': 'American Academy of Arts and Sciences', 'Q206702': 'University of Zurich', 'Q55594631': 'Lina Einstein', 'Q93996': 'Ernst Mach', 'Q153238': 'Leó Szilárd', 'Q42309226': 'honorary doctorate from Princeton University', 'Q310794': 'Karl Pearson', 'Q1729754': 'German University in Prague', 'Q97154': 'Heinrich Burkhardt', 'Q72': 'Zurich', 'Q3012': 'Ulm', 'Q138518': 'Princeton', 'Q390003': 'Einsteinhaus', 'Q25696257': 'Unknown', 'Q464344': 'Caputh', 'Q152087': 'Humboldt University of Berlin', 'Q42299': 'Bernhard Riemann', 'Q156478': 'Pour le Mérite', 'Q355245': 'Henry George', 'Q2

The default query for the function is

        """
            SELECT ?related ?relatedLabel WHERE {{
                wd:{item} ?prop ?related .
                FILTER(isIRI(?related))
                FILTER EXISTS {{
                    ?related wdt:P31/wdt:P279* ?type .
                    VALUES ?type {{ wd:Q5 wd:Q43229 wd:Q4830453 wd:Q2424752 wd:Q431289 wd:Q732577 wd:Q11424 wd:Q571 }}
                }}
                SERVICE wikibase:label {{ bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\" }}
            }}
        """

However, using the query param, we can pass in our own custom query as well.

In [6]:
search3 = WikidataSearch()
entity = "Albert Einstein"
#query to retrieve only get entities that are people (only get wd:Q5)
new_query = """
    SELECT ?related ?relatedLabel WHERE {{
        wd:{item} ?prop ?related .
        FILTER(isIRI(?related))
        FILTER EXISTS {{
            ?related wdt:P31/wdt:P279* wd:Q5 .
        }}
        SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }}
    }}
"""
related_ids3 = search3.wikidata_related_entities(entity, depth=1,labels=False,query=new_query)
print(f"Related entities: {related_ids3}")

Related entities: {'Q57193': 'Moritz Schlick', 'Q42299': 'Bernhard Riemann', 'Q4175282': 'Alfred Kleiner', 'Q355245': 'Henry George', 'Q19185': 'George Bernard Shaw', 'Q57246': 'Hermann Minkowski', 'Q76346': 'Mileva Marić', 'Q41688': 'Hendrik Lorentz', 'Q9095': 'James Clerk Maxwell', 'Q4357787': 'Pauline Koch', 'Q87554': 'Ernst G. Straus', 'Q991': 'Fyodor Dostoyevsky', 'Q116635': 'Heinrich Friedrich Weber', 'Q68761': 'Elsa Einstein', 'Q55594631': 'Lina Einstein', 'Q468357': 'Lieserl (Einstein)', 'Q93996': 'Ernst Mach', 'Q153238': 'Leó Szilárd', 'Q935': 'Isaac Newton', 'Q216738': 'Maja Einstein', 'Q25820': 'Thomas Young', 'Q118253': 'Eduard Einstein', 'Q7322195': 'Riazuddin', 'Q35802': 'Benedictus de Spinoza', 'Q38193': 'Arthur Schopenhauer', 'Q200639': 'Paul Valéry', 'Q88665': 'Hermann Einstein', 'Q310794': 'Karl Pearson', 'Q97154': 'Heinrich Burkhardt', 'Q1001': 'Mahatma Gandhi', 'Q37160': 'David Hume', 'Q123371': 'Hans Albert Einstein'}
