This notebook is a demonstration for the new Wikidata traversal based functions.

The new functions can be accessed by the import show below, from exogenous_signals in the news_signals library, using the **WikidataSearch** object.

In [1]:
import news_signals
from news_signals.exogenous_signals import WikidataSearch

First we instantiate the class, Using **entity_to_wikidata**, we can get the wikidataID for a named entity with this function.


:param entity_name: Name of the entity to search for

:return: Wikidata ID as a string, or None if not found

In [2]:
searcher = WikidataSearch()
entity_id = searcher.entity_to_wikidata("Albert Einstein")
print(entity_id)

Q937


**wikidata_related_entities** performs a Breadth-First Search on the outgoing links of an entity's Wikidata page. 

:param wikidata_id: The starting Wikidata ID.

:param depth: The number of hops (levels) to traverse.

:param labels: Whether to return Wikidata IDs or human-readable labels.

:param query: Optional custom SPARQL query for retrieving related entities.

:return: A list of related Wikidata IDs.
        
For this instance, lets set labels to True, and pass in our previous entity_id into the function

In [3]:
related_ids = searcher.wikidata_related_entities(entity_id, depth=1,labels=True)
print(f"Related entities: {related_ids}")

Related entities: ['Q4397938', 'Q21578', 'Q7213562', 'Q4357787', 'Q116635', 'Q2497232', 'Q1085', 'Q138518', 'Q76346', 'Q200639', 'Q168756', 'Q464344', 'Q9095', 'Q11942', 'Q39934978', 'Q543804', 'Q57246', 'Q25696257', 'Q685539', 'Q15056034', 'Q1726', 'Q72', 'Q28861731', 'Q183', 'Q156478', 'Q2370801', 'Q123371', 'Q191583', 'Q57193', 'Q6173448', 'Q37160', 'Q123885', 'Q4175282', 'Q42309226', 'Q70', 'Q2095524', 'Q2945826', 'Q533534', 'Q30', 'Q43287', 'Q68761', 'Q152087', 'Q942842', 'Q14708020', 'Q270794', 'Q97154', 'Q8487137', 'Q8092556', 'Q41304', 'Q642074', 'Q468357', 'Q118253', 'Q466089', 'Q26963166', 'Q684415', 'Q463303', 'Q124500735', 'Q21200226', 'Q390003', 'Q902624', 'Q35802', 'Q39', 'Q435651', 'Q675617', 'Q1729754', 'Q1001', 'Q103505599', 'Q991', 'Q708038', 'Q819187', 'Q355245', 'Q635642', 'Q19185', 'Q87554', 'Q659080', 'Q156598', 'Q31519', 'Q5460604', 'Q22095877', 'Q329464', 'Q188771', 'Q310794', 'Q88665', 'Q41688', 'Q42299', 'Q7322195', 'Q338432', 'Q935', 'Q48835067', 'Q9009', 'Q5

Now after getting the labels, we can use the function **wikidata_to_ids** to get the list of labels back with their entity names.

:param wikidata_ids: List of Wikidata IDs (can include composite IDs).

:param language: Language code for labels.

:return: Dictionary mapping Wikidata IDs to labels.

In [4]:
labels = searcher.wikidata_to_labels(related_ids,language="en")
print(f"Related entities: {labels}")

Related entities: {'Q4397938': 'Russian Academy of Sciences (1917–1925)', 'Q21578': 'Princeton University', 'Q7213562': 'Category:Albert Einstein', 'Q4357787': 'Pauline Koch', 'Q116635': 'Heinrich Friedrich Weber', 'Q2497232': 'Brazilian Academy of Sciences', 'Q1085': 'Prague', 'Q138518': 'Princeton', 'Q76346': 'Mileva Marić', 'Q200639': 'Paul Valéry', 'Q168756': 'University of California, Berkeley', 'Q464344': 'Caputh', 'Q9095': 'James Clerk Maxwell', 'Q11942': 'ETH Zurich', 'Q39934978': 'ETH Zurich University Archives', 'Q543804': 'German Academy of Sciences Leopoldina', 'Q57246': 'Hermann Minkowski', 'Q25696257': 'Unknown', 'Q685539': 'Swiss Federal Institute of Intellectual Property', 'Q15056034': 'Pour le Mérite for Sciences and Arts order', 'Q1726': 'Munich', 'Q72': 'Zurich', 'Q28861731': 'honorary doctor of the Hebrew University of Jerusalem', 'Q183': 'Germany', 'Q156478': 'Pour le Mérite', 'Q2370801': 'Academy of Sciences of the USSR', 'Q123371': 'Hans Albert Einstein', 'Q19158

In this instance, we can easily skip the multiple steps above and by not specifying labels parameter as True (or just letting it default to False). 

We just need to pass in the entity name string now.

In [5]:
search2 = WikidataSearch()
entity = "Albert Einstein"
related_ids2 = search2.wikidata_related_entities(entity, depth=1)
print(f"Related entities: {related_ids2}")

Related entities: {'Q4397938': 'Russian Academy of Sciences (1917–1925)', 'Q21578': 'Princeton University', 'Q7213562': 'Category:Albert Einstein', 'Q4357787': 'Pauline Koch', 'Q116635': 'Heinrich Friedrich Weber', 'Q2497232': 'Brazilian Academy of Sciences', 'Q1085': 'Prague', 'Q138518': 'Princeton', 'Q76346': 'Mileva Marić', 'Q200639': 'Paul Valéry', 'Q168756': 'University of California, Berkeley', 'Q464344': 'Caputh', 'Q9095': 'James Clerk Maxwell', 'Q11942': 'ETH Zurich', 'Q39934978': 'ETH Zurich University Archives', 'Q543804': 'German Academy of Sciences Leopoldina', 'Q57246': 'Hermann Minkowski', 'Q25696257': 'Unknown', 'Q685539': 'Swiss Federal Institute of Intellectual Property', 'Q15056034': 'Pour le Mérite for Sciences and Arts order', 'Q1726': 'Munich', 'Q72': 'Zurich', 'Q28861731': 'honorary doctor of the Hebrew University of Jerusalem', 'Q183': 'Germany', 'Q156478': 'Pour le Mérite', 'Q2370801': 'Academy of Sciences of the USSR', 'Q123371': 'Hans Albert Einstein', 'Q19158

The default query for the function is

        """
            SELECT ?related ?relatedLabel WHERE {{
                wd:{item} ?prop ?related .
                FILTER(isIRI(?related))
                FILTER EXISTS {{
                    ?related wdt:P31/wdt:P279* ?type .
                    VALUES ?type {{ wd:Q5 wd:Q43229 wd:Q4830453 wd:Q2424752 wd:Q431289 wd:Q732577 wd:Q11424 wd:Q571 }}
                }}
                SERVICE wikibase:label {{ bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\" }}
            }}
        """

However, using the query param, we can pass in our own custom query as well.

In [6]:
search3 = WikidataSearch()
entity = "Albert Einstein"
#query to retrieve only get entities that are people (only get wd:Q5)
new_query = """
    SELECT ?related ?relatedLabel WHERE {{
        wd:{item} ?prop ?related .
        FILTER(isIRI(?related))
        FILTER EXISTS {{
            ?related wdt:P31/wdt:P279* wd:Q5 .
        }}
        SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }}
    }}
"""
related_ids3 = search3.wikidata_related_entities(entity, depth=1,labels=False,query=new_query)
print(f"Related entities: {related_ids3}")

Related entities: {'Q310794': 'Karl Pearson', 'Q88665': 'Hermann Einstein', 'Q123371': 'Hans Albert Einstein', 'Q41688': 'Hendrik Lorentz', 'Q4357787': 'Pauline Koch', 'Q116635': 'Heinrich Friedrich Weber', 'Q57193': 'Moritz Schlick', 'Q37160': 'David Hume', 'Q35802': 'Benedictus de Spinoza', 'Q42299': 'Bernhard Riemann', 'Q7322195': 'Riazuddin', 'Q76346': 'Mileva Marić', 'Q4175282': 'Alfred Kleiner', 'Q1001': 'Mahatma Gandhi', 'Q200639': 'Paul Valéry', 'Q935': 'Isaac Newton', 'Q991': 'Fyodor Dostoyevsky', 'Q55594631': 'Lina Einstein', 'Q25820': 'Thomas Young', 'Q9095': 'James Clerk Maxwell', 'Q355245': 'Henry George', 'Q57246': 'Hermann Minkowski', 'Q97154': 'Heinrich Burkhardt', 'Q153238': 'Leó Szilárd', 'Q19185': 'George Bernard Shaw', 'Q87554': 'Ernst G. Straus', 'Q93996': 'Ernst Mach', 'Q468357': 'Lieserl (Einstein)', 'Q216738': 'Maja Einstein', 'Q118253': 'Eduard Einstein', 'Q38193': 'Arthur Schopenhauer', 'Q68761': 'Elsa Einstein'}
