This notebook is a demonstration for the new Wikidata traversal based functions.

The new functions can be accessed by the import show below, from exogenous_signals in the news_signals library, using the **WikidataSearch** object.

In [3]:
import news_signals
from news_signals.exogenous_signals import WikidataSearch

First we instantiate the class, Using **entity_to_wikidata**, we can get the wikidataID for a named entity with this function.


:param entity_name: Name of the entity to search for

:return: Wikidata ID as a string, or None if not found

In [4]:
searcher = WikidataSearch()
entity_id = searcher.entity_to_wikidata("Albert Einstein")
print(entity_id)

Q937


**wikidata_related_entities** performs a Breadth-First Search on the outgoing links of an entity's Wikidata page. 

:param wikidata_id: The starting Wikidata ID.

:param depth: The number of hops (levels) to traverse.

:param labels: Whether to return Wikidata IDs or human-readable labels.

:param query: Optional custom SPARQL query for retrieving related entities.

:return: A list of related Wikidata IDs.
        
For this instance, lets set labels to True, and pass in our previous entity_id into the function

In [5]:
related_ids = searcher.wikidata_related_entities(entity_id, depth=1,labels=True)
print(f"Related entities: {related_ids}")

Related entities: ['Q153238', 'Q685539', 'Q216738', 'Q21200226', 'Q152087', 'Q19185', 'Q25696257', 'Q2095524', 'Q1001', 'Q55594631', 'Q156478', 'Q57193', 'Q183', 'Q902624', 'Q206702', 'Q5460604', 'Q6173448', 'Q675617', 'Q1085', 'Q543804', 'Q41304', 'Q188771', 'Q1309294', 'Q390003', 'Q76346', 'Q97154', 'Q41688', 'Q22095877', 'Q466089', 'Q7213562', 'Q57246', 'Q15056034', 'Q1876751', 'Q68761', 'Q88665', 'Q463303', 'Q42309226', 'Q70', 'Q72', 'Q191583', 'Q310794', 'Q64', 'Q4357787', 'Q1729754', 'Q31519', 'Q708038', 'Q991', 'Q39934978', 'Q11942', 'Q123371', 'Q168756', 'Q659080', 'Q2370801', 'Q355245', 'Q48835067', 'Q3603946', 'Q642074', 'Q9009', 'Q1726', 'Q635642', 'Q253439', 'Q14708020', 'Q935', 'Q42299', 'Q87554', 'Q93996', 'Q156598', 'Q3012', 'Q30', 'Q533534', 'Q118253', 'Q2497232', 'Q942842', 'Q9095', 'Q43287', 'Q270794', 'Q8092556', 'Q4397938', 'Q200639', 'Q819187', 'Q684415', 'Q26963166', 'Q103505599', 'Q138518', 'Q37160', 'Q464344', 'Q38193', 'Q338432', 'Q2945826', 'Q7322195', 'Q11663

Now after getting the labels, we can use the function **wikidata_to_ids** to get the list of labels back with their entity names.

:param wikidata_ids: List of Wikidata IDs (can include composite IDs).

:param language: Language code for labels.

:return: Dictionary mapping Wikidata IDs to labels.

In [10]:
labels = searcher.wikidata_to_ids(related_ids,language="en")
print(f"Related entities: {labels}")

Related entities: {'Q153238': 'Leó Szilárd', 'Q685539': 'Swiss Federal Institute of Intellectual Property', 'Q216738': 'Maja Einstein', 'Q152087': 'Humboldt University of Berlin', 'Q21200226': 'Albert Einstein', 'Q19185': 'George Bernard Shaw', 'Q25696257': 'Unknown', 'Q2095524': 'Indian National Science Academy', 'Q1001': 'Mahatma Gandhi', 'Q55594631': 'Lina Einstein', 'Q156478': 'Pour le Mérite', 'Q57193': 'Moritz Schlick', 'Q183': 'Germany', 'Q902624': 'National Museum of Health and Medicine', 'Q206702': 'University of Zurich', 'Q5460604': 'Wikipedia:List of articles all languages should have', 'Q6173448': 'Wikipedia:Vital articles/Level/4', 'Q675617': 'Swiss Literary Archives', 'Q1085': 'Prague', 'Q543804': 'German Academy of Sciences Leopoldina', 'Q41304': 'Weimar Republic', 'Q188771': 'French Academy of Sciences', 'Q1309294': 'Einsteinhaus Caputh', 'Q390003': 'Einsteinhaus', 'Q76346': 'Mileva Marić', 'Q97154': 'Heinrich Burkhardt', 'Q41688': 'Hendrik Lorentz', 'Q22095877': 'Alber

In this instance, we can easily skip the multiple steps above and by not specifying labels parameter as True (or just letting it default to False). 

We just need to pass in the entity name string now.

In [None]:
search2 = WikidataSearch()
entity = "Albert Einstein"
related_ids2 = search2.wikidata_related_entities(entity, depth=1)
print(f"Related entities: {related_ids2}")

The default query for the function is

        """
            SELECT ?related ?relatedLabel WHERE {{
                wd:{item} ?prop ?related .
                FILTER(isIRI(?related))
                FILTER EXISTS {{
                    ?related wdt:P31/wdt:P279* ?type .
                    VALUES ?type {{ wd:Q5 wd:Q43229 wd:Q4830453 wd:Q2424752 wd:Q431289 wd:Q732577 wd:Q11424 wd:Q571 }}
                }}
                SERVICE wikibase:label {{ bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\" }}
            }}
        """

However, using the query param, we can pass in our own custom query as well.

In [12]:
search3 = WikidataSearch()
entity = "Albert Einstein"
#query to retrieve only get entities that are people (only get wd:Q5)
new_query = """
    SELECT ?related ?relatedLabel WHERE {{
        wd:{item} ?prop ?related .
        FILTER(isIRI(?related))
        FILTER EXISTS {{
            ?related wdt:P31/wdt:P279* wd:Q5 .
        }}
        SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }}
    }}
"""
related_ids3 = search3.wikidata_related_entities(entity, depth=1,labels=False,query=new_query)
print(f"Related entities: {related_ids3}")

Related entities: {'Q153238': 'Leó Szilárd', 'Q216738': 'Maja Einstein', 'Q200639': 'Paul Valéry', 'Q19185': 'George Bernard Shaw', 'Q1001': 'Mahatma Gandhi', 'Q55594631': 'Lina Einstein', 'Q57193': 'Moritz Schlick', 'Q310794': 'Karl Pearson', 'Q4357787': 'Pauline Koch', 'Q935': 'Isaac Newton', 'Q37160': 'David Hume', 'Q42299': 'Bernhard Riemann', 'Q87554': 'Ernst G. Straus', 'Q93996': 'Ernst Mach', 'Q38193': 'Arthur Schopenhauer', 'Q7322195': 'Riazuddin', 'Q991': 'Fyodor Dostoyevsky', 'Q116635': 'Heinrich Friedrich Weber', 'Q35802': 'Benedictus de Spinoza', 'Q118253': 'Eduard Einstein', 'Q76346': 'Mileva Marić', 'Q97154': 'Heinrich Burkhardt', 'Q468357': 'Lieserl (Einstein)', 'Q25820': 'Thomas Young', 'Q41688': 'Hendrik Lorentz', 'Q4175282': 'Alfred Kleiner', 'Q123371': 'Hans Albert Einstein', 'Q57246': 'Hermann Minkowski', 'Q9095': 'James Clerk Maxwell', 'Q68761': 'Elsa Einstein', 'Q355245': 'Henry George', 'Q88665': 'Hermann Einstein'}
