This notebook is for getting timeseries of related entities to the one defined using Graph traversal on Wikidata IDs to get related entities and then using news-signals in order to get the news volume timeseries of the related entities.

Edit the below cell, 

**entity** = The surface entity, the root of the graph traversal

**depth** = Depth of the graph search of related entities

In [44]:
entity = "Elon Musk"
depth = 1

Python library imports for this notebook, incase you want to run a particular cell block separetely (provided it doesnt have past variable dependencies)

In [45]:
import requests
import csv

Here we perform the graph traversal, a simple BFS approach, we get back the related entity IDs.

The SPARQL **query** is inside the function, feel free to modify it according to needs. 

It is currently set to get entities of these types:

**Humans (Q5)**

**Organizations (Q43229)**

**Companies (Q4830453)**

**Products (Q2424752)**

**Brands (Q431289)**

**Publications (Q732577)**

**Films (Q11424)**

**Books (Q571)**

You can build your query here [Query Builder](https://query.wikidata.org/querybuilder/?uselang=en)

In [50]:
def get_wikidata_id(entity_name):
    """
    Given an entity name returns its Wikidata ID.
    Uses Wikidata's wbsearchentities API.
    """
    url = "https://www.wikidata.org/w/api.php"
    params = {
        "action": "wbsearchentities",
        "format": "json",
        "language": "en",
        "search": entity_name
    }
    response = requests.get(url, params=params)
    data = response.json()
    
    if "search" in data and data["search"]:
        # Return the first match
        return data["search"][0]["id"]
    else:
        print(f"No Wikidata ID found for {entity_name}.")
        return None

def get_related_entities(wikidata_id, depth=1):
    """
    Given a Wikidata ID, performs a graph traversal to find related entities up to a specified depth.
    "Related" means any entity connected via outgoing edges.
    
    Parameters:
    - wikidata_id: The starting Wikidata ID (e.g., 'Q937' for Albert Einstein).
    - depth: The number of hops (levels) to traverse.
    
    Returns:
    - A list of Wikidata IDs that are related to the starting entity.
    
    Note: This implementation uses iterative breadth-first search (BFS) and can produce many queries.
    """
    endpoint_url = "https://query.wikidata.org/sparql"
    headers = {
        "User-Agent": "GraphTraversalBot/1.0 (your_email@example.com) Python/requests"
    }
    
    visited = set()
    current_level = {wikidata_id}
    all_related = set()
    
    for d in range(depth):
        next_level = set()
        for item in current_level:
            # This SPARQL query finds entities related to the current item (only outgoing edges).
            query = f"""
            SELECT ?related ?relatedLabel WHERE {{
                wd:{item} ?prop ?related .
                FILTER(isIRI(?related))
                FILTER EXISTS {{
                    ?related wdt:P31/wdt:P279* ?type .
                    VALUES ?type {{ wd:Q5 wd:Q43229 wd:Q4830453 wd:Q2424752 wd:Q431289 wd:Q732577 wd:Q11424 wd:Q571 }}
                }}
                SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }}
                }}
            """

            response = requests.get(endpoint_url, params={'query': query, 'format': 'json'}, headers=headers)
            result = response.json()
            
            for binding in result["results"]["bindings"]:
                related_uri = binding["related"]["value"]
                # Extract the Wikidata ID from the URI (e.g., http://www.wikidata.org/entity/Q42 -> Q42)
                if related_uri.startswith("http://www.wikidata.org/entity/"):
                    related_id = related_uri.split("/")[-1]
                    if related_id not in visited:
                        next_level.add(related_id)
                        all_related.add(related_id)
            visited.add(item)
        current_level = next_level
        if not current_level:
            break  # no further nodes to traverse
    return list(all_related)

if __name__ == "__main__":
    wikidata_id = get_wikidata_id(entity)
    if wikidata_id:
        print(f"Wikidata ID for '{entity}': {wikidata_id}")
        # Change depth as needed; be cautious with high numbers!
        related_entities = get_related_entities(wikidata_id, depth)
        print(f"Related entities (depth {depth}): {related_entities}")
    else:
        print("Entity conversion failed. Check the input name.")


Wikidata ID for 'Elon Musk': Q317521
Related entities (depth 1): ['Q202973', 'Q122374852', 'Q30686532', 'Q28874479', 'Q18325434', 'Q604444', 'Q112957545', 'Q105538968', 'Q111363577', 'Q918', 'Q111167889', 'Q1329269', 'Q109731214', 'Q46845259', 'Q101674980', 'Q7974160', 'Q30', 'Q122374820', 'Q193701', 'Q104721242', 'Q93418989', 'Q105424537', 'Q97572429', 'Q122374054', 'Q478214', 'Q258', 'Q48817614', 'Q29043471', 'Q269309', 'Q16', 'Q112626243', 'Q101675234', 'Q14590866', 'Q123885', 'Q131158869', 'Q112626244', 'Q67311526', 'Q1420038', 'Q1989', 'Q112626245', 'Q229166', 'Q6708744', 'Q95724881', 'Q10341331', 'Q7242167', 'Q104721244', 'Q41506', 'Q28534056', 'Q210893', 'Q21708200', 'Q49117', 'Q6318376', 'Q483959', 'Q101879184', 'Q671782', 'Q6409751', 'Q35723119', 'Q209896', 'Q1860', 'Q3926', 'Q117970', 'Q104101703', 'Q7555824', 'Q101675036', 'Q111204042', 'Q1427829', 'Q24007468', 'Q7827568', 'Q6173448']


Now that we have the related entities IDs from wikidata, we can convert them back to human-readable format for our use

In [53]:
def get_labels_for_ids(wikidata_ids, language='en'):
    """
    Convert a list of Wikidata IDs (including composite IDs) to human-readable labels.
    This function extracts the base ID (everything before the first hyphen), removes duplicates,
    and then queries Wikidata's API.
    
    Parameters:
      - wikidata_ids: List of strings, e.g. ["Q317521", "Q317521-XXXX", ...]
      - language: Language code for labels (default is 'en').
      
    Returns:
      A dictionary mapping base Wikidata IDs to their human-readable labels.
    """
    if not wikidata_ids:
        print("No Wikidata IDs provided for label lookup.")
        return {}

    # Extract base IDs (before the first hyphen) and remove duplicates
    base_ids = list(set(wid.split('-')[0] for wid in wikidata_ids))
    #print(f"Extracted unique base IDs: {base_ids}")  # Debugging

    url = "https://www.wikidata.org/w/api.php"
    headers = {"User-Agent": "GraphTraversalBot/1.0"}
    labels = {}

    MAX_IDS = 50  # Process in batches to avoid API limits

    for i in range(0, len(base_ids), MAX_IDS):
        batch_ids = base_ids[i:i + MAX_IDS]
        ids_param = "|".join(batch_ids)

        params = {
            "action": "wbgetentities",
            "ids": ids_param,
            "format": "json",
            "props": "labels",
            "languages": language
        }

        response = requests.get(url, params=params, headers=headers)
        try:
            data = response.json()
            #print("API Response:", data)  # Debugging

            if "error" in data:
                print(f"Error fetching labels: {data['error']}")
                continue

            for entity_id, entity_info in data.get("entities", {}).items():
                label = entity_info.get("labels", {}).get(language, {}).get("value", "Unknown")
                labels[entity_id] = label

        except requests.exceptions.JSONDecodeError:
            print(f"Failed to decode JSON response. Raw response: {response.text}")
            continue

    return labels

labels_dict = get_labels_for_ids(related_entities)

for base_id, label in labels_dict.items():
    print(f"{base_id} -> {label}")


Q202973 -> Kingston
Q122374852 -> Azure Musk
Q30686532 -> X.com
Q28874479 -> The Boring Company
Q18325434 -> Template:Elon Musk
Q604444 -> University of Pretoria
Q112957545 -> Shivon Zilis
Q105538968 -> Alexandra Musk
Q111363577 -> Asha Rose Musk
Q918 -> X
Q111167889 -> Musk family
Q1329269 -> The Wharton School
Q109731214 -> Nevada Musk
Q46845259 -> Elon Musk's Tesla Roadster
Q101674980 -> Unknown
Q7974160 -> Waterkloof House Preparatory School
Q30 -> United States of America
Q122374820 -> Strider Musk
Q193701 -> SpaceX
Q104721242 -> Jana Bezuidenhout
Q93418989 -> X Æ A-Ⅻ Musk
Q105424537 -> Smith School of Business
Q97572429 -> Kai Musk
Q122374054 -> Tau Musk
Q478214 -> Tesla, Inc.
Q258 -> South Africa
Q48817614 -> Bryanston High School
Q29043471 -> Neuralink
Q269309 -> Talulah Riley
Q16 -> Canada
Q112626243 -> X Holdings I, Inc.
Q101675234 -> Damian Musk
Q14590866 -> Category:Elon Musk
Q123885 -> Royal Society
Q131158869 -> Department of Government Efficiency
Q112626244 -> X Holdings

Now lets save the results to a csv file

In [54]:
filename = 'labels_dict.csv'

with open(filename, mode='w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Wikidata ID', 'Label'])
    for key, value in labels_dict.items():
        writer.writerow([key, value])

print(f"Data has been written to {filename}")

Data has been written to labels_dict.csv
