This notebook is for getting timeseries of related entities to the one defined using Graph traversal on Wikidata IDs to get related entities and then using news-signals in order to get the news volume timeseries of the related entities.

Edit the below cell, 

**entity** = The surface entity, the root of the graph traversal

**depth** = Depth of the graph search of related entities

In [2]:
entity = "Elon Musk"
depth = 1

Python library imports for this notebook, incase you want to run a particular cell block separetely (provided it doesnt have past variable dependencies)

In [3]:
import requests

Here we perform the graph traversal, a simple BFS approach, we get back the related entity IDs

In [11]:

def get_wikidata_id(entity_name):
    """
    Given an entity name (e.g., 'Albert Einstein'), returns its Wikidata ID.
    Uses Wikidata's wbsearchentities API.
    """
    url = "https://www.wikidata.org/w/api.php"
    params = {
        "action": "wbsearchentities",
        "format": "json",
        "language": "en",
        "search": entity_name
    }
    response = requests.get(url, params=params)
    data = response.json()
    
    if "search" in data and data["search"]:
        # Return the first match
        return data["search"][0]["id"]
    else:
        print(f"No Wikidata ID found for {entity_name}.")
        return None

def get_related_entities(wikidata_id, depth=1):
    """
    Given a Wikidata ID, performs a graph traversal to find related entities up to a specified depth.
    "Related" means any entity connected via any property (incoming or outgoing).
    
    Parameters:
    - wikidata_id: The starting Wikidata ID (e.g., 'Q937' for Albert Einstein).
    - depth: The number of hops (levels) to traverse.
    
    Returns:
    - A list of Wikidata IDs that are related to the starting entity.
    
    Note: This implementation uses iterative breadth-first search (BFS) and can produce many queries.
    """
    endpoint_url = "https://query.wikidata.org/sparql"
    headers = {
        "User-Agent": "GraphTraversalBot/1.0 (your_email@example.com) Python/requests"
    }
    
    visited = set()
    current_level = {wikidata_id}
    all_related = set()
    
    for d in range(depth):
        next_level = set()
        for item in current_level:
            # This SPARQL query finds entities that are connected via any property
            query = f"""
            SELECT ?related WHERE {{
              {{
                wd:{item} ?prop ?related .
              }}
              FILTER(isIRI(?related))
            }}
            """
            response = requests.get(endpoint_url, params={'query': query, 'format': 'json'}, headers=headers)
            result = response.json()
            
            for binding in result["results"]["bindings"]:
                related_uri = binding["related"]["value"]
                # Extract the Wikidata ID from the URI (e.g., http://www.wikidata.org/entity/Q42 -> Q42)
                if related_uri.startswith("http://www.wikidata.org/entity/"):
                    related_id = related_uri.split("/")[-1]
                    if related_id not in visited:
                        next_level.add(related_id)
                        all_related.add(related_id)
            visited.add(item)
        current_level = next_level
        if not current_level:
            break  # no further nodes to traverse
    return list(all_related)

# Example usage:
if __name__ == "__main__":
    wikidata_id = get_wikidata_id(entity)
    if wikidata_id:
        print(f"Wikidata ID for '{entity}': {wikidata_id}")
        # Change depth as needed; be cautious with high numbers!
        related_entities = get_related_entities(wikidata_id, depth)
        print(f"Related entities (depth {depth}): {related_entities}")
    else:
        print("Entity conversion failed. Check the input name.")


Wikidata ID for 'Elon Musk': Q317521
Related entities (depth 1): ['Q317521-E88BE26B-04B8-4598-8D31-C5A1D5BFC258', 'Q317521-f0cabca1-453a-7fd9-95b8-7dc3fd53cf30', 'Q317521-187a7265-4f76-e17d-ff2c-d4c5f02298e9', 'Q101674980', 'Q317521-36c8ec47-46b7-22d6-fd88-0c022a698c66', 'Q317521-D7BE26EE-927C-434F-8070-04132585F593', 'Q97572429', 'Q317521-40d45b83-4c01-ac07-2f87-adc03edb2ab9', 'Q258', 'Q212238', 'Q317521-a4f49af9-16e8-4b56-9475-dd19de53a304', 'Q317521-016F0E9C-FA10-4268-A015-4B076864AA0D', 'Q317521-558a5aec-bd65-494f-9ad4-7abd39fe097b', 'Q317521-0e3f10c8-4341-3129-8a18-46d711e2f9f7', 'Q317521-08eb83da-4853-d7a8-ef3b-c6710c461d15', 'Q317521-cbdb25a0-400d-2345-9c9f-984ca51c44be', 'Q317521-0011c73d-4c99-0479-a292-6dc26999cb73', 'Q7555824', 'Q317521-8A3FB706-BF18-41DE-AA22-0712F3D3C502', 'Q317521-1A41A0F1-8840-4371-83DA-2939672D5633', 'Q317521-74850a75-4d50-bdc7-ce03-41de1ce73a58', 'Q317521-184877ea-45ef-ed6c-14c2-5d16d088c98b', 'Q604444', 'Q317521-384975c2-4188-9bc9-f3c8-5a3ca32d5976', '

Now that we have the related entities IDs from wikidata, we can convert them back to human-readable format for our use

In [14]:
import requests

def get_labels_for_ids(wikidata_ids, language='en'):
    """
    Convert a list of Wikidata IDs (including composite IDs) to human-readable labels.
    This function extracts the base ID (everything before the first hyphen), removes duplicates,
    and then queries Wikidata's API.
    
    Parameters:
      - wikidata_ids: List of strings, e.g. ["Q317521", "Q317521-XXXX", ...]
      - language: Language code for labels (default is 'en').
      
    Returns:
      A dictionary mapping base Wikidata IDs to their human-readable labels.
    """
    if not wikidata_ids:
        print("No Wikidata IDs provided for label lookup.")
        return {}

    # Extract base IDs (before the first hyphen) and remove duplicates
    base_ids = list(set(wid.split('-')[0] for wid in wikidata_ids))
    #print(f"Extracted unique base IDs: {base_ids}")  # Debugging

    url = "https://www.wikidata.org/w/api.php"
    headers = {"User-Agent": "GraphTraversalBot/1.0"}
    labels = {}

    MAX_IDS = 50  # Process in batches to avoid API limits

    for i in range(0, len(base_ids), MAX_IDS):
        batch_ids = base_ids[i:i + MAX_IDS]
        ids_param = "|".join(batch_ids)

        params = {
            "action": "wbgetentities",
            "ids": ids_param,
            "format": "json",
            "props": "labels",
            "languages": language
        }

        response = requests.get(url, params=params, headers=headers)
        try:
            data = response.json()
            #print("API Response:", data)  # Debugging

            if "error" in data:
                print(f"Error fetching labels: {data['error']}")
                continue

            for entity_id, entity_info in data.get("entities", {}).items():
                label = entity_info.get("labels", {}).get(language, {}).get("value", "Unknown")
                labels[entity_id] = label

        except requests.exceptions.JSONDecodeError:
            print(f"Failed to decode JSON response. Raw response: {response.text}")
            continue

    return labels

# Get the cleaned labels
labels_dict = get_labels_for_ids(related_entities)

# Print the original (cleaned) IDs alongside the human-readable label.
for base_id, label in labels_dict.items():
    print(f"{base_id} -> {label}")


Q28874479 -> The Boring Company
Q18325434 -> Template:Elon Musk
Q317521 -> Elon Musk
Q111363577 -> Asha Rose Musk
Q3908516 -> entrepreneurship
Q101674980 -> Unknown
Q122374820 -> Strider Musk
Q21684304 -> Musk
Q30144811 -> Stephen Hawking Medal For Science Communication
Q82955 -> politician
Q23581190 -> Living Legends of Aviation
Q97572429 -> Kai Musk
Q5482740 -> programmer
Q105424537 -> Smith School of Business
Q122374054 -> Tau Musk
Q258 -> South Africa
Q269309 -> Talulah Riley
Q212238 -> civil servant
Q123885 -> Royal Society
Q112626244 -> X Holdings II, Inc.
Q131158869 -> Department of Government Efficiency
Q1989 -> Saskatchewan
Q112626245 -> X Holdings III, LLC
Q10341331 -> Order of Defence Merit
Q21708200 -> OpenAI
Q81096 -> engineer
Q209896 -> honorary degree
Q3926 -> Pretoria
Q104101703 -> Clubhouse
Q7555824 -> SolarCity
L485 -> Unknown
Q140686 -> chairperson
Q7827568 -> Tosca Musk
Q274490 -> public finance
Q604370 -> Time 100
Q84606034 -> Member of the National Academy of Engi