# GraphRAG API Demo

This notebook is written as a tutorial/demonstration on how to use the GraphRAG solution accelerator API.

## Existing APIs

| HTTP Method | Endpoint         |
|-------------|------------------|
| GET         | /data
| POST        | /data
| DELETE      | /data/{storage_name}
| GET         | /index
| POST        | /index
| DELETE      | /index/{index_name}
| GET         | /index/status/{index_name}
| GET         | /index/config/entity
| PUT         | /index/config/entity
| POST        | /index/config/entity
| GET         | /index/config/entity/{entity_configuration_name}
| DELETE      | /index/config/entity/{entity_configuration_name}
| POST        | /query/global
| POST        | /query/local
| GET         | /graph/graphml/{index_name}
| GET         | /graph/stats/{index_name}
| GET         | /source/report/{index_name}/{report_id}
| GET         | /source/text/{index_name}/{text_unit_id}
| GET         | /source/entity/{index_name}/{entity_id}
| GET         | /source/claim/{index_name}/{claim_id}
| GET         | /source/relationship/{index_name}/{relationship_id}

## Prerequisites
Install 3rd party packages that are not part of the Python Standard Library

In [None]:
! pip install devtools pandas python-magic requests tqdm

In [None]:
import getpass
import json
import sys
import time
from pathlib import Path

import magic
import pandas as pd
import requests
from devtools import pprint
from tqdm import tqdm

## Configuration required by User


#### Get API Key for API Management Service
For authentication, the API requires a *subscription key* to be passed in the header of all requests. To find this key, visit the Azure Portal. The API subscription key will be located under `<my_resource_group> --> <API Management service> --> <APIs> --> <Subscriptions> --> <Built-in all-access subscription> Primary Key`.

In [None]:
ocp_apim_subscription_key = getpass.getpass(
    "Enter the subscription key to the GraphRag APIM:"
)

#### Setup directories and API endpoint

The following parameters are required to access and use the GraphRAG solution accelerator API:
* file_directory
* storage_name
* index_name
* endpoint

For demonstration purposes, you may use the provided `get-wiki-articles.py` script to download a small set of wikipedia articles or provide your own data.

In [None]:
"""
These parameters must be defined by the user:

- file_directory: local directory where data files of interest are stored.
- storage_name: unique name for an Azure blob storage container where files will be uploaded.
- index_name: unique name for a single knowledge graph construction. Multiple indexes can be created from the same blob container of data.
- remote_endpoint: the endpoint URL for GraphRAG service (this is the Gateway URL found in the APIM resource).
"""

file_directory = ""
storage_name = ""
index_name = ""
endpoint = ""

In [None]:
assert (
    file_directory != "" and storage_name != "" and index_name != "" and endpoint != ""
)

## Helper Functions
We've provided helper functions below that encapsulate http requests to make API interaction more intuitive.

In [None]:
"""
"Ocp-Apim-Subscription-Key": 
    This is a custom HTTP header used by Azure API Management service (APIM) to 
    authenticate API requests. The value for this key should be set to the subscription 
    key provided by the Azure APIM instance in your GraphRAG resource group.
"""
headers = {"Ocp-Apim-Subscription-Key": ocp_apim_subscription_key}


def upload_files(
    file_directory: str,
    storage_name: str,
    batch_size: int = 100,
    overwrite: bool = True,
    max_retries: int = 5,
) -> requests.Response | list[Path]:
    """
    Upload files to a blob storage container.

    Args:
    file_directory - a local directory of .txt files to upload. All files must be in utf-8 encoding.
    storage_name - a unique name for the Azure storage container.
    batch_size - the number of files to upload in a single batch.
    overwrite - whether or not to overwrite files if they already exist in the storage container.
    max_retries - the maximum number of times to retry uploading a batch of files if the API is busy.

    NOTE: Uploading files may sometimes fail if the blob container was recently deleted
    (i.e. a few seconds before. The solution "in practice" is to sleep a few seconds and try again.
    """
    url = endpoint + "/data"

    def upload_batch(
        files: list, storage_name: str, overwrite: bool, max_retries: int
    ) -> requests.Response:
        for _ in range(max_retries):
            response = requests.post(
                url=url,
                files=files,
                params={"storage_name": storage_name, "overwrite": overwrite},
                headers=headers,
            )
            # API may be busy, retry
            if response.status_code == 500:
                print("API busy. Sleeping and will try again.")
                time.sleep(10)
                continue
            return response
        return response

    batch_files = []
    accepted_file_types = ["text/plain"]
    filepaths = list(Path(file_directory).iterdir())
    for file in tqdm(filepaths):
        # validate that file is a file, has acceptable file type, has a .txt extension, and has utf-8 encoding
        if (
            not file.is_file()
            or file.suffix != ".txt"
            or magic.from_file(str(file), mime=True) not in accepted_file_types
        ):
            print(f"Skipping invalid file: {file}")
            continue
        # open and decode file as utf-8, ignore bad characters
        batch_files.append(
            ("files", open(file=file, mode="r", encoding="utf-8", errors="ignore"))
        )
        # upload batch of files
        if len(batch_files) == batch_size:
            response = upload_batch(batch_files, storage_name, overwrite, max_retries)
            # if response is not ok, return early
            if not response.ok:
                return response
            batch_files.clear()
    # upload remaining files
    if len(batch_files) > 0:
        response = upload_batch(batch_files, storage_name, overwrite, max_retries)
    return response


def delete_files(storage_name: str) -> requests.Response:
    """Delete a blob storage container."""
    url = endpoint + f"/data/{storage_name}"
    return requests.delete(url=url, headers=headers)


def list_files() -> requests.Response:
    """List all data storage containers."""
    url = endpoint + "/data"
    return requests.get(url=url, headers=headers)


def build_index(
    storage_name: str,
    index_name: str,
    entity_config_name: str = None,
    merge_with_index: str = None,
) -> requests.Response:
    """Create a search index.
    This function kicks off a job that builds a knowledge graph (KG) index from files located in a blob storage container.
    """
    url = endpoint + "/index"
    request = {
        "storage_name": storage_name,
        "index_name": index_name,
        "entity_config_name": entity_config_name,
        "merge_with_index": merge_with_index,
    }
    return requests.post(url, json=request, headers=headers)


def delete_index(index_name: str) -> requests.Response:
    """Delete a search index."""
    url = endpoint + f"/index/{index_name}"
    return requests.delete(url, headers=headers)


def list_indexes() -> list:
    """List all search indexes."""
    url = endpoint + "/index"
    response = requests.get(url, headers=headers)
    try:
        indexes = json.loads(response.text)
        return indexes["index_name"]
    except json.JSONDecodeError:
        print(response.text)
        return response


def index_status(index_name: str) -> requests.Response:
    url = endpoint + f"/index/status/{index_name}"
    return requests.get(url, headers=headers)


def list_entity_configs() -> list:
    """List all entity configurations."""
    url = endpoint + "/index/config/entity"
    response = requests.get(url, headers=headers)
    try:
        entity_types = json.loads(response.text)
        return entity_types["entity_configuration_name"]
    except json.JSONDecodeError:
        print(response.text)
        return response


def create_entity_config(
    name: str,
    entity_type: list[str],
    examples,
) -> requests.Response:
    """Create a new entity configuration."""
    url = endpoint + "/index/config/entity"
    request = json.dumps(
        {
            "entity_configuration_name": name,
            "entity_types": entity_type,
            "entity_examples": examples,
        }
    )
    return requests.post(url=url, data=request, headers=headers)


def delete_entity_config(name: str) -> requests.Response:
    """Delete an existing entity configuration."""
    url = endpoint + f"/index/config/entity/{name}"
    return requests.delete(url, headers=headers)


def modify_entity_config(
    name: str,
    entity_type: list[str],
    examples,
) -> requests.Response:
    """Modify an existing entity configuration."""
    url = endpoint + "/index/config/entity"
    request = {
        "entity_configuration_name": name,
        "entity_types": entity_type,
        "entity_examples": examples,
    }
    return requests.put(url=url, json=request, headers=headers)


def get_entity_config(name: str) -> requests.Response:
    """Get an existing entity configuration."""
    url = endpoint + f"/index/config/entity/{name}"
    return requests.get(url, headers=headers)


def global_search(index_name: str | list[str], query: str) -> requests.Response:
    """Run a global query over the knowledge graph(s) associated with one or more indexes"""
    url = endpoint + "/query/global"
    request = {"index_name": index_name, "query": query}
    return requests.post(url, json=request, headers=headers)


def global_search_streaming(
    index_name: str | list[str], query: str
) -> requests.Response:
    """Run a global query across one or more indexes and stream back the response"""
    url = endpoint + "/experimental/query/global/streaming"
    request = {"index_name": index_name, "query": query}
    context_list = []
    with requests.post(url, json=request, headers=headers, stream=True) as r:
        r.raise_for_status()
        for chunk in r.iter_lines(chunk_size=256 * 1024, decode_unicode=True):
            try:
                payload = json.loads(chunk)
                token = payload["token"]
                context = payload["context"]
                if token != "<EOM>":
                    print(token, end="")
                elif (token == "<EOM>") and not context:
                    print("\n")  # transition from output message to context
                else:
                    context_list.append(context)
            except json.JSONDecodeError:
                print(type(chunk), len(chunk), sys.getsizeof(chunk), chunk, end="\n")
    display(pd.DataFrame.from_dict(context_list).head(10))


def local_search(index_name: str | list[str], query: str) -> requests.Response:
    """Run a local query over the knowledge graph(s) associated with one or more indexes"""
    url = endpoint + "/query/local"
    request = {"index_name": index_name, "query": query}
    return requests.post(url, json=request, headers=headers)


def get_graph_stats(index_name: str) -> requests.Response:
    """Get basic statistics about the knowledge graph constructed by GraphRAG."""
    url = endpoint + f"/graph/stats/{index_name}"
    return requests.get(url, headers=headers)


def save_graphml_file(index_name: str, graphml_file_name: str) -> None:
    """Retrieve and save a graphml file that represents the knowledge graph.
    The file is downloaded in chunks and saved to the local file system.
    """
    url = endpoint + f"/graph/graphml/{index_name}"
    if Path(graphml_file_name).suffix != ".graphml":
        raise UserWarning(f"{graphml_file_name} must have a .graphml file extension")
    with requests.get(url, headers=headers, stream=True) as r:
        r.raise_for_status()
        with open(graphml_file_name, "wb") as f:
            for chunk in r.iter_content(chunk_size=1024):
                f.write(chunk)


def get_report(index_name: str, report_id: str) -> requests.Response:
    """Retrieve a report generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/report/{index_name}/{report_id}"
    return requests.get(url, headers=headers)


def get_entity(index_name: str, entity_id: str) -> requests.Response:
    """Retrieve an entity generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/entity/{index_name}/{entity_id}"
    return requests.get(url, headers=headers)


def get_relationship(index_name: str, relationship_id: str) -> requests.Response:
    """Retrieve a relationship generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/relationship/{index_name}/{relationship_id}"
    return requests.get(url, headers=headers)


def get_claim(index_name: str, claim_id: str) -> requests.Response:
    """Retrieve a claim/covariate generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/claim/{index_name}/{claim_id}"
    return requests.get(url, headers=headers)


def get_text_unit(index_name: str, text_unit_id: str) -> requests.Response:
    """Retrieve a text unit generated by GraphRAG for a specific index."""
    url = endpoint + f"/source/text/{index_name}/{text_unit_id}"
    return requests.get(url, headers=headers)


def parse_query_response(
    response: requests.Response, return_context_data: bool = False
) -> requests.Response | dict[list[dict]]:
    """
    Prints response['result'] value and optionally
    returns associated context data.
    """
    if response.ok:
        print(json.loads(response.text)["result"])
        if return_context_data:
            return json.loads(response.text)["context_data"]
        return response
    else:
        print(response.reason)
        print(response.content)
        return response

## Upload files
Use the API to upload a collection of local files. The API will automatically creates a new data blob container to host these files in. For a set of large files, consider reducing the batch upload size in order to not overwhelm the API endpoint and prevent out-of-memory problems.

In [None]:
response = upload_files(
    file_directory=file_directory,
    storage_name=storage_name,
    batch_size=100,
    overwrite=True,
)
print(response)

#### To list all existing data storage containers:

In [None]:
response = list_files()
print(response)
pprint(response.json())

#### To remove files from the GraphRAG service:

In [None]:
# # uncomment this cell to delete data container
# response = delete_files(storage_name)
# print(response)
# pprint(response.text)

## Entity Configuration (optional)

GraphRAG builds a knowledge graph (KG) from data based on the ability to first identify entities and the relationships between them. Defining a _good_ schema for entities can be a critical step to constructing a high-quality KG. An example is provided below.

Note: Defining an entity configuration is optional but highly encouraged for better performance in domain-specific scenarios. If an entity configuration is not provided, a default entity configuration by the graphrag python package will be used.

#### Create a new entity configuration

An entity configuration object consist of a list of entity types that we will ask the LLM to identify along with a few examples to be used for few-shot prompting.

In [None]:
# provide a unique name to refer to the entity configuration by
entity_configuration_name = f"{index_name}_entity_schema"

# provide a list of entity type labels
entity_types = ["ORGANIZATION"]

# provide a self-labeled example of how the entity types will appear in the data
entity_examples = [
    {
        "entity_types": "ORGANIZATION",
        "text": "Arm's (ARM) stock skyrocketed in its opening day on the Nasdaq Thursday. But IPO experts warn that the British chipmaker's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nArm, a formerly public company, was taken private by SoftBank in 2016. The well-established chip designer says it powers 99% of premium smartphones.",
        "output": '("entity"{tuple_delimiter}ARM{tuple_delimiter}ORGANIZATION{tuple_delimiter}Arm is a stock now listed on the Nasdaq which powers 99% of premium smartphones)\n{record_delimiter}\n("entity"{tuple_delimiter}SOFTBANK{tuple_delimiter}ORGANIZATION{tuple_delimiter}SoftBank is a firm that previously owned Arm)\n{record_delimiter}\n("relationship"{tuple_delimiter}ARM{tuple_delimiter}SOFTBANK{tuple_delimiter}SoftBank formerly owned Arm from 2016 until present{tuple_delimiter}5)\n{completion_delimiter}',
    }
]

# upload and save the entity configuration
response = create_entity_config(
    entity_configuration_name, entity_types, entity_examples
)
print(response)
if response.ok:
    print(response.text)

#### Modify an existing entity configuration
An existing entity configuration object can be modified. Note that the update process is a full-replacement, not additive (i.e. updating the entity schema overwrites the existing schema)

In [None]:
# provide a list of entity type labels
entity_types = ["ORGANIZATION", "GEO", "PERSON"]

# provide a few self-labeled examples of how the entity types will appear in the data
entity_examples = [
    {
        "entity_types": "ORGANIZATION, PERSON",
        "text": "The Fed is scheduled to meet on Tuesday and Wednesday, with the central bank planning to release its latest policy decision on Wednesday at 2:00 p.m. ET, followed by a press conference where Fed Chair Jerome Powell will take questions. Investors expect the Federal Open Market Committee to hold its benchmark interest rate steady in a range of 5.25%-5.5%.",
        "output": '("entity"{tuple_delimiter}FED{tuple_delimiter}ORGANIZATION{tuple_delimiter}The Fed is the Federal Reserve, which will set interest rates on Tuesday and Wednesday)\n{record_delimiter}\n("entity"{tuple_delimiter}JEROME POWELL{tuple_delimiter}PERSON{tuple_delimiter}Jerome Powell is the chair of the Federal Reserve)\n{record_delimiter}\n("entity"{tuple_delimiter}FEDERAL OPEN MARKET COMMITTEE{tuple_delimiter}ORGANIZATION{tuple_delimiter}The Federal Reserve committee makes key decisions about interest rates and the growth of the United States money supply)\n{record_delimiter}\n("relationship"{tuple_delimiter}JEROME POWELL{tuple_delimiter}FED{tuple_delimiter}Jerome Powell is the Chair of the Federal Reserve and will answer questions at a press conference{tuple_delimiter}9)\n{completion_delimiter}',
    },
    {
        "entity_types": "ORGANIZATION",
        "text": "Arm's (ARM) stock skyrocketed in its opening day on the Nasdaq Thursday. But IPO experts warn that the British chipmaker's debut on the public markets isn't indicative of how other newly listed companies may perform.\n\nArm, a formerly public company, was taken private by SoftBank in 2016. The well-established chip designer says it powers 99% of premium smartphones.",
        "output": '("entity"{tuple_delimiter}ARM{tuple_delimiter}ORGANIZATION{tuple_delimiter}Arm is a stock now listed on the Nasdaq which powers 99% of premium smartphones)\n{record_delimiter}\n("entity"{tuple_delimiter}SOFTBANK{tuple_delimiter}ORGANIZATION{tuple_delimiter}SoftBank is a firm that previously owned Arm)\n{record_delimiter}\n("relationship"{tuple_delimiter}ARM{tuple_delimiter}SOFTBANK{tuple_delimiter}SoftBank formerly owned Arm from 2016 until present{tuple_delimiter}5)\n{completion_delimiter}',
    },
    {
        "entity_types": "ORGANIZATION,GEO,PERSON",
        "text": "Five Americans jailed for years in Iran and widely regarded as hostages are on their way home to the United States.\n\nThe last pieces in a controversial swap mediated by Qatar fell into place when $6bn (£4.8bn) of Iranian funds held in South Korea reached banks in Doha.\n\nIt triggered the departure of the four men and one woman in Tehran, who are also Iranian citizens, on a chartered flight to Qatar's capital.\n\nThey were met by senior US officials and are now on their way to Washington.\n\nThe Americans include 51-year-old businessman Siamak Namazi, who has spent nearly eight years in Tehran's notorious Evin prison, as well as businessman Emad Shargi, 59, and environmentalist Morad Tahbaz, 67, who also holds British nationality.",
        "output": '("entity"{tuple_delimiter}IRAN{tuple_delimiter}GEO{tuple_delimiter}Iran held American citizens as hostages)\n{record_delimiter}\n("entity"{tuple_delimiter}UNITED STATES{tuple_delimiter}GEO{tuple_delimiter}Country seeking to release hostages)\n{record_delimiter}\n("entity"{tuple_delimiter}QATAR{tuple_delimiter}GEO{tuple_delimiter}Country that negotiated a swap of money in exchange for hostages)\n{record_delimiter}\n("entity"{tuple_delimiter}SOUTH KOREA{tuple_delimiter}GEO{tuple_delimiter}Country holding funds from Iran)\n{record_delimiter}\n("entity"{tuple_delimiter}TEHRAN{tuple_delimiter}GEO{tuple_delimiter}Capital of Iran where the Iranian hostages were being held)\n{record_delimiter}\n("entity"{tuple_delimiter}DOHA{tuple_delimiter}GEO{tuple_delimiter}Capital city in Qatar)\n{record_delimiter}\n("entity"{tuple_delimiter}WASHINGTON{tuple_delimiter}GEO{tuple_delimiter}Capital city in United States)\n{record_delimiter}\n("entity"{tuple_delimiter}SIAMAK NAMAZI{tuple_delimiter}PERSON{tuple_delimiter}Hostage who spent time in Tehran\'s Evin prison)\n{record_delimiter}\n("entity"{tuple_delimiter}EVIN PRISON{tuple_delimiter}GEO{tuple_delimiter}Notorious prison in Tehran)\n{record_delimiter}\n("entity"{tuple_delimiter}EMAD SHARGI{tuple_delimiter}PERSON{tuple_delimiter}Businessman who was held hostage)\n{record_delimiter}\n("entity"{tuple_delimiter}MORAD TAHBAZ{tuple_delimiter}PERSON{tuple_delimiter}British national and environmentalist who was held hostage)\n{record_delimiter}\n("relationship"{tuple_delimiter}IRAN{tuple_delimiter}UNITED STATES{tuple_delimiter}Iran negotiated a hostage exchange with the United States{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}QATAR{tuple_delimiter}UNITED STATES{tuple_delimiter}Qatar brokered the hostage exchange between Iran and the United States{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}QATAR{tuple_delimiter}IRAN{tuple_delimiter}Qatar brokered the hostage exchange between Iran and the United States{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}SIAMAK NAMAZI{tuple_delimiter}EVIN PRISON{tuple_delimiter}Siamak Namazi was a prisoner at Evin prison{tuple_delimiter}8)\n{record_delimiter}\n("relationship"{tuple_delimiter}SIAMAK NAMAZI{tuple_delimiter}MORAD TAHBAZ{tuple_delimiter}Siamak Namazi and Morad Tahbaz were exchanged in the same hostage release{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}SIAMAK NAMAZI{tuple_delimiter}EMAD SHARGI{tuple_delimiter}Siamak Namazi and Emad Shargi were exchanged in the same hostage release{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}MORAD TAHBAZ{tuple_delimiter}EMAD SHARGI{tuple_delimiter}Morad Tahbaz and Emad Shargi were exchanged in the same hostage release{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}SIAMAK NAMAZI{tuple_delimiter}IRAN{tuple_delimiter}Siamak Namazi was a hostage in Iran{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}MORAD TAHBAZ{tuple_delimiter}IRAN{tuple_delimiter}Morad Tahbaz was a hostage in Iran{tuple_delimiter}2)\n{record_delimiter}\n("relationship"{tuple_delimiter}EMAD SHARGI{tuple_delimiter}IRAN{tuple_delimiter}Emad Shargi was a hostage in Iran{tuple_delimiter}2)\n{completion_delimiter}',
    },
]

# upload and overwrite the entity configuration
response = modify_entity_config(
    entity_configuration_name, entity_types, entity_examples
)
print(response)
if response.ok:
    print(response.text)

#### Get entity configuration
To retrieve an entity configuration that has been previously uploaded:

In [None]:
response = get_entity_config(entity_configuration_name)
print(response)
if response.ok:
    for example in json.loads(response.text)["entity_examples"]:
        for k, v in example.items():
            print("{}: {}\n".format(k, v))
        print()
else:
    print(response.text)

#### List entity configuration
To see the current state of what entity schema configurations exist in the GraphRAG service, you may retrieve a list of all entity configurations previously created.

In [None]:
all_entity_configs = list_entity_configs()
pprint(all_entity_configs)

#### Delete an existing entity configuration
If an entity schema is no longer needed, remove it from the GraphRAG service.

In [None]:
# # uncomment this cell to delete entity configuration
# response = delete_entity_config(entity_configuration_name)
# print(response)
# pprint(response.json())

## Indexing

After data files have been uploaded and an (optional) entity configuration has been created in the GraphRAG service, it is now possible to construct a knowledge graph by creating a search index. If an entity configuration is not provided, a default entity configuration will be used that has been shown to generally work well.

#### Start a new indexing job

In [None]:
response = build_index(
    storage_name=storage_name,
    index_name=index_name,
    entity_config_name=entity_configuration_name
    if "entity_configuration_name" in locals()
    else None,
)
print(response)
if response.ok:
    print(response.text)
else:
    print(f"Failed to submit job.\nStatus: {response.text}")

Note: An indexing job may fail sometimes due to insufficient TPM quota of the GPT-4 turbo model. In this situation, an indexing job can be restarted by re-running the cell above with the same parameters. `graphrag` caches previous indexing results as a cost-savings measure so that restarting indexing jobs will "pick up" where the last job stopped.

#### Check the status of an indexing job

In [None]:
response = index_status(index_name)
print(response)
pprint(response.json())

#### List indexes
To view a list of all indexes that exist in the GraphRAG service:

In [None]:
all_indexes = list_indexes()
pprint(all_indexes)

#### Delete an indexing job
If an index is no longer needed, remove it from the GraphRAG service.

In [None]:
# # uncomment this cell to delete entity configuration
# response = delete_index(index_name)
# print(response)
# pprint(response.json())

## Query

After an indexing job has completed, the knowledge graph is ready to query. Two types of queries (global and local) are currently supported. In addition, you can issue a query over a single index or multiple indexes.

#### Global Search

Global search queries are resource-intensive, but give good responses to questions that require an understanding of the dataset as a whole.

In [None]:
%%time
# pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2]
global_response = global_search(
    index_name=index_name, query="Summarize the main topics of this data"
)
# print the result and save context data in a variable
global_response_data = parse_query_response(global_response, return_context_data=True)
global_response_data

An *experimental* API endpoint has been designed to support streaming back the graphrag response while executing a global query (useful in chatbot applications).

In [None]:
global_search_streaming(
    index_name=index_name, query="Summarize the main topics of this data"
)

#### Local Search

Local search queries are best suited for narrow-focused questions that require an understanding of specific entities mentioned in the documents (e.g. What are the healing properties of chamomile?)

In [None]:
%%time
# pass in a single index name as a string or to query across multiple indexes, set index_name=[myindex1, myindex2]
local_response = local_search(
    index_name=index_name, query="Who are the primary actors in these communities?"
)
# print the result and save context data in a variable
local_response_data = parse_query_response(local_response, return_context_data=True)
local_response_data

## Sources

In a query response, citations will often appear that support GraphRAG's response. API endpoints are provided to enable retrieval of the sourced documents, entities, relationships, etc.

Multiple types of sources may be referenced in a query: Reports, Entities, Relationships, Claims, and Text Units. The API provides various endpoints to retrieve these sources for data provenance.

#### Get a Report

In [None]:
report = get_report(index_name, 0)
print(report.json()["text"]) if report.ok else (report.reason, report.content)

#### Get an Entity

In [None]:
entity = get_entity(index_name, 0)
entity.json() if entity.ok else (entity.reason, entity.content)

#### Get a Relationship

In [None]:
relationship = get_relationship(index_name, 1)
relationship.json() if relationship.ok else (relationship.reason, relationship.content)

#### Get a Claim

In [None]:
claim_response = get_claim(index_name, 1)
if claim_response.ok:
    pprint(claim_response.json())
else:
    print(claim_response)
    print(claim_response.text)

#### Get a Text Unit

In [None]:
text_unit_id = ""  ### Get a text unit id from one of previous Source results
if not text_unit_id:
    raise ValueError(
        "Must provide a text_unit_id from previous source results. Look for 'text_units' in the response."
    )
text_unit = get_text_unit(index_name, text_unit_id)
if text_unit.ok:
    print(text_unit.json()["text"])
else:
    print(text_unit.reason)
    print(text_unit.content)

## Exploring the GraphRAG knowledge graph
The API currently provides some basic functionality to better understand the knowledge graph that was constructed during the indexing process.

In addition, an option is available to export the graph to a graphml file which can be imported by other open source visualization software (we recommend [Gephi](https://gephi.org/)) for deeper exploration.

#### Basic knowledge graph statistics

In [None]:
response = get_graph_stats(index_name)
print(response)
print(response.text)

#### Get a GraphML file

In [None]:
# will save graphml file to the current local directory
save_graphml_file(index_name, "knowledge_graph.graphml")