# Use Case A Overview: Audit Research Assistant

GraphRAG does particularly well as a research assistant with large amounts of data. It is able to analyze data, draw meaningful connections, and synthesize concepts and patterns into an insightful outcome. To standup a POC for an audit research assistant, the below limited set of input data files are downloaded from Wikipedia.

- [https://en.wikipedia.org/wiki/Financial_audit](https://en.wikipedia.org/wiki/Financial_audit)
- [https://en.wikipedia.org/wiki/Internal_audit](https://en.wikipedia.org/wiki/Internal_audit)
- [https://en.wikipedia.org/wiki/Money_laundering](https://en.wikipedia.org/wiki/Money_laundering)
- [https://en.wikipedia.org/wiki/Financial_accounting](https://en.wikipedia.org/wiki/Financial_accounting)
- [https://en.wikipedia.org/wiki/Forensic_accounting](https://en.wikipedia.org/wiki/Forensic_accounting)
- [https://en.wikipedia.org/wiki/Embezzlement](https://en.wikipedia.org/wiki/Embezzlement)
- [https://en.wikipedia.org/wiki/Fraud](https://en.wikipedia.org/wiki/Fraud)
- [https://en.wikipedia.org/wiki/Fraud_deterrence](https://en.wikipedia.org/wiki/Fraud_deterrence)
- [https://en.wikipedia.org/wiki/Association_of_Certified_Fraud_Examiners](https://en.wikipedia.org/wiki/Association_of_Certified_Fraud_Examiners)

**Outcome**: The ability to reason across documents to respond to user input will be demonstrated.

**Note**: This notebook has already been ran in the GitHub repository. However, you can empty the following folders and follow this notebook to reproduce the use case. If you do not want to reproduce the content and just want to run the GraphRAG search, skip to the last section, [Skip the re-creation, just run the Global Search](#skip-the-re-creation-just-run-the-global-search)


### Re-create UseCaseA folder

In [1]:
useCaseDir = "./UseCaseA"
folders_to_recreate = ["cache","input", "output"]
delete_all = True

import shutil
import os
if delete_all == True:
    if os.path.exists(useCaseDir):
        shutil.rmtree(useCaseDir)
    os.mkdir(useCaseDir)
else:
    for folder in folders_to_recreate:
        if os.path.exists(f"{useCaseDir}/{folder}"):
            shutil.rmtree(f"{useCaseDir}/{folder}")
        os.mkdir(f"{useCaseDir}/{folder}")


### Set variables

In [2]:
import os, sys
import base64, time
from dotenv import load_dotenv

load_dotenv(override=True)

True

### Define helper functions

In [3]:
import wikipedia

def download_wikipedia_page(url: str, directory: str, filename: str):
    """
    Download a Wikipedia page as a UTF-8 file in the specified directory.

    Args:
        url (str): The Wikipedia page url.
        directory (str): The directory where the file will be saved.
        filename (str): The name of the file to save the content as.
    """
    # Extract the page title from the url
    page_title = url.split("/")[-1]
    
    # Fetch the page content
    try:
        page_content = wikipedia.page(page_title, auto_suggest=False, redirect=True, preload=False).content
    except wikipedia.exceptions.DisambiguationError as e:
        print(f"Disambiguation error: {e}. Please specify a more specific title.")
        
    
    
    # Ensure the directory exists
    os.makedirs(directory, exist_ok=True)
    
    # Write the content to a UTF-8 file
    file_path = os.path.join(directory, filename)
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(page_content)
    
    print(f"Page saved to {file_path}")


### Get Wiki Articles

In [4]:
wikipedia_pages = [
    {"url": "https://en.wikipedia.org/wiki/Financial_audit", "filename": "financial_audit.txt"},
    {"url": "https://en.wikipedia.org/wiki/Internal_audit", "filename": "internal_audit.txt"},
    {"url": "https://en.wikipedia.org/wiki/Money_laundering", "filename": "money_laundering.txt"},
    {"url": "https://en.wikipedia.org/wiki/Financial_accounting", "filename": "financial_accounting.txt"},
    {"url": "https://en.wikipedia.org/wiki/Forensic_accounting", "filename": "forensic_accounting.txt"},
    {"url": "https://en.wikipedia.org/wiki/Embezzlement", "filename": "embezzlement.txt"},
    {"url": "https://en.wikipedia.org/wiki/Fraud", "filename": "fraud.txt"},
    {"url": "https://en.wikipedia.org/wiki/Fraud_deterrence", "filename": "fraud_deterrence.txt"},
    {"url": "https://en.wikipedia.org/wiki/Association_of_Certified_Fraud_Examiners", "filename": "association_of_fraud_examiners.txt"},
]

for page in wikipedia_pages:
    download_wikipedia_page(page["url"], "./UseCaseA/input", page["filename"])

Page saved to ./UseCaseA/input\financial_audit.txt
Page saved to ./UseCaseA/input\internal_audit.txt
Page saved to ./UseCaseA/input\money_laundering.txt
Page saved to ./UseCaseA/input\financial_accounting.txt
Page saved to ./UseCaseA/input\forensic_accounting.txt
Page saved to ./UseCaseA/input\embezzlement.txt
Page saved to ./UseCaseA/input\fraud.txt
Page saved to ./UseCaseA/input\fraud_deterrence.txt
Page saved to ./UseCaseA/input\association_of_fraud_examiners.txt


### Build the GraphRAG index

In [29]:
from pathlib import Path
from pprint import pprint

import pandas as pd

import graphrag.api as api
from graphrag.config.load_config import load_config
from graphrag.index.typing.pipeline_run_result import PipelineRunResult
module_path = os.path.abspath(os.path.join('.'))
sys.path.append(module_path)

In [30]:
module_path

'c:\\Users\\aprilhazel\\Source\\graphrag_demo_202505\\graphrag_demo_202505'

In [33]:
PROJECT_DIRECTORY = "UseCaseA"
PROJECT_PATH =os.path.join(module_path, PROJECT_DIRECTORY)

In [34]:
! graphrag init --root ./UseCaseA

⠋ GraphRAG Indexer 
Initializing project at 
C:\Users\aprilhazel\Source\graphrag_demo_202505\graphrag_demo_202505\UseCaseA
⠋ GraphRAG Indexer 


>
> **STOP** The above statement generates a .env and settings.yaml file in the UseCaseA directory. 
>
> Configure ```./UseCaseA/.env``` and ```./UseCaseA/settings.yaml``` before proceeding!
> 
> Sample settings are shown in ```./sample_uca.settings.yaml``` file and the ```./sample.env```.

In [36]:
graphrag_config = load_config(Path(PROJECT_PATH))

In [37]:
index_result: list[PipelineRunResult] = await api.build_index(config=graphrag_config)

# index_result is a list of workflows that make up the indexing pipeline that was run
for workflow_result in index_result:
    status = f"error\n{workflow_result.errors}" if workflow_result.errors else "success"
    print(f"Workflow Name: {workflow_result.workflow}\tStatus: {status}")

  warn(


Workflow Name: create_base_text_units	Status: success
Workflow Name: create_final_documents	Status: success
Workflow Name: extract_graph	Status: success
Workflow Name: finalize_graph	Status: success
Workflow Name: create_communities	Status: success
Workflow Name: create_final_text_units	Status: success
Workflow Name: create_community_reports	Status: success
Workflow Name: generate_text_embeddings	Status: success


In [38]:
# load the index result into dataframes
entities = pd.read_parquet(f"{PROJECT_DIRECTORY}/output/entities.parquet")
communities = pd.read_parquet(f"{PROJECT_DIRECTORY}/output/communities.parquet")
community_reports = pd.read_parquet(
    f"{PROJECT_DIRECTORY}/output/community_reports.parquet"
)

### Execute a GraphRAG Global Search

In [39]:
# Question 1: What are things I should think about when I am conducting an accounts payable audit?
response, context = await api.global_search(
    config=graphrag_config,
    entities=entities,
    communities=communities,
    community_reports=community_reports,
    community_level=2,
    dynamic_community_selection=False,
    response_type="Multiple Paragraphs",
    query="What are things I should think about when I am conducting an accounts payable audit?",
)
print(response)

When conducting an accounts payable audit, there are several critical aspects to consider to ensure a comprehensive and effective audit process. These considerations span compliance, risk management, internal controls, and the audit process itself. Below are key points to guide you through the audit:

### Compliance and Legal Considerations

1. **Sarbanes-Oxley Act Compliance**: Ensure that your audit procedures align with the Sarbanes-Oxley Act, which mandates robust internal control procedures to prevent corporate scandals and financial misstatements. This Act is crucial for improving corporate accountability and preventing financial misrepresentation [Data: Reports (8, 3, 32)].

2. **Anti-Money Laundering Regulations**: Be aware of the potential for money laundering activities and ensure compliance with stringent anti-money laundering regulations. These regulations are vital for maintaining the integrity of the financial system and preventing illicit financial activities [Data: Repo