# Visualizing the knowledge graph with `yfiles-jupyter-graphs`

This notebook is a partial copy of [local_search.ipynb](../../local_search.ipynb) that shows how to use `yfiles-jupyter-graphs` to add interactive graph visualizations of the parquet files  and how to visualize the result context of `graphrag` queries (see at the end of this notebook).

In [1]:
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License.

In [1]:
import os

import pandas as pd
import tiktoken

from graphrag.config.enums import ModelType
from graphrag.config.models.language_model_config import LanguageModelConfig
from graphrag.language_model.manager import ModelManager
from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_covariates,
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.structured_search.local_search.mixed_context import (
    LocalSearchMixedContext,
)
from graphrag.query.structured_search.local_search.search import LocalSearch
from graphrag.vector_stores.lancedb import LanceDBVectorStore

## Local Search Example

Local search method generates answers by combining relevant data from the AI-extracted knowledge-graph with text chunks of the raw documents. This method is suitable for questions that require an understanding of specific entities mentioned in the documents (e.g. What are the healing properties of chamomile?).

### Load text units and graph data tables as context for local search

- In this test we first load indexing outputs from parquet files to dataframes, then convert these dataframes into collections of data objects aligning with the knowledge model.

### Load tables to dataframes

In [2]:
INPUT_DIR = "/home/chuaxu/projects/graphrag/ragsas/output"
LANCEDB_URI = f"{INPUT_DIR}/lancedb"

COMMUNITY_REPORT_TABLE = "community_reports"
COMMUNITY_TABLE = "communities"
ENTITY_TABLE = "entities"
RELATIONSHIP_TABLE = "relationships"
COVARIATE_TABLE = "covariates"
TEXT_UNIT_TABLE = "text_units"
COMMUNITY_LEVEL = 2

#### Read entities

In [3]:
# read nodes table to get community and degree data
entity_df = pd.read_parquet(f"{INPUT_DIR}/{ENTITY_TABLE}.parquet")
community_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_TABLE}.parquet")

#### Read relationships

In [4]:
relationship_df = pd.read_parquet(f"{INPUT_DIR}/{RELATIONSHIP_TABLE}.parquet")
relationships = read_indexer_relationships(relationship_df)

# Visualizing nodes and relationships with `yfiles-jupyter-graphs`

`yfiles-jupyter-graphs` is a graph visualization extension that provides interactive and customizable visualizations for structured node and relationship data.

In this case, we use it to provide an interactive visualization for the knowledge graph of the [local_search.ipynb](../../local_search.ipynb) sample by passing node and relationship lists converted from the given parquet files. The requirements for the input data is an `id` attribute for the nodes and `start`/`end` properties for the relationships that correspond to the node ids. Additional attributes can be added in the `properties` of each node/relationship dict:

In [5]:
%pip install yfiles_jupyter_graphs --quiet
from yfiles_jupyter_graphs import GraphWidget
import numpy as np
import pandas as pd


# converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_entities_to_dicts(df):
    """Convert the entities dataframe to a list of dicts for yfiles-jupyter-graphs."""
    def clean_value(value):
        """Clean a value to make it JSON serializable."""
        # Handle arrays first (before checking for NaN)
        if isinstance(value, (np.ndarray, list)):
            # Convert arrays to strings or take first element if single value
            if len(value) == 0:
                return None
            elif len(value) == 1:
                return str(value[0])
            else:
                return str(value)
        # Now check for NaN on scalar values
        elif pd.isna(value):
            return None
        elif isinstance(value, (np.integer, np.floating)):
            # Convert numpy numbers to Python numbers
            if np.isnan(value) or np.isinf(value):
                return None
            return value.item()
        elif isinstance(value, float):
            # Handle Python floats that might be NaN or inf
            if pd.isna(value) or np.isinf(value):
                return None
            return value
        else:
            return value
    
    nodes_dict = {}
    for _, row in df.iterrows():
        # Create a dictionary for each row and collect unique nodes
        node_id = row["title"]
        if node_id not in nodes_dict:
            # Clean all properties to make them JSON serializable
            cleaned_properties = {k: clean_value(v) for k, v in row.to_dict().items()}
            nodes_dict[node_id] = {
                "id": node_id,
                "properties": cleaned_properties,
            }
    return list(nodes_dict.values())


# converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
def convert_relationships_to_dicts(df):
    """Convert the relationships dataframe to a list of dicts for yfiles-jupyter-graphs."""
    def clean_value(value):
        """Clean a value to make it JSON serializable."""
        # Handle arrays first (before checking for NaN)
        if isinstance(value, (np.ndarray, list)):
            # Convert arrays to strings or take first element if single value
            if len(value) == 0:
                return None
            elif len(value) == 1:
                return str(value[0])
            else:
                return str(value)
        # Now check for NaN on scalar values
        elif pd.isna(value):
            return None
        elif isinstance(value, (np.integer, np.floating)):
            # Convert numpy numbers to Python numbers
            if np.isnan(value) or np.isinf(value):
                return None
            return value.item()
        elif isinstance(value, float):
            # Handle Python floats that might be NaN or inf
            if pd.isna(value) or np.isinf(value):
                return None
            return value
        else:
            return value
    
    relationships = []
    for _, row in df.iterrows():
        # Create a dictionary for each row
        cleaned_properties = {k: clean_value(v) for k, v in row.to_dict().items()}
        relationships.append({
            "start": row["source"],
            "end": row["target"],
            "properties": cleaned_properties,
        })
    return relationships


w = GraphWidget()
w.directed = True
w.nodes = convert_entities_to_dicts(entity_df)
w.edges = convert_relationships_to_dicts(relationship_df)

Note: you may need to restart the kernel to use updated packages.


## Configure data-driven visualization

The additional properties can be used to configure the visualization for different use cases.

In [6]:
# show title on the node
w.node_label_mapping = "title"


# map community to a color
def community_to_color(community):
    """Map a community to a color."""
    colors = [
        "crimson",
        "darkorange",
        "indigo",
        "cornflowerblue",
        "cyan",
        "teal",
        "green",
    ]
    return (
        colors[int(community) % len(colors)] if community is not None else "lightgray"
    )


def edge_to_source_community(edge):
    """Get the community of the source node of an edge."""
    source_node = next(
        (entry for entry in w.nodes if entry["properties"]["title"] == edge["start"]),
        None,
    )
    if source_node is None:
        return None
    # Handle missing community property gracefully
    source_node_community = source_node["properties"].get("community", None)
    return source_node_community if source_node_community is not None else None


w.node_color_mapping = lambda node: community_to_color(node["properties"].get("community", None))
w.edge_color_mapping = lambda edge: community_to_color(edge_to_source_community(edge))
# map size data to a reasonable factor
w.node_scale_factor_mapping = lambda node: 0.5 + node["properties"].get("size", 0) * 1.5 / 20
# use weight for edge thickness
w.edge_thickness_factor_mapping = "weight"

## Automatic layouts

The widget provides different automatic layouts that serve different purposes: `Circular`, `Hierarchic`, `Organic (interactiv or static)`, `Orthogonal`, `Radial`, `Tree`, `Geo-spatial`.

For the knowledge graph, this sample uses the `Circular` layout, though `Hierarchic` or `Organic` are also suitable choices.

In [7]:
# Use the circular layout for this visualization. For larger graphs, the default organic layout is often preferrable.
w.circular_layout()

## Display the graph

In [8]:
display(w)

GraphWidget(layout=Layout(height='800px', width='100%'))

# Visualizing the result context of `graphrag` queries

The result context of `graphrag` queries allow to inspect the context graph of the request. This data can similarly be visualized as graph with `yfiles-jupyter-graphs`.

## Making the request

The following cell recreates the sample queries from [local_search.ipynb](../../local_search.ipynb).

In [7]:
# setup (see also ../../local_search.ipynb)
entities = read_indexer_entities(entity_df, community_df, COMMUNITY_LEVEL)

description_embedding_store = LanceDBVectorStore(
    collection_name="default-entity-description",
)
description_embedding_store.connect(db_uri=LANCEDB_URI)

# Comment out covariates for now if file doesn't exist
try:
    covariate_df = pd.read_parquet(f"{INPUT_DIR}/{COVARIATE_TABLE}.parquet")
    claims = read_indexer_covariates(covariate_df)
    covariates = {"claims": claims}
except FileNotFoundError:
    print("Covariate file not found, proceeding without covariates")
    covariates = {}

report_df = pd.read_parquet(f"{INPUT_DIR}/{COMMUNITY_REPORT_TABLE}.parquet")
reports = read_indexer_reports(report_df, community_df, COMMUNITY_LEVEL)
text_unit_df = pd.read_parquet(f"{INPUT_DIR}/{TEXT_UNIT_TABLE}.parquet")
text_units = read_indexer_text_units(text_unit_df)

# Load configuration from settings.yaml
from graphrag.config.load_config import load_config
from pathlib import Path

config_path = Path("/home/chuaxu/projects/graphrag/ragsas")
config = load_config(config_path)

# Get model configurations from the loaded config
chat_model_config = config.get_language_model_config("default_chat_model")
embedding_model_config = config.get_language_model_config("default_embedding_model")

chat_model = ModelManager().get_or_create_chat_model(
    name="local_search",
    model_type=chat_model_config.type,
    config=chat_model_config,
)

token_encoder = tiktoken.encoding_for_model(chat_model_config.model)

text_embedder = ModelManager().get_or_create_embedding_model(
    name="local_search_embedding",
    model_type=embedding_model_config.type,
    config=embedding_model_config,
)

context_builder = LocalSearchMixedContext(
    community_reports=reports,
    text_units=text_units,
    entities=entities,
    relationships=relationships,
    covariates=covariates,
    entity_text_embeddings=description_embedding_store,
    embedding_vectorstore_key=EntityVectorStoreKey.ID,  # if the vectorstore uses entity title as ids, set this to EntityVectorStoreKey.TITLE
    text_embedder=text_embedder,
    token_encoder=token_encoder,
)

local_context_params = {
    "text_unit_prop": 0.5,
    "community_prop": 0.1,
    "conversation_history_max_turns": 5,
    "conversation_history_user_turns_only": True,
    "top_k_mapped_entities": 10,
    "top_k_relationships": 10,
    "include_entity_rank": True,
    "include_relationship_weight": True,
    "include_community_rank": False,
    "return_candidate_context": False,
    "embedding_vectorstore_key": EntityVectorStoreKey.ID,  # set this to EntityVectorStoreKey.TITLE if the vectorstore uses entity title as ids
    "max_tokens": 80_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 5000)
}

model_params = {
    "max_tokens": 16_000,  # change this based on the token limit you have on your model (if you are using a model with 8k limit, a good setting could be 1000=1500, the model supports at most 16384 completion tokens)
    "temperature": 0.0,
}

search_engine = LocalSearch(
    model=chat_model,
    context_builder=context_builder,
    token_encoder=token_encoder,
    model_params=model_params,
    context_builder_params=local_context_params,
    response_type="multiple paragraphs",  # free form text describing the response type and format, can be anything, e.g. prioritized list, single paragraph, multiple paragraphs, multiple-page report
)

Covariate file not found, proceeding without covariates


## Run local search on sample queries

In [8]:
result = await search_engine.search("Tell me about Agent Mercer")
print(result.response)

I'm sorry, but I don't have any information about Agent Mercer in the provided data tables. If you have any other questions or need information on a different topic, feel free to ask!


In [9]:
from IPython.display import Markdown, display

question = "How do different pricing strategies impact the business revenue?"
# question = "Which published studies in our knowledge base used both panel data methods and cointegration analysis on emerging market economies?"
result = await search_engine.search(question)

# Display as formatted Markdown instead of plain text
display(Markdown(result.response))



# Impact of Pricing Strategies on Business Revenue

Pricing strategies are crucial for businesses aiming to optimize revenue and enhance customer engagement. The Online Media Company, specializing in music subscription services, provides a compelling case study of how personalized pricing strategies can significantly impact business revenue. By leveraging advanced analytical tools such as SAS's DEEPPRICE procedure, the company tailors its pricing plans to align with individual user profiles, thereby maximizing profitability [Data: Reports (14); Entities (1729, 1762, 1745, 1754); Relationships (1968, 1992)].

## Personalized Pricing Plans

The Online Media Company employs personalized pricing plans to cater to the unique consumption patterns and willingness to pay of each subscriber. This approach involves analyzing user behavior and characteristics to set user-specific prices, which are adjusted based on the price elasticity of demand. The DEEPPRICE procedure plays a pivotal role in this process by estimating demand curves and accounting for heterogeneous price effects based on user characteristics. This enables the company to offer pricing plans that reflect the individual needs and behaviors of its users, ultimately driving revenue growth [Data: Entities (1729, 1762, 1745); Relationships (1992); Sources (438, 413)].

## Policy Evaluation and Comparison

The company conducts policy evaluation and comparison to identify the most effective pricing strategies. For instance, policies such as S1, which offers discounts to all users, and S2, which targets price-sensitive users, are compared based on their revenue generation capabilities. Output 16.1.7 highlights that policy S1 generates the most revenue, indicating the effectiveness of personalized pricing strategies in maximizing business profitability. Additionally, the company considers offering personalized discounts during the discount season, further enhancing its ability to attract and retain subscribers [Data: Entities (1784, 1778, 1740, 1741); Relationships (1967, 1980); Sources (438)].

## Analytical Tools and Data Integration

The integration of SAS analytical tools, such as DEEPPRICE and MYLIB, into the company's operations underscores the importance of data-driven decision-making in optimizing pricing strategies. The PRICING_SAMPLE dataset, comprising 10,000 simulated observations, is utilized to analyze personalized discount policies, providing valuable insights into customer behavior and business outcomes. This data-driven approach ensures that the company remains responsive to the evolving needs and preferences of its subscribers, allowing it to maintain a competitive edge in the rapidly changing online media landscape [Data: Reports (14); Entities (1733); Relationships (1911); Sources (413, 436)].

In conclusion, the strategic use of personalized pricing plans and advanced analytical tools enables businesses like the Online Media Company to optimize their pricing strategies, thereby enhancing revenue and customer engagement. By tailoring prices to individual user profiles and conducting thorough policy evaluations, the company effectively balances competitive pricing with the goal of maximizing profitability.

## Inspecting the context data used to generate the response

In [10]:
result.context_data["entities"].head()

Unnamed: 0,id,entity,description,number of relationships,in_context
0,1776,POLICY S1,An optimal pricing policy that sets user-speci...,2,True
1,1729,ONLINE MEDIA COMPANY,The ONLINE MEDIA COMPANY is a dynamic organiza...,12,True
2,1774,POLICY OPTIMIZATION,The goal of maximizing the expected utility of...,1,True
3,1745,DEEPPRICE PROCEDURE,The DEEPPRICE PROCEDURE is a comprehensive met...,8,True
4,1785,OUTPUT 16.1.8,Output showing policy comparison based on reve...,4,True


In [11]:
result.context_data["relationships"].head()

Unnamed: 0,id,source,target,description,weight,links,in_context
0,1968,ONLINE MEDIA COMPANY,POLICY OPTIMIZATION,The online media company aims to optimize pric...,8.0,1,True
1,1967,ONLINE MEDIA COMPANY,POLICY EVALUATION AND COMPARISON,The online media company is conducting policy ...,8.0,2,True
2,1923,DEEPPRICE PROCEDURE,POLICY EVALUATION AND COMPARISON,The DEEPPRICE procedure is related to the poli...,1.0,2,True
3,1992,PROC DEEPPRICE,ONLINE MEDIA COMPANY,The online media company uses PROC DEEPPRICE t...,8.0,1,True
4,1924,DEEPPRICE PROCEDURE,SAS CLOUD ANALYTIC SERVICES,The DEEPPRICE procedure requires SAS Cloud Ana...,8.0,1,True


## Visualizing the result context as graph

In [13]:
"""
Helper function to visualize the result context with `yfiles-jupyter-graphs`.

The dataframes are converted into supported nodes and relationships lists and then passed to yfiles-jupyter-graphs.
Additionally, some values are mapped to visualization properties.
"""


def show_graph(result):
    """Visualize the result context with yfiles-jupyter-graphs."""
    from yfiles_jupyter_graphs import GraphWidget

    if (
        "entities" not in result.context_data
        or "relationships" not in result.context_data
    ):
        msg = "The passed results do not contain 'entities' or 'relationships'"
        raise ValueError(msg)

    # converts the entities dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_entities_to_dicts(df):
        """Convert the entities dataframe to a list of dicts for yfiles-jupyter-graphs."""
        nodes_dict = {}
        for _, row in df.iterrows():
            # Create a dictionary for each row and collect unique nodes
            node_id = row["entity"]
            if node_id not in nodes_dict:
                nodes_dict[node_id] = {
                    "id": node_id,
                    "properties": row.to_dict(),
                }
        return list(nodes_dict.values())

    # converts the relationships dataframe to a list of dicts for yfiles-jupyter-graphs
    def convert_relationships_to_dicts(df):
        """Convert the relationships dataframe to a list of dicts for yfiles-jupyter-graphs."""
        relationships = []
        for _, row in df.iterrows():
            # Create a dictionary for each row
            relationships.append({
                "start": row["source"],
                "end": row["target"],
                "properties": row.to_dict(),
            })
        return relationships

    w = GraphWidget()
    # use the converted data to visualize the graph
    w.nodes = convert_entities_to_dicts(result.context_data["entities"])
    w.edges = convert_relationships_to_dicts(result.context_data["relationships"])
    w.directed = True
    # show title on the node
    w.node_label_mapping = "entity"
    # use weight for edge thickness
    w.edge_thickness_factor_mapping = "weight"
    display(w)


show_graph(result)

GraphWidget(layout=Layout(height='700px', width='100%'))

In [12]:
# Analyze the context data and token usage that the LLM receives
import tiktoken
from IPython.display import Markdown, display

def analyze_context_and_tokens(result, search_engine):
    """Analyze the context data and count tokens for different message components."""
    
    # Get the token encoder
    encoding = tiktoken.encoding_for_model("gpt-4")
    
    print("=== CONTEXT DATA ANALYSIS ===\n")
    
    # Show context data statistics
    if "entities" in result.context_data:
        entities_df = result.context_data["entities"]
        print(f"📊 Entities in context: {len(entities_df)} entities")
        print(f"   - Columns: {list(entities_df.columns)}")
        print(f"   - Sample entity: {entities_df.iloc[0]['entity'] if len(entities_df) > 0 else 'None'}")
    
    if "relationships" in result.context_data:
        relationships_df = result.context_data["relationships"]
        print(f"📊 Relationships in context: {len(relationships_df)} relationships")
        print(f"   - Columns: {list(relationships_df.columns)}")
        print(f"   - Sample relationship: {relationships_df.iloc[0]['source']} -> {relationships_df.iloc[0]['target'] if len(relationships_df) > 0 else 'None'}")
    
    if "reports" in result.context_data:
        reports_df = result.context_data["reports"]
        print(f"📊 Community reports in context: {len(reports_df)} reports")
    
    if "sources" in result.context_data:
        sources_df = result.context_data["sources"]
        print(f"📊 Text sources in context: {len(sources_df)} text units")
    
    print("\n=== TOKEN ANALYSIS ===\n")
    
    # Reconstruct the context that was sent to the LLM
    # This is an approximation of what the LocalSearch builds
    
    # 1. System prompt
    with open("/home/chuaxu/projects/graphrag/ragsas/prompts/local_search_system_prompt.txt", "r") as f:
        system_prompt = f.read()

    # 2. Context data formatting (full version of what LocalSearch does)
    context_parts = []
    
    # Add entities context (FULL - no truncation)
    if "entities" in result.context_data and len(result.context_data["entities"]) > 0:
        entities_context = "## Relevant Entities:\n\n"
        for _, entity in result.context_data["entities"].iterrows():  # Show ALL entities
            desc = entity.get('description', 'No description')
            rank = entity.get('rank', 'N/A')
            entities_context += f"**{entity['entity']}** (Rank: {rank})\n"
            entities_context += f"Description: {desc}\n\n"
        context_parts.append(entities_context)
    
    # Add relationships context (FULL - no truncation)
    if "relationships" in result.context_data and len(result.context_data["relationships"]) > 0:
        relationships_context = "## Relevant Relationships:\n\n"
        for _, rel in result.context_data["relationships"].iterrows():  # Show ALL relationships
            desc = rel.get('description', 'No description')
            weight = rel.get('weight', 'N/A')
            relationships_context += f"**{rel['source']} → {rel['target']}** (Weight: {weight})\n"
            relationships_context += f"Description: {desc}\n\n"
        context_parts.append(relationships_context)
    
    # Add community reports context (FULL - no truncation)
    if "reports" in result.context_data and len(result.context_data["reports"]) > 0:
        reports_context = "## Relevant Community Reports:\n\n"
        for _, report in result.context_data["reports"].iterrows():  # Show ALL reports
            title = report.get('title', 'Untitled Report')
            content = report.get('content', report.get('summary', 'No content'))
            rank = report.get('rank', 'N/A')
            reports_context += f"**{title}** (Rank: {rank})\n"
            reports_context += f"{content}\n\n"
        context_parts.append(reports_context)
    
    # Add sources context (FULL - no truncation)
    if "sources" in result.context_data and len(result.context_data["sources"]) > 0:
        sources_context = "## Relevant Text Sources:\n\n"
        for _, source in result.context_data["sources"].iterrows():  # Show ALL sources
            text_content = source.get('text', source.get('content', 'No content'))
            source_id = source.get('id', 'Unknown')
            rank = source.get('rank', 'N/A')
            sources_context += f"**Source {source_id}** (Rank: {rank})\n"
            sources_context += f"{text_content}\n\n"
        context_parts.append(sources_context)
    
    # 3. User question
    user_question = question  # From the previous cell
    
    # Combine all context
    full_context = "\n".join(context_parts)
    
    # 4. Calculate tokens for each component
    system_prompt_tokens = len(encoding.encode(system_prompt))
    context_tokens = len(encoding.encode(full_context))
    user_question_tokens = len(encoding.encode(user_question))
    response_tokens = len(encoding.encode(result.response))
    
    total_input_tokens = system_prompt_tokens + context_tokens + user_question_tokens
    total_tokens = total_input_tokens + response_tokens
    
    print(f"🔢 Token Breakdown:")
    print(f"   - System prompt: {system_prompt_tokens:,} tokens")
    print(f"   - Context data: {context_tokens:,} tokens")
    print(f"   - User question: {user_question_tokens:,} tokens")
    print(f"   - Total INPUT: {total_input_tokens:,} tokens")
    print(f"   - Response: {response_tokens:,} tokens")
    print(f"   - TOTAL MESSAGE: {total_tokens:,} tokens")
    
    # Show model limits
    model_limit = 128_000  # GPT-4o limit
    print(f"\n📏 Model Capacity:")
    print(f"   - Model limit: {model_limit:,} tokens")
    print(f"   - Used: {total_tokens:,} tokens ({total_tokens/model_limit*100:.1f}%)")
    print(f"   - Remaining: {model_limit - total_tokens:,} tokens")
    
    if total_tokens > model_limit:
        print("   ⚠️  WARNING: Token count exceeds model limit!")
    elif total_tokens > model_limit * 0.9:
        print("   ⚠️  WARNING: Token count is near model limit!")
    else:
        print("   ✅ Token count is within safe limits")
    
    print("\n" + "="*80)
    print("=== FULL CONTEXT SENT TO LLM ===")
    print("="*80)
    
    # Display the complete context in a nice formatted way
    print(f"\n� SYSTEM PROMPT (Skipped)")
    print("-" * 40)
    
    print(f"\n🔵 USER QUESTION:")
    print("-" * 40)
    print(user_question)
    
    print(f"\n🔵 CONTEXT DATA ({context_tokens:,} tokens):")
    print("-" * 40)
    
    # Display full context as Markdown for better formatting
    display(Markdown(full_context))
    
    print("="*80)
    print("=== END OF CONTEXT ===")
    print("="*80)
    
    return {
        "system_prompt_tokens": system_prompt_tokens,
        "context_tokens": context_tokens,
        "user_question_tokens": user_question_tokens,
        "response_tokens": response_tokens,
        "total_tokens": total_tokens,
        "context_data": result.context_data,
        "full_context": full_context
    }

# Run the analysis
token_analysis = analyze_context_and_tokens(result, search_engine)

=== CONTEXT DATA ANALYSIS ===

📊 Entities in context: 20 entities
   - Columns: ['id', 'entity', 'description', 'number of relationships', 'in_context']
   - Sample entity: POLICY S1
📊 Relationships in context: 26 relationships
   - Columns: ['id', 'source', 'target', 'description', 'weight', 'links', 'in_context']
   - Sample relationship: ONLINE MEDIA COMPANY -> POLICY OPTIMIZATION
📊 Community reports in context: 1 reports
📊 Text sources in context: 3 text units

=== TOKEN ANALYSIS ===

🔢 Token Breakdown:
   - System prompt: 604 tokens
   - Context data: 7,447 tokens
   - User question: 10 tokens
   - Total INPUT: 8,061 tokens
   - Response: 592 tokens
   - TOTAL MESSAGE: 8,653 tokens

📏 Model Capacity:
   - Model limit: 128,000 tokens
   - Used: 8,653 tokens (6.8%)
   - Remaining: 119,347 tokens
   ✅ Token count is within safe limits

=== FULL CONTEXT SENT TO LLM ===

� SYSTEM PROMPT (Skipped)
----------------------------------------

🔵 USER QUESTION:
-------------------------------

## Relevant Entities:

**POLICY S1** (Rank: N/A)
Description: An optimal pricing policy that sets user-specific prices to maximize revenue

**ONLINE MEDIA COMPANY** (Rank: N/A)
Description: The ONLINE MEDIA COMPANY is a dynamic organization specializing in music subscription services, leveraging advanced pricing strategies to enhance user engagement and maximize revenue. This company offers a comprehensive music subscription service, providing users with access to a vast library of music tailored to their preferences. To attract and retain subscribers, the company employs personalized discount policies and targeted discounts, ensuring that users receive offers that are most relevant to their listening habits and preferences.

A key component of the company's strategy is its innovative use of personalized pricing plans. By analyzing user behavior and characteristics, the company optimizes its pricing policies to align with the price elasticity of demand. This approach allows the company to adjust prices based on individual user profiles, ensuring that each subscriber receives a pricing plan that reflects their unique consumption patterns and willingness to pay.

The company utilizes a sophisticated pricing tool known as PROC DEEPPRICE, which is instrumental in setting user-specific prices. This tool enables the company to achieve maximum revenue by tailoring prices to the specific needs and behaviors of its users. Through this method, the company can effectively balance the need for competitive pricing with the goal of maximizing profitability.

In addition to its focus on pricing, the ONLINE MEDIA COMPANY is committed to understanding and analyzing user behavior. By collecting and interpreting data on how users interact with its services, the company can refine its offerings and improve the overall user experience. This data-driven approach ensures that the company remains responsive to the evolving needs and preferences of its subscribers, allowing it to maintain a competitive edge in the rapidly changing online media landscape.

Overall, the ONLINE MEDIA COMPANY stands out as a leader in the online media industry, combining cutting-edge technology with strategic pricing and user analysis to deliver a superior music subscription service. Its commitment to personalized pricing and targeted discounts not only enhances user satisfaction but also drives revenue growth, positioning the company for continued success in the digital marketplace.

**POLICY OPTIMIZATION** (Rank: N/A)
Description: The goal of maximizing the expected utility of a pricing policy by setting user-specific prices

**DEEPPRICE PROCEDURE** (Rank: N/A)
Description: The DEEPPRICE PROCEDURE is a comprehensive method detailed in a chapter of a document that explores its various applications. This procedure is primarily utilized for creating tables and analyzing data, specifically tailored for personalized pricing plans. It serves as a robust tool in estimating parameters, which is crucial for understanding and implementing effective pricing strategies. Additionally, the DEEPPRICE PROCEDURE is instrumental in performing policy evaluation and comparison, particularly in scenarios where the treatment variable is not endogenous. This aspect of the procedure allows for a more accurate assessment of different pricing policies, ensuring that the most effective strategies are identified and implemented. Overall, the DEEPPRICE PROCEDURE is a versatile and essential method for businesses looking to optimize their pricing plans through detailed data analysis and policy evaluation.

**OUTPUT 16.1.8** (Rank: N/A)
Description: Output showing policy comparison based on revenue, indicating that personalized policies s1, s1d, s2, and s4 generate more revenue than the base policy

**PROC DEEPPRICE** (Rank: N/A)
Description: PROC DEEPPRICE is a versatile procedure designed for deep learning and policy evaluation, particularly in the context of analyzing price and demand data. It is equipped to handle continuous outcome variables by utilizing the identity function for G in the outcome model, which simplifies the process of modeling these types of variables. This procedure is instrumental in estimating demand curves, allowing users to specify the correct functional form to accurately capture the relationship between price and demand. Additionally, PROC DEEPPRICE is adept at estimating optimal revenue per user by accounting for heterogeneous price effects based on user characteristics, thereby enabling more personalized and effective pricing strategies.

The procedure offers several options to enhance its functionality and adaptability. Users can specify the minibatch size, which is crucial for managing computational resources and optimizing the learning process in deep learning applications. Furthermore, PROC DEEPPRICE includes options for random seed generation, ensuring reproducibility and consistency in results across different runs. It also provides mechanisms for handling missing values, which is essential for maintaining the integrity and accuracy of the data analysis.

In the context of price effect estimation, PROC DEEPPRICE saves the estimation details, facilitating a comprehensive analysis of how price changes impact demand. This feature is particularly useful for businesses and researchers aiming to understand and predict consumer behavior in response to pricing strategies. By integrating these capabilities, PROC DEEPPRICE serves as a powerful tool for those seeking to leverage data-driven insights to optimize pricing and maximize revenue.

Overall, PROC DEEPPRICE stands out as a robust procedure that combines deep learning techniques with advanced policy evaluation methods to deliver precise and actionable insights into price and demand dynamics. Its ability to handle continuous outcome variables, estimate demand curves, and account for heterogeneous price effects makes it an invaluable asset for analysts and decision-makers in various industries.

**POLICY S3** (Rank: N/A)
Description: A pricing policy offering a discount to all users

**PRICING_SAMPLE** (Rank: N/A)
Description: The "PRICING_SAMPLE" is a dataset comprising 10,000 simulated observations that capture various customer characteristics and online behaviors. This dataset is specifically designed to facilitate the analysis of personalized discount policies, allowing researchers and analysts to explore how tailored pricing strategies can impact consumer behavior and business outcomes. The "PRICING_SAMPLE" serves as a valuable resource for understanding the dynamics of customer interactions and preferences in an online shopping environment.

Additionally, the "PRICING_SAMPLE" is utilized within analytical sessions associated with the "mylib" libref, indicating its integration into broader data analysis frameworks and libraries. This integration suggests that the dataset is not only a standalone resource but also part of a larger ecosystem of data tools and libraries, enhancing its utility for comprehensive analysis and research purposes.

Overall, the "PRICING_SAMPLE" provides a robust foundation for examining the implications of personalized pricing strategies, offering insights into customer behavior that can inform business decisions and marketing strategies. Its role within the "mylib" libref further underscores its importance in data-driven analysis, making it a critical component for those seeking to leverage data for strategic advantage in the realm of personalized discounts and customer engagement.

**OUTPUT 16.1.7** (Rank: N/A)
Description: Output showing policy evaluation based on revenue, highlighting the optimal personalized policy s1

**TRADING STRATEGIES** (Rank: N/A)
Description: Various strategies compared based on forecasted market states, such as buying in a bull market and selling in a bear market

**DEEPPRICE** (Rank: N/A)
Description: DEEPPRICE is a sophisticated procedure designed to facilitate various tasks within the realm of data analysis. It serves as a tool for model information storage, ensuring that complex data structures and their corresponding parameters are efficiently organized and accessible for further analysis. Additionally, DEEPPRICE plays a crucial role in parameter learning, which involves the process of adjusting model parameters to improve the accuracy and reliability of data predictions and insights.

Beyond its foundational capabilities in data management and learning, DEEPPRICE is also employed to estimate patterns in user characteristics and pricing policies. This aspect of the procedure is particularly valuable for businesses and organizations seeking to understand consumer behavior and optimize their pricing strategies. By analyzing user data, DEEPPRICE can identify trends and correlations that inform more effective pricing decisions, ultimately enhancing the organization's ability to meet market demands and maximize profitability.

In summary, DEEPPRICE is a versatile and powerful procedure that combines model information storage, parameter learning, and pattern estimation to provide comprehensive data analysis solutions. Its application in understanding user characteristics and pricing policies underscores its importance in strategic decision-making processes, making it an indispensable tool for entities aiming to leverage data for competitive advantage.

**MARCH** (Rank: N/A)
Description: March is a month mentioned in the context of varying price sensitivity

**APRIL** (Rank: N/A)
Description: April is a month mentioned in the context of varying price sensitivity

**POLICY S2** (Rank: N/A)
Description: A pricing policy offering discounts to users with income less than 1 and who visit the website less than five days a week

**S1** (Rank: N/A)
Description: Policy of offering a discount to everyone

**S1DSTAR** (Rank: N/A)
Description: S1DSTAR is an optimized discount policy that generates less revenue than S1 and S1STAR

**MAY** (Rank: N/A)
Description: May is a month mentioned in the context of varying price sensitivity

**S1D** (Rank: N/A)
Description: S1D is an optimized discount policy that generates less revenue than S1 and S1STAR

**S2** (Rank: N/A)
Description: S2 is a personalized policy designed to offer discounts specifically to price-sensitive users, thereby improving upon the base policy price. The optimal strategy employed by S2 involves offering discounts exclusively to individuals who exhibit positive Individual Treatment Effect (ITE) values. This approach ensures that discounts are targeted effectively, maximizing the benefit for both the users and the entity implementing the policy. By focusing on users who are most likely to respond positively to price adjustments, S2 enhances the overall efficiency and effectiveness of the discount offering process.

**EXAMPLE 15.2** (Rank: N/A)
Description: Example 15.2 illustrates a personalized discount policy for an online media company


## Relevant Relationships:

**ONLINE MEDIA COMPANY → POLICY OPTIMIZATION** (Weight: 8.0)
Description: The online media company aims to optimize pricing policies to maximize revenue

**ONLINE MEDIA COMPANY → POLICY EVALUATION AND COMPARISON** (Weight: 8.0)
Description: The online media company is conducting policy evaluation and comparison to optimize pricing strategies

**DEEPPRICE PROCEDURE → POLICY EVALUATION AND COMPARISON** (Weight: 1.0)
Description: The DEEPPRICE procedure is related to the policy evaluation and comparison discussed in the document

**PROC DEEPPRICE → ONLINE MEDIA COMPANY** (Weight: 8.0)
Description: The online media company uses PROC DEEPPRICE to optimize pricing strategies

**DEEPPRICE PROCEDURE → SAS CLOUD ANALYTIC SERVICES** (Weight: 8.0)
Description: The DEEPPRICE procedure requires SAS Cloud Analytic Services to run

**DEEPPRICE PROCEDURE → SAS VIYA** (Weight: 7.0)
Description: The DEEPPRICE procedure is supported by SAS Viya's Deep Learning Programming Guide

**DEEPPRICE PROCEDURE → ODS** (Weight: 1.0)
Description: The DEEPPRICE procedure uses the Output Delivery System to create tables

**DEEPPRICE PROCEDURE → DEEPCAUSAL PROCEDURE** (Weight: 1.0)
Description: Both DEEPPRICE and DEEPCAUSAL procedures use deep neural networks for causal inference

**ONLINE MEDIA COMPANY → US** (Weight: 1.0)
Description: Some users of the online media company access its website from the US

**ONLINE MEDIA COMPANY → PRICING_SAMPLE** (Weight: 7.0)
Description: The PRICING_SAMPLE data set is used to analyze the personalized discount policy of the online media company

**ONLINE MEDIA COMPANY → PRICING_SAMPLE.CSV** (Weight: 1.0)
Description: The online media company uses the pricing_sample.csv dataset for personalized customer segmentation

**ONLINE MEDIA COMPANY → OUTPUT 16.1.2** (Weight: 7.0)
Description: Output 16.1.2 is used by the online media company to analyze user behavior

**ONLINE MEDIA COMPANY → OUTPUT 16.1.3** (Weight: 7.0)
Description: Output 16.1.3 is used by the online media company to analyze user behavior

**MICROSOFT → ONLINE MEDIA COMPANY** (Weight: 5.0)
Description: Microsoft provided data for the online media company to offer personalized pricing plans

**EXAMPLE 15.2 → ONLINE MEDIA COMPANY** (Weight: 1.0)
Description: Example 15.2 discusses the personalized discount policy of an online media company

**ONLINE MEDIA COMPANY → MUSIC SUBSCRIPTION SERVICE** (Weight: 8.0)
Description: The online media company offers the music subscription service to its customers

**DEEPPRICE PROCEDURE → DNN** (Weight: 8.0)
Description: The DEEPPRICE procedure uses DNNs for causal inference in a two-step semiparametric framework

**ONLINE MEDIA COMPANY → DISCOUNT SEASON** (Weight: 7.0)
Description: The online media company considers offering personalized discounts during the discount season

**OUTPUT 16.1.8 → DGP** (Weight: 7.0)
Description: Output 16.1.8 compares policies using true DGP revenues

**DEEPPRICE PROCEDURE → CAUSAL INFERENCE** (Weight: 9.0)
Description: The DEEPPRICE procedure is designed to perform causal inference using deep neural networks

**DEEPPRICE PROCEDURE → OBSERVATIONAL STUDY** (Weight: 1.0)
Description: The DEEPPRICE procedure treats experiments as observational studies to estimate causal effects

**OUTPUT 16.1.8 → POLICY S3** (Weight: 6.0)
Description: Policy s3 is worse than price according to Output 16.1.8

**OUTPUT 16.1.7 → POLICY S1** (Weight: 8.0)
Description: Policy s1 is highlighted in Output 16.1.7 as generating the most revenue

**OUTPUT 16.1.8 → POLICY S5** (Weight: 6.0)
Description: Policy s5 is the worst compared to price according to Output 16.1.8

**OUTPUT 16.1.8 → POLICY S0** (Weight: 6.0)
Description: Policy s0 is equivalent to price according to Output 16.1.8

**POLICY S1 → POLICY S1D** (Weight: 9.0)
Description: Policy s1d is a modified version of policy s1 that ensures prices do not exceed the original price


## Relevant Community Reports:

**SAS Analytical Tools and Online Media Company** (Rank: N/A)
# SAS Analytical Tools and Online Media Company

This community is centered around the use of SAS analytical tools, particularly the LIBNAME statement, DEEPPRICE, and MYLIB, in conjunction with an Online Media Company that leverages these tools for advanced data analysis and pricing strategies. The entities are interconnected through various data management and analysis processes, highlighting the integration of SAS capabilities in optimizing business operations and decision-making.

## LIBNAME Statement's Role in Data Management

The LIBNAME statement in SAS is a fundamental command used to assign library references to data sources, facilitating efficient data management and access within the SAS environment. It is integral to the SAS programming language, allowing users to define a libref, which acts as a shortcut or alias for a directory or data source. This capability is crucial for organizing and accessing data, particularly when dealing with large datasets that require distributed processing through SAS's Cloud Analytic Services (CAS) engine [Data: Entities (1681); Relationships (1858)].

## DEEPPRICE's Analytical Capabilities

DEEPPRICE is a sophisticated procedure designed to facilitate various tasks within data analysis, including model information storage, parameter learning, and pattern estimation. It is particularly valuable for businesses seeking to understand consumer behavior and optimize pricing strategies. By analyzing user data, DEEPPRICE can identify trends and correlations that inform more effective pricing decisions, enhancing the organization's ability to meet market demands and maximize profitability [Data: Entities (1754); Relationships (1927, 1934, 1936, 1937, 1938, 1939)].

## MYLIB as a Central Data Repository

MYLIB is a versatile library reference used within the SAS environment, specifically designed to facilitate data storage and management across various analytical and modeling processes. It connects to a CAS session, enabling efficient handling of data tables distributed across machine nodes. MYLIB serves as a central repository for storing a wide array of datasets, including those used in statistical analysis, modeling, and data processing tasks [Data: Entities (288); Relationships (248, 1958, 1964, 1965, 1966, 2003, 2397, 249, 2287, 2289, 3386, 492, 493, 494, 2266, 2267, 2268, 2269, 2270, 2271, 2272, 2273, 2274, 2275, 2276, 3178)].

## Online Media Company's Pricing Strategies

The Online Media Company is a dynamic organization specializing in music subscription services, leveraging advanced pricing strategies to enhance user engagement and maximize revenue. The company employs personalized discount policies and targeted discounts, ensuring that users receive offers most relevant to their listening habits and preferences. This approach allows the company to adjust prices based on individual user profiles, optimizing pricing policies to align with the price elasticity of demand [Data: Entities (1729); Relationships (1911, 1968, 1969)].

## Integration of SAS Tools in Business Operations

The integration of SAS tools such as DEEPPRICE and MYLIB into business operations is evident in the Online Media Company's use of these tools for data analysis and pricing strategy optimization. The company utilizes the PRICING_SAMPLE dataset to analyze personalized discount policies, demonstrating the practical application of SAS capabilities in real-world business scenarios. This integration underscores the importance of advanced analytical tools in driving data-driven decision-making and enhancing business performance [Data: Entities (1733); Relationships (1911, 1970, 1971)].


## Relevant Text Sources:

**Source 438** (Rank: N/A)
The estimated price elasticities can then be plotted against various characteristics of users. The following statements plot the relationship between elasticity and income, which is shown in Output 16.1.2: 
proc sort data=odetails ; 
by income; 
run; 
proc sgplot data=odetails ; 
series x=income y=elasticity /legendlabel='Sales elasticity prediction'; 
xaxis label='Income' values=(0 to 6 by 1); 
yaxis label='Song sales elasticity'; 
run; 
Example 16.1: Personalized Customer Segmentation for an Online Media Company F 991 

Output 16.1.2 shows a remarkable heterogeneity among users with respect to income for their response to a price increase. Users whose income is less than 1 (51% of all users) are more sensitive to a price increase; if price goes up by 1%, their number of songs purchased falls by 1% to 5%, compared to around 0.15% for higher-income users. 
The following statements plot the relationship between elasticity, income, and days_visited, which is shown in Output 16.1.3: 
proc sort data=odetails ; 
by days_visited income; run; proc sgpanel data=odetails; 
panelby days_visited/layout=panel columns=4 rows=2 uniscale=column; series x=income y=elasticity; colaxis label='Income' valueshint values=(0 1 2 3 4 5 6); rowaxis label='Song sales elasticity' grid; 
run; 

Output 16.1.3 further shows that low-income users who visited the website less often in the past (less than fve days a week) are even more price-sensitive than low-income users who visited the website more often. The discussion next turns to policy evaluation and comparison. 
In policy optimization, the goal is to maximize the expected utility of a policy. For the defnition of the expected utility function and how it is estimated, as well as details such as the defnition of a policy rule, see the section Policy Evaluation and Comparison on page 984. In this example, the policy decisions are to choose user-specifc prices (pricei) in order to maximize the revenue, which is defned as 
Revenue.pricei;xi/D ...xi/C ..xi/ pricei/ pricei 
The optimal price for each user is therefore derived as price D. ..xi / . Let s1 be the optimal policy that 
i 2..xi / 
sets each users price as price . This optimal policy is highly personalized, because it sets user-specifc prices 
i 
during the discount season, but those prices can exceed the original prices. It is therefore more meaningful to consider an optimized discount personalized policy, s1d, that is equal to s1 if s1 does not exceed the original price, and equal to 1 otherwise. 
Although the optimal pricing policy s1 guarantees the maximum revenue and is highly personalized, the online media company might want to consider a price discount that targets a group of users or all users. Whether the companys revenue will rise or fall depends on the price elasticity of songs demand. Microeconomic theory predicts that lowering prices will increase revenue if demand is price-elastic (elasticity < 1) and decrease revenue if demand is price-inelastic (elasticity > 1). 
Output 16.1.2 and Output 16.1.3 show that price elasticity is heterogeneous among users; the most price-sensitive users are those whose income is less than 1 and who spend on average less than fve days a week on the website. This suggests a policy that offers this type of user a price discount. Let s2 be this policy. For comparison, also consider policies s3 (offer everyone a discount) and s0 (offer no one a discount). 
To select the discount rates, revenues were computed for policies s2 and s3 and for each of the six discount values 5%, 10%, 15%, 20%, 25%, and 30%. For policy s2, a 5% discount did not improve the base policy price, and although a 10%, 15%, 20%, 25%, or 30% discount generated more revenue than the base policy, there was no signifcant difference among the fve discount rates. A 10% discount was therefore selected for policy s2. Using a similar analysis, a 5% discount was selected for policy s3. 
Policies s2 and s3 offer the same discount to all users or groups of users. Versions of these policies (s4 and s5) that randomly select with equal probability a discount rate from the set (5%, 10%, 15%, 20%, 25%, and 30%) are also considered. 
The following DATA step creates policies s0, s1, s1d, s2, s3, s4, and s5 in the data table odetails, which also contains the variable elasticity that was previously computed: 
data mylib.discountPolicy; 
array prob[6] _temporary_ (6*0.1666667); 
call streaminit(54321); 
set odetails; 
s0 =1; 
s1 = -_alpha_/(2*_beta_); 
ifs1>1thens1d =1; 
else s1d = s1; 
if (income < 1 and days_visited <5) then s2 = 0.9; 
else s2=1; 
s3 = 0.95; 
if (income < 1 and days_visited <5) then s4

**Source 413** (Rank: N/A)
% level. 

For more information about the estimates of parameters of interest, policy evaluation, and policy comparison, see the sections Full-Population Average Effect Parameters on page 928 and Subpopulation Average Effect Parameters on page 930. 
Example 15.2: Personalized Discount Policy for an Online Media Company 
This example illustrates how a music subscription service from an online media company can offer targeted discounts through a personalized pricing plan based on many features that it observes about its customers to encourage them to buy more songs or beco me members. The main goal is to construct a policy that raises demand enough to boost overall revenue despite decreasing the price for some customers. 
The data set is provided by the Microsoft research project ALICE and is available at https:// 
msalicedatapublic.z5.web.core.windows.net/datasets/Pricing/pricing_sample. 
csv. The data set has 10,000 simulated observations that represent customers personal characteristics, such 
as age and log income, and online behavior history, such as previous purchase and previous online time per 
week. The treatment variable, t, is a binary variable that indicates whether or not a discount is applied. This 
variable is generated according to the values that the variable price takes in the data set. The value of t is 0 if 
the value of price is 1, indicating that no discount is given, and the value of t is 1 if the value of price is less 
than 1, indicating that a discount is applied. The outcome variable, revenue, is calculated by multiplying the 
number of songs purchased during the discount season by the price paid for the songs. Table 15.3 shows 
the names of the variables that are used in the model, their types, and their defnitions. The type can be T 
(treatment), Y (outcome), x.1/ (a covariate in the propensity score model), and/or x.2/ (a covariate in the 
outcome model). 
Table 15.3 Model Variables 
Name Type Details 
x.1/, x.2/
account_age Users account age x.1/, x.2/
age Users age x.1/, x.2/
avg_hours Average number of hours user was online per week in the past x.1/, x.2/
days_visited Average number of days user visited website per week in the past x.1/, x.2/
friend_count Number of friends user connected to in account x.1/, x.2/
has_membership Whether user has membership x.1/, x.2/
is_US Whether user accesses website from US x.1/, x.2/
songs_purchased Average number of songs user purchased per week in the past x.1/, x.2/
income Users income t T Whether a discount is applied revenue Y Number of songs purchased during discount season times price paid 
Assuming that you have downloaded the data set pricing_sample in your session that is associated with the mylib libref, the following statements create an ID variable that has a unique value for each observation, the treatment variable (t), and the outcome variable (revenue) in the data table new_pricing_sample: 
data mylib.new_pricing_sample; 
set mylib.pricing_sample; id = put(_threadid_,8.) || '_' || Put(_n_,8.); * ID variable; if price<1 then t=1; else t=0; * treatment variable; revenue=price*demand; * outcome variable; 
run; 
The frst step in policy evaluation and policy optimization is to estimate the effect of the treatment and to save to a specifed output data table the details of the estimation, including ../, ../, p./, the residual, and the infuence functions for each unit. You can do this by using the following statements: 
/*---Estimate the treatment effect and save the estimation details ---*/ 
proc deepcausal data=mylib.new_pricing_sample; id id; psmodel t = account_age age avg_hours days_visited friends_count 
has_membership is_US songs_purchased income / dnn=(nodes=(32 32 32 32) train=(optimizer=(miniBatchSize=500 regL1=0.0001 maxEpochs=32000 algorithm=adam) nthreads=20 seed=12345 recordseed=67890)); 
model revenue = account_age age avg_hours days_visited friends_count has_membership is_US songs_purchased income / dnn=(nodes=(32 32 32 32) train=(optimizer=(miniBatchSize=500 regL1=0.001 maxEpochs=32000 algorithm=adam) nthreads=20 seed=12345 recordseed=67890)); 
infer out=mylib.oest outdetails=mylib.odetails; run; 
For this example, the same covariates are used in both the propensity score mode and the outcome model. The model estimation details are saved in the output data table odetails. 
The estimation results are shown in Output 15.2.1. The estimate of the average treatment effect, ATE, is negative and statistically signifcant, suggesting that the discount, on average, causes revenue that is generated by the whole population to decrease. However, the effect of the discount on the revenue among the customers who received a discount, which is measured by the parameter ATT (average treatment effect on the treated), is considerably different from the ATE estimate. It might suggest that identifying the characteristics of customers to whom the discount matters would help construct the optimum policy so that it uses the fewest resources and earns the most proft. 

In policy optimization, the goal is to maximize the expected utility of a policy. For the defnition of the expected utility

**Source 436** (Rank: N/A)
 variable Z 
.2/ .2/
_epsy_ value of the residual in the outcome model, " i D yi . ...O x /C ..O x /ti/when
ii .2/ .2/
there is no instrumental variable, and " i D yi . ...O x /C ..O x /tOi/when there is 
ii .1/ .1/
the instrumental variable Z, where tOi D O0.x /C O1.x /zi
ii 
_eta0_ value of O0.x.1//for each observation. This variable is available only when there is 
i 
the instrumental variable Z. 
_eta1_ value of O1.x.1//for each observation. This variable is available only when there is 
i 
the instrumental variable Z. 
_alpha_ value of ..O x.2//for each observation 
i 
_beta_ value of ..x.2//for each observation 
i 
_alphatilde_ value of ..OQ x/for each observation. This variable is available only when there is the instrumental variable Z. 
_betatilde_ value of ..OQ x/for each observation. This variable is available only when there is the instrumental variable Z. .2/ .2/
_if_alpha_ value of ..yi;xi;ti;..O x /;..O x /;.xi/;t/for each observation when there is 
ii OQ .1/ .1/
no instrumental variable, and value of ..yi;xi;ti;zi;.OQi.xi/;.i.xi/;O0.x /;O1.x /;Z;t/
ii 
for each observation when there is the instrumental variable Z 
.2/ .2/
_if_beta_ value of ..yi;xi;ti;..O x /;..O x /;.xi/;t/for each observation when there is 
ii OQ .1/ .1/
no instrumental variable, and value of ..yi;xi;ti;zi;.OQi.xi/;.i.xi/;O0.x /;O1.x /;Z;t/
ii 
for each observation when there is the instrumental variable Z 
.2/ .2/
_policy_policy-var_ value of .s/.yi;xi;ti;..O xi /;..O xi /;.xi/;t/, where s is the value of a policy variable that you specify in the POLICY= option in the INFER statement. The estimate is displayed for each policy variable. .2/ .2/
_policy_comparison_policy-var_base-var_ value of .sc;sb/.yi;xi;ti;..O xi /;..O xi /;.xi/;t/, where sc is the value of a policy variable, POLICY-VAR, that you specify in the COMPARE= suboption of the POLICYCOMPARISON= option in the INFER statement and sb is the value of a policy variable, BASE-VAR, that you specify in the BASE= suboption of the POLICYCOMPARISON= option in the INFER statement. The estimate is displayed for each combination of sc and sb. 


ODS Table Names 
The DEEPPRICE procedure assigns a name to each table that it creates. You can use these names to refer to the tables when you use the Output Delivery System (ODS) to select tables and create output data tables. 
These names are listed in Table 16.2. 
Table 16.2 ODS Tables Produced in the DEEPPRICE Procedure 
ODS Table Name  Description  Option  
NObs  Number of observations  Default  
ParameterEstimates  Parameter estimates  Default  
PolicyComparison  Policy comparison  POLICYCOMPARISON=  
PolicyEvaluation  Policy evaluation  POLICY=  



Examples: DEEPPRICE Procedure 
Example 16.1: Personalized Customer Segmentation for an Online Media Company 
This example illustrates how a music subscription service from an online media company can offer targeted 
discounts through a personalized pricing plan based on many features that the company observes about its 
customers to encourage them to buy more songs. 
The data set is provided by the Microsoft research project ALICE and is available at https://econmldata. 
azurewebsites.net/datasets/Pricing/pricing_sample.csv. The data set has 10,000 simu­
lated observations that represent users personal characteristics, such as age, log income, and account age, 
as well as users online behavior history, including previous purchases and previous online time per week. 
The treatment variable, price, is the price that the customer was exposed to during the discount season. A 
value of 1 indicates that no discount is given, and a value less than 1 indicates that a discount is applied. The 
outcome variable, demand, is the number of songs that the customer purchased during the discount season. 
Table 16.3 shows the names of the variables that are used in the model, their types, and their defnitions. The 
type can be T (treatment), Y (outcome), x.1/ (a covariate in the treatment model), and/or x.2/ (a covariate in 
the outcome model). 
Table 16.3 Model Variables 

Name  Type  Details  
account_age age avg_hours days_visited friend_count  x.1/, x.2/ x.1/, x.2/ x.1/, x.2/ x.1/, x.2/ x.1/, x.2/  User



=== END OF CONTEXT ===


# Understanding Relationship Weights in GraphRAG

In GraphRAG, the **weight** property on relationships represents the **strength** or **importance** of the connection between two entities. Here's what it means:

## What is Weight?

**Weight** is a numerical value that indicates:
- **Frequency**: How often two entities appear together in the source documents
- **Co-occurrence strength**: The statistical significance of their relationship
- **Semantic closeness**: How tightly connected the entities are in the knowledge graph

## How is Weight Calculated?

The weight is typically derived from:
1. **Text co-occurrence**: How many times the entities appear in the same text units/chunks
2. **Window proximity**: How close the entities appear to each other in the text
3. **Relationship strength**: The confidence level of the extracted relationship
4. **Document frequency**: Across how many documents the relationship appears

## Weight Values

- **Higher weights** (e.g., 8.0, 10.0): Strong, frequently occurring relationships
- **Lower weights** (e.g., 1.0, 2.0): Weaker or less frequent relationships
- **Weight = 1.0**: Often the default/minimum weight for detected relationships

## Usage in GraphRAG

Weights are used for:
- **Ranking relationships**: More important relationships get higher priority in search results
- **Graph visualization**: Thicker edges represent stronger relationships (as seen in the yfiles visualization)
- **Context selection**: Higher-weight relationships are more likely to be included in LLM context
- **Graph algorithms**: Centrality and community detection algorithms use weights to identify key entities

Let's examine the weights in our current result:

In [15]:
# Analyze relationship weights in our current search result
print("=== RELATIONSHIP WEIGHT ANALYSIS ===\n")

if "relationships" in result.context_data:
    relationships_df = result.context_data["relationships"]
    
    if 'weight' in relationships_df.columns:
        # Convert weights to numeric, handling any string values
        relationships_df['weight_numeric'] = pd.to_numeric(relationships_df['weight'], errors='coerce')
        weights = relationships_df['weight_numeric'].dropna()
        
        if len(weights) > 0:
            print(f"📊 Weight Statistics:")
            print(f"   - Total relationships: {len(relationships_df)}")
            print(f"   - Relationships with weights: {len(weights)}")
            print(f"   - Weight range: {weights.min():.2f} to {weights.max():.2f}")
            print(f"   - Average weight: {weights.mean():.2f}")
            print(f"   - Median weight: {weights.median():.2f}")
            
            print(f"\n🔝 Top 5 Strongest Relationships (by weight):")
            top_relationships = relationships_df.nlargest(5, 'weight_numeric')[['source', 'target', 'weight_numeric', 'description']]
            for idx, row in top_relationships.iterrows():
                weight_val = row['weight_numeric']
                if pd.notna(weight_val):
                    print(f"   {row['source']} → {row['target']} (Weight: {weight_val:.2f})")
                else:
                    print(f"   {row['source']} → {row['target']} (Weight: N/A)")
                desc = str(row['description'])[:100] if pd.notna(row['description']) else "No description"
                print(f"      Description: {desc}...")
                print()
            
            print(f"🔻 Bottom 5 Weakest Relationships (by weight):")
            bottom_relationships = relationships_df.nsmallest(5, 'weight_numeric')[['source', 'target', 'weight_numeric', 'description']]
            for idx, row in bottom_relationships.iterrows():
                weight_val = row['weight_numeric']
                if pd.notna(weight_val):
                    print(f"   {row['source']} → {row['target']} (Weight: {weight_val:.2f})")
                else:
                    print(f"   {row['source']} → {row['target']} (Weight: N/A)")
                desc = str(row['description'])[:100] if pd.notna(row['description']) else "No description"
                print(f"      Description: {desc}...")
                print()
            
            # Weight distribution
            print(f"📈 Weight Distribution:")
            weight_bins = [0, 1, 2, 5, 10, float('inf')]
            weight_labels = ['0-1', '1-2', '2-5', '5-10', '10+']
            
            for i, (low, high) in enumerate(zip(weight_bins[:-1], weight_bins[1:])):
                if high == float('inf'):
                    count = len(weights[weights >= low])
                    label = weight_labels[i]
                else:
                    count = len(weights[(weights >= low) & (weights < high)])
                    label = weight_labels[i]
                
                percentage = count / len(weights) * 100 if len(weights) > 0 else 0
                print(f"   - Weight {label}: {count} relationships ({percentage:.1f}%)")
        else:
            print("❌ No valid numeric weights found in relationships data")
            print(f"Sample weight values: {relationships_df['weight'].head().tolist()}")
    else:
        print("❌ No 'weight' column found in relationships data")
        print(f"Available columns: {list(relationships_df.columns)}")
else:
    print("❌ No relationships found in context data")

=== RELATIONSHIP WEIGHT ANALYSIS ===

📊 Weight Statistics:
   - Total relationships: 26
   - Relationships with weights: 26
   - Weight range: 1.00 to 9.00
   - Average weight: 5.62
   - Median weight: 7.00

🔝 Top 5 Strongest Relationships (by weight):
   DEEPPRICE PROCEDURE → CAUSAL INFERENCE (Weight: 9.00)
      Description: The DEEPPRICE procedure is designed to perform causal inference using deep neural networks...

   POLICY S1 → POLICY S1D (Weight: 9.00)
      Description: Policy s1d is a modified version of policy s1 that ensures prices do not exceed the original price...

   ONLINE MEDIA COMPANY → POLICY OPTIMIZATION (Weight: 8.00)
      Description: The online media company aims to optimize pricing policies to maximize revenue...

   ONLINE MEDIA COMPANY → POLICY EVALUATION AND COMPARISON (Weight: 8.00)
      Description: The online media company is conducting policy evaluation and comparison to optimize pricing strategi...

   PROC DEEPPRICE → ONLINE MEDIA COMPANY (Weight: 8.