# Sovereign Triad Corpus Analysis

This notebook analyzes the `triad_corpus.json` file to extract insights about the Sovereign Triad framework. We will perform the following analysis:
1.  **Load and Inspect Data**: Load the JSON corpus and examine its structure.
2.  **Key Term Frequency**: Count the occurrences of core concepts like "Truth," "Wisdom," and "Humanity."
3.  **Visualize Term Usage**: Create visualizations to see how term usage varies across different sections of the document.
4.  **Relationship Graph**: Build and visualize a network graph of the sections to understand their relationships.
5.  **Centrality Analysis**: Identify the most central concepts in the framework using network analysis metrics.

## 1. Load and Inspect the Corpus Data
First, we load the `triad_corpus.json` file and inspect its structure.

In [None]:
import json
import pandas as pd

# Load the corpus data
with open('triad_corpus.json', 'r', encoding='utf-8') as f:
    corpus_data = json.load(f)

# Convert to a pandas DataFrame for easier analysis
df = pd.DataFrame(corpus_data)

# Display the first few rows and the structure of the DataFrame
print("Corpus Data Structure:")
print(df.info())
print("\nFirst 5 Rows:")
print(df.head())


## 2. Analyze Key Term Frequency
Here, we'll count the occurrences of the core terms "Truth", "Wisdom", and "Humanity".

In [None]:
from collections import Counter
import re

# Define key terms
key_terms = ['truth', 'wisdom', 'humanity']

# Concatenate all text for a global count
all_text = ' '.join(df['paragraph']).lower()

# Find all occurrences of the key terms
term_counts = Counter(re.findall(r'\b(' + '|'.join(key_terms) + r')\b', all_text))

print("Key Term Frequencies:")
for term, count in term_counts.items():
    print(f"- {term.capitalize()}: {count}")


## 3. Visualize Term Usage Across Sections
Now, let's visualize how the usage of these terms varies across the main sections of the document.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Calculate term frequency per section
term_freq_by_section = df.groupby('subsection')['paragraph'].apply(lambda x: ' '.join(x).lower()).apply(lambda x: pd.Series({term: len(re.findall(r'\b' + term + r'\b', x)) for term in key_terms}))

# Plotting the heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(term_freq_by_section, annot=True, cmap='viridis', fmt='g')
plt.title('Key Term Frequency by Subsection')
plt.xlabel('Key Terms')
plt.ylabel('Subsection')
plt.xticks(rotation=45)
plt.show()


## 4. Construct and Visualize the Relationship Graph
We'll create a graph where each subsection is a node. An edge exists between two nodes if they belong to the same main section.

In [None]:
import networkx as nx
from itertools import combinations

# Create a graph
G = nx.Graph()

# Add nodes
for subsection in df['subsection'].unique():
    G.add_node(subsection)

# Add edges between subsections within the same main section
# A simple approach: connect all subsections that share a main section title part
df['main_section'] = df['subsection'].apply(lambda x: x.split(' > ')[0])
for section, group in df.groupby('main_section'):
    for u, v in combinations(group['subsection'].unique(), 2):
        G.add_edge(u, v)

# Visualize the graph
plt.figure(figsize=(14, 14))
pos = nx.spring_layout(G, k=0.5, iterations=50)
nx.draw(G, pos, with_labels=True, node_size=50, font_size=8, node_color='skyblue', edge_color='gray')
plt.title('Relationship Graph of Subsections')
plt.show()


## 5. Identify Central Concepts with Network Analysis
Finally, we'll calculate centrality measures to identify the most influential subsections.

In [None]:
# Calculate centrality measures
degree_centrality = nx.degree_centrality(G)
betweenness_centrality = nx.betweenness_centrality(G)
eigenvector_centrality = nx.eigenvector_centrality(G, max_iter=1000) # Increased max_iter

# Create a DataFrame for centrality scores
centrality_df = pd.DataFrame({
    'Degree Centrality': pd.Series(degree_centrality),
    'Betweenness Centrality': pd.Series(betweenness_centrality),
    'Eigenvector Centrality': pd.Series(eigenvector_centrality)
})

# Sort by degree centrality to find the most connected concepts
print("Top 5 Central Concepts by Degree Centrality:")
print(centrality_df.sort_values(by='Degree Centrality', ascending=False).head())

print("\nTop 5 Central Concepts by Betweenness Centrality:")
print(centrality_df.sort_values(by='Betweenness Centrality', ascending=False).head())
