# GitHub Repository Analyzer Agent - With LangSmith Tracing

**Author**: Kush Sahni (2210110371)  
**Course**: MAT496 - Introduction to LLM  
**Project**: GitHub Repository Analyzer using LangGraph

---

##  What This Notebook Demonstrates

This notebook shows the core concepts of building an AI agent with **full observability**:

1. **State Management** - How agents maintain memory
2. **Tool Design** - GitHub API integration  
3. **ReAct Pattern** - Reasoning + Acting loop
4. **LangSmith Tracing** - Complete visibility into agent execution 
5. **LangGraph Studio Export** - Visual graph editing and testing 

Let's build with full observability! 

---
## Section 1: Setup

First, let's set up our environment with tracing enabled.

In [76]:
# Install required packages (run only once)
# !pip install langchain langchain-openai langgraph python-dotenv PyGithub langsmith

In [77]:
# Load environment variables
import os
from dotenv import load_dotenv

load_dotenv(override=True)

# Suppress warnings
import warnings
warnings.filterwarnings("ignore")

print(" Environment loaded!")

 Environment loaded!


### Enable LangSmith Tracing

 **LangSmith** provides complete observability:
- See every LLM call and response
- Track all tool invocations
- Monitor token usage and costs
- Debug issues easily
- Share traces with team

Get your API key at: [https://smith.langchain.com/settings](https://smith.langchain.com/settings)

In [78]:
# Enable LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "GitHub-Analyzer-Simplified"

# Check if LangSmith API key is configured
langsmith_key = os.getenv("LANGCHAIN_API_KEY") or os.getenv("LANGSMITH_API_KEY")

if langsmith_key:
    print(" LangSmith tracing enabled!")
    print(f"   Project: {os.environ['LANGCHAIN_PROJECT']}")
    print("   View traces at: https://smith.langchain.com/")
    print("\n All agent runs will be automatically traced!")
else:
    print("  LangSmith API key not found")
    print("   Tracing is disabled. Add LANGCHAIN_API_KEY to .env to enable")
    print("   Get your key at: https://smith.langchain.com/settings")

 LangSmith tracing enabled!
   Project: GitHub-Analyzer-Simplified
   View traces at: https://smith.langchain.com/

 All agent runs will be automatically traced!


In [79]:
# Verify API keys
print("API Key Status:")
print("=" * 50)
print(f"GitHub Token:  {'' if os.getenv('GITHUB_TOKEN') else ''}")
print(f"OpenAI API:    {'' if os.getenv('OPENAI_API_KEY') else ''}")
print(f"LangSmith API: {'' if langsmith_key else ''}")

API Key Status:
GitHub Token:  
OpenAI API:    
LangSmith API: 


---
## Section 2: Core Concepts

### 2.1 State Management

In [80]:
from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
from langgraph.graph import add_messages
from langsmith import traceable

class GitHubAgentState(TypedDict):
    """State schema for our GitHub analyzer agent."""
    messages: Annotated[Sequence[BaseMessage], add_messages]
    files: dict[str, str]
    current_repo: str | None
    remaining_steps: int

@traceable(name="get_initial_state")
def get_initial_state() -> GitHubAgentState:
    """Create a fresh state."""
    return {
        "messages": [],
        "files": {},
        "current_repo": None,
        "remaining_steps": 20
    }

print(" State schema defined with tracing!")

 State schema defined with tracing!


In [81]:
from langchain_core.messages import HumanMessage, AIMessage

# Demonstrate add_messages behavior
messages = []
messages = add_messages(messages, [HumanMessage(content="What is LangGraph?")])
messages = add_messages(messages, [AIMessage(content="LangGraph is a framework for building stateful agents.")])

print("Messages after adding:")
for i, msg in enumerate(messages, 1):
    print(f"{i}. {msg.__class__.__name__}: {msg.content}")

print("\n Messages automatically accumulated!")

Messages after adding:
1. HumanMessage: What is LangGraph?
2. AIMessage: LangGraph is a framework for building stateful agents.

 Messages automatically accumulated!


---
## Section 3: Building Tools with Tracing

In [82]:
from langchain_core.tools import tool
from github import Github
import base64

# Initialize GitHub client
github_client = Github(os.getenv("GITHUB_TOKEN"))

print(" GitHub client initialized")

 GitHub client initialized


In [83]:
@tool
@traceable(name="get_repository_info")  # LangSmith will trace this tool
def get_repository_info(repo_name: str) -> str:
    """Get basic information about a GitHub repository.
    
    Args:
        repo_name: Repository name in 'owner/repo' format
    
    Returns:
        Formatted repository information
    """
    try:
        repo = github_client.get_repo(repo_name)
        
        info = f"""# Repository: {repo.full_name}

**Description:** {repo.description or 'No description'}
**Stars:**  {repo.stargazers_count:,}
**Forks:**  {repo.forks_count:,}
**Language:** {repo.language or 'Not specified'}
**Open Issues:**  {repo.open_issues_count}
**Last Updated:** {repo.updated_at}
**URL:** {repo.html_url}
"""
        return info
    except Exception as e:
        return f"Error: {str(e)}"

print(" Tool 1: get_repository_info (with tracing)")

 Tool 1: get_repository_info (with tracing)


In [84]:
@tool
@traceable(name="list_repository_structure")
def list_repository_structure(repo_name: str, path: str = "", max_depth: int = 2) -> str:
    """Get directory tree structure of a GitHub repository."""
    try:
        repo = github_client.get_repo(repo_name)
        
        def build_tree(current_path: str, depth: int = 0, prefix: str = "") -> list[str]:
            if depth > max_depth:
                return []
            
            output = []
            contents = repo.get_contents(current_path)
            
            if isinstance(contents, list):
                dirs = sorted([c for c in contents if c.type == "dir"], key=lambda x: x.name)
                files = sorted([c for c in contents if c.type == "file"], key=lambda x: x.name)
                contents = dirs + files
            else:
                contents = [contents]
            
            for idx, item in enumerate(contents[:20]):
                is_last = idx == min(len(contents), 20) - 1
                connector = " " if is_last else " "
                
                if item.type == "dir":
                    output.append(f"{prefix}{connector} {item.name}/")
                    if depth < max_depth:
                        new_prefix = prefix + ("    " if is_last else "   ")
                        output.extend(build_tree(item.path, depth + 1, new_prefix))
                else:
                    output.append(f"{prefix}{connector} {item.name}")
            
            return output
        
        tree = build_tree(path)
        header = f"Repository: {repo_name}\nPath: /{path or 'root'}\n" + "="*50 + "\n"
        return header + "\n".join(tree)
    except Exception as e:
        return f"Error: {str(e)}"

print(" Tool 2: list_repository_structure (with tracing)")

 Tool 2: list_repository_structure (with tracing)


In [85]:
@tool
@traceable(name="read_file_from_repo")
def read_file_from_repo(repo_name: str, file_path: str, ref: str = "main") -> str:
    """Read a specific file from a GitHub repository."""
    try:
        repo = github_client.get_repo(repo_name)
        
        try:
            file_content = repo.get_contents(file_path, ref=ref)
        except:
            for alt_ref in ["master", "main", "develop"]:
                try:
                    file_content = repo.get_contents(file_path, ref=alt_ref)
                    ref = alt_ref
                    break
                except:
                    continue
            else:
                return f"File not found: {file_path}"
        
        if isinstance(file_content, list):
            return f"Error: {file_path} is a directory"
        
        content = base64.b64decode(file_content.content).decode('utf-8')
        
        return f"""# File: {file_path}
Repository: {repo_name}
Branch: {ref}

{'='*60}

{content}
"""
    except Exception as e:
        return f"Error: {str(e)}"

print(" Tool 3: read_file_from_repo (with tracing)")

 Tool 3: read_file_from_repo (with tracing)


---
## Section 4: Building the Agent

In [86]:
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import SystemMessage
from datetime import datetime

# Initialize LLM
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Gather all tools
tools = [
    get_repository_info,
    list_repository_structure,
    read_file_from_repo
]

# Bind tools to the model
model_with_tools = model.bind_tools(tools)

# System prompt
SYSTEM_PROMPT = f"""You are a helpful GitHub Repository Analyzer Agent.

**Current Date:** {datetime.now().strftime('%Y-%m-%d')}

**Your Capabilities:**
- Getting repository information (stars, language, description)
- Listing directory structures
- Reading source code files

**Guidelines:**
- Be concise in your analysis
- Focus on answering the user's specific question
- Use tools to gather information before answering
"""

# Define the agent node
@traceable(name="agent_node")
def agent_node(state: GitHubAgentState) -> dict:
    """The agent node that calls the LLM."""
    messages = state["messages"]
    
    # Prepend system message if not already present
    if not messages or not isinstance(messages[0], SystemMessage):
        messages = [SystemMessage(content=SYSTEM_PROMPT)] + list(messages)
    
    response = model_with_tools.invoke(messages)
    return {"messages": [response]}

# Define the conditional edge function
def should_continue(state: GitHubAgentState) -> str:
    """Determine if we should continue to tools or end."""
    messages = state["messages"]
    last_message = messages[-1]
    
    # If the LLM made tool calls, route to tools
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    # Otherwise, end
    return END

# Create the tool node
tool_node = ToolNode(tools)

# Build the graph manually
workflow = StateGraph(GitHubAgentState)

# Add nodes
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)

# Set entry point
workflow.set_entry_point("agent")

# Add conditional edges
workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "tools": "tools",
        END: END
    }
)

# Tools always go back to agent
workflow.add_edge("tools", "agent")

# Compile the graph
agent = workflow.compile()

print("✅ GitHub Analyzer Agent created with full tracing!")
print(f"   Model: {model.model_name}")
print(f"   Tools: {len(tools)}")
print("   Graph: Manually constructed with agent_node and tool_node")

✅ GitHub Analyzer Agent created with full tracing!
   Model: gpt-4o-mini
   Tools: 3
   Graph: Manually constructed with agent_node and tool_node


### Export Agent for LangGraph Studio

This cell automatically creates a `.py` file that can be loaded in LangGraph Studio.

In [87]:
# Export agent code for LangGraph Studio
export_code = '''"""GitHub Analyzer Agent for LangGraph Studio

Auto-generated from notebook. Load this file in LangGraph Studio.
"""
import os
from typing import TypedDict, Annotated, Sequence
from dotenv import load_dotenv

from langchain_core.messages import BaseMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.graph import add_messages
from langgraph.prebuilt import create_react_agent
from github import Github
from langsmith import traceable
import base64
from datetime import datetime

load_dotenv(override=True)
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "GitHub-Analyzer-Studio"

github_client = Github(os.getenv("GITHUB_TOKEN"))

class GitHubAgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]
    files: dict[str, str]
    current_repo: str | None
    remaining_steps: int

@tool
@traceable
def get_repository_info(repo_name: str) -> str:
    """Get basic information about a GitHub repository."""
    try:
        repo = github_client.get_repo(repo_name)
        return f"""# {repo.full_name}
 {repo.stargazers_count:,} stars |  {repo.forks_count:,} forks
{repo.description or 'No description'}
Language: {repo.language}
{repo.html_url}"""
    except Exception as e:
        return f"Error: {e}"

@tool
@traceable
def list_repository_structure(repo_name: str, path: str = "", max_depth: int = 2) -> str:
    """Get directory tree of a repository."""
    try:
        repo = github_client.get_repo(repo_name)
        contents = repo.get_contents(path)
        if isinstance(contents, list):
            items = [f"{'' if c.type=='dir' else ''} {c.name}" for c in contents[:20]]
            return f"Contents of {repo_name}/{path or 'root'}:\n" + "\n".join(items)
        return str(contents)
    except Exception as e:
        return f"Error: {e}"

@tool
@traceable
def read_file_from_repo(repo_name: str, file_path: str, ref: str = "main") -> str:
    """Read a file from a repository."""
    try:
        repo = github_client.get_repo(repo_name)
        try:
            file_content = repo.get_contents(file_path, ref=ref)
        except:
            file_content = repo.get_contents(file_path, ref="master")
        content = base64.b64decode(file_content.content).decode('utf-8')
        return f"# {file_path}\n{'='*60}\n{content}"
    except Exception as e:
        return f"Error: {e}"

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
tools = [get_repository_info, list_repository_structure, read_file_from_repo]

system_prompt = f"""GitHub Repository Analyzer Agent
Date: {datetime.now().strftime('%Y-%m-%d')}

Analyze GitHub repositories using available tools.
Be concise and use tools before answering.
"""

graph = create_react_agent(
    model=model,
    tools=tools,
    state_schema=GitHubAgentState,
    state_modifier=system_prompt
)

# This 'graph' variable will be detected by LangGraph Studio
'''

# Write to file
with open('github_analyzer_studio.py', 'w', encoding='utf-8') as f:
    f.write(export_code)

print(" Exported agent to: github_analyzer_studio.py")
print("\n To use in LangGraph Studio:")
print("   1. Open LangGraph Studio")
print("   2. Load 'github_analyzer_studio.py'")
print("   3. The 'graph' variable will be auto-detected")
print("   4. You can now visualize and test interactively!")

 Exported agent to: github_analyzer_studio.py

 To use in LangGraph Studio:
   1. Open LangGraph Studio
   2. Load 'github_analyzer_studio.py'
   3. The 'graph' variable will be auto-detected
   4. You can now visualize and test interactively!


---
## Demo: Analyze Repositories

In [88]:
@traceable(name="analyze_repository")  # Trace the entire analysis
def analyze_repository(repo_name: str, question: str):
    """Analyze a GitHub repository with a specific question."""
    
    initial_state = get_initial_state()
    initial_state["current_repo"] = repo_name
    initial_state["messages"] = [
        {"role": "user", "content": f"Repository: {repo_name}\n\nQuestion: {question}"}
    ]
    
    print(f" Analyzing: {repo_name}")
    print(f" Question: {question}")
    print("\n" + "="*70 + "\n")
    
    # Show trace URL if enabled
    if os.getenv("LANGCHAIN_TRACING_V2") == "true":
        print(" LangSmith Trace: https://smith.langchain.com/")
        print(f"   Project: {os.environ.get('LANGCHAIN_PROJECT', 'default')}\n")
    
    step_count = 0
    for step in agent.stream(initial_state, stream_mode="updates"):
        step_count += 1
        
        if "agent" in step:
            messages = step["agent"].get("messages", [])
            for msg in messages:
                if hasattr(msg, 'content') and msg.content:
                    print(f"\n Agent (Step {step_count}):")
                    print(msg.content)
                
                if hasattr(msg, 'tool_calls') and msg.tool_calls:
                    for tc in msg.tool_calls:
                        print(f"\n Calling tool: {tc['name']}")
                        print(f"   Args: {tc['args']}")
        
        if "tools" in step:
            messages = step["tools"].get("messages", [])
            for msg in messages:
                if hasattr(msg, 'content'):
                    content = msg.content[:300]
                    print(f"\n Tool Result:")
                    print(content, "..." if len(msg.content) > 300 else "")
    
    print("\n" + "="*70)
    print(f" Analysis complete! ({step_count} steps)")

print(" analyze_repository function ready (with full tracing)")

 analyze_repository function ready (with full tracing)


### Example 1: Repository Overview

In [92]:
analyze_repository(
    repo_name="MAT496-Monsoon2025-SNU/capstone-template",
    question="What is this repository about? Give me a brief overview from its readme"
)

 Analyzing: MAT496-Monsoon2025-SNU/capstone-template
 Question: What is this repository about? Give me a brief overview from its readme


 LangSmith Trace: https://smith.langchain.com/
   Project: GitHub-Analyzer-Simplified


 Calling tool: get_repository_info
   Args: {'repo_name': 'MAT496-Monsoon2025-SNU/capstone-template'}

 Tool Result:
# Repository: MAT496-Monsoon2025-SNU/capstone-template

**Description:** No description
**Stars:**  0
**Forks:**  45
**Language:** Not specified
**Open Issues:**  0
**Last Updated:** 2025-11-29 07:35:45+00:00
**URL:** https://github.com/MAT496-Monsoon2025-SNU/capstone-template
 

 Calling tool: list_repository_structure
   Args: {'repo_name': 'MAT496-Monsoon2025-SNU/capstone-template', 'max_depth': 1}

 Tool Result:
Repository: MAT496-Monsoon2025-SNU/capstone-template
Path: /root
  .gitignore
  README.md 

 Calling tool: read_file_from_repo
   Args: {'repo_name': 'MAT496-Monsoon2025-SNU/capstone-template', 'file_path': 'README.md'}

 Tool Result:
# 

### Example 2: Directory Structure

In [93]:
analyze_repository(
    repo_name="MAT496-Monsoon2025-SNU/Kush-Sahni-2210110371-Langgraph-MAT496",
    question="What is the main directory structure and what are the main files?"
)

 Analyzing: MAT496-Monsoon2025-SNU/Kush-Sahni-2210110371-Langgraph-MAT496
 Question: What is the main directory structure and what are the main files?


 LangSmith Trace: https://smith.langchain.com/
   Project: GitHub-Analyzer-Simplified


 Calling tool: list_repository_structure
   Args: {'repo_name': 'MAT496-Monsoon2025-SNU/Kush-Sahni-2210110371-Langgraph-MAT496', 'max_depth': 2}

 Tool Result:
Repository: MAT496-Monsoon2025-SNU/Kush-Sahni-2210110371-Langgraph-MAT496
Path: /root
  langchain-academy-my-version/
     module-0/
        basics.ipynb
     module-1/
        studio/
        agent-memory.ipynb
        agent.ipynb
        chain.ipy ...

 Agent (Step 3):
The main directory structure of the repository **MAT496-Monsoon2025-SNU/Kush-Sahni-2210110371-Langgraph-MAT496** is as follows:

```
/root
├── langchain-academy-my-version/
│   ├── module-0/
│   │   └── basics.ipynb
│   ├── module-1/
│   │   ├── agent-memory.ipynb
│   │   ├── agent.ipynb
│   │   ├── chain.ipynb
│   │   ├── de

---
##  Summary

### What We Built

 **Full Agent with Tracing**
- All functions decorated with `@traceable`
- Complete visibility in LangSmith
- Auto-export for LangGraph Studio

 **Observability Features**
- See every LLM call
- Track tool usage
- Monitor costs
- Debug issues

 **Files Created**
- `github_analyzer_studio.py` - Ready for LangGraph Studio

### View Your Traces

1. Visit: https://smith.langchain.com/
2. Navigate to your project: "GitHub-Analyzer-Simplified"
3. See all runs with full details

### Next Steps

- Add more tools
- Test in LangGraph Studio
- Share traces with team
- Monitor production usage

**Repository**: [Full Project](https://github.com/Kushcodingexe/Kush-Sahni-2210110371-GitHub-Repository-Analyzer-Agent-using-LangGraph-Git-MCP-MAT496)

---
*Built with LangGraph, traced with LangSmith* 