Automatic AI Bill of Materials (AIBOM) generator for Python codebases that use LangChain and related AI/ML tooling.
aibom_generator.py performs static analysis of a target Python repository and produces a JSON report (AI_BOM.json) that inventories AI components found in source code.
It scans all .py files recursively and extracts:
- models
- LangChain LLM/chat model class usage (for example
OpenAI,ChatOpenAI,HuggingFaceHub,Ollama) - best-effort model identifiers from constructor args like
model,model_name,model_id,checkpoint
- LangChain LLM/chat model class usage (for example
- datasets
- vector store related usage (for example
FAISS,Chroma,Pinecone) - best-effort dataset/index references such as
path,persist_directory,index_name,collection_name
- vector store related usage (for example
- tools
- LangChain tool/agent-related calls (for example
initialize_agent,load_tools,Tool,AgentExecutor)
- LangChain tool/agent-related calls (for example
- frameworks
- imported AI frameworks (for example
langchain,transformers,torch) - installed package version when available via
importlib.metadata
- imported AI frameworks (for example
- Recursively find Python files in the target directory.
- Parse each file into a Python AST (
ast.parse). - Visit imports and function/class calls with an AST visitor.
- Match known LangChain/model/vectorstore/tool patterns.
- Build a consolidated dictionary with top-level keys:
modelsdatasetstoolsframeworks
- De-duplicate entries and write to JSON.
- Print a short terminal summary.
- Python 3.8+
- No external dependencies required for core functionality.
python aibom_generator.py /path/to/projectThis writes AI_BOM.json in your current working directory.
python aibom_generator.py /path/to/project -o /path/to/output/AI_BOM.jsonpython aibom_generator.py .{
"models": [
{
"type": "ChatOpenAI",
"model": "gpt-4",
"source_file": "app/pipeline.py",
"details": {
"call": "ChatOpenAI",
"params": {
"model": "gpt-4"
}
}
}
],
"datasets": [
{
"name": "FAISS",
"type": "FAISS.from_documents",
"used_for": "Vector store / dataset ingestion",
"source_file": "app/retrieval.py",
"details": {
"persist_directory": "./faiss_index"
}
}
],
"tools": [
{
"name": "initialize_agent",
"purpose": "Agent/tool usage detected",
"source_file": "app/agent.py",
"details": {
"call": "initialize_agent",
"params": {}
}
}
],
"frameworks": [
{
"name": "langchain",
"version": "0.2.0"
}
]
}AI_BOM.json is intended to support inventory, review, and compliance workflows.
- Identify what models are in use and where (
source_file). - Review external dependencies/framework versions for patch and compatibility planning.
- Flag unknown model identifiers (
"model": "unknown") for manual follow-up.
- Use
frameworksto quickly check what AI libraries are present and which versions are installed. - Cross-check for deprecated APIs or vulnerable versions.
- Inspect
datasetsentries for vector store/index paths and collection names. - Confirm which files implement ingestion/indexing logic.
- Review
toolsentries to understand where autonomous/tool-enabled logic exists. - Combine with code review in
source_filepaths for deeper behavior analysis.
- This is static analysis, so results are best-effort.
- Dynamic patterns (runtime imports, indirect wrappers, values built in many steps) may not resolve fully.
- Some fields can be
unknownwhen model names or dataset paths are not literal strings. - Version reporting depends on packages being installed in the environment where the script runs.
python aibom_generator.py . -o AI_BOM.json
python -m py_compile aibom_generator.pySee LICENSE.