Framework-independent entrypoint detection (JackEE-style finders)
Is your feature request related to a problem? Please describe.
codeanalyzer-python produces a symbol table and call graph but has no
notion of which classes/functions are framework entrypoints — the
methods a web/RPC/CLI/task framework calls into from outside the
application's own call graph. Without this, every downstream consumer
must re-derive it ad hoc.
Concretely, the CLDK Java backend already emits this
(JType.is_entrypoint_class, JCallable.is_entrypoint, populated by
the javaee/ finders), so JavaAnalysis.get_entry_point_classes() /
get_entry_point_methods() are real. The Python façade
(cldk.analysis.python.PythonAnalysis) mirrors the same API surface
but get_entry_point_classes, get_entry_point_methods,
get_service_entry_point_classes, and get_service_entry_point_methods
all raise NotImplementedError purely because the backend doesn't
supply the data. This blocks Java/Python feature parity in CLDK.
The analytical problem is also genuinely missing: entrypoints are the
roots for reachability, dead-code, attack-surface, and call-graph
pruning analyses. A call graph with no identified roots is far less
useful.
Describe the solution you'd like
Port the JackEE (Antoniadis et al., PLDI '20) architecture that the
Java backend already uses at the AST level: abstract framework-
independent concepts (EntrypointClass, EntrypointMethod), plus
per-framework finders that map concrete idioms (decorators, base
classes, external route tables) onto those concepts. CRUD detection
is explicitly out of scope for this issue — entrypoints only.
1. Schema additions — codeanalyzer/schema/py_schema.py (all
defaulted, so existing analysis.json stays loadable):
PyClass.is_entrypoint: bool = False
PyClass.entrypoint_framework: Optional[str] = None
PyCallable.is_entrypoint: bool = False
PyCallable.entrypoint_framework: Optional[str] = None
2. Abstract layer — codeanalyzer/frameworks/_base.py:
ModuleContext — carries per-project routing facts (resolved
urls.py entries, FastAPI router mounts, Flask blueprint
registrations) so a finder can answer truthfully for handlers that
are not decorated at their definition site. This is the piece
with no Java analog and the main new work.
AbstractEntrypointFinder:
is_entrypoint_class(class_node, module_ctx) -> bool
is_entrypoint_function(func_node, module_ctx) -> bool
3. Concrete finders + factory — codeanalyzer/frameworks/
(entrypoint_factory.py runs every finder and ORs results, mirroring
Java's EntrypointsFinderFactory):
| Finder |
Detection signals |
flask.py |
@app.route/`@bp.get |
fastapi.py |
`@app |
django.py |
CBV bases (View, APIView, ViewSet, …), DRF @api_view/@action, urls.py resolution |
tornado.py |
RequestHandler subclass |
celery.py |
@app.task, @shared_task, @periodic_task |
aws_lambda.py |
def handler(event, context) convention + SAM/serverless template binding |
cli.py |
Click/Typer @click.command, @app.command |
grpc.py |
*Servicer subclass |
4. Routing pre-pass — codeanalyzer/frameworks/routing/. One pass
per project, consumed by ModuleContext. Emits
{qualified_name → route_metadata}:
django_url_resolver.py — walk every urls.py; evaluate
path()/re_path()/url()/include() chains and .as_view().
fastapi_router_resolver.py — app.include_router(router, prefix=...), app.mount().
flask_blueprint_resolver.py — Blueprint + register_blueprint.
5. Wiring — codeanalyzer/syntactic_analysis/symbol_table_builder.py:
- Run the routing pre-pass before per-file symbol building; thread
ModuleContext through.
- During class/function construction, call the entrypoint factory and
set is_entrypoint / entrypoint_framework.
Acceptance criteria:
Implementation sketch:
frameworks/_base.py: dataclasses + ABCs above.
- Tree-sitter is sufficient for decorator and base-class detection;
reach for Jedi only to resolve a base class across modules
(class V(BaseView) where BaseView is imported and itself
extends APIView).
entrypoint_factory.collect(project) -> None mutates the schema
objects in place during symbol-table construction.
- Routing resolvers are pure builders (no schema mutation) feeding
ModuleContext.
Describe alternatives you've considered
- Decorator-only detection (no routing pre-pass). Simple, covers
Flask/FastAPI/Celery/Click well, but misses Django entirely —
Django function/class views are bound in urls.py, not at the
definition site. Rejected as primary; Django is too common to skip.
- Datalog/Doop-style fact ingestion (literal JackEE). Maximally
principled and what JackEE does, but codeanalyzer-python is
AST/tree-sitter based, not a Datalog engine. The Java backend
already chose the AST-level port of the same architecture; matching
it keeps the two backends conceptually aligned. Rejected as
over-engineering for this codebase.
- Resolve entrypoints in CLDK instead of the backend. CLDK only
sees the serialized schema; it cannot re-run framework-aware AST
passes without duplicating the analyzer. Detection belongs in the
backend that owns the AST. Rejected.
- Mark every public module-level function an entrypoint. Trivially
cheap but useless — destroys the signal (the whole point is
separating framework-invoked roots from internal helpers).
Rejected.
Additional context
- Architecture reference: the Java backend's
src/main/java/com/ibm/cldk/javaee/ —
EntrypointsFinderFactory + AbstractEntrypointFinder +
per-framework finders (spring, jakarta, jax, struts,
camel). This issue is the Python mirror of only the entrypoint
half of that package.
- JackEE: Antoniadis, Filippakis, Krishnan, Ramesh, Allen, Smaragdakis,
"Static Analysis of Java Enterprise Applications: Frameworks and
Caches, the Elephants in the Room", PLDI 2020 — the framework-
independent-concepts + per-framework-mapping design being ported.
- Downstream unblocker: once
is_entrypoint/is_entrypoint_class
are populated, CLDK's PythonAnalysis.get_entry_point_classes,
get_entry_point_methods, and the get_service_entry_point_*
variants become thin readers over these fields — identical to how
JavaAnalysis already reads them — closing a Java/Python parity
gap.
- CRUD detection is intentionally not part of this issue and will
be specced separately.
Framework-independent entrypoint detection (JackEE-style finders)
Is your feature request related to a problem? Please describe.
codeanalyzer-pythonproduces a symbol table and call graph but has nonotion of which classes/functions are framework entrypoints — the
methods a web/RPC/CLI/task framework calls into from outside the
application's own call graph. Without this, every downstream consumer
must re-derive it ad hoc.
Concretely, the CLDK Java backend already emits this
(
JType.is_entrypoint_class,JCallable.is_entrypoint, populated bythe
javaee/finders), soJavaAnalysis.get_entry_point_classes()/get_entry_point_methods()are real. The Python façade(
cldk.analysis.python.PythonAnalysis) mirrors the same API surfacebut
get_entry_point_classes,get_entry_point_methods,get_service_entry_point_classes, andget_service_entry_point_methodsall raise
NotImplementedErrorpurely because the backend doesn'tsupply the data. This blocks Java/Python feature parity in CLDK.
The analytical problem is also genuinely missing: entrypoints are the
roots for reachability, dead-code, attack-surface, and call-graph
pruning analyses. A call graph with no identified roots is far less
useful.
Describe the solution you'd like
Port the JackEE (Antoniadis et al., PLDI '20) architecture that the
Java backend already uses at the AST level: abstract framework-
independent concepts (
EntrypointClass,EntrypointMethod), plusper-framework finders that map concrete idioms (decorators, base
classes, external route tables) onto those concepts. CRUD detection
is explicitly out of scope for this issue — entrypoints only.
1. Schema additions —
codeanalyzer/schema/py_schema.py(alldefaulted, so existing
analysis.jsonstays loadable):PyClass.is_entrypoint: bool = FalsePyClass.entrypoint_framework: Optional[str] = NonePyCallable.is_entrypoint: bool = FalsePyCallable.entrypoint_framework: Optional[str] = None2. Abstract layer —
codeanalyzer/frameworks/_base.py:ModuleContext— carries per-project routing facts (resolvedurls.pyentries, FastAPI router mounts, Flask blueprintregistrations) so a finder can answer truthfully for handlers that
are not decorated at their definition site. This is the piece
with no Java analog and the main new work.
AbstractEntrypointFinder:is_entrypoint_class(class_node, module_ctx) -> boolis_entrypoint_function(func_node, module_ctx) -> bool3. Concrete finders + factory —
codeanalyzer/frameworks/(
entrypoint_factory.pyruns every finder and ORs results, mirroringJava's
EntrypointsFinderFactory):flask.py@app.route/`@bp.getfastapi.pydjango.pyView,APIView,ViewSet, …), DRF@api_view/@action,urls.pyresolutiontornado.pyRequestHandlersubclasscelery.py@app.task,@shared_task,@periodic_taskaws_lambda.pydef handler(event, context)convention + SAM/serverless template bindingcli.py@click.command,@app.commandgrpc.py*Servicersubclass4. Routing pre-pass —
codeanalyzer/frameworks/routing/. One passper project, consumed by
ModuleContext. Emits{qualified_name → route_metadata}:django_url_resolver.py— walk everyurls.py; evaluatepath()/re_path()/url()/include()chains and.as_view().fastapi_router_resolver.py—app.include_router(router, prefix=...),app.mount().flask_blueprint_resolver.py—Blueprint+register_blueprint.5. Wiring —
codeanalyzer/syntactic_analysis/symbol_table_builder.py:ModuleContextthrough.set
is_entrypoint/entrypoint_framework.Acceptance criteria:
@app.routeview, a FastAPI@router.gethandler, aDjango CBV referenced only from
urls.py, a Celery@shared_task,and a Click
@cli.commandare each flaggedis_entrypoint=Truewith the correct
entrypoint_framework.a
path('...', view)entry inurls.py(including one level ofinclude()), is flagged — proving the routing pre-pass works.analysis.jsonfiles load unchanged(defaulted fields).
is_entrypoint_classis scoped to inheritance-based entrypoints(Django CBVs, Tornado, gRPC servicers); it is not used as a
coarse "is this class worth analyzing" filter the way the Java
version uses it — in Python the function-level predicate does the
real work.
Implementation sketch:
frameworks/_base.py: dataclasses + ABCs above.reach for Jedi only to resolve a base class across modules
(
class V(BaseView)whereBaseViewis imported and itselfextends
APIView).entrypoint_factory.collect(project) -> Nonemutates the schemaobjects in place during symbol-table construction.
ModuleContext.Describe alternatives you've considered
Flask/FastAPI/Celery/Click well, but misses Django entirely —
Django function/class views are bound in
urls.py, not at thedefinition site. Rejected as primary; Django is too common to skip.
principled and what JackEE does, but
codeanalyzer-pythonisAST/tree-sitter based, not a Datalog engine. The Java backend
already chose the AST-level port of the same architecture; matching
it keeps the two backends conceptually aligned. Rejected as
over-engineering for this codebase.
sees the serialized schema; it cannot re-run framework-aware AST
passes without duplicating the analyzer. Detection belongs in the
backend that owns the AST. Rejected.
cheap but useless — destroys the signal (the whole point is
separating framework-invoked roots from internal helpers).
Rejected.
Additional context
src/main/java/com/ibm/cldk/javaee/—EntrypointsFinderFactory+AbstractEntrypointFinder+per-framework finders (
spring,jakarta,jax,struts,camel). This issue is the Python mirror of only the entrypointhalf of that package.
"Static Analysis of Java Enterprise Applications: Frameworks and
Caches, the Elephants in the Room", PLDI 2020 — the framework-
independent-concepts + per-framework-mapping design being ported.
is_entrypoint/is_entrypoint_classare populated, CLDK's
PythonAnalysis.get_entry_point_classes,get_entry_point_methods, and theget_service_entry_point_*variants become thin readers over these fields — identical to how
JavaAnalysisalready reads them — closing a Java/Python paritygap.
be specced separately.