Codebase Analyzer

Codebase Analyzer is a tool to analyze and visualize the structure of any codebase. It generates an interactive graph to explore file relationships and provides detailed analysis metrics.

Features

Supports multiple programming languages.
Handles large codebases.
Optional AWS infrastructure analysis.
Generates a JSON representation of the codebase structure.
Creates an interactive HTML graph for visualization.
Provides detailed analysis metrics.

Description

Overview

This project is a Codebase Analyzer that scans a codebase to analyze its structure, relationships, and dependencies. It generates a JSON representation of the codebase, performs advanced analysis, and optionally creates an interactive visualization of the relationships between files, components, and AWS infrastructure.

Key Features

Codebase Scanning:
- Recursively scans a directory to identify files, components, and functions.
- Categorizes files into types (e.g., frontend, backend, utility, test).
- Extracts dependencies (e.g., imports, references) and builds a graph of relationships.
AWS Infrastructure Analysis:
- Parses serverless.yml files to detect AWS resources (e.g., Lambda functions, DynamoDB tables, API Gateway endpoints).
- Maps AWS resources to code components (e.g., Lambda handlers).
Graph Representation:
- Builds a graph with nodes (files, components, functions, AWS resources) and edges (relationships like "imports", "invokes").
- Prevents duplicate nodes using a cache.
Advanced Analysis:
- Detects dependency cycles.
- Identifies potential code duplication (e.g., JSX/TSX pairs).
- Performs centrality analysis to find the most connected nodes.
- Groups related nodes into communities using modularity-based clustering.
Visualization:
- Creates an interactive graph visualization using Plotly.
- Nodes are color-coded and sized based on type and connectivity.
- Relationships are represented as edges with labels (e.g., "imports", "invokes").
Output:
- Saves the graph as a JSON file.
- Optionally generates a Markdown analysis report.
- Saves the visualization as an HTML file.

How It Works

Configuration:
- The script defines file extensions to include, directories/files to exclude, and node/relationship types for categorization.
Codebase Scanning:
- The scan_codebase function walks through the directory tree, processes each file, and extracts metadata (e.g., size, type, dependencies).
- It uses regex patterns and AST parsing to extract imports, functions, and React components.
AWS Infrastructure Detection:
- The parse_serverless_config function parses serverless.yml to identify AWS resources and their relationships.
- The find_aws_resource_references function scans code for references to these resources.
Graph Construction:
- Nodes and edges are added to the graph dictionary using add_node and add_edge functions.
- Nodes represent files, components, functions, and AWS resources.
- Edges represent relationships like "imports", "contains", "invokes".
Analysis:
- The analyze_graph function performs advanced analysis on the graph using NetworkX.
- It detects cycles, identifies the most connected nodes, and groups nodes into communities.
Visualization:
- The visualize_graph function uses Plotly to create an interactive graph.
- Nodes are grouped by type (e.g., frontend, backend, shared) and displayed with tooltips and color coding.
Output:
- The graph is saved as a JSON file.
- The analysis report is saved as a Markdown file.
- The visualization is saved as an HTML file.

Files in the Project

1. `requirements.txt`

Specifies the Python dependencies:

networkx: For graph analysis.
plotly: For visualization.
pyyaml: For parsing YAML files (e.g., serverless.yml).

2. `README.md`

Provides an overview of the project, its features, and usage instructions.

3. `codebase_analyzer.py`

The main script that implements all the functionality described above.

Important Notes

Sensitive Information: Ensure no sensitive information (e.g., API keys, credentials) is included in the codebase or output files before uploading to GitHub.
Licensing: This project is licensed under the MIT License. See the LICENSE file for details.

Limitations

Large codebases may take longer to process.
Some file types or edge cases may not be fully supported (e.g., uncommon file extensions).

Future Plans

Add support for additional programming languages.
Improve performance for large codebases.
Enhance visualization with more customization options.

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch for your feature or bug fix.
Submit a pull request with a clear description of your changes.

Usage

Clone the repository:

git clone https://github.com/Nestorovski/codebase-analyzer.git
cd codebase-analyzer

Install dependencies:
```
pip install -r requirements.txt
```
Run the analyzer:
```
python codebase_analyzer.py --root-dir <path_to_codebase>
```
- Replace <path_to_codebase> with the path to the directory you want to analyze. If omitted, the script will analyze the current working directory.
- Example:
```
python codebase_analyzer.py --root-dir C:\Users\Username\FolderName
```
Optional arguments:
- --output-dir <path>: Specify a custom directory for saving output files. Defaults to <root-dir>/codebasegraphdata.
- --no-visualization: Skip generating the interactive HTML graph visualization.
- --analyze: Perform deeper analysis of the codebase structure.
- --verbose: Print detailed output during the scan.
Outputs:
- A JSON file representing the codebase graph.
- An optional Markdown analysis report.
- An optional interactive HTML graph visualization.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Codebase Analyzer

Features

Description

Overview

Key Features

How It Works

Files in the Project

1. `requirements.txt`

2. `README.md`

3. `codebase_analyzer.py`

Important Notes

Limitations

Future Plans

Contributing

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
codebasegraphdata		codebasegraphdata
Example.jpg		Example.jpg
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

Nestorovski/codebase-analyzer

Folders and files

Latest commit

History

Repository files navigation

Codebase Analyzer

Features

Description

Overview

Key Features

How It Works

Files in the Project

1. requirements.txt

2. README.md

3. codebase_analyzer.py

Important Notes

Limitations

Future Plans

Contributing

Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. `requirements.txt`

2. `README.md`

3. `codebase_analyzer.py`

Packages