-
Notifications
You must be signed in to change notification settings - Fork 0
Implement backend parser for UML/SYSML diagrams with database storage #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@vinod0m 👋 This repository doesn't have Copilot instructions. With Copilot instructions, I can understand the repository better, work faster and produce higher quality PRs. I can generate a .github/copilot-instructions.md file for you automatically. Click here to open a pre-filled issue and assign it to me. I'll write the instructions, and then tag you for review. |
This comment has been minimized.
This comment has been minimized.
Co-authored-by: vinod0m <221896197+vinod0m@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a comprehensive backend parser system for extracting information from UML/SYSML diagrams in multiple formats (PlantUML, Mermaid, DrawIO) and storing them in a SQLite database with tagging support. The implementation provides a modular architecture with format-specific parsers, a unified database layer, and extensive utilities for querying and exporting diagram data.
Key Changes
- Parser Architecture: Implemented abstract base parser interface with concrete implementations for PlantUML, Mermaid, and DrawIO formats
- Database Layer: Added SQLite-based storage system with models for diagrams, elements, and relationships including search and export functionality
- Comprehensive Testing: Created 64 unit tests covering all parser functionality, database operations, and utility functions
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/parsers/base_parser.py | Abstract base parser interface defining common data structures and contracts |
| src/parsers/plantuml_parser.py | PlantUML format parser for class diagrams, relationships, and metadata extraction |
| src/parsers/mermaid_parser.py | Mermaid format parser supporting multiple diagram types (class, flowchart, sequence, ER) |
| src/parsers/drawio_parser.py | DrawIO XML format parser for shapes, connectors, and style properties |
| src/parsers/database/models.py | SQLite database models and schema for storing parsed diagram data |
| src/parsers/database/utils.py | Database utility functions for export, statistics, and integrity validation |
| test/unit/parsers/test_*.py | Comprehensive unit test suite covering all parser and database functionality |
| examples/parser_demo.py | Working demonstration script showing complete parsing workflow |
| def _clean_content(self, content: str) -> str: | ||
| """Clean content by removing comments and normalizing whitespace.""" | ||
| # Remove multi-line comments first (PlantUML uses /' comment '/ format) | ||
| content = re.sub(r"/\'.*?'/", "", content, flags=re.DOTALL) |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern for multi-line comments is incorrect. PlantUML uses /' to start and '/ to end multi-line comments, but the pattern uses escaped single quotes which don't match PlantUML syntax. Should be r"/\*.*?\*/" for proper multi-line comment removal.
| content = re.sub(r"/\'.*?'/", "", content, flags=re.DOTALL) | |
| content = re.sub(r"/'.*?'/", "", content, flags=re.DOTALL) |
|
|
||
| # Entity definition without attributes: ENTITY | ||
| simple_entity_match = re.match(r'^(\w+)$', line) | ||
| if simple_entity_match and not any(rel_pattern in line for rel_pattern in ['||', '}o', 'o{', '--']): |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern ^(\w+)$ followed by a complex condition check creates unnecessary overhead. Consider combining the pattern with negative lookahead to exclude relationship patterns: r'^(\w+)(?!.*(?:\|\||\}o|o\{|--))$' to make the matching more efficient.
| if simple_entity_match and not any(rel_pattern in line for rel_pattern in ['||', '}o', 'o{', '--']): | |
| simple_entity_match = re.match(r'^(\w+)(?!.*(?:\|\||\}o|o\{|--))$', line) | |
| if simple_entity_match: |
| return '' | ||
|
|
||
| # Remove HTML tags if present | ||
| import re |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statements should be at the top of the file, not inside methods. Move import re to the top with other imports to follow Python conventions and improve code readability.
| import re |
|
|
||
| # Create new merged diagram | ||
| merged_diagram = ParsedDiagram( | ||
| diagram_type=DiagramType.PLANTUML, # Default type |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a hardcoded default diagram type in the merge function is problematic. The merged diagram should either determine the type from the constituent diagrams (if they're all the same type) or use a dedicated 'MERGED' type to accurately represent the nature of the combined diagram.
| diagram_type=DiagramType.PLANTUML, # Default type | |
| # Determine merged diagram type | |
| diagram_types = [db.get_diagram(diagram_id).diagram_type for diagram_id in diagram_ids] | |
| if all(dt == diagram_types[0] for dt in diagram_types): | |
| merged_type = diagram_types[0] | |
| else: | |
| merged_type = DiagramType.MERGED | |
| # Create new merged diagram | |
| merged_diagram = ParsedDiagram( | |
| diagram_type=merged_type, |
@check-spelling-bot Report🔴 Please reviewSee the 📂 files view, the 📜action log, or 📝 job summary for details.Unrecognized words (24660)Truncated, please see the job summary, log, or artifact if available. These words are not needed and should be removedTruncated, please see the job summary, log, or artifact if available. Some files were automatically ignored 🙈These sample patterns would exclude them: You should consider excluding directory paths (e.g. You should consider adding them to: File matching is via Perl regular expressions. To check these files, more of their words need to be in the dictionary than not. You can use Script unavailableTruncated, please see the job summary, log, or artifact if available. OR To have the bot accept them for you, comment in the PR quoting the following line: Forbidden patterns 🙅 (42)In order to address this, you could change the content to not match the forbidden patterns (comments before forbidden patterns may help explain why they're forbidden), add patterns for acceptable instances, or adjust the forbidden patterns themselves. These forbidden patterns matched content: Should be
|
| ❌ Errors, Warnings, and Notices | Count |
|---|---|
| 319 | |
| ℹ️ candidate-pattern | 60 |
| ❌ check-file-path | 9800 |
| ❌ forbidden-pattern | 1749 |
| 1 | |
| 137 | |
| ❌ slow-file | 1 |
| 87 |
See ❌ Event descriptions for more information.
✏️ Contributor please read this
By default the command suggestion will generate a file named based on your commit. That's generally ok as long as you add the file to your commit. Someone can reorganize it later.
If the listed items are:
- ... misspelled, then please correct them instead of using the command.
- ... names, please add them to
.github/actions/spelling/allow/names.txt. - ... APIs, you can add them to a file in
.github/actions/spelling/allow/. - ... just things you're using, please add them to an appropriate file in
.github/actions/spelling/expect/. - ... tokens you only need in one place and shouldn't generally be used, you can add an item in an appropriate file in
.github/actions/spelling/patterns/.
See the README.md in each directory for more information.
🔬 You can test your commits without appending to a PR by creating a new branch with that extra change and pushing it to your fork. The check-spelling action will run in response to your push -- it doesn't require an open pull request. By using such a branch, you can limit the number of typos your peers see you make. 😉
If the flagged items are 🤯 false positives
If items relate to a ...
-
binary file (or some other file you wouldn't want to check at all).
Please add a file path to the
excludes.txtfile matching the containing file.File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.
^refers to the file's path from the root of the repository, so^README\.md$would exclude README.md (on whichever branch you're using). -
well-formed pattern.
If you can write a pattern that would match it,
try adding it to thepatterns.txtfile.Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.
Note that patterns can't match multiline strings.
This PR implements a comprehensive backend parser system for extracting information from different diagram input formats (UML/SYSML) and storing them in a relational database with relevant tags for downstream tool implementations.
Key Features
Multi-Format Support
The parser system supports three major diagram formats:
Parser Architecture
Database Layer
Structured Data Extraction
The parsers extract:
Implementation Details
Modular Design
Following the repository's organization guidelines:
src/parsers/base_parser.pysrc/parsers/database/test/unit/parsers/Error Handling
ParseErrorexception for parsing failuresTesting & Documentation
Usage Example
The included demo (
examples/parser_demo.py) successfully processes sample diagrams:This implementation provides a solid foundation for any tools in the SDLC_core ecosystem that need to extract and work with diagram information, enabling powerful analysis and transformation capabilities.
Fixes #7.
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.