Releases: docwire/docwire
2024.10.15
The release introduces several significant changes across various files, enhancing configuration, documentation, and functionality. Key updates include improved error handling in various components, providing clearer context for exceptions, and the introduction of new functionality for chaining function calls in C++20, enabling more flexible and composable operations. Other notable changes include modifications to the .gitignore to ignore the .cache directory and updates to the CMake configuration to enhance build process integration with tools like VSCode.
🐇 "In the code where changes abound,
A rabbit hops with joy profound.
New features bloom, errors refined,
Documentation clear, knowledge aligned.
With CMake's might and README's grace,
Our project thrives, a joyful space!" 🐇
-
New Features
- Enhanced error handling in various components, improving security, clarity and context for exceptions.
- Introduced new functionality for chaining function calls in C++20, allowing for more flexible and composable operations.
- Improved handling of w:binData tag in XML parser
-
Documentation
- Updated README.md to clarify error handling, enhancing user-friendliness.
-
Chores
- Updated
.gitignore
to include.cache
directory. - Modified CMake configuration to enhance build process integration with tools like VSCode.
- Updated
2024.08.30
The changes involve enhancements to the XML parsing and reading logic. The parseXmlData function in xml_parser.cpp now includes improved handling for specific XML tags, while xml_stream.cpp has been refactored to encapsulate XML reading logic within a new method. Additionally, a new processing instruction has been added to the test XML document, enriching its structure.
In the meadow, I hop with glee,
New tags and methods, oh what a spree!
Parsing XML, so swift and bright,
With each little change, the code feels just right.
A dance of data, a joyful tune,
Hooray for the updates, we’ll celebrate soon! 🐇✨
-
New Features
- Enhanced XML parsing capabilities with specific handling for
w:p
andw:tab
tags.
- Enhanced XML parsing capabilities with specific handling for
-
Refactor
- Streamlined XML reading logic by encapsulating it in a new method, improving code readability and maintainability.
-
Bug Fixes
- Adjusted logic for handling XML node reading to ensure accurate processing of "processing instructions" elements.
2024.08.25
The changes encompass updates across various components in the project, including parsing improvements, modifications to build configurations, and adjustments to testing mechanisms. Key revisions include the transition of sanitizer options, simplifications in build scripts, and improvements in resource handling.
🐰 In fields of code, a rabbit hops,
With changes fresh, it never stops.
From sanitizers bright to scripts that gleam,
Oh joyful leaps, let coding dream!
With hyperlinks in documents it now plays,
And encodes binary data in clever ways.
Improved tool management, with latest versions in tow,
It's a codebase that's efficient, and in the flow!
Each build a dance, each test a cheer,
In tech we trust, with love sincere! 🌼
-
New Features
- Enhanced installation instructions for the DocWire SDK, emphasizing vcpkg integration.
- Added base64 encoding capabilities to public API to allow for encoding of binary data.
- Introduced support for additional document formats (specialized XML parser) and parsing strategies.
- Implemented a mechanism for handling hyperlinks in RTF and OOXML documents.
-
Documentation
- Updated README with clearer integration and setup instructions, including platform support.
- Refactored comments and code snippets to reflect recent changes in the API.
-
Refactor
- Streamlined multiple build scripts and configurations for improved maintainability.
- Improved tool management and resource handling.
- Replaced several class definitions with type aliases to enhance modularity.
- Upgraded 3rdparty components to the latest available versions
- Removed vcpkg pinning to single version, now using the "latest" version on every build.
- Splitted SDK into more libraries, improving modularity.
-
Chores
- Cleaned up unnecessary conditional logic in build scripts, simplifying the setup process.
- Removed obsolete function declarations to enhance code clarity and organization.
-
Style
- Refined workflow configurations to improve clarity and organization of environment settings.
2024.07.31
Version 2024.07.31
Recent enhancements to the DocWire SDK significantly boost its natural language processing capabilities by integrating local AI models. Developers can now perform tasks like text classification and sentiment analysis directly within C++ applications, improving performance and safeguarding data privacy. Additionally, refined dependency management and build configurations provide a comprehensive toolkit for developing AI-driven applications. Overall, the updates significantly enhance the SDK's utility for developers working on NLP and AI projects, providing them with powerful tools to process and analyze data effectively while maintaining control over their data privacy.
🐰 In the burrow where ideas sprout,
Local models bring cheer, there's no doubt!
Processing swift, with privacy tight,
DocWire now shines, a true delight!
With text and models, we hop and play,
Celebrating changes—hip, hip, hooray! 🌟
-
New Features
- Local AI models execution for text classification, summarization, translation, sentiment analysis, named entity recognition and more directly within applications.
- Added command-line options for local AI model processing, enhancing user interaction.
- Introduced a build-in powerful flan-t5-large model for various NLP tasks
- Added support for fuzzy matching capabilities with a new function for string similarity assessments.
- Streamlined integration of third-party libraries, enhancing dependency management.
-
Documentation
- Updated documentation to reflect new local AI capabilities, including detailed usage instructions.
2024.06.24
Recent updates enhance cross-platform compatibility, introduce new input data handling approaches, and refine testing. OS versions added to the CI configuration include macOS 13, macOS 14, and Ubuntu 24.04. The data_source class now better supports various data types (std::vector, std::span, std::string_view), expanding its flexibility. New tests validate data source processing, while refinements prevent table errors.
In realms of bytes and streams so vast,
Data flows from first to last.
CI matrix now more grand with Mac,
Ubuntu stands, there's no lack.
Tests abound, they check with care,
Ensuring code beyond compare.
A rabbit's cheer for changes new,
With bytes and tests, our world renew!
-
New Features
- Added support for new OS versions
macos-13
andmacos-14
in workflows. - Enhanced
data_source
class to handle new data types likestd::vector<std::byte>
,std::span<const std::byte>
, andstd::string_view
.
- Added support for new OS versions
-
Bug Fixes
- Improved error checks in table structures to prevent invalid nesting in
plain_text_writer
.
- Improved error checks in table structures to prevent invalid nesting in
-
Tests
- Added comprehensive test cases for various input data sources in
api_tests.cpp
.
- Added comprehensive test cases for various input data sources in
-
Chores
- Updated OS configurations in
.github/workflows/build.yml
to include new OS versions and remove outdated ones.
- Updated OS configurations in
2024.06.19
This release introduces significant modernizations and optimizations across the codebase. Key changes include the adoption of modern C++ practices such as move semantics and smart pointers, simplification of the code by removing outdated patterns, and structural enhancements that decentralize parsing responsibilities. Performance is notably optimized through new caching mechanisms and memory management strategies. Additionally, several bug fixes and new features enhance the functionality and reliability of data parsing and exporting components.
In the realm of code, where logic reigns,
A modern touch, the old constrains.
Smart pointers weave, with move semantics,
A tapestry free from old panics.Locks and guards, a vigilant dance,
Safeguarding flow, they advance.
Gone are wrappers, templates flee,
Simplicity wins, the code is free.From parsers to import, a journey bold,
A narrative of efficiency told.
Obsolete no more, the styles old cast,
Lambdas in light, shadows they outlast.Bugs in chains, limits set free,
PST's breath deeper, as it should be.
Metadata whispers in HTML's ear,
EML listens close, clarity near.A cache of memories, seldom forget,
Performance tunes, on better we bet.
Through tests and logs, comparisons drawn,
A script to tell, which faster, which gone.
-
Modernization of Parsing Chain Elements: Initially, the usage of
clone()
methods was replaced with modern C++ design strategies like move semantics and smart pointers. This enhances both the safety and performance of the code by avoiding unnecessary copying and providing better resource management. -
Enhancements in PDFParser: The
std::mutex lock()
andunlock()
methods were replaced withstd::lock_guard
. This change simplifies the code and ensures exception safety by automatically managing the lock's lifecycle. -
Code Simplification: The removal of
ParserWrapper
,wrapper_parser_creator
, andparser_creator
class templates simplifies the codebase, making it easier to maintain and understand. -
Structural Changes in Parsing Logic: Recursive parsing responsibilities were moved from individual parsers to the Importer class. This makes the parsers independent of the Importer class and simplifies the overall parsing architecture.
-
Removal of Obsolete Code: The obsolete
FormattingStyle
class was removed. Additionally,std::bind
was replaced with modern C++ lambda expressions, which are more straightforward and performant. -
Bug Fix in PST Parser: An incorrectly hardcoded limit on the number of mails processed was removed, potentially preventing data loss and improving the parser's reliability.
-
Enhancements in Exporters: Support for document metadata was added in the HTML exporter, along with a workaround for incorrect date values in DOC format. Metadata support in the EML parser was also fixed, improving the accuracy and usefulness of the exported data.
-
Performance Optimization and Refactoring: Introduction of several new classes such as
data_source
,lru_memory_cache
,memory_buffer
,imemorystream
,unique_identifier
, andfile_extension
. These changes aim to optimize performance, reduce reliance on temporary files, and improve support for nested documents in exporters. -
Performance Comparison Script: A new script was added to compare the performance between two versions of the SDK using callgrind logs from automatic tests, aiding in performance evaluation and regression testing.
2024.06.17
The recent changes include updates to the libbfio library, adjustments in test files, and fixes related to address sanitizer CI tests. Overall, these modifications improve compatibility, security, and build robustness.
In code's quiet night, the rabbit hops,
URLs shift, new versions pop,
Headers dance in JSON's cheer,
And builds grow strong, without a fear.
With sanitizer's careful might,
We perfect our software’s flight. 🌙
- Updates
- Updated libbfio library to the latest release.
- Bug Fixes
- Fixed headers in HTTP POST request tests.
- Fixed address sanitizer tests.
2024.04.04
The DocWire SDK is embracing a dynamic development approach with its new "Release Early, Release Often" strategy and date-based versioning. This shift aims to enhance user experience by ensuring rapid releases, allowing users to benefit from the latest advancements with greater transparency and efficiency. While this release focuses on adopting this new versioning strategy and updating documentation, it's important to note that the significant improvements in memory management, thread safety, document processing, and build configurations were introduced in earlier versions. These enhancements are documented in the ChangeLog, highlighting the SDK's ongoing evolution.
🐰✨
In the world of code, where changes abound,
A rabbit hopped in, documentation found.
With every leap, a clearer path in sight,
Guiding through versions, making the future bright.
"Hop along," it cheered, with a joyful sound,
For in the realm of DocWire, clarity is crowned.
🌟📚
- Documentation
- Introduced a "Release Early, Release Often" strategy with date-based versioning for the DocWire SDK, enhancing transparency and ease of tracking updates.
- Updated the
ChangeLog
with details on significant enhancements including memory management, thread safety, document processing capabilities, and build configurations across various versions.
- Refactor
- Modified the project's versioning approach in the
vcpkg.json
file to use "latest" as the version string, simplifying version management. - Adjusted the
portfile.cmake
to reference themaster
branch directly, removing specific commit references for easier updates.
- Modified the project's versioning approach in the
2024.04.01
This update enhances support for Valgrind tools (memcheck, helgrind, callgrind) across various components, improving memory checking and thread safety. It includes changes to build configurations, scripts, and source code to address memory leaks, data races, and simplify XML parsing management. Testing procedures have been updated to incorporate Valgrind tools, and patches have been applied for better compatibility and performance.
"In the realm of code, where the bytes do hop,
A rabbit worked hard, in the garden non-stop.
🌱 With Valgrind in hand, and bugs in its sight,
It leaped through the lines, from morning to night.
🐾 No leak too small, no race too fast,
Ensuring the garden's safety, vast.
🥕 A cheer for the changes, so wisely sown,
For a healthier codebase, robustly grown."
- New Features
- Added support for additional sanitizers (
memcheck
,helgrind
,callgrind
) including Valgrind tools in build configurations and automatic tests. - Introduced mutexes in various parsers for thread safety.
- Enhanced XML handling to prevent memory leaks.
- Added support for additional sanitizers (
- Bug Fixes
- Fixed threading data-race issues in parsers and logging.
- Addressed memory leaks in XML processing.
- Refactor
- Simplified error logging and XML parsing logic.
- Updated memory management for XML streams to use
std::unique_ptr
.
- Chores
- Updated build scripts and configurations for improved testing and compatibility.
- Applied patches to address external library issues and Python incompatibilities in the build tool.
2024.03.26
This comprehensive update streamlines the codebase by enhancing file path handling, operation chaining, and streamlining output processes. It refactors document processing classes, introduces more structured and typed document element handling, and improves stream management. The changes aim to boost code readability, maintainability, and efficiency by leveraging modern C++ features and design patterns, significantly refining the development experience and output quality of the software.
In the realm of code where rabbits dare to hop,
Changes vast and wide, improvements non-stop.
With structured tags and streams so clear,
The code now runs, without a fear.
🐰💻✨
Through paths and chains, it finds its way,
A brighter, sleeker, brand new day.
- New Features
- Introduced structures for document elements to facilitate structured document creation and manipulation.
- Added common logging operators for enhanced debugging and error handling.
- Streamlined stream management in chat functionalities.
- Enhanced document processing with updated class refactoring and tagging system.
- Refactor
- Simplified operation chaining and output handling in core functionalities.
- Updated document processing classes for better maintainability and readability.
- Refactored and renamed Input and Output classes for clearer code semantics.
- Enhanced parsing and output processes with modern C++ features.
- Updated CSS styling support in HTML documents for improved presentation.
- Bug Fixes
- Fixed exception handling during command line argument parsing.
- Tests
- Updated API tests to reflect new parsing and output stream handling methods.
- Modified HTML output tests to test new CSS styling changes.
- Chores
- Removed deprecated code and files, improving codebase cleanliness.
- Added new utility function in build script for robust external command execution.