Skip to content

Fix #507: Add data types documentation#528

Merged
javihern98 merged 8 commits intomainfrom
cr-507
Feb 24, 2026
Merged

Fix #507: Add data types documentation#528
javihern98 merged 8 commits intomainfrom
cr-507

Conversation

@javihern98
Copy link
Copy Markdown
Contributor

@javihern98 javihern98 commented Feb 23, 2026

Summary

  • Add docs/data_types.rst with comprehensive data types reference covering input formats, internal representation, output formats, null handling, and type casting rules (implicit and explicit) based on VTL 2.2
  • Verified all documented behaviors against actual engine output using custom scripts
  • Extract sphinx-multiversion whitelists from conf.py into a gitignored docs/_smv_whitelist.json to prevent unnecessary diffs
  • Add --include-current-branch flag to configure_doc_versions.py for local doc previews
  • Add fallback version selector in versioning.html for sphinx-build output (working tree with uncommitted changes)
  • Add light pink (current) label styling for feature branches in the version selector
  • Fix all redirect URLs to use explicit index.html for file:// protocol compatibility
  • Remove orphaned docs/Operators/ RST files (not included in any toctree)
  • Update CLAUDE.md with VTL 2.2 docs link, documentation structure section, callout conventions, and local build instructions

Test plan

  • Build docs locally with the full pipeline and verify data_types.html renders correctly
  • Verify version selector shows all versions with correct labels and working links
  • Verify the CI docs workflow (docs.yml) still works without --include-current-branch

Add comprehensive data types reference covering input formats, internal
representation, output formats, and type casting rules based on VTL 2.2.

Also improves the local docs build pipeline:
- Extract sphinx-multiversion whitelists to gitignored JSON config
- Add --include-current-branch flag for local preview
- Add fallback version selector for sphinx-build (working tree preview)
- Fix redirect URLs to use explicit index.html for file:// compatibility
- Remove false claims about Integer/Number accepting "true"/"false"
- Fix cast syntax to use lowercase VTL type names (integer, not Integer)
- Fix Date→Time_Period example to use vtl default format (2020D15)
- Remove unsupported String→Boolean conversion detail
- Add note about implicit promotions not renaming measures
- Add note about lowercase type names in cast operator
- Wrap all lines to stay under 100 characters
- Move data_types below api in toctree
- Remove orphaned Operators/ RST files
- Remove Operators reference from CLAUDE.md
@javihern98 javihern98 marked this pull request as ready for review February 23, 2026 23:52
@javihern98 javihern98 requested review from a team and albertohernandez1995 February 23, 2026 23:52
@javihern98 javihern98 merged commit 8d47a19 into main Feb 24, 2026
20 checks passed
@javihern98 javihern98 deleted the cr-507 branch February 24, 2026 08:47
javihern98 added a commit that referenced this pull request Feb 25, 2026
* Bump ruff from 0.15.0 to 0.15.1 (#514)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.15.0 to 0.15.1.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.15.0...0.15.1)

---
updated-dependencies:
- dependency-name: ruff
  dependency-version: 0.15.1
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix #492: Refactor DAG classes for maintainability and performance (#493)

* refactor(DAG): Improve maintainability and performance of DAG classes (#492)

- Introduce typed DatasetSchedule dataclass replacing Dict[str, Any]
- Rewrite _ds_usage_analysis() with reverse index for O(n) performance
- Use sets for per-statement accumulators instead of list→set→list
- Extract shared cycle detection into _build_and_sort_graph()
- Fix O(n²) sort_elements with direct index lookup
- Rename camelCase to snake_case throughout DAG module
- Remove 5 unused fields and 1 dead method
- Delete _words.py (constants inlined)

* refactor(DAG): Replace loose fields with StatementDeps dataclass

Use typed StatementDeps for dependencies dict values and current
statement accumulator, removing string-keyed dict access and 5
redundant per-statement fields.

* Fix #504: Adapt implicit casting to VTL 2.2 (#517)

* Updated Time Period format handler (#518)

* Enhance time period handling: support additional SDMX formats and improve error messages

* Minor fix

* Add tests for TimePeriod input parsing and external representations

* Fix non time period scalar returns in format_time_period_external_representation

* Fixed ruff errors

* Refactor time period regex patterns and optimize check_time_period function

* Added date datatype support for hours, minutes and seconds. (#515)

* Added hours, minutes and seconds handling following ISO8601

* Removed outdated year check.

* Enhance date handling: normalize datetime output format and add year validation. Added new parametrized test.

* Refactor datetime tests by parameritricing new tests. Reorder file so params will be readed first by the developer.

* Added tests for time_agg, flow_to_stock, fill_time_series and time_shift operators

* Updated null distinction between empty string and null. (#521)

* First approach to solve the issue.

* Amend tests with the new changes

* Fix #512: Distinguish null from empty string in Aggregation and Replace operators

Remove sentinel swap (None ↔ "") in Aggregation._handle_data_types for
String and Date types — DuckDB handles NULL natively. Simplify Replace
by removing _REPLACE_PARAM2_OMITTED sentinel and 4 duplicated evaluation
methods, replacing with a minimal evaluate override that injects an empty
string Scalar when param2 is omitted. Fix generate_series_from_param to
use scalar broadcasting instead of single-element list wrapping.

---------

Co-authored-by: Javier Hernandez <javier.hernandez@meaningfuldata.eu>

* Fix #511: Remove numpy objects handling in favour of pyarrow data types (#524)

* Bump ruff from 0.15.1 to 0.15.2 (#527)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.15.1 to 0.15.2.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.15.1...0.15.2)

---
updated-dependencies:
- dependency-name: ruff
  dependency-version: 0.15.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix #507: Add data types documentation (#528)

* Fix #525: Rewrite fill_time_series for TimePeriod data type (#526)

* Fix #525: Rewrite fill_time_series for TimePeriod data type

Rewrote fill_periods method to correctly handle non-annual TimePeriod
frequencies (quarterly, monthly, semester, weekly) by using
generate_period_range for continuous period sequences instead of the
broken approach that decomposed periods into independent (year, number)
components.

* Fix next_period for year-dependent frequencies (daily, weekly)

next_period and previous_period used the static max from
PeriodDuration.periods (366 for D, 53 for W) instead of the
actual max for the current year. This caused failures when
crossing year boundaries for non-leap years (365 days) or
years with 52 ISO weeks.

* Change 2-X error codes from SemanticError to RuntimeError in TimeHandling

These errors occur at runtime during data processing (invalid dates,
unsupported period formats, etc.) rather than during semantic analysis.
Updated all related test assertions accordingly.

* Address PR review: make max_periods_in_year public, optimize fill_periods, fix docstring

* Fix #530: Auto-trigger docs workflow on documentation PR merge (#531)

* Bump version to 1.6.0rc1 (#532)

* Fix #533: Overhaul issue generation process (#534)

* Fix #533: Overhaul issue generation process

Remove auto-assigned labels from issue templates, add contact links
to config.yml, add Labels section and file sync rules to CLAUDE.md,
sync copilot-instructions.md with CLAUDE.md content.

* Add Documentation and Question issue templates

Add two new issue templates with auto-applied labels:
- Documentation: for reporting missing or incorrect docs
- Question: for usage and behavior questions

* Convert issue templates to yml form format with auto-applied types

Replace all .md issue templates with .yml form-based templates that
auto-set the issue type (Bug, Feature, Task) on creation. Labels are
only auto-applied for documentation and question templates.

* Improve issue templates following open source conventions

Add gating checkboxes (duplicate search, docs check), reproducible
example field with Python syntax highlighting, proper placeholders,
and required field validations.

* Align code placeholders with main.py

Update the reproducible example placeholder in bug_report.yml and
the code snippet in CLAUDE.md/copilot-instructions.md to match the
style and structure of main.py.

* Update PR template and add template conventions to CLAUDE.md

Add checklist section to PR template with code quality and test
checks. Update CLAUDE.md to mandate following issue and PR templates.

* Fix markdown lint issues in CLAUDE.md and copilot-instructions.md

Convert consecutive bold paragraphs to a proper list for the VTL
reference links.

* Update SECURITY.md and add security contact link

Update supported versions to 1.5.x, clarify that vulnerabilities
must be reported privately via email, and add a security policy
link to the issue template chooser.

* Enable private vulnerability reporting and update SECURITY.md

Add GitHub Security Advisories as the primary reporting channel
alongside email. Update the issue template contact link to point
directly to the new advisory form.

* Implemented handler for explicit casting with optional mask (#529)

* Refactor CastOperator: Enhance casting methods and add support for explicit mask with mask

* Add interval_to_period_str function and update explicit_cast methods for TimePeriod and TimeInterval

* Updated cast tests

* Parameterized cast tests

* Updated exception tests

* Simplified Time Period mask generator

* Refactor error handling in Cast operator to use consistent error codes and include mask in RunTimeError

* Enhance cast tests with additional cases for Integer, Number, Date, TimePeriod, and Duration conversions, aligning with VTL 2.2 specifications.

* Fixed ruff and mypy errors

* Updated number regex to accept other separators

* Removed Explicit cast with mask

* Minor fix

* Removed EXPLICIT_WITH_MASK_TYPE_PROMOTION_MAPPING from type promotion mappings

* Minor fix

* Updated poetry lock

* Fixed linting errors

* Duckdb ReferenceManual tests will only be launche when env var VTL_ENGINE_BACKEND is set to "duckdb"

* fix: removed  matplotlib dependency to allow versions >=3.9

* Fixed linting errors

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Francisco Javier Hernández del Caño <javier.hernandez@meaningfuldata.eu>
Co-authored-by: Alberto <155883871+albertohernandez1995@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants