To install the unreleased unihan-etl version, see developmental releases.
pip:
$ pip install --user --upgrade --pre unihan-etl
pipx:
$ pipx install --suffix=@next unihan-etl --pip-args '\--pre' --force
// Usage: unihan-etl@next
- Automatically linkify links that were previously only text.
-
poetry: 1.8.1 -> 1.8.2
See also: https://github.com/python-poetry/poetry/blob/1.8.2/CHANGELOG.md
-
Code quality: Use f-strings in more places (#320)
via ruff 0.4.2.
-
Aggressive automated lint fixes via
ruff
(#317)via ruff v0.3.4, all automated lint fixes, including unsafe and previews were applied:
ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; ruff format .
Branches were treated with:
git rebase \ --strategy-option=theirs \ --exec 'poetry run ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; poetry run ruff format .; git add src tests; git commit --amend --no-edit' \ origin/master
-
poetry: 1.7.1 -> 1.8.1
See also: https://github.com/python-poetry/poetry/blob/1.8.1/CHANGELOG.md
-
ruff 0.2.2 -> 0.3.0 (#316)
Related formattings. Update CI to use
ruff check .
instead ofruff .
.See also: https://github.com/astral-sh/ruff/blob/v0.3.0/CHANGELOG.md
Maintenance release: No bug fixes or new features.
- README: Rewrite introduction, note updated UNIHAN compatibility information.
- Link to UNIHAN release in v0.31.0's changelog notes.
Maintenance release: No bug fixes or new features.
CsvLexer
: Fix quoted items (#314)
-
Strengthen linting (#313)
-
Add flake8-commas (COM)
-
Add flake8-builtins (A)
-
Add flake8-errmsg (EM)
-
- Highlighting for CSV and TSV examples (#253)
- Typing fixes and additional doctest for
kTGH2013
(#312)
- Added
types-pygments
package (#253) - Added some manual type stubs for
pygments
'Lexer
(#253) - pytest-watcher: Silent
*.py.*py
reruns (#312)
Bump UNIHAN compatibility from 11.0.0 to 15.1.0 (released 2023-09-01, revision 35).
- 15.1.0: kHKSCS, kIRGDaiKanwaZiten, kKPS0, kKPS1, kKSC0, kKSC1, kRSKangXi
- 13.0.0: kRSJapanese, kRSKanWa, kRSKorean
- 12.0.0: kDefaultSortKey (private property)
- 15.1.0: kJapanese, kMojiJoho, kSMSZD2003Index, kSMSZD2003Readings, kVietnameseNumeric, kZhuangNumeric
- 15.0.0: kAlternateTotalStrokes
- 14.0.0: kStrange
- 13.0.0: kIRG_SSource, kIRG_UKSource, kSpoofingVariant, kTGHZ2013, kUnihanCore2020
- Quiet pytest tracebacks (#310)
- Relax pytest plugin assertions in regards to zip / export file size (#310)
- Expansions: Fix loading of double apostrophe values via
kRSUnicode
viakRSGeneric
(#304)
- Move CodeQL from advanced configuration file to GitHub's default
- Typo fixes
Maintenance only, no bug fixes, or new features
- ci: Add pydocstyle rule to ruff (#303)
- Add docstrings to functions, methods, classes, and packages (#303)
Maintenance only, no bug fixes, or new features
-
Move pytest configuration to
pyproject.toml
(#299) -
Add Python 3.12 to trove classifiers
-
Per Poetry's docs on managing dependencies and
poetry check
, we had it wrong: Instead of using extras, we should create these:[tool.poetry.group.group-name.dependencies] dev-dependency = "1.0.0"
Which we now do.
-
Poetry: 1.6.1 -> 1.7.0
See also: https://github.com/python-poetry/poetry/blob/1.7.0/CHANGELOG.md
-
Move formatting from
black
toruff format
(#302)This retains the same formatting style of
black
while eliminating a dev dependency by using our existing rust-basedruff
linter. -
CI: Update action packages to fix warnings
- dorny/paths-filter: 2.7.0 -> 2.11.1
SPACE_DELIMITED_LIST_FIELDS
: Fix for field namekAccountingNumeric
found during automated sweep for typos.
-
Typo fixes
typos --format brief --write-changes
One of these typos was for
kAccountingNumeric
inSPACE_DELIMITED_LIST_FIELDS
. -
ruff: Remove ERA /
eradicate
pluginThis rule had too many false positives to trust. Other ruff rules have been beneficial.
-
All pytest plugin fixtures are now prefixed
unihan_
, e.g.:quick_unihan_path
->unihan_quick_path
quick_unihan_options
->unihan_quick_options
quick_unihan_packager
->unihan_quick_packager
ensure_quick_unihan
->unihan_ensure_quick
mock_zip
->unihan_mock_zip
columns
->unihan_quick_columns
-
TestPackager
fixture has been removedThis fixture was made redundant by
unihan_quick_*
andunihan_full_*
fixtures
- pytest plugin (
unihan_zshrc
): Fixskipif
condition to run if shell useszsh(1)
-
"quick" fixtures:
- Data has been moved from
tests/fixtures
tosrc/unihan_etl/data_files/quick
- Fixtures prefixed by
sample_
in the name have been renamed toquick_
- Data has been moved from
-
"quick" and "full" fixtures: Fixed ability to access data files from outside
unihan_etl
package
- ruff: Code quality tweaks (#295)
-
pytest plugin: Add cached fixtures for
UNIHAN
(#291)After initial download of UNIHAN.zip, an 11 second testrun on unihan-etl's test can go down to 1.5 seconds - eliminating redownloading and extraction.
-
pytest plugin: Revert fix of
zshrc
fixture'sskipif
condition (#293)It was fine as-is.
Rolled back
- pytest plugin: Fix
zshrc
fixture'sskipif
condition (#292)
Maintenance only, no bug fixes, or new features
- ruff: Add additional linters, apply code fixes automatically and by hand (#290)
- Typings: Extract
LogLevel
andUnihanFormats
Maintenance only, no bug fixes, or new features
-
zhon: 1.1.5 -> 2.0.0 (#289, fixes #282)
Fixes pytest warning related to regular expressions.
Maintenance only, no bug fixes, or new features
-
{mod}
unihan_etl._internal.app_dirs
improvements (#287)-
Breaking:
app_dirs
moved- Before 0.23.x:
unihan_etl.app_dirs
- After 0.23.x:
unihan_etl._internal.app_dirs
- Before 0.23.x:
-
New feature: Override directories on a one-off basis
-
New feature: Template replacement of variables replacing environmental variables via {func}
os.path.expandvars
+ {func}os.path.expanduser
-
{mod}
doctests
: See the above in action thanks to doctests -
Dedicated tests via pytest
-
- API docs (#288):
- Limit depth of table of contents to one
- Fix section heading
- Fix comment in
AppDirs
- Fix for
destination
of files not replacing file extension correctly (#285)
This module has been renamed.
Before 0.22.x, unihan_etl's configuration was done through a {class}dict
object.
0.22.0 and after settings are configurable via a {obj}dataclasses.dataclass
object:
{class}unihan_etl.options.Options
-
Add {mod}
doctest
support (#274)- Initial doctest example added to README.md, test.py, and util.py.
-
Stub out initial pytest plugin (#274)
-
Split API docs into multiple files (#283)
-
Fix
make start
indocs/Makefile
by fixing argument positions (#283)
- Fix for
destination
of files not replacing file extension correctly (#286)
Maintenance only, no bug fixes or features
- Move file locations to {mod}
pathlib
internally (#277) - Improved typing download
urlretrive_fn
andreporthook
via {class}typing.Protocol
(#277)
Maintenance only, no bug fixes or features
-
Python 3.7 Dropped
Python 3.7 support has been dropped (#272)
Its end-of-life is June 27th, 2023 and Python 3.8 will add support for {mod}
typing
's {class}typing.TypedDict
and {class}typing.Protocol
out of the box without needing {mod}typing_extensions
's.
-
Typings:
- Import {mod}
typing
as a namespace, e.g.import typing as t
(#276) - Use
typing
for {class}typing.TypedDict
and {class}typing.Literal
(#276) - Use typing_extensions' {py:data}
TypeAlias
for repeated types, such in test_expansions (#276)
- Import {mod}
Maintenance only, no bug fixes or features
-
Add back
black
for formattingThis is still necessary to accompany
ruff
, until it replaces black.
Maintenance only, no bug fixes or features
-
Move formatting, import sorting, and linting to ruff.
This rust-based checker has dramatically improved performance. Linting and formatting can be done almost instantly.
This change replaces black, isort, flake8 and flake8 plugins.
-
poetry: 1.4.0 -> 1.5.0
See also: https://github.com/python-poetry/poetry/releases/tag/1.5.0
-
pytest: Fix invalid escape sequence warning from
zhon
merge_dict
: Improve typing of generic params (#271)
- Add PyYAML dependency
-
CI speedups (#267)
- Split out release to separate job so the PyPI Upload docker image isn't pulled on normal runs
- Clean up CodeQL
-
Bump poetry 1.1.x to 1.2.x
- Move
.coveragerc
->pyproject.toml
(#268)
- Move to
src/
-layout structure (#266) - Add flake8-bugbear (#263)
- Add flake8-comprehensions (#264)
- Render changelog in
linkify_issues
(#261, #265) - Fix Table of contents rendering with sphinx autodoc with
sphinx_toctree_autodoc_fix
(#265) - Test doctests in our docs via
pytest_doctest_docutils
(built ondoctest_docutils
) (#265)
- Add vendorized, updated fork of
sphinxcontrib-issuetracker
, via #261. - Remove sphinx-issues package
Follow ups to #257.
merged_dict()
: Fix merging edgecase where destination key was missingdownload()
: Fix edgecase when "downloading" file from local path
- mypy
--strict
annotations, via #257
-
New option:
--no-cache
Disregard cached .zip / extracted files, via #259.
-
Add python 3.8 and 3.9 to CI
This is to make way for strict type annotations, as the typings and generic behavior vary dramatically between 3.7 - 3.11.
- Python 2 compatibility module and imports removed. Python 2.x was officially dropped in 0.12.0 (2021-06-15) via #258
load_data
: Accept list ofpathlib.Path
in addition to list ofstr
- Add Python 3.10 (#248)
- Dropped Python 3.6 (#248)
Infrastructure updates for static type checking and doctest examples.
-
Update poetry to 1.1
- CI: Use poetry 1.1.12 and
install-poetry.py
installer (#237 + #248) - Relock poetry.lock at 1.1 (w/ 1.1.7's fix)
- CI: Use poetry 1.1.12 and
-
Run pyupgrade for python 3.7
-
Tests: Move from
tmpdir
->tmp_path
-
Initial doctests support added, via #255
-
Initial mypy validation, via #255
-
CI (tests, docs): Improve caching of python dependencies via
action/setup-python
's v3/4's new poetry caching, via #255 -
CI (docs): Skip if no
PUBLISH
condition triggered, via #255
- Move to
furo
theme - Add :ref:
quickstart
page - Link to cihai's developer documentation: https://cihai.git-pull.com/contributing/
- #236: Convert to markdown
- Update
black
to 21.6b0 - Update trove classifiers to 3.9
- #235: Drop python 2.7, 3.5. Remove python 2 modesets and
__future__
- #230 Move packaging / publishing to poetry
- #229 Self host docs
- #229 Add metadata / icons / etc. for doc site
- #229 Move travis -> github actions
- #229 Overhaul Makefiles
- Update CHANGES headings to produce working links
- Relax
appdirs
version constraint - #228 Move from Pipfile to poetry
- Fix flicker in download progress bar
- Add
project_urls
to setup.py - Use plain reStructuredText for CHANGES
- Use
collections
that's compatible with python 2 and 3 - PEP8 tweaks
- Add code links in API
- Add
__version__
tounihan_etl
-
#91 New fields from UNIHAN Revision 25.
- kJinmeiyoKanji
- kJoyoKanji
- kKoreanEducationHanja
- kKoreanName
- kTGH
UNIHAN Revision 25 was released 2018-05-18 and issued for Unicode 11.0:
-
Add tests and example corpus for kCCCII
-
Add configuration / make tests for isort, flake8
-
Switch tmuxp config to use pipenv
-
Add Pipfile
-
Add
make sync_pipfile
task to sync requirements/.txt* files with *Pipfile* -
Update and sync Pipfile
-
Developer package updates (linting / docs / testing)
- isort 4.2.15 to 4.3.4
- flake8 3.3.0 to 3.5.0
- vulture 0.14 to 0.27
- sphinx 1.6.2 to 1.7.6
- alagitpull 0.0.12 to 0.0.21
- releases 1.3.1 to 1.6.1
- sphinx-argparse 0.2.1 to 1.6.2
- pytest 3.1.2 to 3.6.4
-
Move documentation over to numpy-style
-
Add sphinxcontrib-napoleon 0.6.1
-
Update LICENSE New BSD to MIT
-
All future commits and contributions are licensed to the cihai software foundation. This includes commits by Tony Narlock (creator).
- Enhance support for locations on kHDZRadBreak fields.
- Fix kIRG_GSource without location
- Fix kFenn output
- Fix kHanyuPinlu support output for n diacritics
- Add expansion for kIRGKangXi
- Normalize Radical-Stroke expansion for kRSUnicode
- Migrate more fields to regular expressions
- Normalize character field for kDaeJaweon, kHanyuPinyin, and kCheungBauer, kFennIndex, kCheungBauerIndex, kIICore, kIRGHanyuDaZidian
- Support for expanding kGSR
- Convert some field expansions to use regexes
- Fix bug where destination file was made into directory on first run
- Rename from unihan-tabular to unihan-etl
- Support for expanding multi-value fields
- Support for pruning empty fields
- Improve help dialog
- Added a page about UNIHAN and the project to documentation
- Split constant values into their own module
- Split functionality for expanding unstructured values into its own module
- Update to add kJa and adjust source file of kCompatibilityVariant per Unicode 8.0.0.
- Support for configuring logging via options and CLI
- Convert all print statements to use logger
- Allow for local / file system sources for Unihan.zip
- Only extract zip if unextracted
- Update package classifiers
- Add back datapackage
- Fix python 2 CSV output
- Default to CSV output
- Move unicodecsv module to dependency package
- Support for XDG directory specification
- Support for custom destination output, including replacing
template variable
{ext}
- Move about.py to module level
- Fix python package import
- Fix readme bug on pypi
- Support for exporting in YAML and JSON
- More internal factoring and simplification
- Return data as list
- Drop python 3.3 an 3.4 support
- Rename from cihaidata_unihan unihan_tabular
- Drop datapackages in favor of a universal JSON, YAML and CSV export.
- Only use UnicodeWriter in Python 2, fixes issue with python
would encode
b
in front of values
- Rename scripts/ to cihaidata_unihan/
- Enable invoking tool via
$ cihaidata_unihan
- Major internal refactor and simplification
- Convert to pytest
assert
statements - Convert full test suite to pytest functions and fixtures
- Get CLI documentation up again
- Improve test coverage
- Lint code, remove unused imports
- Switch license BSD -> MIT
- Rebooted
- Modernize Makefile in docs
- Add Makefile to main project
- Modernize package metadata to use about.py
- Update requirements to use requirements/ folder for base, testing and doc dependencies.
- Update sphinx theme to alabaster with new logo.
- Update travis to use coverall
- Update links on README to use https
- Update travis to test up to python 3.6
- Add support for pypy (why not)
- Lock base dependencies
- Add dev dependencies for isort, vulture and flake8