Skip to content

Releases: ilius/pyglossary

PyGlossary 4.6.1

10 Mar 12:48
5f6ebd6
Compare
Choose a tag to compare

Changes since 4.6.0

Bug fixes

  • Fix a bug causing broken installation if ~/.local/lib is a symbolic link

    • or site-packages or any of its parents are a symbolic link
  • Fix incompatibilty with Python 3.9 (despite documentation)

  • Fix scripts/entry-filters-doc.py, scripts/plugin-doc.py and doc/entry-filters.md

  • AppleDict: Fix typos in Chinese language module

Features:

  • Use environment variable VERBOSITY as default (a number from 0 to 5)

Improvements

  • AppleDict Binary: set html_full=True by default

  • Update wcwidth to 0.2.6

Refactoring

  • Add glos.stripFullHtml(errorHandler) and use it in 3 plugins

    • Add entry filter StripFullHtml and change entry.stripFullHtml() to return error
  • Refactor entryFiltersRules

  • Remove empty plugin gettext_mo.py

  • Remove glos.titleElement from glossary_v2.Glossary

    • Add to glossary.Glossary for compatibility
    • glossary.Glossary is a wrapper (child class) on top on glossary_v2.Glossary

Documentation

  • Update doc/entry-filters.md to list some entry filters that were enabled conditionally (besides config)

  • Remove sdict.md and sdict_source.md (removed plugins)

Type checking

  • Add missing method in GlossaryType class
  • Fix mypy errors on most of code base and some of plugins
  • Use builtin types list, dict, tuple, set for type annotations
  • Replace Optional[X] with X or None
    • will not effect runtime, but type checking now only works with Python 3.10+

PyGlossary 4.6.0

07 Mar 11:27
4b7ae78
Compare
Choose a tag to compare

Changes since 4.5.0

Dependency change

We now require Python 3.9 or a later version.

Bug fixes

  • Fix exception in scripts/plugin-index.py: 8a94b8c

  • StarDict: Fix writing to .zip file produced empty zip, and fix bad test

  • dictunformat: fix #367: add option headword_separator, default to ;

  • Fixes in ui_gtk, #380 #382 #403

  • AppleDict source: fix #407 missing quotes for title, and refactor duplicate codes

  • DictionaryForMIDs: remove | from word when normalizing, fix punctuation regex, use Unix newlines

  • StarDict: use Unix newline when reading and writing .ifo file on Windows

  • Fix bug of glos.addEntryObj(dataEntry) adding empty file because tmpDataDir is not set until glos.read()

    • Set and create tmpDataDir on glos.tmpDataDir access, and add test, #424
  • Fix scripts/wiki-formats.py, #428

  • Dictd / Dict.org: fix exception on Windows

Features

  • Support sorting by an ICU locale, see Sorting section of README

  • Add Gtk4 interface --ui=gtk4 / --gtk4

    • still buggy and not as functional as Gtk3 or Tkinter interfaces
  • Add flag --optimize-memory, config key optimize_memory

    • To enable entry compression on --indirect
    • Not enabled by default (it was previously always compressed)
  • Allow plugin's reader.open() to return an Iterator for progress bar

    • Implement for Tabfile (reading info/metedata)
    • Implement for AppleDict Binary (reading KeyText.data)
  • Add read and write support for StarDict Textual File (.xml), #348

  • Add support for writing Yomichan dictionary files, #395 by @tomtung

  • StarDict reader: support .syn.dz file, #410

  • StarDict writer: add write option large_file, #392 #422

  • StarDict reader: support dxoffsetbits=64 on read, #392 #422

  • JMDict: support examples, #383

  • Add read support for JMnedict, #386

  • Add flag --skip-duplicate-headword, config skip_duplicate_headword, #365

    • Zim reader: remove option skip_duplicate_words, #365
  • Add flag --trim-arabic-diacritics, config trim_arabic_diacritics, #366

  • Add read support for IUPAC goldbook (.xml), #355

  • Add write support for DIKT JSON

  • StarDict writer: limit memory usage by using SQLite for idx and syn data, #409

  • CSV: add newline option, defaulting to Unix-style

  • Aard2 Slob writer: add option file_size_approx_check_num_entries

  • Add scripts/diff-glossary and scripts/view-glossary

Improvements

  • When remove HTML tags, also replace <div> with \n, #394 by @tomtung

    • Treat <div> the same way <p> is treated.
  • Mobi: add mobi7-forcing switch to kindlegen command, #374 by @holyspiritomb

  • Octopus MDict: ignore directories with same_dir_data_files, #362

  • StarDict reader: handle definitions with mixed types/formats

  • Dictfile: strip whitespaces from word and defi before going through entry filters

  • BGL: strip whitespaces from word and defi before going through entry filters

  • Improvement in glos.write: avoid printing exception for invalid encoding

  • Remove empty logs in glos.convert

  • StarDict reader: fix validating sametypesequence, and add test

  • glos.convert: Allow an existing empty directory as output path

  • TextGlossaryReader: replace nextPair method with nextBlock which returns resource files as third item

  • ui_cmd_interactive: allow converting several times before exiting

  • Change title tag for Greek from <big> to <b>

  • Update language data set (langs.json)

  • ui/main.py: print 1-line error instead of full exception on ImportError

  • ui/main.py: Windows: try Tkinter before Gtk

  • ebook_base.py: avoid shutil.move on Windows, #368

  • TextGlossaryReader: fix loading info and some refactoring, #370 36b9cd8

  • Entry: Allow word to be tuple in Entry(word=...)

  • glos.iterInfo() return Iterator rather than Iterable

  • Zim: change dependency to libzim>=1.0, and some comments

  • Mobi: work with kindlegen executable in PATH directories, #401

  • ui: limit the length of option comments in Format Options dialog

  • ui_gtk: improvement: show (last) critical error on status bar

  • ui_gtk: set intial focus

  • ui_gtk: improvements in About tab

  • ui_tk: revert most ttk widgets to tk because the theme doesn't match

  • Add SVG icon, #414 by @proletarius101

  • Prevent exception/traceback on Ctrl+C

  • Optimize progress bar

  • Aard2 slob: show info log before and after slobWriter.finalize(), #437

Removed features

  • Remove read support for Wiktiomary Dump, #48

  • Remove support for Sdictionary Binary and Source

Octopus MDict MDX: features and improvements

  • Support MDict V3 fomrat by updating readmdict, #385 by @xiaoqiangwang

  • Fix files created without UUID in header, #387 by @xiaoqiangwang

    • MdxBuilder 4.0 RC2 and before creates files without UUID header
  • Decode mdict title & description if they're bytes, #393 by @tomtung

  • readmdict: Skip zlib decompress exceptions, #384

  • readmdict: Use __name__ as logger name, and add 2 debug logs, #384

  • readmdict: improve exception msg for xxhash, #385

XDXF: fixes / imrovements, issue #376

  • Support <categ>
  • Support embedded tags in <iref>
  • Fix ignoring <mrkd>
  • Fix extra newlines
  • Get rid of warning for <etm>
  • Fix/improve newline and space issues
  • Fix and improve tests
  • Update url for format description
  • Support any tag/string in <ex>, #396
  • Support reading compressed files directly (.xdxf.gz, .xdxf.bz2, .xdxf.lzma)
  • Allow using XSL using --write-options=xsl=True
  • Update XSL
  • Other improvements in XDXF to HTML transformation

AppleDict Binary: features, bug fixes, improvements, refactoring

  • Fix css name on html_full=True

  • Fix using self._encoding when should use utf-8

  • Fix internal links, #343

    • Remove x-dictionary:d: prefix from href
    • First fix for x-dictionary:r:: use title if present
    • Add bword:// prefix to href (unless it points to http/https)
    • Read entry IDs on open and fix links with x-dictionary:r:
  • Add plistlib to dependencies

  • Add tests

  • Replace <entry ...> with <div>

  • Fix bad exception formatting

  • Fixes from PR #436

  • Support morphology (alternates): #434 by @soshial

  • Support different AppleDict offsets, #417 by @soshial

  • Extract AppleDict meta-info (langs, title, author), #418 by @soshial

  • Progress Bar on open() / loading KeyText.data

  • Improve memory usage of loading KeyText.data

  • Replace appledict_bin.py with appledict_bin directory and more refactoring

Glossary class (glossary.py)

  • Lots of refactoring in glossary.py

    • Improve the design and readability
    • Reduce complexity of methods
    • Move some code into new classes that Glossary inherits from
    • Improve error messages
  • Introduce glossary_v2.py, and maintain API backward-compatibility for glossary.py (as far as documented)

Refactoring

  • Fix style errors using ruff based on pyproject.toml configuration

  • Remove all usages of pyglossary.plugins.formats_common

  • Use str.startswith(tuple) and str.endswith(tuple)

  • Reduce complexity of Glossary methods

  • Rename entry filter strip to trim_whitespaces

  • Some refactoring in StarDict reader

  • Use f-string equal syntax added in Python 3.8

  • Use str.removeprefix and str.removesuffix added in Python 3.9

  • langs/writing_system.py:

    • Change iso field to list
    • Add new scripts
    • Add getAllWritingSystemsFromText
    • More refactoring
  • Split up TextGlossaryReader.loadInfo method

  • plugin_manager.py: make some methods private

Documentation

  • Update plugins' documentation

  • Glossary: add comments about entryFilters

  • Update config.rst

  • Update doc/entry-filters.md

  • Update README.md

  • Update doc/sort-key.md

  • Update doc/pyicu.md

  • Update plugins/testformat.py

  • Add types for arguments and result of all functions/methods

  • Add types for r/w options in reader/writer classes

  • Fix a few incorrect type annotations

  • README.md: Add document for adding data entries, #412

  • README.md: Fix -> nixos command, #400 by @srghma

  • Update bgl_info.md and move it from pyglossary/plugins/babylon_bgl/ to doc/babylon/

Testing

  • Add test for DSL -> Tabfile conversion

  • dsl_test.py: fix method names not starting with test_

  • StarDict reader: better testing for handling definitions with mixed types

  • StarDict writer: much better testing, coverage of stardict.py: from %62 to %83

  • Refactoring and improvements in tests of Glossary, along with new tests

  • Add test for dictunformat -> Tabfile

  • AppleDict (source) tests: validate plist file contents

  • Allow forking and branching pyglossary-test repo

  • Fix some failing tests on Windows

  • Slob: test file_size_approx

  • Test Tabfile -> SQL conversion

  • Test StarDict error/warning for sortKeyName with and without locale

  • Print useful messages for unhandled warnings

  • Improve logs

  • Add showDiff=False arg to compareTextFiles and convert

Packaging

  • Update and refactor Dockerfile and run-with-docker.sh

    • Dockerfile: chan...
Read more

PyGlossary 4.5.0

04 Feb 23:19
2433ff5
Compare
Choose a tag to compare

Changes since 4.4.1

Bug fixes

  • Fix 2 log messages in glos._resolveConvertSortParams

  • Fixes and improvements in Dictfile (.df) reader

    • Fix exception: disable loading info (Dicfile does not support info)
    • TextGlossaryReader: prevent producing duplicate data entries
      • This fixes: error in DataEntry.save: [Errno 2] No such file or directory: ... because entry.save() moves the temp file to output path
      • This bug only existed for Dictfile (.df) format.
    • Remove extra colon, #358
    • Remove some extra newline
    • And add test for Dictfile to/from Tabfile
  • Fix not cleaning up temp directory on return with error from glos.convert

Features

  • ui_gtk: add a "General Options" button that opens a dialog for:

    • Settings for sort and sortKey
    • Checkbox for SQLite mode
    • Check boxes for config params: save_info_json, lower, skip_resources, rtl, enable_alts, cleanup, remove_html_all
  • Add support for --sort-key random to shuffle entries

Performance improvements

  • Performance improvement: remove gc.collect() calls in Glossary and *EntryList

    • Not needed since Python 3.8
    • Change minimum python requirement to 3.8 in README.md
  • Do not import all plugin modules (only import two plugins that are used)

    • Load json file plugins-meta/index.json instead
    • In debug mode, all plugin modules are still imported and validated
    • User plugins are still imported

Other improvements

  • Improve detection of languages from glossary name, and add tests
  • Update langs.json: add new 3-letter codes for 25 languages
  • glos.preventDuplicateWords and glos.removeHtmlTagsAll: prevent adding filter twice
  • glos.cleanup: reset path list to avoid (non-critical) error if called again
  • Minor improvements in Glossary.init()
  • DataEntry.save: on FileNotFoundError show a 1-line error instead of log.exception
  • ui_gtk: create a new Glossary object every time Convert button is clicked
  • Add docstring for Glossary.init

Unit testing

  • Update tests/glossary_errors_test.py
  • Add missing cleanup for some temp file
  • add test for LDF to/from Tabfile

Refactoring

  • Plugins: replace import of formats_common from currect directory with pyglossary.plugins.formats_common

  • Fix logging.warn method is deprecated, use warning instead, PR #360 by @BoboTiG

  • Fix DeprecationWarning: invalid escape sequence, PR #361 by @BoboTiG

  • Move some functions from glossary_utils.py to compression.py

  • Move some methods from Glossary to new parent classes PluginManager and GlossaryInfo

  • Some refactoring in plugin_prop.py and plugin_manager.py

    • Rename plugin.pluginModule to plugin.module
    • Minimize direct access to plugin.module, plugin.readerClass or plugin.writerClass
    • Add some new properties to PluginProp
    • Remove a log from glossary.py
    • Disable validation of plugins unless in debug mode
    • plugin_prop.py: fix checking debug level
  • sq_entry_list.py: rename sortColumns to sqliteSortKey

  • Some refactoring around setSortKey between Glossary, EntryList and SqEntryList

  • Remove Entry.sqliteSortKeyFrom and related classmethods

  • Some more simplification in glossary.py

  • Remove Entry.defaultSortKey

  • Some style fixes

  • iter_utils.py: remove unused key= argument from unique_everseen

  • Refactor ui_gtk and update config comments

  • extractInlineHtmlImages: avoid writing file within sub func

PyGlossary 4.4.1

25 Jan 10:22
663748c
Compare
Choose a tag to compare

Changes since 4.4.0

Bug fixes

  • Automatically create cacheDir on Glossary.init()
    • Fixes exception in SQLite mode

Features

  • ui_cmd_interactive: support setting sortKey

Improvements and documentation

  • Wiktionary Dump: remove detect-by-extension
  • glossary.py: update docstrings for sortKeyName
  • sort_keys.py: add desc to NamedSortKey
  • Update doc/sort-key.md

PyGlossary 4.4.0

24 Jan 17:39
cfd61e8
Compare
Choose a tag to compare

Changes since 4.3.0

Breaking changes

  • Remove partial sorting support (obsolete feature)

    • Remove --sort-cache-size flag in command line
    • (For library users) Remove sortCacheSize argument to glos.write and glos.convert
  • Re-design sorting and sortKey parameters

    • Breaking change for library users, and user plugins that need sorting (sortOnWrite = ALWAYS)

    • Change glos.convert

      • Replace argument sortKey (Callable) with sortKeyName (str)
      • Add argument sortEncoding (str) defaulting to utf-8
    • Change glos.write

      • Replace argument sortKey (Callable) with namedSortKey (sort_keys.NamedSortKey)
      • Add argument sortEncoding (str) defaulting to utf-8
    • Change glos.sortWords

      • Replace argument key (Callable) with sortKeyName (str)
      • Add argument sortEncoding (str) defaulting to utf-8
    • Change API of plugins that use sortOnWrite = ALWAYS

      • Replace writer.sortKey and Writer.sqliteSortKey with sortKeyName in plugin module.
      • See the stardict.py for example.

    Note 1: All sortKey and sortEncoding arguments are optional.

    Note 2: Values of sortKeyName are documented in doc/sort-key.md

  • Rename 2 files in doc/:

    • Rename doc/entry_filters.md to doc/entry-filters.md
    • Rename doc/term_colors.md to doc/term-colors.md

Features

  • --sort-key and --sort-encoding command line flags (as part of above re-design)

  • Now SQLite mode works for all output formats.

Bug fixes

  • Fix lack of Progress Bar while writing in indirect or SQLite mode
  • Fix misleading message log about SQLite mode
  • Fix unclosed files in XDXF and FreeDict plugins

Improvements

  • Show a 1-line log instead of FileNotFoundError traceback in glos.read and glos.write
  • Close readers in glos.convert if write failed
  • Fix some type annotations and comments
  • (For library users) Change Glossary.__str__
  • (For library users) glos.setInfo: convert non-str value to str, and add tests

Unit testing

Add new tests and improve existing tests.

  • Coverage of glossary.py: %89
  • Overall coverage of codebase + plugins: %58

Refactoring and design improvements

  • Simplify by passing glos object to EntryList()
  • Replace SqList with SqEntryList
  • Change __iter__ of SqEntryList and EntryList to give entry objects
  • Simplify Glossary by moving gc.collect to EntryList and SqEntryList
  • Remove unused function xml_unescape
  • Remove unused import from FreeDict and JMDict plugins
  • Use operator.itemgetter in stardict.py, dict_cc.py, ebook_kobo.py, reverse.py
  • glossary.py: cleanup, simplify and optimize generators logic
    • Also remove index argument from entryFilter.run method and add some comments
  • Remove redundant check in glos.progress
  • Remove redundant check in _getLangByStr
  • Remove redundant check in Glossary.detectOutputFormat

PyGlossary 4.3.0

15 Jan 12:18
cf4db2b
Compare
Choose a tag to compare

Changes since 4.2.1

Bug fixes

  • Tabfile writer: fix replacing \ with \\
  • --remove-html flag: fix bad regex
  • ui_cmd_interactive: fix a few bugs
  • Lowercase word/entry links (<a href="bword://...) when --lower flag is passed
  • TextGlossaryWriter: do not skip words that start with #
  • Fix StdLogHandler: was not applying --no-color
  • Fix checking for sys.frozen

New features

  • Add auto_sqlite config parameter

    • to use SQLite mode for StarDict and EPUB-2 (which require sorting) by default
    • also allow overriding it with --no-sqlite flag
  • Add 3 config parameters allow changing log colors in terminal:

    • color.cmd.critical
    • color.cmd.error
    • color.cmd.warning
  • Add 2 keys to config to enable/disable colors in Unix and Windows separately

    • color.enable.cmd.unix: default true
    • color.enable.cmd.windows: default false

New features for library users

  • Allow glos.setInfo(key, None) to delete the info / metadata key

  • Add glos.alts property as shortcut, and use it internally

Design improvements

Change rawEntry[0] from bytes to List[str] and avoid split/join when converting rawEntry <-> entry.
This fixes some very edge cases involving | in words, but uses more RAM in indirect mode (converting to StarDict), which can be solved with --sqlite.

Documentation

Unit testing

Coverage of glossary.py: %75

There are 2501 lines of test code in tests directory.

Tests for Glossary class include:

  • Basic functionality
  • Error handling
  • Sorting and direct / indirect / SQLite modes
  • Entry filter config/flags (lower, rtl, remove_html, remove_html_all)
  • Resources / data entries
  • Convert: Tabfile <-> Aard2 slob
  • Convert: Tabfile <-> CSV
  • Convert: Tabfile -> EPUB-2
  • Convert: Tabfile -> JSON
  • Convert: Tabfile <-> StarDict

Other improvements:

  • glossary_test.py: check CRC32 of downloaded test files
  • glossary_test.py: use a new temp dir for each test method for isolation.
  • ebook_kobo_test.py: split into several test methods

Improvements

  • Zim: make improvements, #352
  • Aard2 slob: add 2 mime types, #352
  • ui/main.py: do not allow --remove-html and --remove-html-all together
  • Glossary: do not allow glos.config to be set twice
  • Glossary: change some error logs to critical, and more improvements
  • Prevent conflicting config flags together, like --lower --no-lower
  • Disable utf8_check config parameter by default (not needed since 3.0.0)

Refactoring and cleanup

  • Glossary: some refactoring in convert method
  • Rename 3 scripts in scripts/ directory
  • Remove DataEntry.fromFile and improve behavior of DataEntry.__init__
  • Refactoring in ui/
  • rename option.cmdFlag to option.customFlag
  • Glossary: add glos.rawEntryCompress property, and use in entry.py
  • Glossary: minor improvement in loadPlugins
  • XDXF: remove useless argument in Reader.open
  • remove unused some functions from text_utils.py
  • plugin_prop.py: refactor getExtraOptions
  • Avoid assigning protected attrs in text_writer.py and plugins/tabfile.py
  • Fewer protected attr access in entry_filters.py
  • Move sortKey and get_prefix implementations from ebook_base.py to epub and mobi plugins
  • Change name of 2 entry filters to match the config param

PyGlossary 4.2.1

26 Dec 20:01
c0d0eef
Compare
Choose a tag to compare

Changes since version 4.2.0

Minor bug fixes and improvements:

  • text_utils.py

    • Minor bug: fix legacy function urlToPath using urllib.parse.unquote
    • Minor bug: replacePostSpaceChar: remove trailing space from the output str
    • Cleanup:
      • Remove unused function isControlChar
      • Remove unused function formatByteStr
      • Remove argument exclude from function isASCII
    • Add unit tests
  • ui_cmd_interactive.py: fix a minor bug and some small refactoring

  • Command line: Override input glossary info with --source-lang and --target-lang flags

  • Add unit tests for CSV -> Tabfile conversion

  • CSV plugin: some refactoring, and rename the module to csv_plugin.py

  • Update setup.py: add python_requires=">=3.7.0", update extras_require

  • Update README.md

Fearures:

  • Command line: Add --name flag for changing glossary name
  • Glossary: convert: add infoOverride optional argument

PyGlossary 4.2.0

20 Dec 08:30
1b1450c
Compare
Choose a tag to compare

Changes since 4.1.0

  • Breaking changes:

    • Replace glos.getAuthor() with glos.author
      • This looks for "author" and then "publisher" keys in info/metadata
    • Rename option apply_css to css for mobi and epub2
    • glos.getInfo and glos.setInfo only accept str as key (or a subclass of str)
  • Bug fixes:

    • Indirect mode: Fix handling '|' character in words.

      • Escape/unescape | in words when converting entry <-> rawEntry
    • Escape/unescape | in words when writing/reading text-based file formats

    • JSON: Prevent duplicate keys in json output, #344

      • Add new method glos.preventDuplicateWords()
  • Features and improvements

    • Add SQLite mode with --sqlite flag for converting to StarDict.

      • Eliminates the need to load all entries into RAM, limiting RAM usage.
      • You can add --sqlite to you command, even for running GUI.
        • For example: python3 main.py --tk --sqlite
      • See README.md for more details.
    • Add --source-lang and --target-lang flags

    • XDXF: support more tags and improvements

    • Add unit tests for Glossary class, and some functions in text_utils.py

    • Windows: change cache directory to %LOCALAPPDATA%

    • Some refactoring and optimization

    • Update, improve and re-format documentations

PyGlossary 4.1.0

01 Dec 15:44
11b1710
Compare
Choose a tag to compare

There are a lot of changes since last release, but here is what I could gather and organize!
Please see the commit list for more!

  • Improvements in ui_gtk

  • Improvements in ui_tk

  • Improvements in ui_cmd_interactive

  • Refactoring and improvements in ui-related codebase

  • Fix not loading config with --ui=none

  • Code style fixes and cleanup

  • Documentation

    • Update most documentations.
    • Add comments for read/write options.
    • Generate documentation for all formats
      • Placed in doc/p, linked to in README.md
      • Generating with scripts/plugin-doc-gen.py script
      • Read list of dictionary tools/applicatios from TOML files in plugins-meta/tools
  • Add Dockerfile and run-with-docker.sh script

  • New command-line flags:

    • --json-read-options and --json-write-options
      • To allow using ; in option values
      • Example: '--json-write-options={"delimiter": ";"}'
    • --gtk, --tk and --cmd as shortcut for --ui=gtk etc
    • --rtl to change direction of definitions, #268, also added to config.json
  • Fix non-working --remove-html flag

  • Changes in Glossary class

    • Rename glos.getPref to glos.getConfig
    • Change formatsReadOptions and formatsWriteOptions to Dict[str, OrderedDict[str, Any]]
      • to include default values
    • remove glos.writeTabfile, replace with a func in pyglossary/text_writer.py
    • Glossary.init: avoid showing error if user plugin directory does not exist
  • Fixes and improvements code base

    • Prevent dataEntry.save() from raising exception because of invalid filename or permission
    • Avoid exception if removing temp file/folder failed
    • Avoid mktemp and more improvements
      • use ~/.cache/pyglossary/ directory instead of /tmp/
    • Fixes and improvements in runDictzip
    • Raise RuntimeError instead of StopIteration when iterating over a non-open reader
    • Avoid exception if no zip command was found, fix #294
    • Remove directory after creating .zip, and some refactoring, #294
    • DataEntry: replace inTmp argument with tmpPath argument
    • Entry: fix html pattern for hyperlinks, #330
    • Fix incorrect virutal env directory detection
    • Refactor dataDir detection, #307 #316
    • Show warning if failed to create user plugins directory
    • fix possible exception in log.emit
    • Add support for Conda in dataDir detection, #321
    • Fix f-string in StdLogHandler.emit
  • Fixes and improvements in Windows

    • Fix bad dataDir on Windows, #307
    • Fix shutil.rmtree exception on Windows
    • Support creating .zip on Windows 10, #294
    • Check zip command before tar on Windows, #294
    • Show graphical error on exceptions on Windows
    • Fix dataDir detection on Windows, #323 $324
  • Changes in Config:

    • Rename config key skipResources to skip_resources
      • Add it to config.json and configDefDict
    • Rename config key utf8Check to utf8_check
      • User should edit ~/.pyglossary/config.json manually
  • Implement direct compression and uncompression, and some refactoring

    • change glos.detectInputFormat to return (filename, format, compression) or None
    • remove Glossary.formatsReadFileObj and Glossary.formatsWriteFileObj
    • remove fileObj= argument from glos.writeTxt
    • use optional 'compressions' list/tuple from Writer or Reader classes for direct compression/uncompression
    • refactoring in glossary_utils.py
  • Update setup.py

  • Show version from 'git describe --always' on --version

  • FileSize option (used in many formats):

    • Switch to metric (powers of 1000) for K, M, G units
    • Add KiB, MiB, GiB for powers of 1024
  • Add extensionCreate variable (str) to plugins and plugin API

    • Use it to improve ui_tk
  • Text-based glossary code-base (effecting Tabfile, Kobo Dictfile, LDF)

    • Optimize TextGlossaryReader
    • Change multi-file text glossary file names from .N.txt to .txt.N (where N>=1)
    • Enable reading pyglossary-writen multi-file text glossary by adding file_count=-1 to metadata
      • because the number of files is not known when creating the first txt file
  • Tabfile

    • Rename option writeInfo to enable_info
    • Reader: read resource files from *.txt_res directory if exists
    • Add *.txt_res directory to *.zip file
  • Zim Reader:

    • Migrate to libzim 1.0
    • Add mimetype image/webp, fix #329
  • Slob and Tabfile Writer: add file_size_approx option to allow writing multi-part output

    • support values like: 5500k, 100m, 1.2g
  • Add word_title=False option to some writers

    • Slob Writer: add word_title=False option
    • Tabfile Writer: add word_title=False option
    • CSV Writer: add word_title=False option
    • JSON Writer: add word_title=False option
    • Dict.cc Reader: do not add word title
    • FreeDict Reader: rename keywords_header option to word_title
    • Add glos.wordTitleStr, used in plugins with word_title option
    • Add definition_has_headwords=True info key to avoid adding the title next time we read the glossary
  • Aard2 (slob)

    • Writer: add option separate_alternates=False, #270
    • Writer: fix handling content_type option
    • Writer: use ~/.cache/pyglossary/ instead of /tmp
    • Writer: add mp3 to mime types, #289
    • Writer: add support for .ini data file, #289
    • Writer: support .webp files, #329
    • Writer: supoort .tiff and .tif files
    • Reader: read glossary name/title and creation time from tags
    • Reader: extract all metedata / tags
    • slob.py library: Refactoring and cleanup
  • StarDict:

    • Reader: add option unicode_errors for invalid UTF-8 data, #309
    • Writer: add bool write-option audio_goldendict, #327
    • Writer: add option audio_icon=True, and add option comment, #327
  • FreeDict Reader

    • Fix two slashes before and after pron
    • Avoid running unescape_unicode by encoding="utf-8" arg to ET.htmlfile
    • Fix exception if edition is missing in header, and few other fixes
    • Support <cit type="example"> with <cit type="trans"> inside it
    • Support <cit type="trans"> inside nested second-level(nested) <sense>
    • Add "lang" attribute to html elements
    • Add option "example_padding"
    • Fix rendering <def>, refactoring and improvement
    • Handle <note> inside <sense>
    • Support <note> in <gramGrp>
    • Mark external refs with <a ... class="external">
    • Support comment in <cit>
    • Support <xr> inside <sense>
    • Implement many tags under <sense>
    • Improvements and refactoring
  • XDXF

    • Fix not finding xdxf.xsl in installed mode

      • Effecting XDXF and StarDict formats
    • xdxf.xsl: generate <font color=...> instead of <span style=...>

    • StarDict Reader: Add xdxf_to_html=True option, #258

    • StarDict Reader: Import xdxf_transform lazily

      • Remove forced dependency to lxml, #261
    • XDXF plugin: fix glos.setDefaultDefiFormat call

    • xdxf_transform.py: remove warnings for , #322
    • Merge PR #317
      • Parse sr, gr, ex_orig, ex_transl tags and audio
      • Remove None attribute from audio tag
      • Use unicode symbols for audio and external link
      • Use another speaker symbol for audio
      • Add audio controls
      • Use plain link without an audio tag
  • Mobi

    • Update ebook_mobi.py and README.md, #299
    • Add PR #335 with some modifications
  • Changes in ebook_base.py (Mobi and EPUB)

    • Avoid exception if removing tmpDir failed
    • Use style.css dataEntry, #299
  • DSL Reader:

    • Strip whitespaces around language names, #264
    • Add progressbar support, #264
    • Run html.escape on text before adding html tags, #265
    • Strip and unquote glossary name
    • Generate <i> and <font color=...> instead of <span style=...>
    • Avoid adding html comment
    • Remove \ufeff from header lines, #306
  • AppleDict Source

    • Change path of Dictionary Development Kit, #300
    • Open all text files with encoding="utf-8"
    • Some refactporing
    • Rename 4 options:
      • cleanHTML -> clean_html
      • defaultPrefs -> default_prefs
      • prefsHTML -> prefs_html
      • frontBackMatter -> front_back_matter
  • AppleDict Binary

    • Improvements, #299
    • Read DefaultStyle.css file, add as style.css, #299
    • Change default value of option: html=True
  • Octopus MDict (MDX)

    • Fix image links
    • Do not set empty title
    • Minor improvement in readmdict.py
    • Handle exception when reading from a corrupt MDD file
    • Add bool flag same_dir_data_files, #289
    • Add read-option: audio=True (default: False), #327
    • audio: remove extra attrs and add comments
  • DICT.org plugin:

    • installToDictd: skip if target directory does not exist
    • Make rendering dictd files a bit clear in pure txt
    • Fix indention issue and add bword prefix as url
  • Fixes and improvements in Dict.cc (SQLite3) plugin:

    • Fix typo, and avoid iterating over cur, use fetchall(), #296
    • Remove gender from headword, add it to definition, #296
    • Avoid running unescape_unicode
  • JMDict

    • Support reading compressed file directly
    • Show pos before gloss (translations)
    • Avoid running unescape_unicode
  • DigitalNK: work around Python's sqlite bug, #282

  • Changes in dict_org.py plugin, By Justin Yang

    • Use
      to replace newline
    • Replace words with {} around to true web link
  • CC-CEDICT Reader:

    • Fix import error in conv.py
    • Switch from jinja2 to lxml
      • Fix not escaping <, > and &
      • Note: lxml inserts &#160; instead of &nbsp;
    • Use <font> instead of <span style=...>
    • add option to use Traditional Chinese for entry name
    • Avoid colorizing if tones count does not match len(syllables), #328
    • Add <font color=""> for each syllable in case of mismatch tones, #328
  • Rename read/write options:

    • DSL: rename option onlyFixMarkUp to only_fix_markup
    • SQL: r...
Read more

PyGlossary 4.0.0

24 Oct 11:45
047b747
Compare
Choose a tag to compare

Changes since 3.3.0

  • Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6

  • Fix / rewrite setup.py

    • Fix python3 setup.py sdist bdist_wheel, and pypi paackage
      • Had to move ui/ directory into pyglossary/
    • Switch from distutils to setuptools
    • Remove py2exe
  • Add interactive command line user interface

    • Automatically selected if input & ouput file arguments are not passed and one of these:
      • On Linux and no $DISPLAY is not set
      • On Mac and no tkinter module is found
      • --ui=cmd flag is passed
  • New format support:

    • Add read support for FreeDict, #206
    • Add read support for Zim (Kiwix)
    • Add read and write support for Kobo E-Reader Dictfile (.df)
    • Add write support for DICT.org dictfmt source file
    • Add read support for dictunformat output file
    • Add write support for JSON
    • Add read support for Dict.cc (SQLite3)
    • Add read support for JMDict, #239
    • Add basic read support for Wiktionary Dump (.xml)
    • Add read support for cc-kedict
    • Add read support for DigitalNK (SQLite3)
    • Add read support for Wordset.org JSON directory
  • Remove Omnidic write support (Unmaintained J2ME dictionary)

  • Remove Octopus MDict Source plugin

  • Remove Babylon Source plugin

  • BGL Weader: improvements

  • DictionaryForMIDs Writer: fix non-working code

  • Gettext Source (po) Writer: fix info header

  • MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add kindlegen_path option, #112

  • EPUB-2 E-Book Writer: fix sort order

  • XDXF Reader: rewrite with etree.iterparse to avoid using too much RAM

  • Lingoes Source (LDF) Reader: fix ignoring info/metadata header

  • dict_org.py: rewrite broken plugin (Reader and Writer)

  • DSL Reader: fix loosing metadata/info

  • Aard 2 (slob) Reader:

    • Fix adding css/js files as normal entries
    • Add bword:// prefix to entry links
    • Fix duplicate entries issue by keeping a set of blob IDs, #224
    • Detect and pass defiFormat
  • Aard 2 (slob) Writer:

    • Fix content_type detection
    • Remove bword:// prefix from entry links
    • Add resource files / data entries, #243
    • Fix replacing image paths
    • Show log events from slob.py in debug mode
    • Change default compression to zlib
    • Allow passing empty compression
  • Octopus MDict Reader:

    • Read MDX file twice to load links
    • Count data entries as part of len(reader) for progressbar
  • StarDict Writer:

    • Copy "copyright" and "publisher" values to "description"
    • Add source and target language codes to the end of bookname
    • Add write-option stardict_client: bool
      Set True to make glossary more compatible with StarDict 3.x
    • Fix broken result when sametypesequence option is given and a definitions contains |
    • Allow sametypesequence=x for xdxf
    • Add merge_syns option
    • Allow sametypesequence=None option
  • XDXF Reader:

    • Fix/improve xdxf to html transformation
  • Kobo Writer:

    • Fix get_prefix algorithm and sorting order, with tests, #219
    • Replace <img src=... tags with [Image: name.bmp], #219
      • and show a warning about data entries
    • Additional keywords as alternatives, #232
    • Fix support for alternates: duplicate entries based on word prefix, #238
    • Show headword in title of alternate entries, #238, #245
    • Strip full html definition, #246
  • CSV:

    • Add delimiter option to Reader and Writer
    • Read and write info
    • Writer: accept bool option add_defi_format=True (default False)
  • AppleDict Writer:

    • AppleDict Writer: replace fix_sound_link() code with a single line
    • AppleDict Writer should not call glos.setDefaultDefiFormat
  • MDX Reader:

    • Replace entry:// with bword:// in MDX Reader instead of AppleDict Writer
    • Fix internal href="x:" and href="d:" links
    • Fix file:// in images path, fix #243
  • User Interface improvements and fixes:

    • ui_gtk: add About tab and more improvements
    • ui_tk: replace About dialog with About tab and more improvements
    • ui_cmd: improvements in progressbar
    • ui_cmd: allow "=" in value of read/write options
  • Add a list of 208 languages and ~40 writing systems

    • Detect sourceLang and targetLang from glossary name/title
    • Auto-select between <b> and <big> tags depending on writing system
      • Using glos.titleElement method, used in FreeDict, JMDict and Dict.cc writers
    • glos.sourceLang and glos.targetLang properties (with setters) as Lang objects
    • glos.sourceLangName and glos.targetLangName properties (with setters) as str
      • Used in several plugins
  • Break compatibilty of plugins

    • Drop support for read and write functions (outside a class)
    • Now we only support Reader class and Writer class
    • Reader class must have these methods
      • __init__(self, glos)
      • open(self, filename)
        • Here glossary info must be read from file and set with glos.setInfo
      • __len__(self) -> int
        • Should return the number or entries, or zero if it's too costly
      • __iter__(self) -> "Iterator[BaseEntry]"
        • Can be a generator
      • close(self)
    • Writer class must have these methods
      • __init__(self, glos)

      • open(self, filename)

        • Here glossary info must be read from glos.getInfo or glos.iterInfo and written to file
      • write(self) -> "Generator[None, BaseEntry, None]"

        • Entries must be fetched with entry = yield in a while True loop:

           while True:
           	entry = yield
           	if entry is None:
           		break
           	# process and write entry into file(s)
      • finish(self)

    • Read options and write options must be set to their default values as class attributes
      • See pyglossary/plugins/csv_pyg.py plugin for example
    • sortKey must be an intance method of Writer, instead of a function outside any class
      • Only for plugins that need sorting before write
  • Refactor and cleanup Glossary class

    • Removed or replaced most of class/static attributes of Glossary
      • To see the diff, run git diff 3.3.0..master -- pyglossary/glossary.py
    • Removed glos.addEntry method
      • If you use it in your program, replace with glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
    • Removed instance methods:
      • getMostUsedDefiFormats
      • iterEntryBuckets
      • zipOutDir and archiveOutDir
        • Moved to pyglossary/glossary_utils.py
        • archiveOutDir renamed to compressOutDir
      • writeDict
      • iterSqlLines -> moved to pyglossary/plugins/sql.py
      • reverse, takeOutputWords, searchWordInDef -> moved to pyglossary/reverse.py
    • Values of Glossary.plugins is changed to plugin_prop.PluginProp instances
    • Change glos.writeTxt arguments
      • Replace sep1 and sep2 with entryFmt
      • Replace rplList with defiEscapeFunc, wordEscapeFunc and tail
      • Remove iterEntries, entryFilterFunc
      • Method returns Generator[None, BaseEntry, None] instead of bool
      • See for usage example:
        • pyglossary/glossary.py -> def writeTabfile
        • pyglossary/plugins/dict_org_source.py
        • pyglossary/plugins/json_plugin.py
        • pyglossary/plugins/lingoes_ldf.py
        • pyglossary/plugins/sdict_source.py
  • Refactor, cleanup and fixes in Entry and DataEntry classes

    • Replace entry.getWord() with entry.word
    • Replace entry.getWords() with entry.l_word
    • Replace entry.getDefi() with entry.defi
    • Remove entry.getDefis()
      • Drop handling alternate definitions in Entry objects
    • Replace entry.getDefiFormat() with entry.defiFormat
    • Add entry.b_word and entry.b_defi shortcuts that give bytes (UTF-8)
    • Replace dataEntry.getData() with dataEntry.data
    • Add __slots__ to Entry and DataEntry classes
    • Fix DataEntry in indirect mode
      • Mistaken for Entry with defi=DATA, and file content discarded
      • Save resource files in user's cache directory when loading input glossary into memory
        • Move file to output glossary on dataEntry.save(...)
    • Fix Entry.getRawEntrySortKey not being alternates-aware, broke StarDict Writer
    • DataEntry: save: use shutil.copy if has _tmpPath, and set _tmpPath
  • New features of Entry

    • entry.stripFullHtml(), remove <html... <head>...</head>...<body>
      • Used in Kobo and Kobo Dictfile writers
      • Add tests
  • Fix glos.writeTabfile:

    • Remove \r from definitions and info values
    • Fix not escaping word
  • Fix/improve html detection in definitions

  • Switch to lazy imports of non-standard modules in plugins

  • Optimize RAM usage of indirect conversion

    • To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
  • Other new features of Glossary class

    • glos.getAuthor() to get "author", or "publisher" (as fallback)
    • glos.removeHtmlTagsAll() method, can be called by plugins' writer
    • glos.collectDefiFormat(maxCount) extract defiFormat counts
      • by reading first maxCount entries. (then iterator will be reset)
      • Used in StarDict Writer
    • Show memory usage in trace mode
  • Bug fixes and improvements in code base

    • Apply entry filter when iterating over reader, fix #251

      • Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
    • Fixes and improvements in TextGlossaryReader class

      • Fix ignoring glossary defaultDefiFormat
    • Fix evaluating None value in read/write options

  • Support reading multi-file Tabfile or other text formats

    • Example: file.txt, file.txt.1, file.txt.2
    • Need to add file_count info key, for example: ##file_count 3
  • Fixes in Tabfile Writer

    • Fix not escaping ""
  • Add/update docume...

Read more