10 Mar 12:48

ilius

5f6ebd6

PyGlossary 4.6.1 Latest

Latest

Changes since `4.6.0`

Bug fixes

Fix a bug causing broken installation if ~/.local/lib is a symbolic link
- or site-packages or any of its parents are a symbolic link
Fix incompatibilty with Python 3.9 (despite documentation)
Fix scripts/entry-filters-doc.py, scripts/plugin-doc.py and doc/entry-filters.md
AppleDict: Fix typos in Chinese language module

Features:

Use environment variable VERBOSITY as default (a number from 0 to 5)

Improvements

AppleDict Binary: set html_full=True by default
Update wcwidth to 0.2.6

Refactoring

Add glos.stripFullHtml(errorHandler) and use it in 3 plugins
- Add entry filter StripFullHtml and change entry.stripFullHtml() to return error
Refactor entryFiltersRules
Remove empty plugin gettext_mo.py
Remove glos.titleElement from glossary_v2.Glossary
- Add to glossary.Glossary for compatibility
- glossary.Glossary is a wrapper (child class) on top on glossary_v2.Glossary

Documentation

Update doc/entry-filters.md to list some entry filters that were enabled conditionally (besides config)
Remove sdict.md and sdict_source.md (removed plugins)

Type checking

Add missing method in GlossaryType class
Fix mypy errors on most of code base and some of plugins
Use builtin types list, dict, tuple, set for type annotations
Replace Optional[X] with X or None
- will not effect runtime, but type checking now only works with Python 3.10+

Assets 3

07 Mar 11:27

ilius

4.6.0

4b7ae78

PyGlossary 4.6.0

Changes since `4.5.0`

Dependency change

We now require Python 3.9 or a later version.

Bug fixes

Fix exception in scripts/plugin-index.py: 8a94b8c
StarDict: Fix writing to .zip file produced empty zip, and fix bad test
dictunformat: fix #367: add option headword_separator, default to ;
Fixes in ui_gtk, #380 #382 #403
AppleDict source: fix #407 missing quotes for title, and refactor duplicate codes
DictionaryForMIDs: remove | from word when normalizing, fix punctuation regex, use Unix newlines
StarDict: use Unix newline when reading and writing .ifo file on Windows
Fix bug of glos.addEntryObj(dataEntry) adding empty file because tmpDataDir is not set until glos.read()
- Set and create tmpDataDir on glos.tmpDataDir access, and add test, #424
Fix scripts/wiki-formats.py, #428
Dictd / Dict.org: fix exception on Windows

Features

Support sorting by an ICU locale, see Sorting section of README
Add Gtk4 interface --ui=gtk4 / --gtk4
- still buggy and not as functional as Gtk3 or Tkinter interfaces
Add flag --optimize-memory, config key optimize_memory
- To enable entry compression on --indirect
- Not enabled by default (it was previously always compressed)
Allow plugin's reader.open() to return an Iterator for progress bar
- Implement for Tabfile (reading info/metedata)
- Implement for AppleDict Binary (reading KeyText.data)
Add read and write support for StarDict Textual File (.xml), #348
Add support for writing Yomichan dictionary files, #395 by @tomtung
StarDict reader: support .syn.dz file, #410
StarDict writer: add write option large_file, #392 #422
StarDict reader: support dxoffsetbits=64 on read, #392 #422
JMDict: support examples, #383
Add read support for JMnedict, #386
Add flag --skip-duplicate-headword, config skip_duplicate_headword, #365
- Zim reader: remove option skip_duplicate_words, #365
Add flag --trim-arabic-diacritics, config trim_arabic_diacritics, #366
Add read support for IUPAC goldbook (.xml), #355
Add write support for DIKT JSON
StarDict writer: limit memory usage by using SQLite for idx and syn data, #409
CSV: add newline option, defaulting to Unix-style
Aard2 Slob writer: add option file_size_approx_check_num_entries
Add scripts/diff-glossary and scripts/view-glossary

Improvements

When remove HTML tags, also replace <div> with \n, #394 by @tomtung
- Treat <div> the same way <p> is treated.
Mobi: add mobi7-forcing switch to kindlegen command, #374 by @holyspiritomb
Octopus MDict: ignore directories with same_dir_data_files, #362
StarDict reader: handle definitions with mixed types/formats
Dictfile: strip whitespaces from word and defi before going through entry filters
BGL: strip whitespaces from word and defi before going through entry filters
Improvement in glos.write: avoid printing exception for invalid encoding
Remove empty logs in glos.convert
StarDict reader: fix validating sametypesequence, and add test
glos.convert: Allow an existing empty directory as output path
TextGlossaryReader: replace nextPair method with nextBlock which returns resource files as third item
ui_cmd_interactive: allow converting several times before exiting
Change title tag for Greek from <big> to <b>
Update language data set (langs.json)
ui/main.py: print 1-line error instead of full exception on ImportError
ui/main.py: Windows: try Tkinter before Gtk
ebook_base.py: avoid shutil.move on Windows, #368
TextGlossaryReader: fix loading info and some refactoring, #370 36b9cd8
Entry: Allow word to be tuple in Entry(word=...)
glos.iterInfo() return Iterator rather than Iterable
Zim: change dependency to libzim>=1.0, and some comments
Mobi: work with kindlegen executable in PATH directories, #401
ui: limit the length of option comments in Format Options dialog
ui_gtk: improvement: show (last) critical error on status bar
ui_gtk: set intial focus
ui_gtk: improvements in About tab
ui_tk: revert most ttk widgets to tk because the theme doesn't match
Add SVG icon, #414 by @proletarius101
Prevent exception/traceback on Ctrl+C
Optimize progress bar
Aard2 slob: show info log before and after slobWriter.finalize(), #437

Removed features

Remove read support for Wiktiomary Dump, #48
Remove support for Sdictionary Binary and Source

Octopus MDict MDX: features and improvements

Support MDict V3 fomrat by updating readmdict, #385 by @xiaoqiangwang
Fix files created without UUID in header, #387 by @xiaoqiangwang
- MdxBuilder 4.0 RC2 and before creates files without UUID header
Decode mdict title & description if they're bytes, #393 by @tomtung
readmdict: Skip zlib decompress exceptions, #384
readmdict: Use __name__ as logger name, and add 2 debug logs, #384
readmdict: improve exception msg for xxhash, #385

XDXF: fixes / imrovements, issue #376

Support <categ>
Support embedded tags in <iref>
Fix ignoring <mrkd>
Fix extra newlines
Get rid of warning for <etm>
Fix/improve newline and space issues
Fix and improve tests
Update url for format description
Support any tag/string in <ex>, #396
Support reading compressed files directly (.xdxf.gz, .xdxf.bz2, .xdxf.lzma)
Allow using XSL using --write-options=xsl=True
Update XSL
Other improvements in XDXF to HTML transformation

AppleDict Binary: features, bug fixes, improvements, refactoring

Fix css name on html_full=True
Fix using self._encoding when should use utf-8
Fix internal links, #343
- Remove x-dictionary:d: prefix from href
- First fix for x-dictionary:r:: use title if present
- Add bword:// prefix to href (unless it points to http/https)
- Read entry IDs on open and fix links with x-dictionary:r:
Add plistlib to dependencies
Add tests
Replace <entry ...> with <div>
Fix bad exception formatting
Fixes from PR #436
Support morphology (alternates): #434 by @soshial
Support different AppleDict offsets, #417 by @soshial
Extract AppleDict meta-info (langs, title, author), #418 by @soshial
Progress Bar on open() / loading KeyText.data
Improve memory usage of loading KeyText.data
Replace appledict_bin.py with appledict_bin directory and more refactoring

Glossary class (`glossary.py`)

Lots of refactoring in glossary.py
- Improve the design and readability
- Reduce complexity of methods
- Move some code into new classes that Glossary inherits from
- Improve error messages
Introduce glossary_v2.py, and maintain API backward-compatibility for glossary.py (as far as documented)
- See README.md for sample code.

Refactoring

Fix style errors using ruff based on pyproject.toml configuration
Remove all usages of pyglossary.plugins.formats_common
Use str.startswith(tuple) and str.endswith(tuple)
Reduce complexity of Glossary methods
Rename entry filter strip to trim_whitespaces
Some refactoring in StarDict reader
Use f-string equal syntax added in Python 3.8
Use str.removeprefix and str.removesuffix added in Python 3.9
langs/writing_system.py:
- Change iso field to list
- Add new scripts
- Add getAllWritingSystemsFromText
- More refactoring
Split up TextGlossaryReader.loadInfo method
plugin_manager.py: make some methods private

Documentation

Update plugins' documentation
Glossary: add comments about entryFilters
Update config.rst
Update doc/entry-filters.md
Update README.md
Update doc/sort-key.md
Update doc/pyicu.md
Update plugins/testformat.py
Add types for arguments and result of all functions/methods
Add types for r/w options in reader/writer classes
Fix a few incorrect type annotations
README.md: Add document for adding data entries, #412
README.md: Fix -> nixos command, #400 by @srghma
Update bgl_info.md and move it from pyglossary/plugins/babylon_bgl/ to doc/babylon/

Testing

Add test for DSL -> Tabfile conversion
dsl_test.py: fix method names not starting with test_
StarDict reader: better testing for handling definitions with mixed types
StarDict writer: much better testing, coverage of stardict.py: from %62 to %83
Refactoring and improvements in tests of Glossary, along with new tests
Add test for dictunformat -> Tabfile
AppleDict (source) tests: validate plist file contents
Allow forking and branching pyglossary-test repo
- See tests/glossary_v2_test.py
Fix some failing tests on Windows
Slob: test file_size_approx
Test Tabfile -> SQL conversion
Test StarDict error/warning for sortKeyName with and without locale
Print useful messages for unhandled warnings
Improve logs
Add showDiff=False arg to compareTextFiles and convert

Packaging

Update and refactor Dockerfile and run-with-docker.sh
- Dockerfile: chan...

Contributors

soshial, tomtung, and 4 other contributors

Assets 3

04 Feb 23:19

ilius

4.5.0

2433ff5

PyGlossary 4.5.0

Changes since 4.4.1

Bug fixes

Fix 2 log messages in glos._resolveConvertSortParams
Fixes and improvements in Dictfile (.df) reader
- Fix exception: disable loading info (Dicfile does not support info)
- TextGlossaryReader: prevent producing duplicate data entries
  - This fixes: error in DataEntry.save: [Errno 2] No such file or directory: ... because entry.save() moves the temp file to output path
  - This bug only existed for Dictfile (.df) format.
- Remove extra colon, #358
- Remove some extra newline
- And add test for Dictfile to/from Tabfile
Fix not cleaning up temp directory on return with error from glos.convert

Features

ui_gtk: add a "General Options" button that opens a dialog for:
- Settings for sort and sortKey
- Checkbox for SQLite mode
- Check boxes for config params: save_info_json, lower, skip_resources, rtl, enable_alts, cleanup, remove_html_all
Add support for --sort-key random to shuffle entries

Performance improvements

Performance improvement: remove gc.collect() calls in Glossary and *EntryList
- Not needed since Python 3.8
- Change minimum python requirement to 3.8 in README.md
Do not import all plugin modules (only import two plugins that are used)
- Load json file plugins-meta/index.json instead
- In debug mode, all plugin modules are still imported and validated
- User plugins are still imported

Other improvements

Improve detection of languages from glossary name, and add tests
Update langs.json: add new 3-letter codes for 25 languages
glos.preventDuplicateWords and glos.removeHtmlTagsAll: prevent adding filter twice
glos.cleanup: reset path list to avoid (non-critical) error if called again
Minor improvements in Glossary.init()
DataEntry.save: on FileNotFoundError show a 1-line error instead of log.exception
ui_gtk: create a new Glossary object every time Convert button is clicked
Add docstring for Glossary.init

Unit testing

Update tests/glossary_errors_test.py
Add missing cleanup for some temp file
add test for LDF to/from Tabfile

Refactoring

Plugins: replace import of formats_common from currect directory with pyglossary.plugins.formats_common
Fix logging.warn method is deprecated, use warning instead, PR #360 by @BoboTiG
Fix DeprecationWarning: invalid escape sequence, PR #361 by @BoboTiG
Move some functions from glossary_utils.py to compression.py
Move some methods from Glossary to new parent classes PluginManager and GlossaryInfo
Some refactoring in plugin_prop.py and plugin_manager.py
- Rename plugin.pluginModule to plugin.module
- Minimize direct access to plugin.module, plugin.readerClass or plugin.writerClass
- Add some new properties to PluginProp
- Remove a log from glossary.py
- Disable validation of plugins unless in debug mode
- plugin_prop.py: fix checking debug level
sq_entry_list.py: rename sortColumns to sqliteSortKey
Some refactoring around setSortKey between Glossary, EntryList and SqEntryList
Remove Entry.sqliteSortKeyFrom and related classmethods
Some more simplification in glossary.py
Remove Entry.defaultSortKey
Some style fixes
iter_utils.py: remove unused key= argument from unique_everseen
Refactor ui_gtk and update config comments
extractInlineHtmlImages: avoid writing file within sub func

Contributors

BoboTiG

Assets 3

25 Jan 10:22

ilius

4.4.1

663748c

PyGlossary 4.4.1

Changes since 4.4.0

Bug fixes

Automatically create cacheDir on Glossary.init()
- Fixes exception in SQLite mode

Features

ui_cmd_interactive: support setting sortKey

Improvements and documentation

Wiktionary Dump: remove detect-by-extension
glossary.py: update docstrings for sortKeyName
sort_keys.py: add desc to NamedSortKey
Update doc/sort-key.md

Assets 3

24 Jan 17:39

ilius

4.4.0

cfd61e8

PyGlossary 4.4.0

Changes since 4.3.0

Breaking changes

Remove partial sorting support (obsolete feature)
- Remove --sort-cache-size flag in command line
- (For library users) Remove sortCacheSize argument to glos.write and glos.convert
Re-design sorting and sortKey parameters
- Breaking change for library users, and user plugins that need sorting (sortOnWrite = ALWAYS)
- Change glos.convert
  - Replace argument sortKey (Callable) with sortKeyName (str)
  - Add argument sortEncoding (str) defaulting to utf-8
- Change glos.write
  - Replace argument sortKey (Callable) with namedSortKey (sort_keys.NamedSortKey)
  - Add argument sortEncoding (str) defaulting to utf-8
- Change glos.sortWords
  - Replace argument key (Callable) with sortKeyName (str)
  - Add argument sortEncoding (str) defaulting to utf-8
- Change API of plugins that use sortOnWrite = ALWAYS
  - Replace writer.sortKey and Writer.sqliteSortKey with sortKeyName in plugin module.
  - See the stardict.py for example.
Note 1: All sortKey and sortEncoding arguments are optional.

Note 2: Values of sortKeyName are documented in doc/sort-key.md
Rename 2 files in doc/:
- Rename doc/entry_filters.md to doc/entry-filters.md
- Rename doc/term_colors.md to doc/term-colors.md

Features

--sort-key and --sort-encoding command line flags (as part of above re-design)
- See README.md and doc/sort-key.md.
Now SQLite mode works for all output formats.

Bug fixes

Fix lack of Progress Bar while writing in indirect or SQLite mode
Fix misleading message log about SQLite mode
Fix unclosed files in XDXF and FreeDict plugins

Improvements

Show a 1-line log instead of FileNotFoundError traceback in glos.read and glos.write
Close readers in glos.convert if write failed
Fix some type annotations and comments
(For library users) Change Glossary.__str__
(For library users) glos.setInfo: convert non-str value to str, and add tests

Unit testing

Add new tests and improve existing tests.

Coverage of glossary.py: %89
Overall coverage of codebase + plugins: %58

Refactoring and design improvements

Simplify by passing glos object to EntryList()
Replace SqList with SqEntryList
Change __iter__ of SqEntryList and EntryList to give entry objects
Simplify Glossary by moving gc.collect to EntryList and SqEntryList
Remove unused function xml_unescape
Remove unused import from FreeDict and JMDict plugins
Use operator.itemgetter in stardict.py, dict_cc.py, ebook_kobo.py, reverse.py
glossary.py: cleanup, simplify and optimize generators logic
- Also remove index argument from entryFilter.run method and add some comments
Remove redundant check in glos.progress
Remove redundant check in _getLangByStr
Remove redundant check in Glossary.detectOutputFormat

Assets 3

15 Jan 12:18

ilius

4.3.0

cf4db2b

PyGlossary 4.3.0

Changes since 4.2.1

Bug fixes

Tabfile writer: fix replacing \ with \\
--remove-html flag: fix bad regex
ui_cmd_interactive: fix a few bugs
Lowercase word/entry links (<a href="bword://...) when --lower flag is passed
TextGlossaryWriter: do not skip words that start with #
Fix StdLogHandler: was not applying --no-color
Fix checking for sys.frozen

New features

Add auto_sqlite config parameter
- to use SQLite mode for StarDict and EPUB-2 (which require sorting) by default
- also allow overriding it with --no-sqlite flag
Add 3 config parameters allow changing log colors in terminal:
- color.cmd.critical
- color.cmd.error
- color.cmd.warning
Add 2 keys to config to enable/disable colors in Unix and Windows separately
- color.enable.cmd.unix: default true
- color.enable.cmd.windows: default false

New features for library users

Allow glos.setInfo(key, None) to delete the info / metadata key
Add glos.alts property as shortcut, and use it internally

Design improvements

Change rawEntry[0] from bytes to List[str] and avoid split/join when converting rawEntry <-> entry.
This fixes some very edge cases involving | in words, but uses more RAM in indirect mode (converting to StarDict), which can be solved with --sqlite.

Documentation

Replace doc/config.md with doc/config.rst, update comments and other improvements
Generate doc/entry_filters.md
Update plugins doc
Update README.md

Unit testing

Coverage of glossary.py: %75

There are 2501 lines of test code in tests directory.

Tests for Glossary class include:

Basic functionality
Error handling
Sorting and direct / indirect / SQLite modes
Entry filter config/flags (lower, rtl, remove_html, remove_html_all)
Resources / data entries
Convert: Tabfile <-> Aard2 slob
Convert: Tabfile <-> CSV
Convert: Tabfile -> EPUB-2
Convert: Tabfile -> JSON
Convert: Tabfile <-> StarDict

Other improvements:

glossary_test.py: check CRC32 of downloaded test files
glossary_test.py: use a new temp dir for each test method for isolation.
ebook_kobo_test.py: split into several test methods

Improvements

Zim: make improvements, #352
Aard2 slob: add 2 mime types, #352
ui/main.py: do not allow --remove-html and --remove-html-all together
Glossary: do not allow glos.config to be set twice
Glossary: change some error logs to critical, and more improvements
Prevent conflicting config flags together, like --lower --no-lower
Disable utf8_check config parameter by default (not needed since 3.0.0)

Refactoring and cleanup

Glossary: some refactoring in convert method
Rename 3 scripts in scripts/ directory
Remove DataEntry.fromFile and improve behavior of DataEntry.__init__
Refactoring in ui/
rename option.cmdFlag to option.customFlag
Glossary: add glos.rawEntryCompress property, and use in entry.py
Glossary: minor improvement in loadPlugins
XDXF: remove useless argument in Reader.open
remove unused some functions from text_utils.py
plugin_prop.py: refactor getExtraOptions
Avoid assigning protected attrs in text_writer.py and plugins/tabfile.py
Fewer protected attr access in entry_filters.py
Move sortKey and get_prefix implementations from ebook_base.py to epub and mobi plugins
Change name of 2 entry filters to match the config param

Assets 3

26 Dec 20:01

ilius

4.2.1

c0d0eef

PyGlossary 4.2.1

Changes since version 4.2.0

Minor bug fixes and improvements:

text_utils.py
- Minor bug: fix legacy function urlToPath using urllib.parse.unquote
- Minor bug: replacePostSpaceChar: remove trailing space from the output str
- Cleanup:
  - Remove unused function isControlChar
  - Remove unused function formatByteStr
  - Remove argument exclude from function isASCII
- Add unit tests
ui_cmd_interactive.py: fix a minor bug and some small refactoring
Command line: Override input glossary info with --source-lang and --target-lang flags
Add unit tests for CSV -> Tabfile conversion
CSV plugin: some refactoring, and rename the module to csv_plugin.py
Update setup.py: add python_requires=">=3.7.0", update extras_require
Update README.md

Fearures:

Command line: Add --name flag for changing glossary name
Glossary: convert: add infoOverride optional argument

Assets 3

20 Dec 08:30

ilius

4.2.0

1b1450c

PyGlossary 4.2.0

Changes since 4.1.0

Breaking changes:
- Replace glos.getAuthor() with glos.author
  - This looks for "author" and then "publisher" keys in info/metadata
- Rename option apply_css to css for mobi and epub2
- glos.getInfo and glos.setInfo only accept str as key (or a subclass of str)
Bug fixes:
- Indirect mode: Fix handling '|' character in words.
  - Escape/unescape | in words when converting entry <-> rawEntry
- Escape/unescape | in words when writing/reading text-based file formats
- JSON: Prevent duplicate keys in json output, #344
  - Add new method glos.preventDuplicateWords()
Features and improvements
- Add SQLite mode with --sqlite flag for converting to StarDict.
  - Eliminates the need to load all entries into RAM, limiting RAM usage.
  - You can add --sqlite to you command, even for running GUI.
    - For example: python3 main.py --tk --sqlite
  - See README.md for more details.
- Add --source-lang and --target-lang flags
- XDXF: support more tags and improvements
- Add unit tests for Glossary class, and some functions in text_utils.py
- Windows: change cache directory to %LOCALAPPDATA%
- Some refactoring and optimization
- Update, improve and re-format documentations

Assets 3

01 Dec 15:44

ilius

4.1.0

11b1710

PyGlossary 4.1.0

There are a lot of changes since last release, but here is what I could gather and organize!
Please see the commit list for more!

Improvements in ui_gtk
Improvements in ui_tk
Improvements in ui_cmd_interactive
Refactoring and improvements in ui-related codebase
Fix not loading config with --ui=none
Code style fixes and cleanup
Documentation
- Update most documentations.
- Add comments for read/write options.
- Generate documentation for all formats
  - Placed in doc/p, linked to in README.md
  - Generating with scripts/plugin-doc-gen.py script
  - Read list of dictionary tools/applicatios from TOML files in plugins-meta/tools
Add Dockerfile and run-with-docker.sh script
New command-line flags:
- --json-read-options and --json-write-options
  - To allow using ; in option values
  - Example: '--json-write-options={"delimiter": ";"}'
- --gtk, --tk and --cmd as shortcut for --ui=gtk etc
- --rtl to change direction of definitions, #268, also added to config.json
Fix non-working --remove-html flag
Changes in Glossary class
- Rename glos.getPref to glos.getConfig
- Change formatsReadOptions and formatsWriteOptions to Dict[str, OrderedDict[str, Any]]
  - to include default values
- remove glos.writeTabfile, replace with a func in pyglossary/text_writer.py
- Glossary.init: avoid showing error if user plugin directory does not exist
Fixes and improvements code base
- Prevent dataEntry.save() from raising exception because of invalid filename or permission
- Avoid exception if removing temp file/folder failed
- Avoid mktemp and more improvements
  - use ~/.cache/pyglossary/ directory instead of /tmp/
- Fixes and improvements in runDictzip
- Raise RuntimeError instead of StopIteration when iterating over a non-open reader
- Avoid exception if no zip command was found, fix #294
- Remove directory after creating .zip, and some refactoring, #294
- DataEntry: replace inTmp argument with tmpPath argument
- Entry: fix html pattern for hyperlinks, #330
- Fix incorrect virutal env directory detection
- Refactor dataDir detection, #307 #316
- Show warning if failed to create user plugins directory
- fix possible exception in log.emit
- Add support for Conda in dataDir detection, #321
- Fix f-string in StdLogHandler.emit
Fixes and improvements in Windows
- Fix bad dataDir on Windows, #307
- Fix shutil.rmtree exception on Windows
- Support creating .zip on Windows 10, #294
- Check zip command before tar on Windows, #294
- Show graphical error on exceptions on Windows
- Fix dataDir detection on Windows, #323 $324
Changes in Config:
- Rename config key skipResources to skip_resources
  - Add it to config.json and configDefDict
- Rename config key utf8Check to utf8_check
  - User should edit ~/.pyglossary/config.json manually
Implement direct compression and uncompression, and some refactoring
- change glos.detectInputFormat to return (filename, format, compression) or None
- remove Glossary.formatsReadFileObj and Glossary.formatsWriteFileObj
- remove fileObj= argument from glos.writeTxt
- use optional 'compressions' list/tuple from Writer or Reader classes for direct compression/uncompression
- refactoring in glossary_utils.py
Update setup.py
Show version from 'git describe --always' on --version
FileSize option (used in many formats):
- Switch to metric (powers of 1000) for K, M, G units
- Add KiB, MiB, GiB for powers of 1024
Add extensionCreate variable (str) to plugins and plugin API
- Use it to improve ui_tk
Text-based glossary code-base (effecting Tabfile, Kobo Dictfile, LDF)
- Optimize TextGlossaryReader
- Change multi-file text glossary file names from .N.txt to .txt.N (where N>=1)
- Enable reading pyglossary-writen multi-file text glossary by adding file_count=-1 to metadata
  - because the number of files is not known when creating the first txt file
Tabfile
- Rename option writeInfo to enable_info
- Reader: read resource files from *.txt_res directory if exists
- Add *.txt_res directory to *.zip file
Zim Reader:
- Migrate to libzim 1.0
- Add mimetype image/webp, fix #329
Slob and Tabfile Writer: add file_size_approx option to allow writing multi-part output
- support values like: 5500k, 100m, 1.2g
Add word_title=False option to some writers
- Slob Writer: add word_title=False option
- Tabfile Writer: add word_title=False option
- CSV Writer: add word_title=False option
- JSON Writer: add word_title=False option
- Dict.cc Reader: do not add word title
- FreeDict Reader: rename keywords_header option to word_title
- Add glos.wordTitleStr, used in plugins with word_title option
- Add definition_has_headwords=True info key to avoid adding the title next time we read the glossary
Aard2 (slob)
- Writer: add option separate_alternates=False, #270
- Writer: fix handling content_type option
- Writer: use ~/.cache/pyglossary/ instead of /tmp
- Writer: add mp3 to mime types, #289
- Writer: add support for .ini data file, #289
- Writer: support .webp files, #329
- Writer: supoort .tiff and .tif files
- Reader: read glossary name/title and creation time from tags
- Reader: extract all metedata / tags
- slob.py library: Refactoring and cleanup
StarDict:
- Reader: add option unicode_errors for invalid UTF-8 data, #309
- Writer: add bool write-option audio_goldendict, #327
- Writer: add option audio_icon=True, and add option comment, #327
FreeDict Reader
- Fix two slashes before and after pron
- Avoid running unescape_unicode by encoding="utf-8" arg to ET.htmlfile
- Fix exception if edition is missing in header, and few other fixes
- Support <cit type="example"> with <cit type="trans"> inside it
- Support <cit type="trans"> inside nested second-level(nested) <sense>
- Add "lang" attribute to html elements
- Add option "example_padding"
- Fix rendering <def>, refactoring and improvement
- Handle <note> inside <sense>
- Support <note> in <gramGrp>
- Mark external refs with <a ... class="external">
- Support comment in <cit>
- Support <xr> inside <sense>
- Implement many tags under <sense>
- Improvements and refactoring
XDXF
- Fix not finding xdxf.xsl in installed mode
  - Effecting XDXF and StarDict formats
- xdxf.xsl: generate <font color=...> instead of <span style=...>
- StarDict Reader: Add xdxf_to_html=True option, #258
- StarDict Reader: Import xdxf_transform lazily
  - Remove forced dependency to lxml, #261
- XDXF plugin: fix glos.setDefaultDefiFormat call
- xdxf_transform.py: remove warnings for , #322
- Merge PR #317
  - Parse sr, gr, ex_orig, ex_transl tags and audio
  - Remove None attribute from audio tag
  - Use unicode symbols for audio and external link
  - Use another speaker symbol for audio
  - Add audio controls
  - Use plain link without an audio tag
Mobi
- Update ebook_mobi.py and README.md, #299
- Add PR #335 with some modifications
Changes in ebook_base.py (Mobi and EPUB)
- Avoid exception if removing tmpDir failed
- Use style.css dataEntry, #299
DSL Reader:
- Strip whitespaces around language names, #264
- Add progressbar support, #264
- Run html.escape on text before adding html tags, #265
- Strip and unquote glossary name
- Generate <i> and <font color=...> instead of <span style=...>
- Avoid adding html comment
- Remove \ufeff from header lines, #306
AppleDict Source
- Change path of Dictionary Development Kit, #300
- Open all text files with encoding="utf-8"
- Some refactporing
- Rename 4 options:
  - cleanHTML -> clean_html
  - defaultPrefs -> default_prefs
  - prefsHTML -> prefs_html
  - frontBackMatter -> front_back_matter
AppleDict Binary
- Improvements, #299
- Read DefaultStyle.css file, add as style.css, #299
- Change default value of option: html=True
Octopus MDict (MDX)
- Fix image links
- Do not set empty title
- Minor improvement in readmdict.py
- Handle exception when reading from a corrupt MDD file
- Add bool flag same_dir_data_files, #289
- Add read-option: audio=True (default: False), #327
- audio: remove extra attrs and add comments
DICT.org plugin:
- installToDictd: skip if target directory does not exist
- Make rendering dictd files a bit clear in pure txt
- Fix indention issue and add bword prefix as url
Fixes and improvements in Dict.cc (SQLite3) plugin:
- Fix typo, and avoid iterating over cur, use fetchall(), #296
- Remove gender from headword, add it to definition, #296
- Avoid running unescape_unicode
JMDict
- Support reading compressed file directly
- Show pos before gloss (translations)
- Avoid running unescape_unicode
DigitalNK: work around Python's sqlite bug, #282
Changes in dict_org.py plugin, By Justin Yang
- Use
  to replace newline
- Replace words with {} around to true web link
CC-CEDICT Reader:
- Fix import error in conv.py
- Switch from jinja2 to lxml
  - Fix not escaping <, > and &
  - Note: lxml inserts   instead of  
- Use <font> instead of <span style=...>
- add option to use Traditional Chinese for entry name
- Avoid colorizing if tones count does not match len(syllables), #328
- Add <font color=""> for each syllable in case of mismatch tones, #328
Rename read/write options:
- DSL: rename option onlyFixMarkUp to only_fix_markup
- SQL: r...

Assets 3

24 Oct 11:45

ilius

4.0.0

047b747

PyGlossary 4.0.0

Changes since 3.3.0

Require Python 3.7 or 3.8, drop support for Python 3.4, 3.5 and 3.6
Fix / rewrite setup.py
- Fix python3 setup.py sdist bdist_wheel, and pypi paackage
  - Had to move ui/ directory into pyglossary/
- Switch from distutils to setuptools
- Remove py2exe
Add interactive command line user interface
- Automatically selected if input & ouput file arguments are not passed and one of these:
  - On Linux and no $DISPLAY is not set
  - On Mac and no tkinter module is found
  - --ui=cmd flag is passed
New format support:
- Add read support for FreeDict, #206
- Add read support for Zim (Kiwix)
- Add read and write support for Kobo E-Reader Dictfile (.df)
- Add write support for DICT.org dictfmt source file
- Add read support for dictunformat output file
- Add write support for JSON
- Add read support for Dict.cc (SQLite3)
- Add read support for JMDict, #239
- Add basic read support for Wiktionary Dump (.xml)
- Add read support for cc-kedict
- Add read support for DigitalNK (SQLite3)
- Add read support for Wordset.org JSON directory
Remove Omnidic write support (Unmaintained J2ME dictionary)
Remove Octopus MDict Source plugin
Remove Babylon Source plugin
BGL Weader: improvements
DictionaryForMIDs Writer: fix non-working code
Gettext Source (po) Writer: fix info header
MOBI E-Book Writer: fix sort order, fix and test kindlegen codes, add kindlegen_path option, #112
EPUB-2 E-Book Writer: fix sort order
XDXF Reader: rewrite with etree.iterparse to avoid using too much RAM
Lingoes Source (LDF) Reader: fix ignoring info/metadata header
dict_org.py: rewrite broken plugin (Reader and Writer)
DSL Reader: fix loosing metadata/info
Aard 2 (slob) Reader:
- Fix adding css/js files as normal entries
- Add bword:// prefix to entry links
- Fix duplicate entries issue by keeping a set of blob IDs, #224
- Detect and pass defiFormat
Aard 2 (slob) Writer:
- Fix content_type detection
- Remove bword:// prefix from entry links
- Add resource files / data entries, #243
- Fix replacing image paths
- Show log events from slob.py in debug mode
- Change default compression to zlib
- Allow passing empty compression
Octopus MDict Reader:
- Read MDX file twice to load links
- Count data entries as part of len(reader) for progressbar
StarDict Writer:
- Copy "copyright" and "publisher" values to "description"
- Add source and target language codes to the end of bookname
- Add write-option stardict_client: bool
  Set True to make glossary more compatible with StarDict 3.x
- Fix broken result when sametypesequence option is given and a definitions contains |
- Allow sametypesequence=x for xdxf
- Add merge_syns option
- Allow sametypesequence=None option
XDXF Reader:
- Fix/improve xdxf to html transformation
Kobo Writer:
- Fix get_prefix algorithm and sorting order, with tests, #219
- Replace <img src=... tags with [Image: name.bmp], #219
  - and show a warning about data entries
- Additional keywords as alternatives, #232
- Fix support for alternates: duplicate entries based on word prefix, #238
- Show headword in title of alternate entries, #238, #245
- Strip full html definition, #246
CSV:
- Add delimiter option to Reader and Writer
- Read and write info
- Writer: accept bool option add_defi_format=True (default False)
AppleDict Writer:
- AppleDict Writer: replace fix_sound_link() code with a single line
- AppleDict Writer should not call glos.setDefaultDefiFormat
MDX Reader:
- Replace entry:// with bword:// in MDX Reader instead of AppleDict Writer
- Fix internal href="x:" and href="d:" links
- Fix file:// in images path, fix #243
User Interface improvements and fixes:
- ui_gtk: add About tab and more improvements
- ui_tk: replace About dialog with About tab and more improvements
- ui_cmd: improvements in progressbar
- ui_cmd: allow "=" in value of read/write options
Add a list of 208 languages and ~40 writing systems
- Detect sourceLang and targetLang from glossary name/title
- Auto-select between <b> and <big> tags depending on writing system
  - Using glos.titleElement method, used in FreeDict, JMDict and Dict.cc writers
- glos.sourceLang and glos.targetLang properties (with setters) as Lang objects
- glos.sourceLangName and glos.targetLangName properties (with setters) as str
  - Used in several plugins
Break compatibilty of plugins
- Drop support for read and write functions (outside a class)
- Now we only support Reader class and Writer class
- Reader class must have these methods
  - __init__(self, glos)
  - open(self, filename)
    - Here glossary info must be read from file and set with glos.setInfo
  - __len__(self) -> int
    - Should return the number or entries, or zero if it's too costly
  - __iter__(self) -> "Iterator[BaseEntry]"
    - Can be a generator
  - close(self)
- Writer class must have these methods
  - __init__(self, glos)
  - open(self, filename)
    - Here glossary info must be read from glos.getInfo or glos.iterInfo and written to file
  - write(self) -> "Generator[None, BaseEntry, None]"
    - Entries must be fetched with entry = yield in a while True loop:
      while True: entry = yield if entry is None: break # process and write entry into file(s)
  - finish(self)
- Read options and write options must be set to their default values as class attributes
  - See pyglossary/plugins/csv_pyg.py plugin for example
- sortKey must be an intance method of Writer, instead of a function outside any class
  - Only for plugins that need sorting before write
Refactor and cleanup Glossary class
- Removed or replaced most of class/static attributes of Glossary
  - To see the diff, run git diff 3.3.0..master -- pyglossary/glossary.py
- Removed glos.addEntry method
  - If you use it in your program, replace with glos.addEntryObj(glos.newEntry(word, defi, defiFormat))
- Removed instance methods:
  - getMostUsedDefiFormats
  - iterEntryBuckets
  - zipOutDir and archiveOutDir
    - Moved to pyglossary/glossary_utils.py
    - archiveOutDir renamed to compressOutDir
  - writeDict
  - iterSqlLines -> moved to pyglossary/plugins/sql.py
  - reverse, takeOutputWords, searchWordInDef -> moved to pyglossary/reverse.py
- Values of Glossary.plugins is changed to plugin_prop.PluginProp instances
- Change glos.writeTxt arguments
  - Replace sep1 and sep2 with entryFmt
  - Replace rplList with defiEscapeFunc, wordEscapeFunc and tail
  - Remove iterEntries, entryFilterFunc
  - Method returns Generator[None, BaseEntry, None] instead of bool
  - See for usage example:
    - pyglossary/glossary.py -> def writeTabfile
    - pyglossary/plugins/dict_org_source.py
    - pyglossary/plugins/json_plugin.py
    - pyglossary/plugins/lingoes_ldf.py
    - pyglossary/plugins/sdict_source.py
Refactor, cleanup and fixes in Entry and DataEntry classes
- Replace entry.getWord() with entry.word
- Replace entry.getWords() with entry.l_word
- Replace entry.getDefi() with entry.defi
- Remove entry.getDefis()
  - Drop handling alternate definitions in Entry objects
- Replace entry.getDefiFormat() with entry.defiFormat
- Add entry.b_word and entry.b_defi shortcuts that give bytes (UTF-8)
- Replace dataEntry.getData() with dataEntry.data
- Add __slots__ to Entry and DataEntry classes
- Fix DataEntry in indirect mode
  - Mistaken for Entry with defi=DATA, and file content discarded
  - Save resource files in user's cache directory when loading input glossary into memory
    - Move file to output glossary on dataEntry.save(...)
- Fix Entry.getRawEntrySortKey not being alternates-aware, broke StarDict Writer
- DataEntry: save: use shutil.copy if has _tmpPath, and set _tmpPath
New features of Entry
- entry.stripFullHtml(), remove <html... <head>...</head>...<body>
  - Used in Kobo and Kobo Dictfile writers
  - Add tests
Fix glos.writeTabfile:
- Remove \r from definitions and info values
- Fix not escaping word
Fix/improve html detection in definitions
Switch to lazy imports of non-standard modules in plugins
Optimize RAM usage of indirect conversion
- To write StarDict, EPUB and DictionaryForMIDs glossaries, we need to load all entries into RAM to sort them
Other new features of Glossary class
- glos.getAuthor() to get "author", or "publisher" (as fallback)
- glos.removeHtmlTagsAll() method, can be called by plugins' writer
- glos.collectDefiFormat(maxCount) extract defiFormat counts
  - by reading first maxCount entries. (then iterator will be reset)
  - Used in StarDict Writer
- Show memory usage in trace mode
Bug fixes and improvements in code base
- Apply entry filter when iterating over reader, fix #251
  - Fixes wrong sort order for some glossaries (converting to StarDict or other formats that need sort)
- Fixes and improvements in TextGlossaryReader class
  - Fix ignoring glossary defaultDefiFormat
- Fix evaluating None value in read/write options
Support reading multi-file Tabfile or other text formats
- Example: file.txt, file.txt.1, file.txt.2
- Need to add file_count info key, for example: ##file_count 3
Fixes in Tabfile Writer
- Fix not escaping ""
Add/update docume...

Assets 3

Releases: ilius/pyglossary

PyGlossary 4.6.1

Changes since 4.6.0

Bug fixes

Features:

Improvements

Refactoring

Documentation

Type checking

PyGlossary 4.6.0

Changes since 4.5.0

Dependency change

Bug fixes

Features

Improvements

Removed features

Octopus MDict MDX: features and improvements

XDXF: fixes / imrovements, issue #376

AppleDict Binary: features, bug fixes, improvements, refactoring

Glossary class (glossary.py)

Refactoring

Documentation

Testing

Packaging

Contributors

PyGlossary 4.5.0

Changes since 4.4.1

Bug fixes

Features

Performance improvements

Other improvements

Unit testing

Refactoring

Contributors

PyGlossary 4.4.1

Changes since 4.4.0

Bug fixes

Features

Improvements and documentation

PyGlossary 4.4.0

Changes since 4.3.0

Breaking changes

Features

Bug fixes

Improvements

Unit testing

Refactoring and design improvements

PyGlossary 4.3.0

Changes since 4.2.1

Bug fixes

New features

New features for library users

Design improvements

Documentation

Unit testing

Improvements

Refactoring and cleanup

PyGlossary 4.2.1

Changes since version 4.2.0

Minor bug fixes and improvements:

Fearures:

PyGlossary 4.2.0

Changes since 4.1.0

PyGlossary 4.1.0

PyGlossary 4.0.0

Changes since 3.3.0

Changes since `4.6.0`

Changes since `4.5.0`

Glossary class (`glossary.py`)