Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
## 0.18.14-dev0

### Enhancements

### Features

### Fixes

- **change short text language detection log to debug** reduce warning level log spamming

## 0.18.13

### Enhancements
Expand All @@ -6,7 +16,7 @@

### Fixes

- **Parse a wider variety of date formats in email headers** The `partition_email` function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents `ValueError` exceptions when partitioning emails with these date formats.
- **Parse a wider variety of date formats in email headers** The `partition_email` function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents `ValueError` exceptions when partitioning emails with these date formats.

## 0.18.12

Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.18.13" # pragma: no cover
__version__ = "0.18.14-dev0" # pragma: no cover
2 changes: 1 addition & 1 deletion unstructured/partition/common/lang.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,7 +403,7 @@ def detect_languages(
# If text contains special characters (like ñ, å, or Korean/Mandarin/etc.) it will NOT default
# to English. It will default to English if text is only ascii characters and is short.
if re.match(r"^[\x00-\x7F]+$", text) and len(text.split()) < 5:
logger.warning(f'short text: "{text}". Defaulting to English.')
logger.debug(f'short text: "{text}". Defaulting to English.')
return ["eng"]

# set seed for deterministic langdetect outputs
Expand Down