Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3.0 #209

Merged
merged 68 commits into from
Oct 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
e770c99
:bookmark: Bump version to 3.0.0 b1
Ousret Aug 14, 2022
4e9b2d3
:wrench: Add support to build Whl using MYPYC
Ousret Aug 14, 2022
482d2e3
:wrench: remove opt level override
Ousret Aug 14, 2022
e74851a
:fire: remove deprecated
Ousret Aug 14, 2022
cd4be0d
:fire: remove test_normalize_fp as target fn been removed
Ousret Aug 14, 2022
0d89020
:fire: remove extra unicodedata backport support
Ousret Aug 14, 2022
b3c0d5a
:art: reformat models.py and utils.py
Ousret Aug 14, 2022
6f6300a
:art: fix flake8 F401 '.constant.NOT_PRINTABLE_PATTERN' imported but …
Ousret Aug 14, 2022
0262569
:fire: remove NOT_PRINTABLE_PATTERN import in models.py
Ousret Aug 14, 2022
c0283d9
:zap: Only "compile" md.py for whl size sake
Ousret Aug 15, 2022
6328f7c
:sparkle: Add mypyc gha minimum testing
Ousret Aug 15, 2022
31f2673
:sparkle: initial ci update to include building wheels (specific)
Ousret Aug 15, 2022
e8d7405
:pencil: Add CHANGELOG entry for first beta
Ousret Aug 15, 2022
316b5be
:pencil: Update README
Ousret Aug 15, 2022
82fb1b2
:pencil: Add a bit of docs about this
Ousret Aug 15, 2022
09402e6
:wrench: Add py matrix build specific wheels
Ousret Aug 15, 2022
c19faca
Use cibuildwheel action
Ousret Aug 15, 2022
05b7e7e
Update python-publish.yml
Ousret Aug 15, 2022
68f5aff
Update python-publish.yml
Ousret Aug 16, 2022
35f79f6
Update python-publish.yml
Ousret Aug 16, 2022
57a8485
Update python-publish.yml
Ousret Aug 16, 2022
2f5130a
Update python-publish.yml
Ousret Aug 16, 2022
0a0e20b
Update python-publish.yml
Ousret Aug 16, 2022
443ab7d
:fire: remove unicodedata2 import ver in cli
Ousret Aug 19, 2022
b580e97
:sparkle: normalizer --version specify if extra speedup is present
Ousret Aug 19, 2022
eb4577c
:bookmark: bump to beta2
Ousret Aug 19, 2022
03a2599
:pencil: Add changelog entry
Ousret Aug 19, 2022
97b87f0
:pencil: update changelog
Ousret Aug 19, 2022
1755db9
:pencil: update speedup doc
Ousret Aug 19, 2022
1faeed0
:heavy_check_mark: Verify that --version work as intended
Ousret Aug 19, 2022
8e5af12
:art: reformat normalizer.py
Ousret Aug 19, 2022
368d060
:fire: remove method first() and best() from class CharsetMatch
Ousret Aug 19, 2022
6032389
Merge branch 'master' into 3.0
Ousret Aug 19, 2022
f315e4e
Merge branch 'master' into 3.0
Ousret Aug 21, 2022
8e5171a
:fire: :art: remove unused import "warnings"
Ousret Aug 21, 2022
1957898
:art: Fix warnings in Sphinx docs generation process
Ousret Aug 21, 2022
be541de
Merge branch 'master' into 3.0
Ousret Aug 21, 2022
1eeb423
:pencil: update changelog
Ousret Aug 21, 2022
f119e43
:pencil: update docs support section
Ousret Aug 21, 2022
216d1c6
make sure utf-7 is not "detected" without a mark/sig
Ousret Aug 21, 2022
03aa701
:pencil: update changelog
Ousret Aug 21, 2022
18567aa
Merge branch 'master' into 3.0
Ousret Sep 9, 2022
0a74e3d
Merge branch 'master' into 3.0
Ousret Sep 9, 2022
c12a07a
:wrench: switch to static metadata (setup.cfg) and use 'build'
Ousret Oct 1, 2022
b2da4cb
:wrench: Lax on Flask version range (py 3.6)
Ousret Oct 1, 2022
95253c8
:wrench: Lax on pytest version range (py 3.6)
Ousret Oct 1, 2022
a28be6b
:wrench: Lax on requests version range (py 3.6)
Ousret Oct 1, 2022
0296900
:fire: remove codeql action
Ousret Oct 1, 2022
0e91fb6
:bug: Fix CLI --normalize opt using fullpath in args
Ousret Oct 1, 2022
5910d20
:heavy_check_mark: Ensure tests run with cibuildwheel
Ousret Oct 1, 2022
093889b
:art: apply isort on normalizer.py
Ousret Oct 1, 2022
d0df3f4
:sparkle: Extend the capability of explain=True when cp_isolation con…
Ousret Oct 2, 2022
32cbafe
:wrench: run_checks.sh adjust black target lvl py36
Ousret Oct 2, 2022
b5ef798
:ambulance: Fix invalid syntax fstring eq auto format (py 36)
Ousret Oct 2, 2022
2cb15cf
Amend commit d0df3f49377992dd3ec32e83bd2538bd03dae52d
Ousret Oct 2, 2022
9b4a209
:art: reformat file (flake8)
Ousret Oct 2, 2022
5e2368e
:art: reformat api.py
Ousret Oct 2, 2022
c76a83d
:sparkle: Support for alternative language frequency set
Ousret Oct 6, 2022
70c551a
:sparkle: Add parameter `language_threshold` in `from_bytes`, `from_p…
Ousret Oct 18, 2022
14689be
:wrench: Make the language detection stricter
Ousret Oct 18, 2022
8f91aa4
:bug: TooManyAccentuatedPlugin induce false positive on the mess dete…
Ousret Oct 18, 2022
e0010ff
:bookmark: Bump version rc1
Ousret Oct 18, 2022
840a6e0
:wrench: Ensure proper version lock
Ousret Oct 18, 2022
9b8b048
:wrench: set target-version to py36 black autofix script
Ousret Oct 18, 2022
13d9a99
:wrench: mypy ver lock for py 3.6 revised
Ousret Oct 18, 2022
f8e1153
:pencil: Adjust speedup docs section
Ousret Oct 18, 2022
b15f416
:pencil: Update CHANGELOG.md
Ousret Oct 18, 2022
6367d53
:pencil: Missing CHANGELOG entry and add language_threshold to docs::…
Ousret Oct 18, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/chardet-bc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
56 changes: 0 additions & 56 deletions .github/workflows/codeql-analysis.yml

This file was deleted.

3 changes: 2 additions & 1 deletion .github/workflows/detector-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Type checking (Mypy)
run: |
mypy --strict charset_normalizer
Expand Down
40 changes: 40 additions & 0 deletions .github/workflows/mypyc-verify.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: MYPYC Run

on: [push, pull_request]

jobs:
detection_coverage:
runs-on: ${{ matrix.os }}

strategy:
fail-fast: false
matrix:
python-version: [3.6, 3.7, 3.8, 3.9, "3.10"]
os: [ubuntu-latest]

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -U pip setuptools
pip install -r dev-requirements.txt
pip uninstall -y charset-normalizer
- name: Install the package
env:
CHARSET_NORMALIZER_USE_MYPYC: '1'
run: |
python -m build --no-isolation
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
- name: Coverage WITH preemptive
run: |
python ./bin/coverage.py --coverage 97 --with-preemptive
- name: Coverage WITHOUT preemptive
run: |
python ./bin/coverage.py --coverage 95
3 changes: 2 additions & 1 deletion .github/workflows/performance.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down
108 changes: 100 additions & 8 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Type checking (Mypy)
run: |
mypy charset_normalizer
Expand All @@ -51,7 +52,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: [ 3.6, 3.7, 3.8, 3.9, "3.10" ]
python-version: [ 3.6, 3.7, 3.8, 3.9, "3.10", "3.11-dev" ]
os: [ ubuntu-latest ]

steps:
Expand All @@ -67,7 +68,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Run tests
run: |
pytest
Expand Down Expand Up @@ -96,7 +98,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand Down Expand Up @@ -136,7 +139,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build
pip install ./dist/*.whl
- name: Clone the complete dataset
run: |
git clone https://github.com/Ousret/char-dataset.git
Expand All @@ -146,11 +150,92 @@ jobs:
- name: Integration Tests with Requests
run: |
python ./bin/integration.py
universal-wheel:
runs-on: ubuntu-latest
needs:
- integration
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: Update pip, setuptools, wheel and twine
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build Wheel
env:
CHARSET_NORMALIZER_USE_MYPYC: '0'
run: python -m build
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: dist
path: dist

build-wheels:
name: Build wheels on ${{ matrix.os }} ${{ matrix.qemu }}
runs-on: ${{ matrix.os }}-latest
needs: universal-wheel
strategy:
matrix:
os: [ ubuntu, windows, macos ]
qemu: [ '' ]
include:
# Split ubuntu job for the sake of speed-up
- os: ubuntu
qemu: aarch64
- os: ubuntu
qemu: ppc64le
- os: ubuntu
qemu: s390x
steps:
- name: Checkout
uses: actions/checkout@v3
with:
submodules: true
- name: Set up QEMU
if: ${{ matrix.qemu }}
uses: docker/setup-qemu-action@v2
with:
platforms: all
id: qemu
- name: Prepare emulation
run: |
if [[ -n "${{ matrix.qemu }}" ]]; then
# Build emulated architectures only if QEMU is set,
# use default "auto" otherwise
echo "CIBW_ARCHS_LINUX=${{ matrix.qemu }}" >> $GITHUB_ENV
fi
shell: bash
- name: Setup Python
uses: actions/setup-python@v4
- name: Update pip, wheel, setuptools, build, twine
run: |
python -m pip install -U pip wheel setuptools build twine
- name: Build wheels
uses: pypa/cibuildwheel@2.10.2
env:
CIBW_BUILD_FRONTEND: "build"
CIBW_ARCHS_MACOS: x86_64 arm64 universal2
CIBW_ENVIRONMENT: CHARSET_NORMALIZER_USE_MYPYC='1'
CIBW_CONFIG_SETTINGS: "--no-isolation"
CIBW_BEFORE_BUILD: pip install -r dev-requirements.txt
CIBW_TEST_REQUIRES: pytest codecov pytest-cov
CIBW_TEST_COMMAND: pytest {package}/tests
CIBW_SKIP: pp*
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: dist
path: ./wheelhouse/*.whl

deploy:

runs-on: ubuntu-latest
needs:
- integration
- build-wheels

steps:
- uses: actions/checkout@v2
Expand All @@ -162,10 +247,17 @@ jobs:
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
- name: Download disctributions
uses: actions/download-artifact@v3
with:
name: dist
path: dist
- name: Collected dists
run: |
tree dist
- name: Publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*
3 changes: 2 additions & 1 deletion .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ jobs:
pip uninstall -y charset-normalizer
- name: Install the package
run: |
python setup.py install
python -m build --no-isolation
pip install ./dist/*.whl
- name: Run tests
run: |
pytest
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,48 @@
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [3.0.0rc1](https://github.com/Ousret/charset_normalizer/compare/3.0.0b2...3.0.0rc1) (2022-10-18)

### Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio

### Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter

### Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it

### Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'

## [3.0.0b2](https://github.com/Ousret/charset_normalizer/compare/3.0.0b1...3.0.0b2) (2022-08-21)

### Added
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)

### Removed
- Breaking: Method `first()` and `best()` from CharsetMatch
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)

### Fixed
- Sphinx warnings when generating the documentation

## [3.0.0b1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...3.0.0b1) (2022-08-15)

### Changed
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

### Removed
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function `normalize`
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
- Support for the backport `unicodedata2`

## [2.1.1](https://github.com/Ousret/charset_normalizer/compare/2.1.0...2.1.1) (2022-08-19)

### Deprecated
Expand Down