Skip to content

Commit

Permalink
Fix line endings from CRLF to LF (#67)
Browse files Browse the repository at this point in the history
  • Loading branch information
frenzymadness committed Jul 23, 2021
1 parent 417d33f commit bdb91cd
Show file tree
Hide file tree
Showing 10 changed files with 1,027 additions and 1,027 deletions.
484 changes: 242 additions & 242 deletions README.md

Large diffs are not rendered by default.

162 changes: 81 additions & 81 deletions docs/advanced_search.rst
@@ -1,81 +1,81 @@
Advanced Search
===============

Charset Normalizer method ``from_bytes``, ``from_fp`` and ``from_path`` provide some
optional parameters that can be tweaked.

As follow ::

from charset_normalizer import from_bytes

my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')

results = from_bytes(
my_byte_str,
steps=10, # Number of steps/block to extract from my_byte_str
chunk_size=512, # Set block size of each extraction
threshold=0.2, # Maximum amount of chaos allowed on first pass
cp_isolation=None, # Finite list of encoding to use when searching for a match
cp_exclusion=None, # Finite list of encoding to avoid when searching for a match
preemptive_behaviour=True, # Determine if we should look into my_byte_str (ASCII-Mode) for pre-defined encoding
explain=False # Print on screen what is happening when searching for a match
)


Using CharsetMatches
------------------------------

Here, ``results`` is a ``CharsetMatches`` object. It behave like a list but does not implements all related methods.
Initially, it is sorted. Calling ``best()`` is sufficient to extract the most probable result.

.. autoclass:: charset_normalizer.CharsetMatches
:members:

List behaviour
--------------

Like said earlier, ``CharsetMatches`` object behave like a list.

::

# Call len on results also work
if len(results) == 0:
print('No match for your sequence')

# Iterate over results like a list
for match in results:
print(match.encoding, 'can decode properly your sequence using', match.alphabets, 'and language', match.language)

# Using index to access results
if len(results) > 0:
print(str(results[0]))

Using best()
------------

Like said above, ``CharsetMatches`` object behave like a list and it is sorted by default after getting results from
``from_bytes``, ``from_fp`` or ``from_path``.

Using ``best()`` return the most probable result, the first entry of the list. Eg. idx 0.
It return a ``CharsetMatch`` object as return value or None if there is not results inside it.

::

result = results.best()

Calling first()
---------------

The very same thing than calling the method ``best()``.

Class aliases
-------------

``CharsetMatches`` is also known as ``CharsetDetector``, ``CharsetDoctor`` and ``CharsetNormalizerMatches``.
It is useful if you prefer short class name.

Verbose output
--------------

You may want to understand why a specific encoding was not picked by charset_normalizer. All you have to do is passing
``explain`` to True when using methods ``from_bytes``, ``from_fp`` or ``from_path``.
Advanced Search
===============

Charset Normalizer method ``from_bytes``, ``from_fp`` and ``from_path`` provide some
optional parameters that can be tweaked.

As follow ::

from charset_normalizer import from_bytes

my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')

results = from_bytes(
my_byte_str,
steps=10, # Number of steps/block to extract from my_byte_str
chunk_size=512, # Set block size of each extraction
threshold=0.2, # Maximum amount of chaos allowed on first pass
cp_isolation=None, # Finite list of encoding to use when searching for a match
cp_exclusion=None, # Finite list of encoding to avoid when searching for a match
preemptive_behaviour=True, # Determine if we should look into my_byte_str (ASCII-Mode) for pre-defined encoding
explain=False # Print on screen what is happening when searching for a match
)


Using CharsetMatches
------------------------------

Here, ``results`` is a ``CharsetMatches`` object. It behave like a list but does not implements all related methods.
Initially, it is sorted. Calling ``best()`` is sufficient to extract the most probable result.

.. autoclass:: charset_normalizer.CharsetMatches
:members:

List behaviour
--------------

Like said earlier, ``CharsetMatches`` object behave like a list.

::

# Call len on results also work
if len(results) == 0:
print('No match for your sequence')

# Iterate over results like a list
for match in results:
print(match.encoding, 'can decode properly your sequence using', match.alphabets, 'and language', match.language)

# Using index to access results
if len(results) > 0:
print(str(results[0]))

Using best()
------------

Like said above, ``CharsetMatches`` object behave like a list and it is sorted by default after getting results from
``from_bytes``, ``from_fp`` or ``from_path``.

Using ``best()`` return the most probable result, the first entry of the list. Eg. idx 0.
It return a ``CharsetMatch`` object as return value or None if there is not results inside it.

::

result = results.best()

Calling first()
---------------

The very same thing than calling the method ``best()``.

Class aliases
-------------

``CharsetMatches`` is also known as ``CharsetDetector``, ``CharsetDoctor`` and ``CharsetNormalizerMatches``.
It is useful if you prefer short class name.

Verbose output
--------------

You may want to understand why a specific encoding was not picked by charset_normalizer. All you have to do is passing
``explain`` to True when using methods ``from_bytes``, ``from_fp`` or ``from_path``.
40 changes: 20 additions & 20 deletions docs/miscellaneous.rst
@@ -1,20 +1,20 @@
==============
Miscellaneous
==============

Convert to str
--------------

Any ``CharsetMatch`` object can be transformed to exploitable ``str`` variable.

::

my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')

# Assign return value so we can fully exploit result
result = CnM.from_bytes(
my_byte_str
).best()

# This should print '我没有埋怨,磋砣的只是一些时间。'
print(str(result))
==============
Miscellaneous
==============

Convert to str
--------------

Any ``CharsetMatch`` object can be transformed to exploitable ``str`` variable.

::

my_byte_str = '我没有埋怨,磋砣的只是一些时间。'.encode('gb18030')

# Assign return value so we can fully exploit result
result = CnM.from_bytes(
my_byte_str
).best()

# This should print '我没有埋怨,磋砣的只是一些时间。'
print(str(result))

0 comments on commit bdb91cd

Please sign in to comment.