β¬ββββββ¬ β¬βββββββ¬ β¬ βββββ¬βββββ¬ββ
ββ΄βββ€ ββ¬ββ β ββ β βββ€ β β βββ¬β
β΄ β΄βββ β΄ βββββββ΄βββ΄βββ΄ β΄ β΄ ββββ΄ββ
Compares text in a file to reference/glossary/key-items/dictionary.[1][2]
𧱠Built by David Rush fueled by βοΈ βΉοΈ info
keycollator #.#.# Pypi Project Description
- Structure
- Features
- Installation
- Documentation
- Supported File Formats
- Usage
- Example Output
- Todo
- Project Resource Acknowledgements
- Deployment Features
- Releases
- License
- Citation
- Additional Information
.
β
βββ assets
β βββ images
β βββ coverage.svg
β
βββ docs
β βββ cli.md
β βββ index.md
β
βββ src
β βββ __init__.py
β βββ cli.py
β βββ keycollator.py
β βββ extractfile.py
β βββ threadanalysis.py
β βββ extractonator.py
β βββ requirements.txt
β βββdata
β βββ (placeholder)
β βββ (placeholder)
β
βββ tests
β βββ test_keycollator
β βββ __init__.py
β βββ test_keycollator.py
β
βββ COD_OF_CONDUCT.md
βββ CONTRIBUTING.md
βββ LICENSE
βββ Makefile
βββ pyproject.toml
βββ README.README
βββ README.rst
βββ setup.cfg
βββ setup.py
ββ> Extract text from file to dictionary
βββ> Extract keys from file to dictionary
βββ> Find matches of keys in text file
βββ> Apply fuzzy matching
π¦ https://pypi.org/project/keycollator/
python3 -m pip install --upgrade keycollator
Official documentation can be found here:
https://github.com/davidprush/keycollator/tree/main/docs
- TXT/CSV files (Mac/Linux/Win)
- Plans to add PDF and JSON
from keycollator.customlogger import CustomLogger as cl
from keycollator.proceduretimer import ProcedureTimer as pt
click >= 8.0.2
datetime >= 4.7
fuzzywuzzy >= 0.18.0
halo >= 0.0.31
nltk >= 3.7
pytest >= 7.1.3
python-Levenshtein >= 0.12.2
termtables >= 0.2.4
joblib >= 1.2.0
keycollator uses the CLI
to change default parameters and functions
Usage: keycollator.py [OPTIONS] COMMAND [ARGS]...
keycollator is an app that finds keys in a text file.
Options:
-t, --text-file PATH Path/file name of the text to be searched for
against items in the key file
-k, --key-file PATH Path/file name of the key file containing a
dictionary, key items, glossary, or reference
list used to search the text file
-r, --result-file PATH Path/file name of the output file that
will contain the results (CSV or TXT)
--limit-result TEXT Limit the number of results
--abreviate INTEGER Limit the text length of the results
(default=32)
--fuzz-ratio INTEGER RANGE Set the level of fuzzy matching (default=99)
to validate matches using approximations/edit
distances, uses acceptance ratios with integer
values from 0 to 99, where 99 is nearly
identical and 0 is not similar [0<=x<=99]
--ubound-limit INTEGER RANGE Ignores items from the results with matches
greater than the upper boundary (upper-limit);
reduce eroneous matches [1<=x<=99999]
--lbound-limit INTEGER RANGE Ignores items from the results with matches
less than the lower boundary (lower-limit);
reduce eroneous matches [0<=x<=99999]
-v, --verbose Turn on verbose
-l, --logging Turn on logging
-L, --log-file PATH Path/file name to be used for the log file
--help Show this message and exit.
currently provides only one level for verbose, future versions will implement multiple levels (DEBUG, INFO, WARN, etc.)
keycollator --verbose
fuzzy matching uses approximate matches (edit distances) whereby 0 is the least strict and accepts nearly anything as a match and more strictly 99 accepts only nearly identical matches; by default the app uses level 99 only if regular matching finds no matches
keycollator --fuzzy-matching=[0-99]
each line of text represents a key which will be used to match with items in the text file
keycollator --key-file="/path/to/key/file/keys.txt"
text file whereby each line represents an item that will be compared with the items in the keys file
keycollator --text-file="/path/to/key/file/text.txt"
currently uses CSV but will add additional file formats in future releases (PDF/JSON/DOCX)
keycollator --output-file="/path/to/results/result.csv"
Limit the number of results
keycollator --limit-results=30
rejects items with matches over the integer value set, helps with eroneous matches when using fuzzy matching
keycollator --ubound-limit
turn on logging whereby if no log file is supplied by user it will create one using the default log.log
keycollator --set-logging
set the name of the log file to be used by logging
keycollator --log-file="/path/to/log/file/log.log"
python3 src/keycollator.py
Analyzing text for keys...
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 679/679 [00:51<00:00, 13.31it/s]
1.r [536] 51.conduct [7] 101.connect [3] 151.assist develo*[1]
2.manage [73] 52.establish [7] 102.determine [3] 152.assist tracki*[1]
3.develop [62] 53.execute [7] 103.facilitate [3] 153.capture speci*[1]
4.report [58] 54.follow [7] 104.foster [3] 154.conduct code *[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
47.finance [8] 97.business admin*[3] 147.advise sponso*[1] 197.flexible [1]
48.powerpoint [8] 98.attention deta*[3] 148.advocate [1] 198.creative [1]
49.build [7] 99.python [3] 149.align documen*[1] 199.selfmotivated [1]
50.complete [7] 100.collaborate [3] 150.analyze under*[1] 200.difference la*[1]
[0.00]seconds
β Fix pylint errors
β Add command line option to add a stopwords file
β Fix all cli options
β Add comments
β Refactor code and remove redunancies
β Fix pylint errors
β Add proper error handling
β Add CHANGELOG.md
β Create method to KeyKrawler to select and _create missing files_
β Update CODE_OF_CONDUCT.md
β Update CONTRIBUTING.md
β Github: issue and pr templates
β Workflow Automation
β Makefile Usage
β Dockerfile
β @dependabot configuration
β Release Drafter (release-drafter.yml)
Feature | Notes |
---|---|
Github | issue and pr templates |
Workflows | Automate your workflow from idea to production |
Makefile-usage | Makefile Usage |
Dockerfile | Docker Library: Python |
@dependabot | Configuring Dependabot version updates |
Release Drafter | release-drafter.yml |
Release | Version | Status |
---|---|---|
Current: | 0.0.5 | Working |
Version | Notes |
---|---|
0.0.1 | Initial prototype |
0.0.2 | Bug fixes |
0.0.4 | Fixed functions/methods |
0.0.5 | Fixed functions/methods |
This project is licensed under the terms of the MIT license. See LICENSE for more details.
@misc{keycollator,
author = {David Rush},
title = {Compares text in a file to reference/glossary/key-items/dictionary file.},
year = {2022},
publisher = {Rush Solutions, LLC},
journal = {GitHub repository},
howpublished = {\url{https://github.com/davidprush/keycollator}}
}
- The latest version of this document can be found here; if you are viewing it there (via HTTPS), you can download the Markdown/reStructuredText source here.
- You can contact the author via e-mail.