Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New content hashes and default file names #314

Merged
merged 12 commits into from Apr 13, 2023
Merged

New content hashes and default file names #314

merged 12 commits into from Apr 13, 2023

Conversation

adbar
Copy link
Owner

@adbar adbar commented Mar 10, 2023

TODO:

  • content to be hashed: title & main text
  • better string tokenization
  • clean up

@adbar adbar marked this pull request as draft March 10, 2023 16:24
@codecov-commenter
Copy link

codecov-commenter commented Mar 14, 2023

Codecov Report

Merging #314 (915a7c0) into master (55237a7) will increase coverage by 0.06%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #314      +/-   ##
==========================================
+ Coverage   96.35%   96.42%   +0.06%     
==========================================
  Files          21       22       +1     
  Lines        3268     3329      +61     
==========================================
+ Hits         3149     3210      +61     
  Misses        119      119              
Impacted Files Coverage Δ
trafilatura/filters.py 97.26% <ø> (-0.28%) ⬇️
trafilatura/cli.py 94.73% <100.00%> (+0.03%) ⬆️
trafilatura/cli_utils.py 90.24% <100.00%> (-0.10%) ⬇️
trafilatura/core.py 98.10% <100.00%> (+<0.01%) ⬆️
trafilatura/hashing.py 100.00% <100.00%> (ø)
trafilatura/meta.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@adbar adbar marked this pull request as ready for review April 13, 2023 12:17
@adbar adbar merged commit 56421cc into master Apr 13, 2023
12 of 15 checks passed
@adbar adbar deleted the content_hashes branch April 13, 2023 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants