Skip to content

Commit

Permalink
Merge branch 'release/v0.1.3'
Browse files Browse the repository at this point in the history
* release/v0.1.3:
  Bump version from 0.1.2 to 0.1.3
  Fix: don't add empty file extension
  Fix: borrowed value does not live long enough
  Bump indicatif from 0.15.0 to 0.16.0
  Typo
  Add dependabot
  Update dependencies
  Fix links in README.md
  Add help output to readme
  Make the readme example less confusing
  • Loading branch information
MichaelSasser committed May 4, 2021
2 parents dc3bd57 + eee2d83 commit 882b125
Show file tree
Hide file tree
Showing 5 changed files with 128 additions and 96 deletions.
32 changes: 32 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# wordlist-dedup
# Copyright (c) 2021 Michael Sasser <Michael@MichaelSasser.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

---

version: 2
updates:

# Maintain dependencies for GitHub Actions
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"

# Maintain dependencies for rust
- package-ecosystem: "cargo"
directory: "/"
schedule:
interval: "daily"
144 changes: 61 additions & 83 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "wordlist-dedup"
version = "0.1.2"
version = "0.1.3"
authors = ["Michael Sasser <Michael@MichaelSasser.org>"]
edition = "2018"
license = "GPL-3.0+"
Expand All @@ -10,5 +10,5 @@ publish = false
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
indicatif = "0.15.0"
clap = "2.33.1"
indicatif = "0.16.0"
clap = "2.33.3"
34 changes: 26 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ wordlist-dedup is a program written in rust to deduplicate wordlists. Duh.
I tried to deduplicate lines of a huge wordlist (>80 GB) with GNU/coreutils
`uniq`. First everything seemed to be hunky dory. Before I deleted the original
file I spotted the size of the deduplicated. It was about half of the original.
In the firsthand I suspected about 5 % duplicates duplicates.
In the firsthand I suspected about 5 % duplicates.

To check this, I wrote a program to count the duplicates and Bingo! The
original file had just a smidgen over 3 % of duplicates.
Expand All @@ -23,14 +23,32 @@ wordlist-dedup.

## Command line tool

```commandline
wordlist-dedup --help
wordlist-dedup 0.1.2
Michael Sasser <Michael@MichaelSasser.org>
Deduplicate presorted wordlists.
USAGE:
wordlist-dedup <SRC> [DEST]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
ARGS:
<SRC> The presorted source file, wich may contains duplicated lines
<DEST> The destination file, to write the deduplicated file to
```

wordlist-dedup as a pure commandline tool. Keep in mind, the file must be
sorted before running it. You can use GNU/coreutils `sort`, which does a fine
job, even, when the RAM is limited. This means, the file can be larger then
job, even, when the RAM is limited. This means, the file can be larger than
the available RAM. wordlist-dedup does barely use any RAM.
You can use it to deduplicate a file like:

```
$ wordlist-dedup some_file_with_dups.txt some_file_without_dups.txt
```commandline
$ wordlist-dedup some_file_with_dups.txt new_file_to_write_to.txt
⠏ Done. Found 410 duplicates.
```

Expand All @@ -41,8 +59,8 @@ duplicate line.
If you use it with only one argument like `file.ext`, it will name the
outputfile `file_uniq.ext`.

Keep in mind, it was made for one job, to sort wordlists. It might work in
other scenarios.
Keep in mind, it was made for one job, to deduplicate sorted wordlists.
It might work in different scenarios.

If you like to use my scripts to dedup as many files as you like in one folder
check out my
Expand Down Expand Up @@ -70,8 +88,8 @@ This repository uses the
branching model by [Vincent Driessen](https://nvie.com/about/).
It has two branches with infinite lifetime:

* [master](https://github.com/MichaelSasser/matrixctl/tree/master)
* [develop](https://github.com/MichaelSasser/matrixctl/tree/develop)
* [master](https://github.com/MichaelSasser/wordlist-dedup/tree/master)
* [develop](https://github.com/MichaelSasser/wordlist-dedup/tree/develop)

The master branch gets updated on every release. The develop branch is the
merging branch.
Expand Down
Loading

0 comments on commit 882b125

Please sign in to comment.