Skip to content

Commit

Permalink
Merge pull request #55 from 5p1n/umlauts
Browse files Browse the repository at this point in the history
Add new --convert-umlauts parameter
  • Loading branch information
digininja committed Nov 28, 2019
2 parents 371ae86 + a575a36 commit 456ccdb
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
1 change: 1 addition & 0 deletions README
Expand Up @@ -30,6 +30,7 @@ Change Log
Version 5.4.4
-------------
Added the --lowercase parameter to convert all letters to lower case
Added the --convert-umlauts parameter to convert Latin-1 umlauts (e.g. "ä" to "ae", "ö" to "oe", etc.)

Version 5.4.3
-------------
Expand Down
9 changes: 9 additions & 0 deletions cewl.rb
Expand Up @@ -476,6 +476,7 @@ def push(value)
['--email_file', GetoptLong::REQUIRED_ARGUMENT],
['--lowercase', GetoptLong::NO_ARGUMENT],
['--with-numbers', GetoptLong::NO_ARGUMENT],
['--convert-umlauts', GetoptLong::NO_ARGUMENT],
['--meta', "-a", GetoptLong::NO_ARGUMENT],
['--email', "-e", GetoptLong::NO_ARGUMENT],
['--count', '-c', GetoptLong::NO_ARGUMENT],
Expand Down Expand Up @@ -507,6 +508,7 @@ def usage
-n, --no-words: Don't output the wordlist.
--lowercase: Lowercase all parsed words
--with-numbers: Accept words with numbers in as well as just letters
--convert-umlauts: Convert common ISO-8859-1 (Latin-1) umlauts (ä-ae, ö-oe, ü-ue, ß-ss)
-a, --meta: include meta data.
--meta_file file: Output file for meta data.
-e, --email: Include email addresses.
Expand Down Expand Up @@ -554,6 +556,7 @@ def usage
keep = false
lowercase = false
words_with_numbers = false
convert_umlauts = false
show_count = false
auth_type = nil
auth_user = nil
Expand All @@ -580,6 +583,8 @@ def usage
lowercase = true
when "--with-numbers"
words_with_numbers = true
when "--convert-umlauts"
convert_umlauts = true
when "--count"
show_count = true
when "--meta-temp-dir"
Expand Down Expand Up @@ -982,6 +987,10 @@ def usage
words.gsub!(/[^[[:alpha:]]]/i, " ")
end

if convert_umlauts then
words.gsub!(/[äöüßÄÖÜ]/, "ä" => "ae", "ö" => "oe", "ü" => "ue", "ß" => "ss", "Ä" => "Ae", "Ö" => "Oe", "Ü" => "Ue")
end

# Add to the array
words.split(" ").each do |word|
if word.length >= min_word_length
Expand Down

0 comments on commit 456ccdb

Please sign in to comment.