# Word Count

Given a phrase, count the occurrences of each _word_ in that phrase.

For the purposes of this exercise you can expect that a _word_ will always be one of:

1. A _number_ composed of one or more ASCII digits (ie "0" or "1234") OR
2. A _simple word_ composed of one or more ASCII letters (ie "a" or "they") OR
3. A _contraction_ of two _simple words_ joined by a single apostrophe (ie "it's" or "they're")

When counting words you can assume the following rules:

1. The count is _case insensitive_ (ie "You", "you", and "YOU" are 3 uses of the same word)
2. The count is _unordered_; the tests will ignore how words and counts are ordered
3. Other than the apostrophe in a _contraction_ all forms of _punctuation_ are ignored
4. The words can be separated by _any_ form of whitespace (ie "\t", "\n", " ")

For example, for the phrase `"That's the password: 'PASSWORD 123'!", cried the Special Agent.\nSo I fled.` the count would be:

```text
that's: 1
the: 2
password: 2
123: 1
cried: 1
special: 1
agent: 1
so: 1
i: 1
fled: 1
```

## Source

This is a classic toy problem, but we were reminded of it by seeing it in the Go Tour.

## Version compatibility
This exercise has been tested on Julia versions >=1.0.

## Submitting Incomplete Solutions
It's possible to submit an incomplete solution so you can see how others have completed the exercise.

## Your solution

In [180]:
# first try
"""
        wordcount(sentence)

Returns a dictionary containing recognized words and their count.
"""
function wordcount(sentence)
    d = Dict()
    words = split(strip(lowercase(sentence)), r"([^a-zA-Z0-9_']+\W*)+")
    words = map(word -> strip(word,only("'")), words)
    foreach(word -> d[word] = count(w -> word == w, words), unique(words))
    return delete!(d,"")
end

wordcount

In [207]:
# submit
"""
        wordcount(sentence)

Returns a dictionary containing recognized words and their count.
"""
function wordcount(sentence)
    d = Dict()
    words = [lowercase(word.match) for word ∈ eachmatch(r"(\w+('\w)*)+", sentence)]
    foreach(word -> d[word] = count(w -> word == w, words), unique(words))
    return d
end

wordcount

In [205]:
# # match(r"([a-zA-Z]+'[a-zA-Z]+)", "First: don't laugh. Then: don't cry.")
# s = "         First: don't laugh. 'lol'  8 000  Then: don't cry.          "
# lowercase(s)
s = " .\n,\t!^&*()~@#Hello,Hello\$%{}Hello[]You Don't:;'/<>"
# ss = strip(s)
# m = eachmatch(r"([a-zA-Z]+)", "First: don't laugh. Then: don't cry.")
# sss = split(ss, r"[^a-zA-Z0-9]*( )+[^a-zA-z0-9]*")
wordcount(s)
# match(r"\n", "\n")
# match(r"red", "leather")
# strip("'", ''')
# only("x")
# split(lowercase(s), r"([^a-zA-Z0-9_']+\W*)+")

Dict{Any,Any} with 3 entries:
  "don't" => 1
  "you"   => 1
  "hello" => 3

## Test suite

In [208]:
using Test

# include("word-count.jl")

@testset "no words" begin
    @test wordcount(" .\n,\t!^&*()~@#\$%{}[]:;'/<>") == Dict()
end

@testset "count one word" begin
    @test wordcount("word") == Dict("word" => 1)
end

@testset "count one of each word" begin
    @test wordcount("one of each") == Dict("one" => 1, "of" => 1, "each" => 1)
end

@testset "multiple occurrences of a word" begin
    @test wordcount("one fish two fish red fish blue fish") == Dict("one" => 1, "fish" => 4, "two" => 1, "red" => 1, "blue" => 1)
end

@testset "handles cramped lists" begin
    @test wordcount("one,two,three") == Dict("one" => 1, "two" => 1, "three" => 1)
end

@testset "handles expanded lists" begin
    @test wordcount("one,\ntwo,\nthree") == Dict("one" => 1, "two" => 1, "three" => 1)
end

@testset "ignore punctuation" begin
    @test wordcount("car: carpet as java: javascript!!&@\$%^&") == Dict("car" => 1, "carpet" => 1, "as" => 1, "java" => 1, "javascript" => 1)
end

@testset "include numbers" begin
    @test wordcount("testing, 1, 2 testing") == Dict("testing" => 2, "1" => 1, "2" => 1)
end

@testset "normalize case" begin
    @test wordcount("go Go GO Stop stop") == Dict("go" => 3, "stop" => 2)
end

@testset "with apostrophes" begin
    @test wordcount("First: don't laugh. Then: don't cry.") == Dict("first" => 1, "don't" => 2, "laugh" => 1, "then" => 1, "cry" => 1)
end

@testset "with quotations" begin
    @test wordcount("Joe can't tell between 'large' and large.") == Dict("joe" => 1, "can't" => 1, "tell" => 1, "between" => 1, "large" => 2, "and" => 1)
end

@testset "substrings from the beginning" begin
    @test wordcount("Joe can't tell between app, apple and a.") == Dict("joe" => 1, "can't" => 1, "tell" => 1, "between" => 1, "app" => 1, "apple" => 1, "and" => 1, "a" => 1)
end

@testset "multiple spaces not detected as a word" begin
    @test wordcount(" multiple   whitespaces") == Dict("multiple" => 1, "whitespaces" => 1)
end

@testset "alternating word separators not detected as a word" begin
    @test wordcount(",\n,one,\n ,two \n 'three'") == Dict("one" => 1, "two" => 1, "three" => 1)
end

[37m[1mTest Summary: | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
no words      | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:  | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
count one word | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:          | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
count one of each word | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:                  | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
multiple occurrences of a word | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:         | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
handles cramped lists | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:          | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
handles expanded lists | [32m   1  [39m[36m    1[39m
[37m[1mTest Summary:      | [22m[39m[32m[1mPass  [22m[39m[36m[1mTotal[22m[39m
ignore punctuation | [3

Test.DefaultTestSet("alternating word separators not detected as a word", Any[], 1, false)

## Prepare submission
To submit your exercise, you need to save your solution in a file called `word-count.jl` before using the CLI.
You can either create it manually or use the following functions, which will automatically write every notebook cell that starts with `# submit` to the file `word-count.jl`.


In [209]:
using Pkg; Pkg.add("Exercism")
using Exercism
Exercism.create_submission("word-count")

[32m[1m  Resolving[22m[39m package versions...
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Project.toml`
[32m[1mNo Changes[22m[39m to `~/.julia/environments/v1.5/Manifest.toml`


324