Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat punctuation as optional for org name conviction checks #565

Merged
merged 4 commits into from
Nov 21, 2019
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions app/models/waste_carriers_engine/convictions_check/entity.rb
Original file line number Diff line number Diff line change
Expand Up @@ -88,15 +88,34 @@ def self.matching_organisations(name:, company_no: nil)
private_class_method def self.org_name_search_term(term)
return if term.blank?

# Trim trailing full stops
term_without_trailing_full_stops = term.gsub(/\.$/, "")
# The steps for processing the org name must be done in this order:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I have all my code written like this, please? I might then finally have a clue what everything is doing! ❤️ 😁


# 1. Remove trailing full stops
without_full_stops = term.gsub(/\.$/, "")
# 2. Remove common org words
without_org_words = term_without_ignorable_org_words(without_full_stops)
# 3. Escape special characters for regex
escaped = ::Regexp.escape(without_org_words)
# 4. Treat punctuation as optional when matching
term_with_optional_punctuation(escaped)
end

# Remove the words we want to ignore
word_array = term_without_trailing_full_stops.downcase.split(" ")
private_class_method def self.term_without_ignorable_org_words(term)
word_array = term.downcase.split(" ")
word_array.reject! { |word| IGNORABLE_ORG_NAME_WORDS.include?(word) }
term_without_ignorable_words = word_array.join(" ")
word_array.join(" ")
end

private_class_method def self.term_with_optional_punctuation(term)
# These are characters we want to treat as optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we reflect this in the variable's name? Like: optional_chars instead of punctuation_chars :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That probably makes sense. I got what was going on without issue so didn't spot this, but now that you highlight it, I see why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cintamani Good shout! Updated: 4c1852a

punctuation_chars = %w[. , / # ! $ % ^ & * ; : { } = - _ ` ~ ( )]

chars_array = term.scan(/./)
chars_array.each_with_index do |char, index|
chars_array[index] = "#{char}?" if punctuation_chars.include?(char)
end

::Regexp.escape(term_without_ignorable_words)
chars_array.join
end
end
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,17 +40,17 @@ module ConvictionsCheck
expect(scope).to include(upcased_record)
end

context "when the search term ends with a ." do
let(:term) { "foo." }
context "when the search term contains punctuation" do
let(:term) { "Waste Not (Want Not)." }

it "treats the . as an optional match" do
record_with_trailing_full_stop = described_class.create(name: "foo.")
record_without_trailing_full_stop = described_class.create(name: "foo")
record_with_full_stop_somewhere_else = described_class.create(name: "f.oo")
it "treats them as optional" do
record_with_all_punctuation = described_class.create(name: term)
record_with_some_punctuation = described_class.create(name: "Waste Not (Want Not)")
record_with_no_punctuation = described_class.create(name: "Waste Not Want Not")

expect(scope).to include(record_with_trailing_full_stop)
expect(scope).to include(record_without_trailing_full_stop)
expect(scope).to_not include(record_with_full_stop_somewhere_else)
expect(scope).to include(record_with_all_punctuation)
expect(scope).to include(record_with_some_punctuation)
expect(scope).to include(record_with_no_punctuation)
end
end

Expand Down