Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spam detection engine #11319

Open
wants to merge 62 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
64532b9
Add events for comments
alecslupu Jul 3, 2023
98798ea
Add events for debates
alecslupu Jul 3, 2023
c6a07b0
Add events for meetings
alecslupu Jul 3, 2023
ab8bcbd
Update the proposals commands
alecslupu Jul 4, 2023
0d11ecc
Refactor with_events
alecslupu Jul 12, 2023
cf9a7a0
Apply review recommendations
alecslupu Jul 14, 2023
2df8d59
Merge branch 'develop' of github.com:decidim/decidim into fature/prep…
alecslupu Jul 14, 2023
d204dc7
Create decidim-ai module
alecslupu Jul 15, 2023
99e4c98
change description
alecslupu Jul 15, 2023
94952c1
Add language detection-service
alecslupu Jul 15, 2023
6533af8
Add registry strategy (#253)
alecslupu Jul 16, 2023
d2f3823
Add SpamDetectionService class (#255)
alecslupu Jul 17, 2023
6daeb01
Add BayesStrategy (#256)
alecslupu Jul 18, 2023
b27b5a8
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Jul 20, 2023
8d5ae48
Merge branch 'develop' of github.com:decidim/decidim into fature/prep…
alecslupu Jul 20, 2023
692a2fa
Merge branch 'fature/prepare-analyzer-events' into ale-add-spam-detec…
alecslupu Jul 20, 2023
9d57036
Change the pipeline working dir
alecslupu Jul 20, 2023
c6a772a
Fixing spam suite
alecslupu Jul 20, 2023
e27dc50
Revert event changes
alecslupu Jul 21, 2023
6dc2c88
Revert event changes
alecslupu Jul 21, 2023
7dd8754
Merge branch 'fature/prepare-analyzer-events' into ale-add-spam-detec…
alecslupu Jul 21, 2023
d587f3c
Add BayesStrategy (#256) (#257)
alecslupu Jul 21, 2023
367ca39
Merge branch 'ale-add-spam-detection' of github.com:tremend-cofe/deci…
alecslupu Jul 21, 2023
8f51cc7
Running linters
alecslupu Jul 21, 2023
daec039
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Jul 22, 2023
f6a1dca
Add score calculation (#262)
alecslupu Jul 22, 2023
6583933
Add event handlers and spec data (#263)
alecslupu Jul 23, 2023
4a1cc71
Refactor AI namespaces (#269)
alecslupu Jul 31, 2023
215aa57
Add resources to be indexed (#254)
alecslupu Jul 31, 2023
1046e24
Add action to reset train model (#270)
alecslupu Aug 1, 2023
addc5ef
Merge branch 'develop' into ale-add-spam-detection
alecslupu Aug 1, 2023
9ac577c
Add documentation page (#275)
alecslupu Aug 3, 2023
a0c1707
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Aug 10, 2023
720333b
Add Initiatives to antispam (#271)
alecslupu Aug 10, 2023
08b954d
Merge branch 'ale-add-spam-detection' of github.com:tremend-cofe/deci…
alecslupu Aug 10, 2023
ce424fe
Merge branch 'develop' into ale-add-spam-detection
alecslupu Aug 24, 2023
4d07727
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Oct 8, 2023
834e957
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Oct 25, 2023
69c9ad6
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Oct 26, 2023
a634ff7
Fix rubocop
alecslupu Oct 27, 2023
33b18b4
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Dec 1, 2023
5992620
Running linters
alecslupu Dec 1, 2023
fd6362d
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Jan 14, 2024
3bf57b9
Merge branch 'ale-add-spam-detection' of github.com:tremend-cofe/deci…
alecslupu Jan 14, 2024
f957f13
FIx failing spec
alecslupu Jan 14, 2024
d778220
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Feb 2, 2024
65ba7cf
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Feb 6, 2024
37e896a
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Feb 17, 2024
9240000
Gem revert
alecslupu Feb 17, 2024
b05f0c5
Fix gem dependencies
alecslupu Feb 17, 2024
25cfd4b
Fix test suite
alecslupu Feb 17, 2024
b71691c
Fix spec
alecslupu Feb 17, 2024
ce04e00
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Mar 8, 2024
6c90048
Revert wicked_pdf version change
alecslupu Mar 8, 2024
cc8ab5d
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Mar 11, 2024
06479e3
Add spam csv dictionaries to the list of exceptions
alecslupu Mar 11, 2024
b5ce0db
Additional CSV
alecslupu Mar 11, 2024
51deb6e
Spell checks
alecslupu Mar 11, 2024
4e354d3
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu Mar 11, 2024
b5151f3
Spell checks
alecslupu Mar 11, 2024
312a2c5
Merge branch 'develop' of github.com:decidim/decidim into ale-add-spa…
alecslupu May 1, 2024
e55dd3d
Add Autolabeler config for Ai module
alecslupu May 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 34 additions & 0 deletions .github/workflows/ci_ai.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: "[CI] Ai"
on:
push:
branches:
- develop
- release/*
- "*-stable"
pull_request:
branches-ignore:
- "chore/l10n*"
paths:
- "*"
- ".github/**"
- "decidim-ai/**"
- "decidim-core/**"
- "decidim-dev/**"

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build_app:
uses: ./.github/workflows/build_app.yml
secrets: inherit
name: Build test application
main:
needs: build_app
name: Tests
uses: ./.github/workflows/test_app.yml
secrets: inherit
with:
working-directory: "decidim-ai"
test_command: bundle exec parallel_test --type rspec --pattern spec/
2 changes: 2 additions & 0 deletions .spelling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ exclude_paths:
- decidim-core/lib/decidim/db/common-passwords.txt
- decidim-initiatives/spec/types/initiative_type_spec.rb
- decidim-proposals/app/packs/documents/decidim/proposals/participatory_texts/participatory_text.md
- decidim-ai/data/(.*).csv
- decidim-ai/spec/support/test.csv

forbidden:
arent: are not
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ source "https://rubygems.org"
ruby RUBY_VERSION

gem "decidim", path: "."
gem "decidim-ai", path: "."
gem "decidim-conferences", path: "."
gem "decidim-consultations", path: "."
gem "decidim-elections", path: "."
Expand Down
11 changes: 11 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ PATH
devise (~> 4.7)
devise-i18n (~> 1.2)
devise_invitable (~> 2.0)
decidim-ai (0.28.0.dev)
classifier-reborn (~> 2.3.0)
cld (~> 0.11)
decidim-core (= 0.28.0.dev)
decidim-api (0.28.0.dev)
commonmarker (~> 0.23.0, >= 0.23.9)
graphql (~> 2.0)
Expand Down Expand Up @@ -312,6 +316,11 @@ GEM
actionpack (>= 5.0)
cells (>= 4.1.6, < 5.0.0)
charlock_holmes (0.7.7)
classifier-reborn (2.3.0)
fast-stemmer (~> 1.0)
matrix (~> 0.4)
cld (0.13.0)
ffi
cmdparse (3.0.7)
commonmarker (0.23.9)
concurrent-ruby (1.2.2)
Expand Down Expand Up @@ -399,6 +408,7 @@ GEM
faraday-retry (1.0.3)
faraday_middleware (1.2.0)
faraday (~> 1.0)
fast-stemmer (1.0.2)
ffi (1.15.5)
file_validators (3.0.0)
activemodel (>= 3.2)
Expand Down Expand Up @@ -820,6 +830,7 @@ DEPENDENCIES
brakeman (~> 5.4)
byebug (~> 11.0)
decidim!
decidim-ai!
decidim-conferences!
decidim-consultations!
decidim-dev!
Expand Down
55 changes: 55 additions & 0 deletions decidim-ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Decidim::Ai

The Decidim::AI is a library that aims to privide Artificial Inteligence tools for Decidim. This plugin has been initially developed aiming to analyze the content and provide spam classification using Naive Bayes algorithm.
All AI related functionality provided by Decidim should be included in this same module.

## Installation

In order to install use this library, you need at least Decidim 0.25 to be installed.

Add this line to your application's Gemfile:

```ruby
gem "decidim-tools-ai"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
gem "decidim-tools-ai"
gem "decidim-ai"

```

And then execute:

```bash
bundle install
```

After that, add an initializer file inside your project, having the following content:

```ruby
# config/initializers/decidim_ai.rb
```

After the configuration is added, you need to run the below command, so that the reporting user is created.

```ruby
bundle exec rake decidim:spam:data:create_reporting_user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bundle exec rake decidim:spam:data:create_reporting_user
bundle exec rake decidim:ai:create_reporting_user

```

If you have an existing installation, you can use the below command to train the engine with your existing data:

```ruby
bundle exec rake decidim:spam:train:moderation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bundle exec rake decidim:spam:train:moderation
bundle exec rake decidim:ai:load_plugin_dataset
bundle exec rake decidim:ai:load_application_dataset
bundle exec rake decidim:ai:train_using_database

I would actually like if this happened in only one rake task (like in the docs) but just commenting based on what I see in the rake tasks.

```

Add the queue name to `config/sidekiq.yml` file:

```yaml
:queues:
- ["default", 1]
- ["spam_analysis", 1]
# The other yaml entries
```

## Contributing

See [Decidim](https://github.com/decidim/decidim).

## License

See [Decidim](https://github.com/decidim/decidim).
3 changes: 3 additions & 0 deletions decidim-ai/Rakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# frozen_string_literal: true

require "decidim/dev/common_rake"
230 changes: 230 additions & 0 deletions decidim-ai/data/blocked_accounts.csv

Large diffs are not rendered by default.

5,574 changes: 5,574 additions & 0 deletions decidim-ai/data/sms-spam.csv

Large diffs are not rendered by default.

104 changes: 104 additions & 0 deletions decidim-ai/data/spam_comments.csv

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions decidim-ai/decidim-ai.gemspec
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# frozen_string_literal: true

$LOAD_PATH.push File.expand_path("lib", __dir__)

require "decidim/ai/version"

Gem::Specification.new do |s|
s.version = Decidim::Ai.version
s.authors = ["Alexandru-Emil Lupu"]
s.email = ["contact@alecslupu.ro"]
s.license = "AGPL-3.0"
s.homepage = "https://decidim.org"
s.metadata = {
"bug_tracker_uri" => "https://github.com/decidim/decidim/issues",
"documentation_uri" => "https://docs.decidim.org/",
"funding_uri" => "https://opencollective.com/decidim",
"homepage_uri" => "https://decidim.org",
"source_code_uri" => "https://github.com/decidim/decidim"
}
s.required_ruby_version = ">= 3.1"

s.name = "decidim-ai"
s.summary = "A Decidim module with AI tools"
s.description = "A Decidim module with AI tools"

s.files = Dir["{app,config,db,lib,vendor}/**/*", "Rakefile", "README.md"]

s.add_dependency "classifier-reborn", "~> 2.3.0"
s.add_dependency "cld", "~> 0.11"
s.add_dependency "decidim-core", Decidim::Ai.version
end
139 changes: 139 additions & 0 deletions decidim-ai/lib/decidim/ai.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# frozen_string_literal: true

require "decidim/ai/engine"

module Decidim
module Ai
autoload :LanguageDetectionService, "decidim/ai/language_detection_service"
autoload :SpamDetectionService, "decidim/ai/spam_detection_service"
autoload :StrategyRegistry, "decidim/ai/strategy_registry"
autoload :LoadDataset, "decidim/ai/load_dataset"

module SpamContent
autoload :BaseStrategy, "decidim/ai/spam_content/base_strategy"
autoload :BayesStrategy, "decidim/ai/spam_content/bayes_strategy"
end

include ActiveSupport::Configurable

# You can configure the spam treshold for the spam detection service.
# The treshold is a float value between 0 and 1.
# The default value is 0.5
# Any value below the treshold will be considered spam.
config_accessor :spam_treshold do
0.5
end
# Registered analyzers.
# You can register your own analyzer by adding a new entry to this array.
# The entry must be a hash with the following keys:
# - name: the name of the analyzer
# - strategy: the class of the strategy to use
# - options: a hash with the options to pass to the strategy
# Example:
# config.registered_analyzers = [
# {
# name: :bayes,
# strategy: Decidim::Ai::SpamContent::BayesStrategy,
# options: {
# adapter: :redis,
# params: {
# url: lambda { ENV["REDIS_URL"] }
# scheme: "redis"
# host: "127.0.0.1"
# port: 6379
# path: nil
# timeout: 5.0
# password: nil
# db: 0
# driver: nil
# id: nil
# tcp_keepalive: 0
# reconnect_attempts: 1
# inherit_socket: false
# }
# }
# }
# ]
config_accessor :registered_analyzers do
[
{ name: :bayes, strategy: Decidim::Ai::SpamContent::BayesStrategy, options: { adapter: :memory, params: {} } }
]
end

# Language detection service class.
#
# If you want to autodetect the language of the content, you can use a class service having the following contract
#
# class LanguageDetectionService
# def initialize(text)
# @text = text
# end
#
# def language_code
# CLD.detect_language(@text).fetch(:code)
# end
# end
config_accessor :language_detection_service do
"Decidim::Ai::LanguageDetectionService"
end

# Spam detection service class.
# If you want to use a different spam detection service, you can use a class service having the following contract
#
# class SpamDetectionService
# def initialize
# @registry = Decidim::Ai.spam_detection_registry
# end
#
# def train(category, text)
# # train the strategy
# end
#
# def classify(text)
# # classify the text
# end
#
# def untrain(category, text)
# # untrain the strategy
# end
#
# def classification_log
# # return the classification log
# end
# end
config_accessor :spam_detection_service do
"Decidim::Ai::SpamDetectionService"
end

# This is the email address used by the spam engine to
# properly identify the user that will report users and content
config_accessor :reporting_user_email do
"reporting.user@domain.tld"
end

def self.spam_detection_instance
@spam_detection_instance ||= spam_detection_service.constantize.new
end

def self.spam_detection_registry
@spam_detection ||= Decidim::Ai::StrategyRegistry.new
end

def self.create_reporting_users!
Decidim::Organization.find_each do |organization|
user = organization.users.find_or_initialize_by(email: Decidim::Ai.reporting_user_email)
next if user.persisted?

password = SecureRandom.hex(10)
user.password = password
user.password_confirmation = password

user.deleted_at = Time.current
user.tos_agreement = true
user.name = ""
user.skip_confirmation!
user.save!
end
end
end
end
22 changes: 22 additions & 0 deletions decidim-ai/lib/decidim/ai/engine.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# frozen_string_literal: true

module Decidim
module Ai
class Engine < ::Rails::Engine
isolate_namespace Decidim::Ai

paths["db/migrate"] = nil
# paths["lib/tasks"] = nil

initializer "decidim_ai.classifiers" do |_app|
Decidim::Ai.registered_analyzers.each do |analyzer|
Decidim::Ai.spam_detection_registry.register_analyzer(**analyzer)
end
end

def load_seed
nil
end
end
end
end
17 changes: 17 additions & 0 deletions decidim-ai/lib/decidim/ai/language_detection_service.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# frozen_string_literal: true

require "cld"

module Decidim
module Ai
class LanguageDetectionService
def initialize(text)
@text = text
end

def language_code
CLD.detect_language(@text).fetch(:code)
end
end
end
end
20 changes: 20 additions & 0 deletions decidim-ai/lib/decidim/ai/load_dataset.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# frozen_string_literal: true

module Decidim
module Ai
class LoadDataset
def self.call(file)
service = Decidim::Ai.spam_detection_instance

ext = File.extname(file)[1..-1]
reader_class = Decidim::Admin::Import::Readers.search_by_file_extension(ext)
reader_class.new(file).read_rows do |row|
next unless %w(spam ham).include?(row[0])
next if row[1].blank?

service.train(row[0].to_sym, row[1])
end
end
end
end
end