-
-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spam detection engine #11319
base: develop
Are you sure you want to change the base?
Add spam detection engine #11319
Changes from 26 commits
64532b9
98798ea
c6a07b0
ab8bcbd
0d11ecc
cf9a7a0
2df8d59
d204dc7
99e4c98
94952c1
6533af8
d2f3823
6daeb01
b27b5a8
8d5ae48
692a2fa
9d57036
c6a772a
e27dc50
6dc2c88
7dd8754
d587f3c
367ca39
8f51cc7
daec039
f6a1dca
6583933
4a1cc71
215aa57
1046e24
addc5ef
9ac577c
a0c1707
720333b
08b954d
ce424fe
4d07727
834e957
69c9ad6
a634ff7
33b18b4
5992620
fd6362d
3bf57b9
f957f13
d778220
65ba7cf
37e896a
9240000
b05f0c5
25cfd4b
b71691c
ce04e00
6c90048
cc8ab5d
06479e3
b5ce0db
51deb6e
4e354d3
b5151f3
312a2c5
e55dd3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
name: "[CI] Ai" | ||
on: | ||
push: | ||
branches: | ||
- develop | ||
- release/* | ||
- "*-stable" | ||
pull_request: | ||
branches-ignore: | ||
- "chore/l10n*" | ||
paths: | ||
- "*" | ||
- ".github/**" | ||
- "decidim-ai/**" | ||
- "decidim-core/**" | ||
- "decidim-dev/**" | ||
|
||
concurrency: | ||
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} | ||
cancel-in-progress: true | ||
|
||
jobs: | ||
build_app: | ||
uses: ./.github/workflows/build_app.yml | ||
secrets: inherit | ||
name: Build test application | ||
main: | ||
needs: build_app | ||
name: Tests | ||
uses: ./.github/workflows/test_app.yml | ||
secrets: inherit | ||
with: | ||
working-directory: "decidim-ai" | ||
test_command: bundle exec parallel_test --type rspec --pattern spec/ |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,55 @@ | ||||||||||
# Decidim::Ai | ||||||||||
|
||||||||||
The Decidim::AI is a library that aims to privide Artificial Inteligence tools for Decidim. This plugin has been initially developed aiming to analyze the content and provide spam classification using Naive Bayes algorithm. | ||||||||||
All AI related functionality provided by Decidim should be included in this same module. | ||||||||||
|
||||||||||
## Installation | ||||||||||
|
||||||||||
In order to install use this library, you need at least Decidim 0.25 to be installed. | ||||||||||
|
||||||||||
Add this line to your application's Gemfile: | ||||||||||
|
||||||||||
```ruby | ||||||||||
gem "decidim-tools-ai" | ||||||||||
``` | ||||||||||
|
||||||||||
And then execute: | ||||||||||
|
||||||||||
```bash | ||||||||||
bundle install | ||||||||||
``` | ||||||||||
|
||||||||||
After that, add an initializer file inside your project, having the following content: | ||||||||||
|
||||||||||
```ruby | ||||||||||
# config/initializers/decidim_ai.rb | ||||||||||
``` | ||||||||||
|
||||||||||
After the configuration is added, you need to run the below command, so that the reporting user is created. | ||||||||||
|
||||||||||
```ruby | ||||||||||
bundle exec rake decidim:spam:data:create_reporting_user | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
``` | ||||||||||
|
||||||||||
If you have an existing installation, you can use the below command to train the engine with your existing data: | ||||||||||
|
||||||||||
```ruby | ||||||||||
bundle exec rake decidim:spam:train:moderation | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I would actually like if this happened in only one rake task (like in the docs) but just commenting based on what I see in the rake tasks. |
||||||||||
``` | ||||||||||
|
||||||||||
Add the queue name to `config/sidekiq.yml` file: | ||||||||||
|
||||||||||
```yaml | ||||||||||
:queues: | ||||||||||
- ["default", 1] | ||||||||||
- ["spam_analysis", 1] | ||||||||||
# The other yaml entries | ||||||||||
``` | ||||||||||
|
||||||||||
## Contributing | ||||||||||
|
||||||||||
See [Decidim](https://github.com/decidim/decidim). | ||||||||||
|
||||||||||
## License | ||||||||||
|
||||||||||
See [Decidim](https://github.com/decidim/decidim). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# frozen_string_literal: true | ||
|
||
require "decidim/dev/common_rake" |
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# frozen_string_literal: true | ||
|
||
$LOAD_PATH.push File.expand_path("lib", __dir__) | ||
|
||
require "decidim/ai/version" | ||
|
||
Gem::Specification.new do |s| | ||
s.version = Decidim::Ai.version | ||
s.authors = ["Alexandru-Emil Lupu"] | ||
s.email = ["contact@alecslupu.ro"] | ||
s.license = "AGPL-3.0" | ||
s.homepage = "https://decidim.org" | ||
s.metadata = { | ||
"bug_tracker_uri" => "https://github.com/decidim/decidim/issues", | ||
"documentation_uri" => "https://docs.decidim.org/", | ||
"funding_uri" => "https://opencollective.com/decidim", | ||
"homepage_uri" => "https://decidim.org", | ||
"source_code_uri" => "https://github.com/decidim/decidim" | ||
} | ||
s.required_ruby_version = ">= 3.1" | ||
|
||
s.name = "decidim-ai" | ||
s.summary = "A Decidim module with AI tools" | ||
s.description = "A Decidim module with AI tools" | ||
|
||
s.files = Dir["{app,config,db,lib,vendor}/**/*", "Rakefile", "README.md"] | ||
|
||
s.add_dependency "classifier-reborn", "~> 2.3.0" | ||
s.add_dependency "cld", "~> 0.11" | ||
s.add_dependency "decidim-core", Decidim::Ai.version | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,139 @@ | ||
# frozen_string_literal: true | ||
|
||
require "decidim/ai/engine" | ||
|
||
module Decidim | ||
module Ai | ||
autoload :LanguageDetectionService, "decidim/ai/language_detection_service" | ||
autoload :SpamDetectionService, "decidim/ai/spam_detection_service" | ||
autoload :StrategyRegistry, "decidim/ai/strategy_registry" | ||
autoload :LoadDataset, "decidim/ai/load_dataset" | ||
|
||
module SpamContent | ||
autoload :BaseStrategy, "decidim/ai/spam_content/base_strategy" | ||
autoload :BayesStrategy, "decidim/ai/spam_content/bayes_strategy" | ||
end | ||
|
||
include ActiveSupport::Configurable | ||
|
||
# You can configure the spam treshold for the spam detection service. | ||
# The treshold is a float value between 0 and 1. | ||
# The default value is 0.5 | ||
# Any value below the treshold will be considered spam. | ||
config_accessor :spam_treshold do | ||
0.5 | ||
end | ||
# Registered analyzers. | ||
# You can register your own analyzer by adding a new entry to this array. | ||
# The entry must be a hash with the following keys: | ||
# - name: the name of the analyzer | ||
# - strategy: the class of the strategy to use | ||
# - options: a hash with the options to pass to the strategy | ||
# Example: | ||
# config.registered_analyzers = [ | ||
# { | ||
# name: :bayes, | ||
# strategy: Decidim::Ai::SpamContent::BayesStrategy, | ||
# options: { | ||
# adapter: :redis, | ||
# params: { | ||
# url: lambda { ENV["REDIS_URL"] } | ||
# scheme: "redis" | ||
# host: "127.0.0.1" | ||
# port: 6379 | ||
# path: nil | ||
# timeout: 5.0 | ||
# password: nil | ||
# db: 0 | ||
# driver: nil | ||
# id: nil | ||
# tcp_keepalive: 0 | ||
# reconnect_attempts: 1 | ||
# inherit_socket: false | ||
# } | ||
# } | ||
# } | ||
# ] | ||
config_accessor :registered_analyzers do | ||
[ | ||
{ name: :bayes, strategy: Decidim::Ai::SpamContent::BayesStrategy, options: { adapter: :memory, params: {} } } | ||
] | ||
end | ||
|
||
# Language detection service class. | ||
# | ||
# If you want to autodetect the language of the content, you can use a class service having the following contract | ||
# | ||
# class LanguageDetectionService | ||
# def initialize(text) | ||
# @text = text | ||
# end | ||
# | ||
# def language_code | ||
# CLD.detect_language(@text).fetch(:code) | ||
# end | ||
# end | ||
config_accessor :language_detection_service do | ||
"Decidim::Ai::LanguageDetectionService" | ||
end | ||
|
||
# Spam detection service class. | ||
# If you want to use a different spam detection service, you can use a class service having the following contract | ||
# | ||
# class SpamDetectionService | ||
# def initialize | ||
# @registry = Decidim::Ai.spam_detection_registry | ||
# end | ||
# | ||
# def train(category, text) | ||
# # train the strategy | ||
# end | ||
# | ||
# def classify(text) | ||
# # classify the text | ||
# end | ||
# | ||
# def untrain(category, text) | ||
# # untrain the strategy | ||
# end | ||
# | ||
# def classification_log | ||
# # return the classification log | ||
# end | ||
# end | ||
config_accessor :spam_detection_service do | ||
"Decidim::Ai::SpamDetectionService" | ||
end | ||
|
||
# This is the email address used by the spam engine to | ||
# properly identify the user that will report users and content | ||
config_accessor :reporting_user_email do | ||
"reporting.user@domain.tld" | ||
end | ||
|
||
def self.spam_detection_instance | ||
@spam_detection_instance ||= spam_detection_service.constantize.new | ||
end | ||
|
||
def self.spam_detection_registry | ||
@spam_detection ||= Decidim::Ai::StrategyRegistry.new | ||
end | ||
|
||
def self.create_reporting_users! | ||
Decidim::Organization.find_each do |organization| | ||
user = organization.users.find_or_initialize_by(email: Decidim::Ai.reporting_user_email) | ||
next if user.persisted? | ||
|
||
password = SecureRandom.hex(10) | ||
user.password = password | ||
user.password_confirmation = password | ||
|
||
user.deleted_at = Time.current | ||
user.tos_agreement = true | ||
user.name = "" | ||
user.skip_confirmation! | ||
user.save! | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# frozen_string_literal: true | ||
|
||
module Decidim | ||
module Ai | ||
class Engine < ::Rails::Engine | ||
isolate_namespace Decidim::Ai | ||
|
||
paths["db/migrate"] = nil | ||
# paths["lib/tasks"] = nil | ||
|
||
initializer "decidim_ai.classifiers" do |_app| | ||
Decidim::Ai.registered_analyzers.each do |analyzer| | ||
Decidim::Ai.spam_detection_registry.register_analyzer(**analyzer) | ||
end | ||
end | ||
|
||
def load_seed | ||
nil | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# frozen_string_literal: true | ||
|
||
require "cld" | ||
|
||
module Decidim | ||
module Ai | ||
class LanguageDetectionService | ||
def initialize(text) | ||
@text = text | ||
end | ||
|
||
def language_code | ||
CLD.detect_language(@text).fetch(:code) | ||
end | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# frozen_string_literal: true | ||
|
||
module Decidim | ||
module Ai | ||
class LoadDataset | ||
def self.call(file) | ||
service = Decidim::Ai.spam_detection_instance | ||
|
||
ext = File.extname(file)[1..-1] | ||
reader_class = Decidim::Admin::Import::Readers.search_by_file_extension(ext) | ||
reader_class.new(file).read_rows do |row| | ||
next unless %w(spam ham).include?(row[0]) | ||
next if row[1].blank? | ||
|
||
service.train(row[0].to_sym, row[1]) | ||
end | ||
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.