This repository has been archived by the owner. It is now read-only.
Automatically tag content on GOV.UK using machine learning (experiment, not live)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
data
lib
.gitignore
.rspec
.ruby-version
Gemfile
Gemfile.lock
Guardfile
LICENSE
README.md
Rakefile

README.md

AutoTagger

What is it?

This is an experimental automatic content tagger for GOV.UK pages based on the Ankusa gem, using the naive Bayes algorithm.

It attempts to determine correct tags for a page by learning from other, manually tagged pages.

How to use it?

To run the script locally, run ./bin/tag.rb file_name in your command line.

The file you pass to the script should be in CSV format with three columns - URL, tag and content. For an example, see the sample_content.csv file.

How to run the tests?

Just run rspec in the command line (which will work once the tests are written).

License

See the LICENSE file.