Experiments to extract taxonomy concepts from historical search queries performed on GOV.UK.
- Python 3
- Google Analytics search report CSV
- Inventory CSV
$ git clone git@github.com:alphagov/govuk-search-concepts-experiments.git
$ cd govuk-search-concepts-experiments
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txtExtract search terms applicable to the inventory:
$ ./extract_inventory_searches /path/to/inventory.csv /path/to/search-data.csv | tee /dev/tty > terms.csvCluster search terms based on term frequency–inverse document frequency:
$ ./cluster_search_terms 100 terms.csv | tee /dev/tty > clusters.txt