# Cotiss Takehome Project

## Overview

In Australia and New Zealand alone there are over 40,000 government contracts posted each year. Most contracts have common meta data such as the date they are published, their associated category and the region in which the good/service is being procured. In addition, they contain free text that describes the details of the tender.

Whilst filtering tenders by the meta data reduces the volume of contracts SMEs have to evaluate, there is still in infeasible amount to evaluate. In order to make these contracts accessible to SMEs, each tender must be searchable via a free text search. This means that the content of the tender can be processed, so relevant businesses can be notified.

A key issue in searching for listings is that, often, those searching don't know what phrases to search for. An improvement for plaintext search is an autocomplete search feature that prompts the user on what they should search. You have probably experienced this when doing a Google search.

## The task

For this task, we have collated a file _keywords.txt_ containing the content of thousands of public tenders posted through Government portals. Your job is to create a model that extracts the common keywords and key-phrases from this text, so it can be used to create search suggestions.

Your approach may:
- Strip non-sensical words such as website links from the text.
- Index each key-phrase with a count of the number occurrences.
- Remove stopwords that don't provide any context on the tender itself.
- Filter out all the junk e.g. "morrinsvilletahuna road matamata piako district council boundary bridge".
- Trim key phrases that start with common words e.g. "the", "their", "your", "a".

These are just suggestions. You may choose not to follow any of these approaches or follow them all - it's up to you.

We are not asking to implement the search component itself. This project is open ended and designed to provide a platform to show your creativity, coding and problem skills. There is also no strict requirement on what the final solution should look like. There are no right or wrong answers so don't overthink it - just state your assumptions and justification for the model you think works best. If you have any questions, please reach out to matthew.oh@cotiss.com and ask away - we're here to help!

Your approach may also include external services in addition to python. However, if you do so, please state what service was used and how you used it. An example might be using AWS Comprehend to extract key phrases.

## Output

- Once the model has been produced, it should show the output when run on the file provided. 
- Graphical analysis in order to interprets the results.
- Suggestions on how this data could be combined with the search input to produce autocomplete suggestions. For example: "We recommend using n-gram, weighted by the frequency of occurences to produce suggestions".
