Retrieve data from a public API, then train the Document Classifier to predict keywords for new text samples
Algorithmia's Document Clasifier lets you train it on a set of documents (blocks of text), each associated with a keyword. Once it has been trained, you can then give a new document and it will return a set of predicted keywords.
For the full blog post related to this recipe, see https://blog.algorithmia.com/acquiring-data-for-document-classification/.
Create a free Algorithmia account, and install the Algorithmia Python client and BeautifulSoup:
pip install algorithmia pip install beautifulsoup4
Detailed instructions can be found in the blog post.
How To Run the Script
First, edit the script and replace
your_api_key with your Algorithmia API Key
Use the command line, and navigate to the folder with your Python file and run:
This sample used PubMed data, but to go further, modify the script to use a different datasource API or a webpage scraper such as https://algorithmia.com/algorithms/web/HTMLDataExtractor