Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
document-classification.py
readme.md

readme.md

Retrieve data from a public API, then train the Document Classifier to predict keywords for new text samples

Algorithmia's Document Clasifier lets you train it on a set of documents (blocks of text), each associated with a keyword. Once it has been trained, you can then give a new document and it will return a set of predicted keywords.

For the full blog post related to this recipe, see https://blog.algorithmia.com/acquiring-data-for-document-classification/.

Getting Started

Create a free Algorithmia account, and install the Algorithmia Python client and BeautifulSoup:

pip install algorithmia
pip install beautifulsoup4

Detailed instructions can be found in the blog post.

How To Run the Script

First, edit the script and replace your_api_key with your Algorithmia API Key

Use the command line, and navigate to the folder with your Python file and run:

python document-classification.py

This sample used PubMed data, but to go further, modify the script to use a different datasource API or a webpage scraper such as https://algorithmia.com/algorithms/web/HTMLDataExtractor

Built With

You can’t perform that action at this time.