FastAPI application with full CRUD operations for 20 newsgroups dataset, backed by Elasticsearch.
docker-compose -f docker-compose-elasticsearch.yml up -d
curl http://localhost:8182/health
Open: http://localhost:8182/docs
- Full CRUD: Create, Read, Update, Delete documents
- Advanced Search: Multi-field search with category/tag/author filters
- Real Data: Load 20newsgroups dataset via scikit-learn
- Bulk Operations: Bulk document creation
- Analytics: Document statistics
POST /documents
- Create documentGET /documents/{id}
- Get documentPUT /documents/{id}
- Update documentDELETE /documents/{id}
- Delete documentPOST /documents/bulk
- Bulk create
GET /search?q=term&category=sci.space&limit=10
GET /search/categories
- List all categories
POST /data/load-20newsgroups?subset=train&max_documents=1000
POST /data/load-sample
- Load sample data
GET /analytics/stats
- Collection statisticsGET /analytics/categories
- Category breakdown
curl -X POST "http://localhost:8182/data/load-20newsgroups?subset=train&max_documents=500"
curl "http://localhost:8182/search?q=space&category=sci.space&limit=5"
curl -X POST "http://localhost:8182/documents" \
-H "Content-Type: application/json" \
-d '{
"title": "New Discussion Topic",
"body": "This is a test post...",
"category": "sci.space",
"author": "test_user",
"tags": ["test", "space"]
}'
Service | URL | Purpose |
---|---|---|
API | http://localhost:8182 | FastAPI app |
Elasticsearch | http://localhost:9200 | Search engine |
Kibana | http://localhost:5601 | Data visualization |
Set these in docker-compose or .env file:
ELASTICSEARCH_PROTOCOL=http
ELASTICSEARCH_HOST=elasticsearch
ELASTICSEARCH_PORT=9200
ELASTICSEARCH_INDEX=newsgroups
LOG_LEVEL=INFO
DEFAULT_MAX_DOCUMENTS=1000
The API supports all 20 original newsgroup categories:
alt.atheism
comp.graphics
comp.os.ms-windows.misc
comp.sys.ibm.pc.hardware
comp.sys.mac.hardware
comp.windows.x
misc.forsale
rec.autos
rec.motorcycles
rec.sport.baseball
rec.sport.hockey
sci.crypt
sci.electronics
sci.med
sci.space
soc.religion.christian
talk.politics.guns
talk.politics.mideast
talk.politics.misc
talk.religion.misc
- Docker & Docker Compose
- Python 3.13+ (for local development)
- scikit-learn (automatically installed)
Elasticsearch data is persisted in Docker volume elasticsearch_data_newsgroups
.