A Go Application combined API service, Apache Kafka, and Elasticsearch, which handles news synchronization and news retrieval.
Go/Golang || HTTP Service || Apche Kafka || Redis || Elasticsearch || Linux
1 Connect different open news sources with API service.
2 Normalizing the data into a unified format.
3 Feed data from the Apache Kafka and pushing it downstream.
Open Sources Example:
1. news API: https://www.juhe.cn/docs/api/id/235
2. 163News API: https://www.jianshu.com/p/c54e25349b77
3. TOPNEWS API: https://www.tianapi.com/apiview/99
4. GENERNALNEWS API: https://www.tianapi.com/apiview/87
Unified Data Structure:
type StdNew struct {
Timestamp string `json:"timestamp"`
Source string `json:"source"`
Title string `json:"title"`
Body string `json:"body"`
URL string `json:"url"`
Types []string `json:"types"`
}
Kafka repository: https://github.com/confluentinc/confluent-kafka-go
API Crawling: https://www.tianapi.com/apiview/87
1 Consume data from Apache Kafka.
2 Optimize data, using Redis filtering news to accomplish data deduplication.
3 Create the data index and write into Elasticsearch.
Open Source:
Content Moderation API:
1. Tencent could: https://cloud.tencent.com/product/tms
2. Baidu could: https://ai.baidu.com/tech/textcensoring
Others:
1. jieba: https://github.com/fxsjy/jieba
2. Sensitive words: https://github.com/jkiss/sensitive-words
3. Trie tree with AC automaton: https://github.com/ChisBread/TrieNAhoCorasick
4. Tencent Cloud NLP: https://cloud.tencent.com/product/nlp
5. simhash: https://blog.csdn.net/lengye7/article/details/79789206
Redis repository: https://godoc.org/github.com/gomodule/redigo/redis#pkg-index
Elasticsearch repository: https://github.com/olivere/elastic
Library: https://godoc.org/
1 Implement back-end interfaces for news retrieval, news recommendation, and news timeline count.
2 Front-end developer is responsible for the UI design.



