This repository has been archived by the owner on Jan 13, 2022. It is now read-only.
Come up with a solution for consuming crawler events #457
Labels
✨ goal: improvement
Improvement to an existing feature
🙅 status: discontinued
Not suitable for work as repo is in maintenance
We're reading metadata from images on a large scale and sticking it into some Kafka topics. We ought to start incorporating this data into the data layer so we can use it in CC Search. The format of the data is documented here. In summary:
image_metadata_updates
topicimage_metadata_updates
topic.link_rot
topicThis data can be produced continuously by the crawler, so we should prefer building streaming consumers over reading topics in batches.
We know from experience now that dumping this into the meta_data column en masse is not a good option, so this is a good time to start thinking about alternatives.
The text was updated successfully, but these errors were encountered: