Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArticleSnapshot table rotation #81

Closed
pm5 opened this issue Feb 25, 2020 · 3 comments
Closed

ArticleSnapshot table rotation #81

pm5 opened this issue Feb 25, 2020 · 3 comments
Assignees

Comments

@pm5
Copy link
Contributor

pm5 commented Feb 25, 2020

ArticleSnapshot 用多個 table 去 rotate,確定 parse 完的 table 就 dump 出來備份,再直接砍掉,這樣會比較好管理。

  • 用 article ID 切換 table
    • discover 步驟:
      1. 寫進新的 article,取回 article id
      2. 根據 article id 取得 snapshot table name
      3. 把 snapshot 寫進指定的 snapshot table
    • update 步驟:
      1. 取得要 update 的 article
      2. 根據 article id 取得 snapshot table name
      3. 把 snapshot 寫進指定的 snapshot table
  • 先把大約一年份的 snapshot table 都開好?
  • 要檢查哪些 table 可以砍掉
@pm5 pm5 self-assigned this Mar 11, 2020
@pm5
Copy link
Contributor Author

pm5 commented Mar 26, 2020

Wrote a tentative scheme for table rotation over here. Partitioning is nice because it is completely done by the storage layer. The crawler pipeline does not have to know about it so there would be fewer code changes.

@pm5
Copy link
Contributor Author

pm5 commented Mar 29, 2020

Going to implement this by the end of March.

@pm5
Copy link
Contributor Author

pm5 commented Apr 7, 2020

Since we do not have a mechanism yet to decide when can we truncate old snapshots, we will do it manually for the moment. This closes this issue.

@pm5 pm5 closed this as completed Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant