Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site checkpoints #1053

Open
HelloZeroNet opened this issue Jul 27, 2017 · 0 comments

Comments

@HelloZeroNet
Copy link
Owner

commented Jul 27, 2017

Move older site data to a separate checkpoint file to reduce site size, number of files and speed up initial download time.

Example

Checkpoint file: data/users/checkpoint-2017-07-27.json.zip

{
"112GGMvUJbBTCtQu8UUSYpo8UjLdo1B73n/content-2017-07-27.json": {
   "cert_auth_type": "web",
   "cert_user_id": "qu363c@zeroid.bit"
},
"112GGMvUJbBTCtQu8UUSYpo8UjLdo1B73n/data-2017-07-27.json": {
    "comment": {
        "3_1CoVAt2jBzV4Na6nX3VG1JSfjS8fwse1Z9": [
			{
				"comment_id": 1,
				"body": "This is great!",
				"added": 1469117689
			}
	]
    }
}

When the data/users/checkpoint-2017-07-27.json.zip file downloaded (automatically or manually triggered if optional file) the DB layer acts as if it just received 112GGMvUJbBTCtQu8UUSYpo8UjLdo1B73n/content-2017-07-27.json and 112GGMvUJbBTCtQu8UUSYpo8UjLdo1B73n/data-2017-07-27.json files with the given content.

Problem 1: Know which data is archived.

When a new update comes for a file the db layer drops the old entries and insert the new ones. So we has to know which entries belongs to checkpoint file and which ones are stored in user's data.json.

Solution A: Change the filename

Archive 112GGMvUJbBTCtQu8UUSYpo8UjLdo1B73n/data.json file as 112GGMvUJbBTCtQu8UUSYpo8UjLdo1B73n/data-2017-07-27.json

Pros:

  • Does not require changes in site's sqlite database file json table.
    Cons:
  • Changes in dbschema.json and site's sql queries may required.

Solution B

Add a new checkpoint col to the site's sqlite database json table that stores if the data comes from a checkpoint.

json_id    	directory                              	inner_path	checkpoint
1       	12uZs7vLvb1b3LdmR3RyYvoCkEazYn162W	content.json	
2        	12uZs7vLvb1b3LdmR3RyYvoCkEazYn162W	data.json
3       	12uZs7vLvb1b3LdmR3RyYvoCkEazYn162W	data.json	2017-07-27

So when the update comes in for data.json it will only delete the entries with json_id 2.

Pros:

  • Probably more compatibility with current site's data structure
    Cons:
  • Requires re-create site's json table by reloading all data file.

TODO

  • Sample files
  • Test parsing
  • Test updating
  • cpschema.json example
  • Generate checkpoint based on cpschema.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.