Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only store diff in database. #2

Open
junzhengca opened this issue Jan 26, 2017 · 2 comments
Open

Only store diff in database. #2

junzhengca opened this issue Jan 26, 2017 · 2 comments
Assignees

Comments

@junzhengca
Copy link

Currently we store the full source for each fork.
Which is very very bad.
Is there a way we could only store diff? Since we know the parent, we can evaluate diff and give user their code.

The problem with this is that if a fork is too far away from the base, performance will be an issue.

@junzhengca
Copy link
Author

junzhengca commented Jan 26, 2017

image
On above graph, bold text indicates repository identifier, S - indicates commit identifier (save identifier), which only saves diff rather than the full repository.

Let's say we now want to get 128dbece at save 58791a. We have to reconstruct the code by going through: 8e0069cba(8a8d81) -> 8e0069cba(ab61f1) -> 87f294ed(5dc5e5) -> 87f294ed(1e5352) -> 93574bfc(7cbe7) -> 93574bfc(8c3ef) -> 93574bfc(7dea1) -> 93574bfc(5d3ea) -> 128dbece(75caeb) -> 128dbece(58791a)

Which can be very very slow if the tree gets large.

If you guys have time, can you think about a solution that can optimize this?

Don't say Git, it is slow, because it only saves the most recent copy. We must guarantee all save points can be reconstructed very fast.

@junzhengca
Copy link
Author

I will implement the brute force method for now. But we have to optimize in the future for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants