Only store diff in database. #2

junzhengca · 2017-01-26T06:09:08Z

Currently we store the full source for each fork.
Which is very very bad.
Is there a way we could only store diff? Since we know the parent, we can evaluate diff and give user their code.

The problem with this is that if a fork is too far away from the base, performance will be an issue.

The text was updated successfully, but these errors were encountered:

junzhengca · 2017-01-26T06:35:58Z

On above graph, bold text indicates repository identifier, S - indicates commit identifier (save identifier), which only saves diff rather than the full repository.

Let's say we now want to get 128dbece at save 58791a. We have to reconstruct the code by going through: 8e0069cba(8a8d81) -> 8e0069cba(ab61f1) -> 87f294ed(5dc5e5) -> 87f294ed(1e5352) -> 93574bfc(7cbe7) -> 93574bfc(8c3ef) -> 93574bfc(7dea1) -> 93574bfc(5d3ea) -> 128dbece(75caeb) -> 128dbece(58791a)

Which can be very very slow if the tree gets large.

If you guys have time, can you think about a solution that can optimize this?

Don't say Git, it is slow, because it only saves the most recent copy. We must guarantee all save points can be reconstructed very fast.

junzhengca · 2017-01-26T06:41:37Z

I will implement the brute force method for now. But we have to optimize in the future for sure.

junzhengca added the help wanted label Jan 26, 2017

junzhengca assigned TommyX12, junzhengca, Spyguy001 and EvW1998 Jan 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only store diff in database. #2

Only store diff in database. #2

junzhengca commented Jan 26, 2017

junzhengca commented Jan 26, 2017 •

edited

Loading

junzhengca commented Jan 26, 2017

Only store diff in database. #2

Only store diff in database. #2

Comments

junzhengca commented Jan 26, 2017

junzhengca commented Jan 26, 2017 • edited Loading

junzhengca commented Jan 26, 2017

junzhengca commented Jan 26, 2017 •

edited

Loading