Re-implement the stream name to stream number lookups #84

ghost · 2016-09-08T08:43:24Z

The design goals were to have Stage 2 of the script's internal processing not execute any Accurev commands. This is because we should have all the information needed to create the converted Git branches under the refs/ac2git/depots/ in the Git repository. It was intended that you could re-process the information that was downloaded from Accurev in order to tailor it to your situation or apply bug fixes that were made to the script to your repository.

However, the process by which we convert Accurev stream names to Accurev stream numbers from the information under refs/ac2git/depots/ is slow and error prone, so it should be revised.

A fix for this issue has the potential to break backwards compatibility with refs/ac2git/depots/ that were produced by previous versions of the script forcing those who want to utilize it to re-run the entire conversion once again. A migration path should be considered along-side the fix.

The text was updated successfully, but these errors were encountered:

ghost · 2016-09-08T08:47:44Z

Currently the process creates a JSON object that maps stream names to stream numbers under the refs/ac2git/cache/... ref.

ghost · 2016-09-23T11:21:12Z

This has been partially addressed in commit dfac3f4

When an accurev session is already available we will use Accurev because it is quicker than a first time lookup, however if Accurev is unavailable then we will use purely our Git structures to figure it out.

ghost · 2016-09-23T11:25:56Z

Looking up the stream name in the cache should be quicker than asking Accurev. Asking Accurev is quicker than figuring it out for the first time using Git so we should prefer Accurev over the search. This means that the optimal search order should be:

Cache
Accurev
Search through Git history

However currently it is:

Accurev
Cache
Search

So these lines should be moved to here to optimize the current pattern.

ghost · 2016-09-23T11:28:59Z

That said, we should still refactor the RetrieveStreamInfo() to generate not only */info refs but also */names refs for a stream, which would contain a line separated list of all the past names of a stream.

These could then subsequently be aggregated into a single lookup table which would be a simple UTF-8 encoded text file with the following format:

<stream number> <stream name>

Where the stream numbers and stream names are delimited by a single space or a single tab character.

This would make stream name to stream number lookups very fast.

ghost · 2016-09-23T11:48:42Z

The need for RetrieveStreamInfo() to create */name refs comes from the desire to leave the first stage capable of being run in parallel on multiple machines.

Let's say you have 6 large streams to convert. Currently you can do the following:

Make 2 config files with different git repositories
Put 3 streams in one config file and the other 3 into the other config file.
Configure both scripts to use skip as the merge-strategy (so that they don't generate the branches and only download the data), using your preferred method, i.e. deep-hist.
Run both conversions simultaneously on the same machine or different machines.
Once complete combine the downloaded accurev data into a single Git repository. e.g. Create a new git repository and do a git fetch repo1 refs/ac2git/depots/*:refs/ac2git/depots/* and git fetch repo2 refs/ac2git/depots/*:refs/ac2git/depots/* where repo1 and repo2 represent your converted repositories.
Change one of the config files to point to this combined repository, put all 6 streams back into the config file, change the method to skip and change the merge-strategy to normal or orphan. By specifying the method element as skip in the config file this invocation won't download anything more from Accurev and will simply process the already downloaded data.

This means that you can distribute the conversion across multiple machines or parallelize it to some extent.

This is a feature of the script that would be broken by implementing issue #83, in which case it would be expected that this too is centralized to a single ref.

ghost added enhancement help wanted optimization labels Sep 8, 2016

ghost added this to the Future milestone Sep 8, 2016

orao modified the milestones: v0.7, Future Sep 16, 2016

ghost mentioned this issue Sep 23, 2016

Split the stream info contents #83

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement the stream name to stream number lookups #84

Re-implement the stream name to stream number lookups #84

ghost commented Sep 8, 2016

ghost commented Sep 8, 2016

ghost commented Sep 23, 2016 •

edited by ghost

ghost commented Sep 23, 2016

ghost commented Sep 23, 2016 •

edited by ghost

ghost commented Sep 23, 2016

Re-implement the stream name to stream number lookups #84

Re-implement the stream name to stream number lookups #84

Comments

ghost commented Sep 8, 2016

ghost commented Sep 8, 2016

ghost commented Sep 23, 2016 • edited by ghost

ghost commented Sep 23, 2016

ghost commented Sep 23, 2016 • edited by ghost

ghost commented Sep 23, 2016

ghost commented Sep 23, 2016 •

edited by ghost

ghost commented Sep 23, 2016 •

edited by ghost