Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement the stream name to stream number lookups #84

Open
ghost opened this issue Sep 8, 2016 · 5 comments
Open

Re-implement the stream name to stream number lookups #84

ghost opened this issue Sep 8, 2016 · 5 comments

Comments

@ghost
Copy link

ghost commented Sep 8, 2016

The design goals were to have Stage 2 of the script's internal processing not execute any Accurev commands. This is because we should have all the information needed to create the converted Git branches under the refs/ac2git/depots/ in the Git repository. It was intended that you could re-process the information that was downloaded from Accurev in order to tailor it to your situation or apply bug fixes that were made to the script to your repository.

However, the process by which we convert Accurev stream names to Accurev stream numbers from the information under refs/ac2git/depots/ is slow and error prone, so it should be revised.

A fix for this issue has the potential to break backwards compatibility with refs/ac2git/depots/ that were produced by previous versions of the script forcing those who want to utilize it to re-run the entire conversion once again. A migration path should be considered along-side the fix.

@ghost
Copy link
Author

ghost commented Sep 8, 2016

Currently the process creates a JSON object that maps stream names to stream numbers under the refs/ac2git/cache/... ref.

@ghost ghost added this to the Future milestone Sep 8, 2016
@orao orao modified the milestones: v0.7, Future Sep 16, 2016
@ghost
Copy link
Author

ghost commented Sep 23, 2016

This has been partially addressed in commit dfac3f4

When an accurev session is already available we will use Accurev because it is quicker than a first time lookup, however if Accurev is unavailable then we will use purely our Git structures to figure it out.

@ghost
Copy link
Author

ghost commented Sep 23, 2016

Looking up the stream name in the cache should be quicker than asking Accurev. Asking Accurev is quicker than figuring it out for the first time using Git so we should prefer Accurev over the search. This means that the optimal search order should be:

  1. Cache
  2. Accurev
  3. Search through Git history

However currently it is:

  1. Accurev
  2. Cache
  3. Search

So these lines should be moved to here to optimize the current pattern.

@ghost
Copy link
Author

ghost commented Sep 23, 2016

That said, we should still refactor the RetrieveStreamInfo() to generate not only */info refs but also */names refs for a stream, which would contain a line separated list of all the past names of a stream.

These could then subsequently be aggregated into a single lookup table which would be a simple UTF-8 encoded text file with the following format:

<stream number> <stream name>

Where the stream numbers and stream names are delimited by a single space or a single tab character.

This would make stream name to stream number lookups very fast.

@ghost
Copy link
Author

ghost commented Sep 23, 2016

The need for RetrieveStreamInfo() to create */name refs comes from the desire to leave the first stage capable of being run in parallel on multiple machines.

Let's say you have 6 large streams to convert. Currently you can do the following:

  1. Make 2 config files with different git repositories
  2. Put 3 streams in one config file and the other 3 into the other config file.
  3. Configure both scripts to use skip as the merge-strategy (so that they don't generate the branches and only download the data), using your preferred method, i.e. deep-hist.
  4. Run both conversions simultaneously on the same machine or different machines.
  5. Once complete combine the downloaded accurev data into a single Git repository. e.g. Create a new git repository and do a git fetch repo1 refs/ac2git/depots/*:refs/ac2git/depots/* and git fetch repo2 refs/ac2git/depots/*:refs/ac2git/depots/* where repo1 and repo2 represent your converted repositories.
  6. Change one of the config files to point to this combined repository, put all 6 streams back into the config file, change the method to skip and change the merge-strategy to normal or orphan. By specifying the method element as skip in the config file this invocation won't download anything more from Accurev and will simply process the already downloaded data.

This means that you can distribute the conversion across multiple machines or parallelize it to some extent.

This is a feature of the script that would be broken by implementing issue #83, in which case it would be expected that this too is centralized to a single ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant