Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue5 #6

Merged
merged 19 commits into from Aug 6, 2020
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
70 changes: 42 additions & 28 deletions README.md
Expand Up @@ -16,48 +16,62 @@ repository is compatible with the official Fast Downward Git repository.
resulting Git repository will be written to. The optional parameter can be used to
redirect the output of fast-export to a file.

./run-cleanup-and-conversion.sh MERCURIAL_REPOSITORY CONVERTED_GIT_REPOSITORY \
./run-all-steps.sh MERCURIAL_REPOSITORY CONVERTED_GIT_REPOSITORY \
[--redirect-fast-export-stderr FILE]

The conversion is done in two steps that can also be run individually. In this case
CLEANED_MERCURIAL_REPOSITORY is a location where the intermediate cleaned up Mercurial
repository will be written to:

./run-cleanup.sh MERCURIAL_REPOSITORY CLEANED_MERCURIAL_REPOSITORY
ORDERED_REPOSITORY is an intermediate repository that ensures that the history
contains all commits in the same order as the Fast Downward master repository.
The CLEANED_MERCURIAL_REPOSITORY is a location where the intermediate cleaned
up Mercurial repository will be written to:

./run-cleanup.sh MERCURIAL_REPOSITORY ORDERED_REPOSITORY CLEANED_MERCURIAL_REPOSITORY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is strange to have two intermediate directories as parameters to a single step. It seem to disagree with the decision to merge the two steps into one. What I mean is that passing two directories here makes it seem like two steps that just happen to both be executed by a single script. I'm fine with leaving it like this but it feels odd to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this as 1 intermediate repository (ORDERED_REPOSITORY). CLEANED_* is the intended output of this script, but I also dislike interface. We do not see another way to solve what we want to do. We need to 1. order the commits and 2. cleanup the repository. We do NOT want to modify the MERCURIAL_REPOSITORY.
To order the commits, we have to clone hg.fast-downward.org anew. This requires a directory and pull the users changes into it. The cleanup (hg convert) cannot be done inplace. Thus, we have to create a second directory (we do not overwrite the users repository).

Executing this script (instead of the run-all-steps.sh) shall give the user the ability to investigate what could have gone wrong. Thus, I do not create a temporary directory and delete at the end of the cleanup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all sounds like an argument for having two separate steps for ordering and cleanup. What was the reasoning merging the steps? If the point was that the ordering is something internal, then we can make it more internal by using a temporary directory for it and not bothering the user with this. But if we want to keep the ordered repository after the script runs, it sounds like we should have two steps again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering is more an internal thing for the cleanup. We observed the bad design, when Jendrik ran the ordering which took some time and then the cleanup failed because its requirement (all changes pulled in) were not satisfied. The user should see this directly. Adding the check that all changes are pulled in to the run-all-steps.sh script would be a hack. Instead we made the ordering internal to the cleanup.

We could make the ordered repository a temporary one. For me this script is used by a user who wants to have full control. Therefore, I kept the ordered repository. The user can investigate into it or just add rm DIR to his command. If the majority wants it to be temporary and automatically removed, we can change this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, it still sounds like this should be separate steps to me. If there are conditions that the input (MERCURIAL_REPOSITORY) has to satisfy, they should be check at the start of the first step. If the cloning takes too long to check if there are incoming changes, you could do hg -R MERCURIAL_REPOSITORY incoming hg.fast-downward.org instead . But again, if you already discussed this, I don't want to restart that discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also disagree with keeping an intermediate result around, and even making it a command-line argument. Like Florian said it breaks the abstraction. If we want to facilitate debugging if things go wrong, for me the logical thing is simply to not go out of our way and delete intermediate results in cases where things go wrong, but they should not be kept around if things go right.

I also agree it makes no sense to first clone and then test and fail if there are incoming commits. The test about incoming commits should be made right at the start if we consider it a reason to abort.

We also agreed on these points in the discussion with Silvan and Jendrik.

For what it's worth, regarding debugging: unless there are network errors or similar basic issues, there is no way in which the initial clone-and-pull steps could go wrong.

If you're OK with it, I'll push a suggested change -- if you don't like it, we can strip it or back it out.

./run-conversion.sh CLEANED_MERCURIAL_REPOSITORY CONVERTED_GIT_REPOSITORY \
[--redirect-fast-export-stderr FILE]

The scripts will automatically set up the required tools (a virtual
environment with compatible versions of Mercurial and the fast-export tool
https://github.com/frej/fast-export.git).
The scripts will automatically set up the required tools (a virtual
environment with compatible versions of Mercurial and the fast-export tool
https://github.com/frej/fast-export.git).

## Limitations
- Multiple Mercurial heads with the same branch name are not supported. If your
repository has those, you will see
`Error: repository has at least one unnamed head: hg rXXX`.
- If you have closed and merged a branch "subfeature" into a branch "feature"
and "feature" is not yet merged into "main", you might want to delete "subfeature"
branch from the resulting Git repository by running `git branch -D subfeature`.
- Multiple Mercurial heads with the same branch name are not supported. If your
repository has those, you will see
`Error: repository has at least one unnamed head: hg rXXX`.
- If you have closed and merged a branch "subfeature" into a branch "feature"
and "feature" is not yet merged into "main", you will receive:
`error: The branch 'BRANCH' is not fully merged.`
Don't worry. You might want to delete "subfeature"
branch from the resulting Git repository by running `git branch -D subfeature`.

## Warnings
- Both scripts generate a lot of output on stdout and stderr. If you want
to analyze it, better redirect it into files.
- The cleanup script generates repeated warnings about missing or invalid tags.
These are caused by moved or broken tags and can be ignored.
- The `run-cleanup.sh` and `run-conversion.sh` scripts generate a lot of output
on stdout and stderr. If you want to analyze it, better redirect it into files.
- The cleanup script generates repeated warnings about missing or invalid tags.
These are caused by moved or broken tags and can be ignored.

## Troubleshooting
If you have problems with the `run-all-steps.sh` script, try to run the steps
individually and carefully inspect the output of each step. Depending on your
problems it might help to first pull the changes from
PatrickFerber marked this conversation as resolved.
Show resolved Hide resolved
`http://hg.fast-downward.org` in your repository and then start the conversion
process.

## Details of the cleanup process
- fix and unify author names in commit message
- fix typos in branch names
- remove large files from history that should not have been added
- change commit message to follow the new convention which is to start with
"`[BRANCH NAME] `"
- clone the (Mercurial) Fast Downward master repository
- pull your repository in the master repository
- fix and unify author names in commit message
- fix typos in branch names
- remove large files from history that should not have been added
- change commit message to follow the new convention which is to start with
"`[BRANCH NAME] `"
- strip the open branches `issue323` and `ipc-2011-fixes`

## Details of the conversion process
- convert a Mercurial repository to Git with `fast-export`
- delete all Git branches that belong to Mercurial branches which have been
merged and closed
- remove empty commits
- run garbage collections
- convert a Mercurial repository to Git with `fast-export`
- delete all Git branches that belong to Mercurial branches which have been
merged and closed
- remove empty commits
- run garbage collections


Let's rewrite history!
19 changes: 10 additions & 9 deletions run-cleanup-and-conversion.sh → run-all-steps.sh
Expand Up @@ -25,29 +25,30 @@ fi
TEMP_DIR="$(mktemp -d)"
echo "Storing intermediate repository under ${TEMP_DIR}"
# Generate a path to a non-existing temporary directory.
INTERMEDIATE_REPOSITORY="${TEMP_DIR}/intermediate"
ORDERED_REPOSITORY="${TEMP_DIR}/ordered"
CLEANED_REPOSITORY="${TEMP_DIR}/cleaned"
BASE="$(realpath "$(dirname "$(readlink -f "$0")")")"
SETUP_CLEANUP="${BASE}/setup-cleanup.sh"
SETUP_CONVERSION="${BASE}/setup-conversion.sh"
SETUP_MERCURIAL="${BASE}/setup-mercurial.sh"
SETUP_FAST_EXPORT="${BASE}/setup-fast-export.sh"
RUN_CLEANUP="${BASE}/run-cleanup.sh"
RUN_CONVERSION="${BASE}/run-conversion.sh"

if ! /bin/bash "${SETUP_CLEANUP}"; then
echo "Error during the setup for the cleaning script."
if ! /bin/bash "${SETUP_MERCURIAL}"; then
echo "Error during the Mercurial setup."
exit 2
fi

if ! /bin/bash "${SETUP_CONVERSION}"; then
echo "Error during the setup for the conversion script."
if ! /bin/bash "${SETUP_FAST_EXPORT}"; then
echo "Error during the 'fast-export' setup."
exit 2
fi

if ! "${RUN_CLEANUP}" "${SRC_REPOSITORY}" "${INTERMEDIATE_REPOSITORY}"; then
if ! "${RUN_CLEANUP}" "${SRC_REPOSITORY}" "${ORDERED_REPOSITORY}" "${CLEANED_REPOSITORY}"; then
echo "Cleanup failed."
exit 2
fi

if ! "${RUN_CONVERSION}" "${INTERMEDIATE_REPOSITORY}" "${CONVERTED_REPOSITORY}" $@; then
if ! "${RUN_CONVERSION}" "${CLEANED_REPOSITORY}" "${CONVERTED_REPOSITORY}" $@; then
echo "Conversion failed."
exit 2
fi
Expand Down
44 changes: 33 additions & 11 deletions run-cleanup.sh
Expand Up @@ -2,49 +2,71 @@

set -euo pipefail

if [[ $# -ne 2 ]]; then
echo "Invalid arguments. Use: $0 SRC DST"
if [[ $# -ne 3 ]]; then
echo "Invalid arguments. Use: $0 SRC TMP DST"
exit 1
fi

SRC_REPOSITORY="$1"
CLEANED_REPOSITORY="$2"
shift 2
ORDERED_REPOSITORY="$2"
CLEANED_REPOSITORY="$3"
shift 3

if [[ ! -d "${SRC_REPOSITORY}" ]]; then
echo "Invalid argument. ${SRC_REPOSITORY} has to be a directory."
exit 1
fi

if [[ -e "${ORDERED_REPOSITORY}" ]]; then
echo "Invalid argument. ${ORDERED_REPOSITORY} may not exist."
exit 1
fi

if [[ -e "${CLEANED_REPOSITORY}" ]]; then
echo "Invalid argument. ${CLEANED_REPOSITORY} may not exist."
exit 1
fi


BASE="$(dirname "$(readlink -f "$0")")"
SETUP_CLEANUP="${BASE}/setup-cleanup.sh"
SETUP_MERCURIAL="${BASE}/setup-mercurial.sh"
VIRTUALENV="${BASE}/data/py3-env"

if ! /bin/bash "${SETUP_CLEANUP}"; then
if ! /bin/bash "${SETUP_MERCURIAL}"; then
echo "Error during setup."
exit 2
fi
source "${VIRTUALENV}/bin/activate"

# Disable all extensions.
# (https://stackoverflow.com/questions/46612210/mercurial-disable-all-the-extensions-from-the-command-line)
HGRCPATH= hg \
export HGRCPATH=
export HGPLAIN=


echo "Cloning official repository"
hg clone "http://hg.fast-downward.org" "${ORDERED_REPOSITORY}"

if hg -R "${SRC_REPOSITORY}" incoming "${ORDERED_REPOSITORY}"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test still necessary? After the pull in line 56 the condition will be satisfied in the new repository, even if it wasn't before. I think I suggested this test but this was when we were still removing parts from the official clone before pulling.

I don't see how removing this test would cause any problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is still necessary. It enforces that all current changes from the last Mercurial master are pulled in by the user prior to executing our script. Any 'multiple head on the same branch' issues (or other issues the pull could cause and we do not know about) can be seen in the user's repository.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It enforces that all current changes from the last Mercurial master are pulled in by the user prior to executing our script.

I don't see why that would be necessary. The next step after this test is a pull and after this pull the condition that we test here is satisfied even if it wasn't before.

If our assumption is that the repository does not have multiple heads on a single branch, we should test that assumption explicitly instead of testing something unrelated that would force the user to do a step that generates a warning (not an error) when the assumption is not satisfied.
(Here is an example for testing this but note the comment about the test in line #19: https://gist.github.com/FernFerret/3178035)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our problem is, we do not exactly know when fast-export succeeds and when it fails. Malte tested it and said there are cases when multiple heads on the same branch work. We simply do not know for what exactly we have to test (and additionally, what else we do not know what else could fast-export to fail). We do not want to say you are not allowed to convert this repository, although it would convert perfectly fine.

therefore, we kept the check that all changes have to be pulled. If fast-export tells you know "you have multiple heads on branch X" or "fix that". Those issues are in the user's repository and he can fix them there.
If we would pull the newest changes ourselves, then 1. he could not find the problems that caused the error in his repository and 2. could not fix them.
(yes, he could find and fix them by pulling himself, but for me it is more naturally: "I get an error, I look into my repository" and not: "I get an error, let's look through all the temporary files")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fast-export doesn't produce an error in this case, only a warning and an incompatible repository, But if you already discussed this, I'm fine with leaving things as they are.

echo 1>&2 "Your repository is missing commits from http://hg.fast-downward.org."
echo 1>&2 "You must pull from http://hg.fast-downward.org first."
exit 3
fi

echo "Enforce commit order"
hg -R "${ORDERED_REPOSITORY}" pull "${SRC_REPOSITORY}"

echo "Clean up repository"
hg \
--config extensions.renaming_mercurial_source="${BASE}/renaming_mercurial_source.py" \
--config extensions.hgext.convert= \
--config format.sparse-revlog=0 \
convert "${SRC_REPOSITORY}" "${CLEANED_REPOSITORY}" \
convert "${ORDERED_REPOSITORY}" "${CLEANED_REPOSITORY}" \
--source-type renaming_mercurial_source \
--authormap "${BASE}/data/downward_authormap.txt" \
--filemap "${BASE}/data/downward_filemap.txt" \
--splicemap "${BASE}/data/downward_splicemap.txt" \
--branchmap "${BASE}/data/downward_branchmap.txt"

cd "${CLEANED_REPOSITORY}"
HGRCPATH= hg --config extensions.strip= strip "branch(issue323)" --nobackup
HGRCPATH= hg --config extensions.strip= strip "branch(ipc-2011-fixes)" --nobackup
hg --config extensions.strip= strip "branch(issue323)" --nobackup
hg --config extensions.strip= strip "branch(ipc-2011-fixes)" --nobackup
8 changes: 5 additions & 3 deletions run-conversion.sh
Expand Up @@ -7,15 +7,17 @@ CONVERTED_REPOSITORY="$2"
shift 2

BASE="$(dirname "$(readlink -f "$0")")"
SETUP_CONVERSION="${BASE}/setup-conversion.sh"
SETUP_FAST_EXPORT="${BASE}/setup-fast-export.sh"
CONVERT="${BASE}/convert.py"
VIRTUALENV="${BASE}/data/py3-env"

if ! /bin/bash "${SETUP_CONVERSION}"; then
if ! /bin/bash "${SETUP_FAST_EXPORT}"; then
echo "Error during setup."
exit 2
fi

source "${VIRTUALENV}/bin/activate"
export HGRCPATH=
export HGPLAIN=

HGRCPATH= python3 "${CONVERT}" "${INTERMEDIATE_REPOSITORY}" "${CONVERTED_REPOSITORY}" $@
python3 "${CONVERT}" "${INTERMEDIATE_REPOSITORY}" "${CONVERTED_REPOSITORY}" $@
4 changes: 2 additions & 2 deletions setup-conversion.sh → setup-fast-export.sh
@@ -1,12 +1,12 @@
#!/bin/bash

BASE="$(dirname "$(readlink -f "$0")")"
SETUP_CLEANUP="${BASE}/setup-cleanup.sh"
SETUP_MERCURIAL="${BASE}/setup-mercurial.sh"
FAST_EXPORT_REPO="${BASE}/data/fast-export"
FAST_EXPORT_VERSION="v200213-23-g44c50d0"


if ! /bin/bash "${SETUP_CLEANUP}"; then
if ! /bin/bash "${SETUP_MERCURIAL}"; then
echo "Error during Mercurial setup."
fi

Expand Down
File renamed without changes.