Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a shallow clone example for checkout step for 2.0 #2040

Closed
Aghassi opened this issue Feb 20, 2018 · 20 comments
Closed

Add a shallow clone example for checkout step for 2.0 #2040

Aghassi opened this issue Feb 20, 2018 · 20 comments

Comments

@Aghassi
Copy link
Contributor

Aghassi commented Feb 20, 2018

Many users have been asking about shallow clones in 2.0. I've made a modification to the existing checkout command that allows just this. However, I'm not sure where to put it in the docs. It does work, though not 100% (forks cause issues). However, it does work to do checkouts in PRs etc. It helped us reduce a checkout from 7 min. to 10s.

Put the following at the top of the config file.

aliases:
  # Shallow Clone
  - &checkout-shallow
    name: Checkout
    command: |
#!/bin/sh
set -e

# Workaround old docker images with incorrect $HOME
# check https://github.com/docker/docker/issues/2968 for details
if [ "${HOME}" = "/" ]
then
  export HOME=$(getent passwd $(id -un) | cut -d: -f6)
fi

mkdir -p ~/.ssh

echo 'github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ==
bitbucket.org ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAubiN81eDcafrgMeLzaFPsw2kNvEcqTKl/VqLat/MaB33pZy0y3rJZtnqwR2qOOvbwKZYKiEO1O6VqNEBxKvJJelCq0dTXWT5pbO2gDXC6h6QDXCaHo6pOHGPUy+YBaGQRGuSusMEASYiWunYN0vCAI8QaXnWMXNMdFP3jHAJH0eDsoiGnLPBlBp4TNm6rYI74nMzgz3B9IikW4WVK+dc8KZJZWYjAuORU3jc1c/NPskD2ASinf8v3xnfXeukU0sJ5N6m5E8VLjObPEO+mN2t/FZTMZLiFqPWc/ALSqnMnnhwrNi2rbfg/rd/IpL8Le3pSBne8+seeFVBoGqzHM9yXw==
' >> ~/.ssh/known_hosts

(umask 077; touch ~/.ssh/id_rsa)
chmod 0600 ~/.ssh/id_rsa
(cat <<EOF > ~/.ssh/id_rsa
$CHECKOUT_KEY
EOF
)

# use git+ssh instead of https
git config --global url."ssh://git@github.com".insteadOf "https://github.com" || true

if [ -e /home/circleci/project/.git ]
then
    cd /home/circleci/project
    git remote set-url origin "$CIRCLE_REPOSITORY_URL" || true
else
    mkdir -p /home/circleci/project
    cd /home/circleci/project
    git clone --depth=1 "$CIRCLE_REPOSITORY_URL" .
 fi

 if [ -n "$CIRCLE_TAG" ]
 then
   git fetch --depth=10 --force origin "refs/tags/${CIRCLE_TAG}"
 elif [[ "$CIRCLE_BRANCH" =~ ^pull\/* ]]
 then
   # For PR from Fork
    git fetch --depth=10 --force origin "$CIRCLE_BRANCH/head:remotes/origin/$CIRCLE_BRANCH"
 else
    git fetch --depth=10 --force origin "$CIRCLE_BRANCH:remotes/origin/$CIRCLE_BRANCH"
 fi

if [ -n "$CIRCLE_TAG" ]
then
    git reset --hard "$CIRCLE_SHA1"
    git checkout -q "$CIRCLE_TAG"
elif [ -n "$CIRCLE_BRANCH" ]
then
    git reset --hard "$CIRCLE_SHA1"
    git checkout -q -B "$CIRCLE_BRANCH"
fi

git reset --hard "$CIRCLE_SHA1"

And then the user can checkout using this command by running the below instead of using the checkout command

 - run: *checkout-shallow

Sorry for the spacing issues. You get the idea though. It would be nice to be able to have this option backed in, but at least this would get users who need it off the ground.

@smart-alek
Copy link
Contributor

smart-alek commented Feb 20, 2018

@Aghassi this is truly fantastic!

In regards to putting this in docs, I think you're right to have some questions about where to put this. Because we can't guarantee 100% functionality, I hesitate to commit this to our official docs.

However! Here are some ideas:

  1. Definitely agree that this is useful. 7 minutes to 10 seconds is a beautiful thing. I'd drop this in our Discuss forum for now, and if you send me a link, I or another mod will pin that in the appropriate category.

  2. I'm going to flag this to one of our product teams and see if we can't get this baked in to our official configuration syntax. It's clearly useful, so the sell shouldn't be hard. ;)

Thanks again!

@smart-alek smart-alek self-assigned this Feb 20, 2018
@Aghassi
Copy link
Contributor Author

Aghassi commented Feb 22, 2018

@smart-alek Done! https://discuss.circleci.com/t/shallow-clone-for-circleci-2-0-builds/20200

@quicksketch
Copy link

Googled this same question just today and this result came up. This would be lovely. We have a legacy Git repo that has a 500MB checkout due to some accidents in the past (80MB CSS files!) Fixing the repo would be best but it's a hard thing with a team of devs and multiple environments already using this repo. Shallow cloning is our temporary solution, and it would be great if CircleCI could support it natively.

I'm going to flag this to one of our product teams and see if we can't get this baked in to our official configuration syntax. It's clearly useful, so the sell shouldn't be hard. ;)

Could you post here any updates as they happen?

And thanks @Aghassi!

@smart-alek
Copy link
Contributor

WELL @Aghassi @quicksketch here is the requested update! But it is not happy news!

The stance of our developers is this: GitHub hates it when we shallow clone. They're CPU expensive and don't want it to become A Thing — to the extent that they'll automatically rate limit any project that is doing too much of it. On top of that, it's not always necessarily faster than a full clone.

SO. At the moment, we don't have any real plans to bake this in. Blah!

@Aghassi
Copy link
Contributor Author

Aghassi commented Feb 23, 2018

@smart-alek Unfortunate, but fair. I work in enterprise, so we don't experience that side of the issue since we run GitHub Enterprise. At least for the enterprise offering, it would be nice. For the public offering, I can't speak much to it.

The script above is fully working for both Forks and normal PRs. People can do with it as they please.

@smart-alek
Copy link
Contributor

Excellent @Aghassi, thanks for understanding!

@michaelcarrano
Copy link

It would be great if this can become official on CircleCI.

We currently use Travis-CI but looking to migrate to CircleCI and I noticed today that CircleCI does a full clone (our history is 2GB!). Travis-CI does a shallow clone by default with a depth of 50 and you can customize it if needed.

It is not a 1 to 1 comparison but on Travis-CI the checkout takes just over 1 second where on CircleCI the checkout takes over a minute.

@smart-alek
Copy link
Contributor

smart-alek commented Feb 26, 2018

Hi folks, this conversation does seem to be evolving beyond a purely docs issue. Can I ask a motivated participant write up a summary and add it to our Ideas page?

@Aghassi
Copy link
Contributor Author

Aghassi commented Feb 27, 2018

@smart-alek Done

Idea #: CCI-I-254
https://circleci.ideas.aha.io/ideas/CCI-I-254

@lra
Copy link

lra commented Apr 5, 2018

Not the good "idea", here's the good one: CCI-I-254

@juanca
Copy link

juanca commented Nov 14, 2018

For those interested in using a shallow checkout, I just published an orb under our organization @ https://github.com/mavenlink/orbs/blob/926f713b56483c0a3379261e901ae883467ab363/src/git/orb.yml

Pretty easy to use:

orbs:
  git: mavenlink/git@1.0.0

...

jobs:
  build:
    resource_class: small
    steps:
      - restore_cache:
          name: Restoring source code from cache
          keys:
            - source-v1-{{ .Branch }}-{{ .Revision }}
            - source-v1-{{ .Branch }}
            - source-v1-
      - git/checkout
      - save_cache:
          name: Saving source code in cache
          key: source-v1-{{ .Branch }}-{{ .Revision }}
          paths:
            - /home/app/current
      - run:
          name: Reduce the size of the persisted workspace
          command: rm -rf .git
      - persist_to_workspace:
          paths:
            - /home/app/.ssh/known_hosts
            - /home/app/current

@azizhk
Copy link

azizhk commented Mar 25, 2019

Hey all,
Any consensus on which is the preferred way to go about this?
Cache the repo directory or hope that github does not rate limit shallow cloning for your repo?

@Aghassi
Copy link
Contributor Author

Aghassi commented Mar 25, 2019

I’ve been using this method for over a year on GitHub enterprise and never had an issue. I don’t recommend caching the git repo.

@ricoli
Copy link

ricoli commented Jun 19, 2020

There's also orbs to achieve shallow cloning, https://circleci.com/orbs/registry/orb/guitarrapc/git-shallow-clone for example

@ricoli
Copy link

ricoli commented Jun 19, 2020

WELL @Aghassi @quicksketch here is the requested update! But it is not happy news!

The stance of our developers is this: GitHub hates it when we shallow clone. They're CPU expensive and don't want it to become A Thing — to the extent that they'll automatically rate limit any project that is doing too much of it. On top of that, it's not always necessarily faster than a full clone.

SO. At the moment, we don't have any real plans to bake this in. Blah!

It's been a while since this happened, and I assume Github has changed their mind on that as the default of the checkout github action is to have a depth of 1 when cloning:

    # Number of commits to fetch. 0 indicates all history.
    # Default: 1
    fetch-depth: ''

https://github.com/actions/checkout

@Pi-George
Copy link

@juanca

Your orb doesn't work unfortunately, path issues, you seem to be trying to mkdir -p /home/app/current instead of mkdir -p '/home/circleci/project'

@juanca
Copy link

juanca commented May 11, 2021

@Pi-George The path is a parameter (repo_path) and it sources its default from $CIRCLE_WORKING_DIRECTORY var. Ensure your env var is set up or the orb is configured to your needs.

mkdir -p << parameters.repo_path >>

https://github.com/mavenlink/orbs/blob/14c295c3df153a6a34749fcf9a0958d3e42bd668/src/git/orb.yml#L52

@whiskey
Copy link

whiskey commented May 20, 2021

I've just found this nice article on the Github blog. As previously stated they also say shallow clones are expensive BUT a valid way to handle repositories which will be deleted after usage:

Quick Summary
There are three ways to reduce clone sizes for repositories hosted by GitHub.

  • git clone --filter=blob:none <url> creates a blobless clone. These clones download all reachable commits and trees while fetching blobs on-demand. These clones are best for developers and build environments that span multiple builds.
  • git clone --filter=tree:0 <url> creates a treeless clone. These clones download all reachable commits while fetching trees and blobs on-demand. These clones are best for build environments where the repository will be deleted after a single build, but you still need access to commit history.
  • git clone --depth=1 <url> creates a shallow clone. These clones truncate the commit history to reduce the clone size. This creates some unexpected behavior issues, limiting which Git commands are possible. These clones also put undue stress on later fetches, so they are strongly discouraged for developer use. They are helpful for some build environments where the repository will be deleted after a single build.

I think this opens room for improvement of the standard CircleCI configuration.

@liberato-whisper
Copy link

Has anyone found a configuration of this command that works for windows VMs? I have shallow checkouts working in all of our jobs except those running in windows VMs. I've tried to convert it to PowerShell and still no luck, getting Host key verification failed.

@pkyeck
Copy link

pkyeck commented Mar 6, 2023

@juanca if I try to working_directory: ~/project/service and then

- git/checkout:
    repo_path: ~/project

because I want to clone a monorepo and work in a certain subfolder, I always get the error

fatal: destination path '.' already exists and is not an empty directory.

because the ~/project/service folder is already created (by circleci!?). ever ran into this?
works with the normal checkout command ... and I don't want to change the working directory b/c otherwise I'd need to adjust all the following steps or they happen in the wrong folder (root of monorepo).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests