Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CDN + jsDelivr to avoid cloning the main repo #8268

Closed
igor-makarov opened this issue Nov 12, 2018 · 20 comments
Closed

Use CDN + jsDelivr to avoid cloning the main repo #8268

igor-makarov opened this issue Nov 12, 2018 · 20 comments
Labels
t3:discussion These are issues that can be non-issues, and encompass best practices, or plans for the future.
Milestone

Comments

@igor-makarov
Copy link
Contributor

Hi all,

For the past couple of months, I've been researching ways to make pod install not rely on git as its main spec registry.

I have developed an experimental plugin that does this. It translates pod names to jsDelivr URLs and downloads the files locally. This way, only the pods necessary are fetched and not the entire spec repo.

There is a small caveat, though: jsDelivr CDN and GitHub's CDN do not have directory listings. On the other hand, GitHub's API is severely rate-limited. To solve this, I have created a fork of the spec repo that generates an index file for each pod directory. I'm auto-updating this repo using a Jenkins job.

I have gotten good performance out of this, both in development and on CI. There's no clone and no pull anymore. I think this can be made available on main, without the use of plugins or repo clones.

Suggested course of action:

  1. Go over the main spec repo and generate per-pod directory index files.
  2. Update trunk server to generate index files on every new commit.
  3. Modify Pod::Source to support this CDN-based way and activate the main spec repo to support this by default.

I've been in contact with jsDelivr and they are ok with this. In addition, GitHub has their own CDN, which is not rate-limited. The one jsDelivr have is more performant, though.

Let me know whether you like my idea. I'm more than willing to put in work and make pod install less painful for everyone.

@igor-makarov
Copy link
Contributor Author

igor-makarov commented Nov 12, 2018

P.S. I've seen the discussion about this last year. I think my solution improves on the one suggested by @dantoml, because in my tests, tar performance on the full master tarball was pretty bad.

@endocrimes
Copy link
Member

@igor-makarov FWIW, @segiddins and I have chatted about moving to an alternative to the git repo for a while, but our main blocker is $ to support the continued running of the service, and also actually supporting it after it's deployed.

That being said, if we were to build this, we'd probably go down a route similar to bundler's compact index, with a CDN for serving the actual spec files.

@dnkoutso dnkoutso added the t3:discussion These are issues that can be non-issues, and encompass best practices, or plans for the future. label Nov 12, 2018
@igor-makarov
Copy link
Contributor Author

@dantoml
Bundler's index is indeed very compact. However, the solution I proposed is pretty similar to that. The version index is provided on a per-pod basis.
As for the $ blocker, my solution is already being served via the free jsDelivr CDN and they told me they see no problem with this usage.

@endocrimes
Copy link
Member

@igor-makarov we'd possibly be able to serve the specs over that CDN, but we'd need something like compactindex for efficiencies sake when resolving or implementing various other features. We'd also want an API shim in place that 301s to specs, to avoid baking in a particular provider or domain for the sake of future resiliency.

We have an existing relationship with Heroku for covering a lot of the backend hosting fees, especially if we write the service in Go or something similarly efficient (although that would involve reimplementing compactindex). We'd want to chat with CDN providers before sending all of CocoaPods production load at them tho (there are 10s of millions of installations per day).

@igor-makarov
Copy link
Contributor Author

igor-makarov commented Nov 13, 2018

@dantoml
My proposed solution does not involve writing any backend servers. It is simple raw file access from a GH repo.

I believe this approach not only saves the trouble of engineering the backend and reimplementing compact index, but is also rather resilient as the incremental changes to the spec repo will be confined to the pod dir, preventing conflicts.

As for future resiliency, I believe that it's not going to be decreased by using a CDN. It's basically accessing CocoaPods/Specs via a proxy. We can also just use raw.githubusercontent.com in place of jsDelivr. It's not rate limited, as opposed to the API.

@MartinKolarik
Copy link

@dantoml hi, Martin from jsDelivr here. Feel free to ping me here at any time.

As @igor-makarov mentioned we already discussed this a little, 10s of millions of installations per day should be fine. We might also be able to tweak the caching config specifically for your project for even better performance.

@igor-makarov
Copy link
Contributor Author

@MartinKolarik thanks for weighing in.

I have tested this approach with a rather large iOS app with 33 of the pods currently sourced from the trunk. I am happy to provide my results.

The timings are amazing!

  • The repo takes 9s to pod install (with lockfile and spec repo updated).
  • A clean install starting with an empty local cache took under 20s.
  • A happy install with everything cached took 10s.

The new timings are just that, no setup costs at all.

In addition, this method is very suitable for CI build artefact caching, since the entire directory was only 4k entries as opposed to the 600k+ of the full CP specs (also, only 12MB rather than 1.1GB). @dantoml, you probably recall the terrible tar performance when faced with the repo tarball.

@igor-makarov
Copy link
Contributor Author

I've submitted #8280 & Core#469 as a way to add a CDN-source as a spec repo.

Pending approval, this leaves us with the trunk grooming mentioned above. I've simulated the desired effect by having it forked, and running a Jenkins job on each commit. The results are awesome, and we've been using it for our test & production iOS builds for a couple of weeks now.

The Jenkins is private, but here's the relevant snippet from the job:

find . -mindepth 5 -maxdepth 5 -type d -not -wholename '**/.git/**/*' -print0 | \
  xargs -0 -I {} bash -c 'cd "{}"; ls -1 | grep -v "index.txt" > index.txt'
find . -mindepth 5 -maxdepth 5 -type d -not -wholename '**/.git/**/*' | \
  cut -c15- | sort > all_pods.txt

git add .
git commit -m "indexed" || true
git push origin master --force

Ultimately, I think that trunk writing should be modified to generate these indices on the fly, but due to the experimental status of the CDN source, we might want to keep it as a recurring sync job for now.
However, I have a bit of a worry due to the recent npm "attractive nuisance" stuff and would like the sync job to run under CP org credentials.

@orta
Copy link
Member

orta commented Nov 27, 2018

Cool, I think first we'll need to get the incremental changes for updating those indexes applied via trunk first, otherwise that job will become out of date very quickly, ideally that should only need to run once to handle the backlog and maybe myself or someone who currently has commit access to specs can run it once we're all happy with the incremental work being solid 👍

@igor-makarov
Copy link
Contributor Author

Allow me to clarify. The job runs on each commit that adds a pod and then force-pushes. The total runtime is about 3 minutes. While not ideal in the long term, I think this will serve better for the first, experimental phase of this CDN endeavor.

The resource usage on our Jenkins server was minimal and hadn't bothered the admins at all. I would like, however, for the job to run under CocoaPods' credentials. I think force-pushing into a separate branch in CocoaPods/Specs won't bother anyone and will allow us to get CDNSource into as many hands as possible.

@orta
Copy link
Member

orta commented Nov 28, 2018

I'm afraid not, if it's not running on infra we own (and can introspect, debug etc) then I'm not really happy to give credentials out for the Specs repo, we keep those locked pretty tightly even within the core team (as it's effectively giving admin access to every pod) - I can set up a webhook from CocoaPods/Specs which you can use to automate your current a fork of Specs?

I think there's value in adding the indexes regardless, I will try make time this weekend to look at amending the commits that trunk sends to start adding the indexes ( /cc @alloy ) first at the Podspec folder level, and then at the global index level

@orta
Copy link
Member

orta commented Nov 28, 2018

That said, maybe we can instead make a CocoaPods/CDNSpecs repo and have it push to that instead?

@igor-makarov
Copy link
Contributor Author

I wasn't suggesting that I get CP repo credentials! 😊

On the contrary. I wanted to get the current sync job out of my hands. It's currently running on my employer's private Jenkins and while the admins don't have a problem with that now, who says they can't change their mind?

Does CP have a Jenkins server or some other kind of service where you can push commits and not just run CI?

@igor-makarov
Copy link
Contributor Author

CircleCI allows attaching user keys to builds: link.

@orta
Copy link
Member

orta commented Dec 11, 2018

I'm struggling to get any free OSS time for Danger, let alone CocoaPods, I think my aim is to have this running on on a Circle CI scheduled cron job running every 10-15m?

@igor-makarov
Copy link
Contributor Author

You could run it on-commit, even. It takes 3 minutes to complete.

@ozmium
Copy link

ozmium commented Apr 13, 2019

I don't know how to get this working. I installed cocoapods 1.7.0.beta.3, using sudo gem install cocoapods --pre. But whenever I run pod setup or pod install on an Xcode project, it starts downloading the huge Cocoapods git repo.

So how do we use the jsDelivr CDN repo? Or how do we specify not to download the git repo locally?


Update: I assume the instructions are in this comment ....

#7046 (comment)

To take advantage of CDN-based spec repo do the following:

  • Use a Gemfile and get the latest cocoapods and cocoapods-core gems from master.
  • On command line: bundle exec pod repo add-cdn jsDelivr-Specs "https://cdn.jsdelivr.net/cocoa/"
  • In your Podfile: source "https://cdn.jsdelivr.net/cocoa/"
  • If you already have a Podfile.lock replace the SPEC REPOS key there as well - there's a bug that prevents it from auto-updating.

@amorde
Copy link
Member

amorde commented Apr 13, 2019

@ozmium those steps are correct 👍 you can also just add that source to your Podfile and it will automatically be added

@ozmium
Copy link

ozmium commented Apr 14, 2019

@amorde, Is it possible for someone to add these instructions to both the Release Notes and the Installation Guide? Otherwise it's very difficult to know how to set up the CDN.

@igor-makarov
Copy link
Contributor Author

While I'm glad @ozmium has gotten it to work, I feel that due to its experimental status (main spec repo not being generated yet), we should go slow with publicizing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t3:discussion These are issues that can be non-issues, and encompass best practices, or plans for the future.
Projects
None yet
Development

No branches or pull requests

7 participants