act as maven repo supporting http uploads #45

Merged
merged 19 commits into from May 21, 2012

3 participants

@xeqi

I'd like to get a couple more eyes on this and make sure I'm not missing something.


Functionality

This will let people deploy over http using the standard tools. Tests show it working using pomegranate, and for lein

:deploy-repositories {"clojars" {:url "http://www.clojars.org/repo"
                                 :username ..
                                 :password ..}

in a project.clj will let lein deploy clojars upload files (tested on lein master). I expect the functionality will work using mvn's deploy as well.

Background

Repos are mostly just dumb servers. Aether will PUT files, GET the group/artifact's maven-metadata.xml and update the version info and then PUT the updated version.

Both "pom" and "jar" are artifacts. It is possible to ask just for the "pom" extension/type. The ssh setup only allows projects that have both.

Aether/pomegranate/lein will also PUT .shas for files.

Potential Concerns

  1. Since the server is dumb its possible for someone to create a dumb client, not using aether, and miss steps for having a good artifact.

  2. The server has to read in the pom file to try and update the database. I think this would allow a large file to cause an OOM, though ssh might have the same issue.

  3. The server accepts any file that looks like it could be an group/name/version/artifact. It needs to accept lots of them for jars, classifier jars, poms, shas, etc. Perhaps this should be restricted.

Setup

Since clojars.org uses nginx for the /repo files, it needs a change in the config to pass PUTs to the ring app. I think http://forum.nginx.org/read.php?2,2414,47453#msg-47453 has the config settings that will do this.

I included clojars.repo/wrap-file-at middleware so it will work in dev/test/standalone.

@xeqi

This does not accept lein1 or lein2-preview3 deploys because it does not http basic challenge correctly.

I think I'm going to have to split this into two ring-handlers, with different friend authentication and other middleware, and have clojars-app pick the right one based on "/repo" or not.

@ato

Haven't had a chance to read the code yet, but a few things off the top of my head:

The scp upload has a max file size limit (both for the pom and the jars). It would be good to limit the HTTP uploads too.

It'd probably also be important to enforce the file extensions to .jar and .pom. Otherwise it'd be susceptible to people uploading html files with javascript that steal session cookies.

@xeqi

Jetty and nginx both have limits on form size, so this should stop large files. I think they will need to be changed from their default settings.

http://wiki.eclipse.org/Jetty/Howto/Configure_Form_Size#Changing_the_Maximum_Form_Size_for_a_Single_Webapp
http://nginx.org/en/docs/http/ngx_http_core_module.html#client_max_body_size

@xeqi

Updated with some file extension filtering and auth challenge. Tested locally with lein 1.x, lein2-preview3, and lein-master.

@technomancy

I get "Could not find artifact shadout-mapes:shadout-mapes:jar:1.0.2 in local (http://localhost:3000)" when I try to deploy over HTTP on a fresh checkout: http://p.hagelb.org/clojars-http-deploy-error.html

I've seen this as a warning when deploying using s3-wagon-private, but it was never fatal.

@xeqi

Turns out @technomancy left off /repo on the url. Aether seems to ignore 404 responses, but 405 will make it realise the file didn't upload.

@technomancy

Yep, this was totally a pebkac on my side.

It does occur to me that we should block HTTP uploads unless they're over SSL though. (With a config setting so it's easy to disable in dev/test?) We should probably also require it for logins, registrations, and password changes too, but that's a different story.

@technomancy

Oops, somehow I missed it was already enforced for logins/registrations; cool.

@xeqi

I'm holding off on merging this for discussion on #46, and perhaps some new instructions for the nginx proxy. Is there anything else that should get resolved first?

@ato

With regard to SSL upload requests, I guess we should block them completely rather than redirecting in nginx. I'm not sure what the HTTP client behaviour is when a PUT results in a redirect but we don't want it to be seamless or users will leave non-SSL configured.

I suppose we could implement this in nginx along with SSL-everywhere-in-the-app (see #46) using some config like (untested):

server ssl {
  root ...path-to-repository...;
  location / {
      proxy_pass @clojars;
  }

  location /repo {
    limit_except GET HEAD {
      # send writes to webapp
      proxy_pass @clojars;
    }
    # allow caching of jars
    expires 1m;
  }
}

server nonssl {
  root ...path-to-repository...;
  location / {
    # redirect webapp requests to SSL
    return 301 https://clojars.org$request_uri;
  }

  location /repo {
    limit_except GET HEAD {
      # not allowed to upload without SSL
      # TODO: return some explanatory error
      return 403;
    }
    # allow caching of jars
    expires 1m;
  }
}

Alternatively maybe it's best to implement this stuff in the app. I guess in the app it's got less chance of being accidentally turned off. Disadvantage is the app needs to know when the request was SSL.

The usual way of doing that with a java webapp is to pass through an X-Forwarded-Proto header and have Jetty obey it. Something like this in your jetty.xml:

    <Set name="handler">
     <New id="Rewrite" class="org.mortbay.jetty.handler.rewrite.RewriteHandler">

      <Set name="handler"><Ref id="oldhandler"/></Set>
      <Set name="rewriteRequestURI">true</Set>
      <Set name="rewritePathInfo">false</Set>
      <Set name="originalPathAttribute">requestedPath</Set>

      <Set name="rules">
          <Array type="org.mortbay.jetty.handler.rewrite.Rule">
            <Item>
              <New id="forwardedHttps" class="org.mortbay.jetty.handler.rewrite.ForwardedSchemeHeaderRule">
                <Set name="header">X-Forwarded-Proto</Set>
                <Set name="headerValue">https</Set>
                <Set name="scheme">https</Set>
              </New>
            </Item>
          </Array>  
      </Set>
     </New>
    </Set>

Not sure how to do that with ring's embedded Jetty, perhaps cemerick/friend#8 is the answer.

@ato

Here's one problem it's possible to create files outside of the repo directory by including ../ in the path:

curl -iX PUT http://bob:bob@localhost:8080/repo/../foo/bar/x.jar

If you repo is in data/repo/ the above creates data/foo/bar/x.jar.

@ato

Similarly you can overwrite jars owned by someone else using:

curl -i -X PUT http://bob:bob@localhost:8080/repo/x/../org/clojars/ato/app/version/app-version.jar

That creates a new group x....org.clojars.ato which is different in the database to org.clojars.ato but is the same on the filesystem.

@xeqi

.. should be invalid in the PUT url now.

@xeqi

When the PUT responds with a redirect aether/pomegranate/lein spits out

Could not transfer artifact kerodon:kerodon:pom:0.0.5-20120519.185825-1 from/to local (http://localhost:3000/repo): Failed to transfer file: http://localhost:3000/repo/kerodon/kerodon/0.0.5-SNAPSHOT/kerodon-0.0.5-20120519.185825-1.pom. Return code is: 302, ReasonPhrase:Found

which isn't a descriptive error, but at least it doesn't silently forward and use the https url.

@ato

Cool. Looks good to me.

I'm happy to go with nginx doing the SSL enforcement as it seems the quickest way forward.

I've done up a new nginx config and applied it to test.clojars.org. Note that I haven't got a proper certificate for the test domain so clients will give you an SSL warning that the domain does not match.

server {
  listen 443;
  listen [::]:443;
  server_name test.clojars.org;
  access_log /var/log/nginx/test.clojars.access.log;
  root /var/www/clojars;

  ssl on;
  ssl_certificate ssl/clojars.org.crt;
  ssl_certificate_key ssl/clojars.org.key;

  location / {
    try_files $uri @clojars_webapp;
  }

  location /repo {
    root /home/clojars;
    autoindex on;
    expires 1w;
    client_max_body_size 20m;
    limit_except GET HEAD {
      # send uploads to webapp
      proxy_pass http://clojars-web;
    }
  }

  location @clojars_webapp {
    proxy_pass http://clojars-web;
  }
}

server {
  listen 80;
  listen [::]:80;
  server_name test.clojars.org;
  access_log  /var/log/nginx/test.clojars.access.log;
  root /var/www/clojars;

  location / {
    # force SSL on everything but the repo
    rewrite ^ https://test.clojars.org$request_uri permanent;
  }

  location /repo {
    root /home/clojars;
    autoindex on;
    expires 1w;
    error_page 405 /sslonly.txt;
  }

  location = /sslonly.txt {}
}

Trying to PUT a file using non-SSL gets your a 405 with some explanatory text. I'm sure it's too much to hope that Aether will display the message but at least if someone's debugging it with curl or wireshark they'll see it.

$ curl -ik --upload-file foo http://test.clojars.org/repo/foo
HTTP/1.1 405 Not Allowed
Server: nginx
Content-Type: text/plain
Content-Length: 84

Uploads are only allowed over HTTPS.
Point your client at https://clojars.org/repo/

Trying to upload something larger than 20 MB will give you:

$ curl -ik --upload-file foo https://test.clojars.org/repo/foo
HTTP/1.1 100 Continue

HTTP/1.1 413 Request Entity Too Large
Server: nginx
Content-Type: text/html
Content-Length: 192

<html>
<head><title>413 Request Entity Too Large</title></head>
<body bgcolor="white">
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx</center>
</body>
</html>

I haven't deployed your branch so an upload will get a 404 from the app for now.

$ curl -ik --upload-file foo https://test.clojars.org/repo/foo
HTTP/1.1 100 Continue

HTTP/1.1 404 Not Found
Server: nginx
Date: Sun, 20 May 2012 05:57:07 GMT
Transfer-Encoding: chunked
Connection: keep-alive

When you're satisfied with your branch, merge it into master and I'll deploy it.

I think we should probably do a quiet launch of this feature, push it into production and ask a few people to test it out (maybe IRC and the maintainers list) before doing a general announcement.

@ato

I've now configured a proper certificate for test.clojars.org to make testing SSL configuration easier.

@xeqi

Thanks for setting all this up.

I did a rebase against master a few minutes ago, and will be looking it over again in a few hours. I'll plan to push to master after that.

@xeqi xeqi merged commit 3431877 into clojars:master May 21, 2012
@xeqi

Thanks for looking this over. Merged into master now.

@technomancy

I'd like to have SSL enforced in the app itself, but I'll handle that later.

@ato

I've deployed this to production as release 0.8.1. I've also applied the nginx configuration shown above. I tried deploying a jar over both scp and https to clojars.org and it seemed to work fine. I also tried various failure scenarios (bad password, non-ssl) and they seem to be working properly.

Sorry for the delay on this, I've had a hectic week (major disk array failure at work) and wanted to deploy this when I had some time available in case anything went wrong.

@technomancy

Fantastic; thanks so much. Should we give it a little more time as a soft launch before we announce it on the Clojure mailing list?

@xeqi

Agreed, thanks for handling all the deployment stuff.

@technomancy

I'm going to release Leiningen 2.0.0-preview5 this week. Should I replace the references to scp upload with lein deploy? I can do this without making a big announcement out of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment