Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SC-7] Restrict egress traffic from cloud.gov-hosted apps #3755

Closed
3 of 7 tasks
mogul opened this issue Mar 23, 2022 · 25 comments
Closed
3 of 7 tasks

[SC-7] Restrict egress traffic from cloud.gov-hosted apps #3755

mogul opened this issue Mar 23, 2022 · 25 comments
Assignees
Labels
ATO compliance Relating to security compliance or documentation component/catalog Related to catalog component playbooks/roles component/inventory Inventory playbooks/roles Feature POAM Issues that should also be appearing in POAM lists

Comments

@mogul
Copy link
Contributor

mogul commented Mar 23, 2022

User Story

In order to minimize the harm a compromised app can do, the data.gov team wants egress traffic from cloud.gov-hosted data.gov applications to be limited to just expected destinations.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN I am logged into cloud.gov
    AND I am targeting the gsa-datagov organization
    • WHEN I run cf space prod
      AND I look at the line that starts with running security groups:
      THEN I do NOT see the security group public_networks_egress listed.
    • WHEN I run cf space prod-egress
      AND I look at the line that starts with running security groups:
      THEN I DO see the security group public_networks_egress listed.
    • WHEN I run cf t -s prod-egress; cf apps
      THEN I see a running instance of the cg-egress-proxy for each app with egress needs in the prod space.
    • WHEN I run cf t -s prod; cf network-policies
      THEN I see a rules allowing traffic from each app with egress needs in the prod space to port 443 on the corresponding proxy app in the prod-egress space.

Background

SC-7 has traditionally been hard to implement for cloud.gov apps. However, the cloud.gov team has now made it possible to drop the ASG allowing public egress traffic for particular spaces. This enables us to run our production space in the "restricted" configuration, when egress traffic is only allowed for bound services (excluding S3). We then set up dedicated egress proxies in a space with external access. Doing this also enables apps in spaces without public_egress to access S3.

cg-egress-proxy was developed for a high-profile GSA project deployed on cloud.gov that had extremely high public visibility, and was required to meet NIST control SC-7 to ensure egress/lateral compromise was not possible. However, that app did not proceed to production despite getting an ATO. As a result, data.gov might be the first team to ship a working SC-7 egress solution for cloud.gov apps. This will set a new precedent for TTS' standard cloud.gov compliance practice, so be sure to make any fixes necessary upstream in cg-egress-proxy!

Security Considerations (required)

This work will ensure egress traffic from cloud.gov-hosted data.gov apps is properly restricted by default, as required by NIST control SC-7.

Sketch

@mogul mogul added POAM Issues that should also be appearing in POAM lists ATO compliance Relating to security compliance or documentation component/catalog Related to catalog component playbooks/roles component/dashboard component/inventory Inventory playbooks/roles labels Mar 23, 2022
@mogul mogul changed the title [SC-7] Limit egress traffic for cloud.gov-hosted apps to the extent possible [SC-7] Restrict egress traffic for cloud.gov-hosted apps Mar 23, 2022
@mogul mogul changed the title [SC-7] Restrict egress traffic for cloud.gov-hosted apps [SC-7] Restrict egress traffic from cloud.gov-hosted apps Mar 23, 2022
@robert-bryson robert-bryson self-assigned this Mar 24, 2022
@robert-bryson
Copy link
Contributor

Can a user with permissions create the needed prod-egress space for this for me? I do not have permissions, but I think @hkdctol and @mogul should.

The command should be: cf create-space prod-egress -o gsa-datagov, though it might be good idea to have it in staging or dev as I work on it, as well (cf create-space stage-egress -o gsa-datagov or cf create-space dev-egress -o gsa-datagov, depending).

@mogul
Copy link
Contributor Author

mogul commented Mar 30, 2022

All done... I also applied the public_networks_egress security-group to the new spaces. (You'll want to drop that same security-group from prod and staging once you've finished this issue.)

@robert-bryson
Copy link
Contributor

All apps created with this gist.

@nickumia-reisys
Copy link
Contributor

Just a clarification, the catalog apps that are relevant for this issue are catalog-fetch and catalog-gather. (not catalog-harvest)

image.png

Also, noticed a difference between staging and prod. (I know it's just work in progress, just mentioning 😅)

image.png

@robert-bryson
Copy link
Contributor

Ok, easy change. Thanks for letting me know. The sketch has harvest..

@robert-bryson
Copy link
Contributor

I don't have permissions to remove the security group, but am at the point to test it. @hkdctol or @mogul, can you please do this for me?

cf target -s development
cf unbind-running-security-group public_networks_egress 

and/or perhaps:

cf target -s staging
cf unbind-running-security-group public_networks_egress 

Thank you!

@mogul
Copy link
Contributor Author

mogul commented Mar 31, 2022

That's the administrator-facing command for setting up defaults on the whole platform.

You want:

cf unbind-security-group public_networks_egress gsa-datagov development --lifecycle running

and

cf unbind-security-group public_networks_egress gsa-datagov staging --lifecycle running

...both of which I've just done. The change applies at the next app restart.

@robert-bryson
Copy link
Contributor

robert-bryson commented Apr 11, 2022

We believe the issue we're hitting in getting inventory/catalog to successfully start has to do with caddy's auto_http, which is set to off. @mogul, do you recall why that is?

The error we get when trying to run caddy with anything other than auto_https off is:

run: loading initial config: loading new config: http app module: start: tcp: listening on :80: listen tcp :80: bind: permission denied

@mogul
Copy link
Contributor Author

mogul commented Apr 12, 2022

Auto-https is off because the platform already provisions a TLS certificate with the route name as one of its Subject Alternative Names (SANs), so we don't want Caddy to handle it. The start script watches for the platform to rotate that certificate and restarts Caddy if it needs to. See the code here:
https://github.com/GSA/cg-egress-proxy/blob/main/proxy/start.sh

We could probably turn of TLS entirely for client connections to Caddy just by having it listen on port 61443; see the docs here:
https://docs.cloudfoundry.org/concepts/understand-cf-networking.html#securing-traffic

So I'm guessing something else is wrong. I can take a look at this with you Tuesday morning during your huddle time, I think.

@mogul
Copy link
Contributor Author

mogul commented Apr 12, 2022

run: loading initial config: loading new config: http app module: start: tcp: listening on :80: listen tcp :80: bind: permission denied

I mean technically this is correct, trying to listen on a privileged port like 80 is a no-no for non-privileged applications. The question is: Why would CF be passing Caddy port 80 in the PORT environment variable Caddy references here? It typically provides 8080. So you'll want to investigate if CF is really passing 80 or that env var value isn't actually getting used.

@robert-bryson
Copy link
Contributor

Here is the output of ./caddy run -environ on proxy-gsa-datagov-development-catalog:

Wall of text
 vcap@e754dab0-0b3f-4e79-4886-84ea:~$ ./caddy run -environ
caddy.HomeDir=/home/vcap/app
caddy.AppDataDir=/home/vcap/app/.local/share/caddy
caddy.AppConfigDir=/home/vcap/app/.config/caddy
caddy.ConfigAutosavePath=/home/vcap/app/.config/caddy/autosave.json
caddy.Version=v2.4.6 h1:HGkGICFGvyrodcqOOclHKfvJC0qTU7vny/7FhYp9hNw=
runtime.GOOS=linux
runtime.GOARCH=amd64
runtime.Compiler=gc
runtime.NumCPU=8
runtime.GOMAXPROCS=8
runtime.Version=go1.17.8
os.Getwd=/home/vcap/app

LD_LIBRARY_PATH=/home/vcap/deps/0/lib
CF_INSTANCE_ADDR=10.10.1.13:61064
LANG=en_US.UTF-8
OLDPWD=/home/vcap
CF_INSTANCE_PORT=61064
VCAP_APPLICATION={"application_id":"5d0e9ea9-fa23-46e6-9cd7-a181d54c0be3","application_name":"proxy-gsa-datagov-development-catalog","application_uris":["proxy-gsa-datagov-development-catalog.apps.internal"],"application_version":"88817d35-015d-44db-8bb8-e6440f9485e7","cf_api":"https://api.fr.cloud.gov","host":"0.0.0.0","instance_id":"e754dab0-0b3f-4e79-4886-84ea","instance_index":0,"limits":{"disk":1024,"fds":16384,"mem":64},"name":"proxy-gsa-datagov-development-catalog","organization_id":"90047c5d-337f-4802-bd48-2149a4265040","organization_name":"gsa-datagov","port":8080,"process_id":"b7121e32-2659-4b36-97a4-6d6ae1326292","process_type":"web","space_id":"3f9b2ef3-f688-4547-b06a-46fa7eb47274","space_name":"development-egress","uris":["proxy-gsa-datagov-development-catalog.apps.internal"],"version":"88817d35-015d-44db-8bb8-e6440f9485e7"}
MEMORY_LIMIT=64m
USER=vcap
CF_INSTANCE_INTERNAL_IP=10.255.87.138
PROXY_ALLOW=*.gov
raw.githubusercontent.com
VCAP_APP_PORT=8080
PWD=/home/vcap/app
HOME=/home/vcap/app
CF_INSTANCE_KEY=/etc/cf-instance-credentials/instance.key
https_proxy=https://secretrandomuser:secretrandompassword@proxy-gsa-datagov-development-catalog.apps.internal:8080
PORT=8080
TMPDIR=/home/vcap/tmp
LIBRARY_PATH=/home/vcap/deps/0/lib
PROXY_USERNAME=secretrandomuser
DEPS_DIR=/home/vcap/deps
PROXY_PASSWORD=secretrandompassword
CF_INSTANCE_GUID=e754dab0-0b3f-4e79-4886-84ea
LOG4J_FORMAT_MSG_NO_LOOKUPS=true
CF_INSTANCE_PORTS=[{"external":61064,"internal":8080,"external_tls_proxy":61078,"internal_tls_proxy":61001},{"external":61064,"internal":8080,"external_tls_proxy":61116,"internal_tls_proxy":61443},{"external":61066,"internal":2222,"external_tls_proxy":61118,"internal_tls_proxy":61002}]
TERM=xterm-256color
CF_SYSTEM_CERT_PATH=/etc/cf-system-certificates
CF_INSTANCE_IP=10.10.1.13
INSTANCE_INDEX=0
CF_INSTANCE_INDEX=0
SHLVL=3
INSTANCE_GUID=e754dab0-0b3f-4e79-4886-84ea
VCAP_SERVICES={}
VCAP_APP_HOST=0.0.0.0
PATH=/home/vcap/deps/0/bin:/bin:/usr/bin:/home/vcap/deps/0/apt/usr/bin
CF_INSTANCE_CERT=/etc/cf-instance-credentials/instance.crt
PROXY_DENY=
_=./caddy
2022/04/12 15:07:15.278	INFO	using adjacent Caddyfile
2022/04/12 15:07:15.280	WARN	input is not formatted with 'caddy fmt'	{"adapter": "caddyfile", "file": "Caddyfile", "line": 38}
1.649776035282646e+09	info	admin	admin endpoint started	{"address": "tcp/localhost:2019", "enforce_origin": false, "origins": ["[::1]:2019", "127.0.0.1:2019", "localhost:2019"]}
1.6497760352831585e+09	warn	tls	stapling OCSP	{"error": "no OCSP stapling for [e754dab0-0b3f-4e79-4886-84ea proxy-gsa-datagov-development-catalog.apps.internal]: no OCSP server specified in certificate"}
1.6497760352831914e+09	info	tls.cache.maintenance	started background certificate maintenance	{"cache": "0xc000466af0"}
1.649776035283221e+09	info	http	enabling automatic HTTP->HTTPS redirects	{"server_name": "srv0"}
1.6497760352853866e+09	info	tls.cache.maintenance	stopped background certificate maintenance	{"cache": "0xc000466af0"}
run: loading initial config: loading new config: http app module: start: tcp: listening on :80: listen tcp :80: bind: permission denied

It looks like $PORT is 8080 and that Caddy is reading it as 8080. The dashboard proxy is working and shows virtually the same output, but is able to start. I'm not sure what the difference is on the Caddy side.

@robert-bryson
Copy link
Contributor

I've been struggling with getting inventory on dev to work. I think now it's (at least in part) due to the stuff we did in huddle on Monday. @nickumia-reisys, can you help walk me through the kubectl stuff to clean up the old inventory instance in dev, please?

image

@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Apr 13, 2022

Yes.. but it's going to be blocked by the other things going on (unfortunately),

@jbrown-xentity
Copy link
Contributor

Next steps:

  • Setup egress app per space that any app can use to access the world
  • Create 3 tickets to move this work forward: dashboard egress, inventory egress, and catalog egress

Inventory ticket may remain in blocked; we have hit GSA-TTS/cg-egress-proxy#5 with inventory using s3 buckets and the app fails to start up. We will treat these tickets as independent.

Once all are resolved, then we can revisit locking down the whole space completely.

@hkdctol
Copy link
Contributor

hkdctol commented Jun 9, 2022

Blocked by creating a support issue with AWS/Cloud.gov

@nickumia-reisys
Copy link
Contributor

Updates:

  • Proxy has been enabled for catalog-web and inventory.
  • This ticket is ready for the last step of removing public_networks_egress from the prod space pending a discussion about catalog-admin, catalog-fetch and catalog-gather

@nickumia-reisys
Copy link
Contributor

One note that applies to this ticket as a whole:

  • The entire egress-proxy pipelines is complex, very much restricted by custom configuration/admin permissions and is very much a manual process. If it breaks or has issues, it would take a lot of context and mental concentration to fix it.

@mogul
Copy link
Contributor Author

mogul commented Nov 8, 2022

General note... Seeing that the egress-proxy is fragile and requires more expertise in how it works than it should to make it work, there's some work percolating in the background to turn the egress-proxy into a brokerable service.

@nickumia-reisys
Copy link
Contributor

nickumia-reisys commented Nov 10, 2022

Pending new bugs (such as #4053), all of our apps have been proven in some regard to work while filtering traffic through the egress-proxy. There do not seem to be any more systemic design flaws platform issues around how this should work.

Current status:

App Working? Enabled? Notes
catalog-web ✔️ ✔️development ❌staging ❌prod Buildpack staging issues
catalog-admin ✔️ ✔️development ❌staging ❌prod Buildpack staging issues, Harvesting Issues
catalog-fetch ✔️ ✔️development ❌staging ❌prod Buildpack staging issues, Harvesting Issues
catalog-gather ✔️ ✔️development ❌staging ❌prod Buildpack staging issues, Harvesting Issues
inventory ✔️ ✔️ development ✔️staging ✔️prod Buildpack staging issues

Script to see if egress proxy is enabled:

cf t -s <space>
apps='catalog-web catalog-admin catalog-fetch catalog-gather inventory'
for app in $apps; do echo -n "$app: "; cf curl "v3/apps/$(cf app $app --guid)/environment_variables" | jq .var.https_proxy; done

Doing the last AC of this ticket of disabling the security group public_networks_egress is problematic in that if there is a new bug to arise and it needs to be enabled to restore functionality to data.gov websites, there would need to be someone with permissions available during our support hours. As it currently stands, egress traffic is controlled by the inherent linux feature that if https_proxy is set, all traffic, by default, travels through the proxy. By setting the parameter, egress traffic control is enabled. By un-setting it, egress traffic control is disabled. I propose we change the acceptance criteria to not disable the public_networks_egress security group.

References:

@mogul
Copy link
Contributor Author

mogul commented Nov 10, 2022

On the staging issues: Try setting the value of https_proxy env var in the .profile at application startup time. You would still pass the value to be set in the .profile via cf set-env or a manifest, just under a different env var name so https_proxy doesn't interfere with anything during staging.

(Props to @rahearn for first bringing this problem to my attention... Something to be documented for sure.)

@rahearn
Copy link

rahearn commented Nov 10, 2022

Here's the .profile we used that @mogul just mentioned: https://github.com/GSA/notifications-api/blob/main/.profile

The one gotcha I've found with it is that .profile doesn't get run automatically when you cf ssh into the app, so you have to source it yourself to activate it before messing around with things manually. The running apps get the value just fine though.

@mogul
Copy link
Contributor Author

mogul commented Nov 11, 2022

(Technically you're supposed to run /tmp/lifecycle/shell before you do that.)

@rahearn
Copy link

rahearn commented Nov 14, 2022

Yeah, but even after you run /tmp/lifecycle/shell you still need to . .profile to pick up the settings there

@nickumia-reisys
Copy link
Contributor

egress-proxy has been enabled and deployed to all catalog and inventory instances on cloud.gov 🍾

@nickumia-reisys
Copy link
Contributor

While there continues to be generall egress anomalies and upgrade work (not to mention that our deployment of this needs to be re-vamped) I believe the initial effort has been complete for a while now. Here's to a final cringe at this ticket 😣 🥂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATO compliance Relating to security compliance or documentation component/catalog Related to catalog component playbooks/roles component/inventory Inventory playbooks/roles Feature POAM Issues that should also be appearing in POAM lists
Projects
Archived in project
Development

No branches or pull requests

6 participants