Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Income Limits Automation: Implement running queries and writing to S3 for VES Data Export #16217

Closed
2 of 4 tasks
Tracked by #16272
FranECross opened this issue Nov 27, 2023 · 11 comments
Closed
2 of 4 tasks
Tracked by #16272
Assignees
Labels
backend Drupal engineering CMS team practice area Public Websites Scrum team in the Sitewide crew sitewide

Comments

@FranECross
Copy link

FranECross commented Nov 27, 2023

Description

Finalize implementation of VES data exports to S3 by leveraging work on the linked ticket.

User story

AS AN Income limits owner
I WANT the app to be updated automatically, regularly, with new VES zipcode data
SO THAT Veterans receive the most up to date information in the UI.

Engineering notes / background

S3 file locations:
https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_zipcode.csv
https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_state.csv
https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_incomethreshold.csv
https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_gmtthresholds.csv
https://sitewide-public-websites-income-limits-data.s3-us-gov-west-1.amazonaws.com/std_county.csv

Starter Github Workflow:
name: Income Limits Data Sync
on: workflow_dispatch

env:
  NODE_EXTRA_CA_CERTS: /etc/ssl/certs/ca-certificates.crt

jobs:
  sync:
    name: Sync Data from VES to S3
    runs-on: self-hosted

      steps:

      - name: Install Actions
        uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

      - name: Install Java
        uses: actions/setup-java@0ab4596768b603586c0de567f2430c30f5b0d2b0 #v3.13.0
        with:
          distribution: 'zulu'
          java-version: '8'

      - name: Test Java
        run: java -version

      - name: Setup the Node environment
        uses: actions/setup-node@8f152de45cc393bb48ce5d89d36b731f54556e65 # v4.0.0
          with:
          node-version: '18.13'

      - name: Test Node
        run: npm --version

Analytics considerations

Quality / testing notes

We need to consider monitoring / alarms for failed exports.

Acceptance criteria

  • Current data files are NOT overwritten
  • Place new files in a subfolder in the S3 bucket
  • Point reviewer and confirm they look good
  • Create ticket to do final cutover and formalize using the new data
@FranECross FranECross added backend Blocked Issues that are blocked on factors other than blocking issues. Drupal engineering CMS team practice area Needs refining Issue status Public Websites Scrum team in the Sitewide crew labels Nov 27, 2023
@FranECross
Copy link
Author

FranECross commented Nov 27, 2023

@dsasser I took a swing at editing the other ticket #15203 , as well as creating this one. Please feel free to edit/alter as needed. Thanks! cc @jilladams

@dsasser
Copy link
Contributor

dsasser commented Dec 7, 2023

Status Update 12/7/2023

Running into a blocking issue:

Executing the new Workflow in a self-hosted runner (ie: behind the TIC) throws an error when trying to download node dependency node-oracledb:

From this workflow run:

(node:1408632) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.

I also tested a ruby-based action, and received a similar error, ostensibly having the same root cause:

From this workflow run:

Error: Error: write EPROTO 00D8C2A99C7F0000:error:0A000152:SSL routines:final_renegotiate:unsafe legacy renegotiation disabled:../deps/openssl/openssl/ssl/statem/extensions.c:921:

Eric ran into this recently with AP.

Next steps appear to be escalating to VA Gateway Operations.

@jilladams
Copy link
Contributor

From scrum:

TIC (owned by VA Gateway Operations) doesn't allow use of the Node / Ruby versions that Github actions uses by default, which blocks the HTTPS TLS handshake. (required to talk to Oracle (what the VES DB is in)

Lower than latest Node version might work. Lower Ruby version might work. Experimental. Won’t affect big picture versions used anywhere in other code, the versions in question here are isolated to the runner that’s running the Github actions job.

If this doesn’t work, we’re back to not being able to use a Github action. In that case, we would be blocked until network changes can be made to the TIC rules by VA Gateway Operations, to allow legacy TLS operations. OR, would need to use a different system than Github actions.

Possibility in that case: Could potentially run this code in Drupal (like Forms DB migration, or Police data theoretically will someday), but just to fetch the data / send it to S3. We would not want to store these 200,00 records in Drupal. If we get blocked on Github actions in this sprint, we can talk about this idea in more detail during code freeze sprint. It would almost certainly require CMS Collab Cycle review.

@dsasser
Copy link
Contributor

dsasser commented Dec 12, 2023

Status Update 12/12/23

Eric via Slack:

The unofficial date for VA Networking operations to ultimately fix this on their end is end of January.

While VA is working on the issue, Eric has been trying alternatives such as using the RBEV installer (which failed).

Next we will try setting up the runner to execute inside a container.

*update:

Running inside a container was successful!

Next we need to resolve how secrets are stored/used, and permission for the S3 bucket, but at this moment we are unblocked.

@dsasser
Copy link
Contributor

dsasser commented Dec 13, 2023

Mid-Sprint Update 12/13/23

🟡 This ticket is at risk for getting complete by sprint close. As mentioned in the above comments, we have had several issues getting unblocked with the self-hosted runner installing Ruby/Node. Eric solved that for us yesterday, thankfully, so we are moving forward. There is a fair bit of work remaining, including:

  • Setting up access to S3
  • Storing secrets (passwords, etc) in SSM
  • Doing the query writing/exporting to S3

@dsasser
Copy link
Contributor

dsasser commented Dec 14, 2023

Update 12/14/2023

We have successfully connected to the VES database within the workflow, and queried all 5 tables! Eric worked his magic and we are unblocked and rolling again on this work.

What remains:

  • Getting the aws cli (AWS command line tool) installed in the runner
  • Wiring up the already present authentication to the aws cli
  • Copying files to s3
  • Verifying data && shape of the final csv
  • Any troubleshooting along the way

The above represents between 3-5 points of work.

@olivereri
Copy link
Contributor

olivereri commented Dec 15, 2023

@dsasser

What remains:

  • Getting the aws cli (AWS command line tool) installed in the runner

I checked on a few other GHA runners and it looks like the base image already comes with AWS CLI already installed:

$ sudo su - runner
$ whoami
runner
$ aws --version
aws-cli/1.22.34 Python/3.10.12 Linux/6.2.0-1016-aws botocore/1.23.34
  • Wiring up the already present authentication to the aws cli

The aws/configure-aws-credentials action claims to make the credentials available to CLI calls:
https://github.com/aws-actions/configure-aws-credentials?tab=readme-ov-file#configure-aws-credentials-for-github-actions

  • Copying files to s3

aws s3 cp or aws s3api should pick up the credentials that the above action stores in environment variables.

If you're hitting some errors using AWS CLI, I can take a look.

Edit: Ok I see where this is breaking down:

Run aws s3 sync income_limits_files s3://sitewide-public-websites-income-limits-data
  aws s3 sync income_limits_files s3://sitewide-public-websites-income-limits-data
  shell: sh -e {0}
  env:
    BUCKET_NAME: sitewide-public-websites-income-limits-data
    AWS_REGION: us-gov-west-1
    TEMP_FOLDER: income_limits_files
    AWS_DEFAULT_REGION: us-gov-west-1
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
    AWS_SESSION_TOKEN: ***
    VA_INCOME_LIMITS_VES_DB_USERNAME: ***
    VA_INCOME_LIMITS_VES_DB_PASSWORD: ***
    VA_INCOME_LIMITS_VES_DB_SID: ***
    VA_INCOME_LIMITS_VES_DB_SERVER: ***
    VA_INCOME_LIMITS_VES_DB_PORT: ***
/__w/_temp/46aee[2](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:2)0b-c1ef-48d1-86fc-[3](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:3)0b825c6c[4](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:4)e[5](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:5).sh: 1: aws: not found
Error: Process completed with exit code 12[7](https://github.com/department-of-veterans-affairs/vets-api/actions/runs/7215037538/job/19658468262#step:14:7).

Using the absolute path might work /usr/bin/aws

@dsasser
Copy link
Contributor

dsasser commented Dec 18, 2023

@olivereri I tried setting the path to the AWS CLI to /usr/bin/aws but I'm getting: /__w/_temp/bab6df03-4a76-4980-8aee-e0e24596d01e.sh: 1: /usr/bin/aws: not found

I'm not sure if this runner just doesn't have the CLI installed, or it is located somewhere mysterious, but I'm going to move forward by installing the CLI manually, which I have previously tested successfully.

@dsasser
Copy link
Contributor

dsasser commented Dec 18, 2023

Update 12/18/23

We have successful uploading of CSVs to S3! Here is what remains:

  • Getting the aws cli (AWS command line tool) installed in the runner - ✅ done
  • Wiring up the already present authentication to the aws cli - ✅ done
  • Copying files to s3 - ✅ done
  • Verifying data && shape of the final csv - ongoing
  • Any troubleshooting along the way - ongoing
  • PR approval

@dsasser
Copy link
Contributor

dsasser commented Dec 19, 2023

End of Sprint Update 12/19/23

This issue is likely to roll over the sprint boundary. The work is nearly complete, but will not be merged in time most likely. My previous update captures the work that is yet complete.

Update

Shape of the data looks good, with one change that will need to be addressed in the Income Limits API code: the date format changed from 6/23/2004 9:43:48.000000 AM to 2004-06-23 09:43:48 +0000. This should be a relatively light lift to add to the final cutover ticket.

  • Verifying data && shape of the final csv - ✅ done
  • Any troubleshooting along the way - ✅ done
  • PR approval by PW (and Eric) - ✅ done
  • Merged - ✅ done

@FranECross some AC notes

  • Place new files in a subfolder in the S3 bucket
    This didn't end up needing a subfolder. Instead the files were uploaded with slightly different filenames. I added _temp to the filename such that, for instance, the result is std_state_temp.csv.

  • Point reviewer and confirm they look good
    New, temporary CSVs are here:

https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_county_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_gmtthresholds_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_incomethreshold_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_state_temp.csv
https://sitewide-public-websites-income-limits-data.s3.us-gov-west-1.amazonaws.com/std_zipcode_temp.csv

@jilladams
Copy link
Contributor

Based on Daniel's notes about final status / ACs, I'm marking this done & closing.

@dsasser you referenced a final cutover ticket and I didn't see that that exists already, so I stubbed out what I think you're saying. It could use a look when you get a chance, from you first, and then from @FranECross : #16512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Drupal engineering CMS team practice area Public Websites Scrum team in the Sitewide crew sitewide
Projects
None yet
Development

No branches or pull requests

4 participants