Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match runner overview #4

Open
devonwalshe opened this issue Sep 16, 2020 · 2 comments
Open

Match runner overview #4

devonwalshe opened this issue Sep 16, 2020 · 2 comments

Comments

@devonwalshe
Copy link
Owner

devonwalshe commented Sep 16, 2020

I've set up the endpoints that will allow the user to upload files and launch a match between the two files, and then output the results of the match to a CSV. I'm going to discuss the endpoints here so we can work together on tying it all up.

Before you read this - please review #2 to refresh yourself on the user flow (we are dropping item 5 for now).

I imagine you'll have questions so let me know and I'll update here and in Slack so we're on the same page.

Short version:

  • POST to /pipeline/ - make a new pipeline
  • POST to /raw_file/ - uploads files - do this 2 times with additional form data (discussed below)
  • POST to /run_match/ - make a new run match with all of its related data
  • POST to /matchrunner/<runmatch_id>/ to launch a data match on the files
  • GET to /matchrunner/<runmatch_id>/export to return the data as CSV text, which you'll then need to package into a file for the user to download.

More detail:

  1. Pipeline

    • This is the top of the food chain for the application, the resources InspectionRun and RunMatch both require a pipeline as their starting reference point.
    • POST to /pipeline/ with a name parameter, thats it.
  2. RawFile

    • Input dataset. We need two of these to set up a run match, I think the page where we start a new run match should start with a form where you enter two files, and all the necessary data, but actually send two post requests (we can discuss this more)
    • A POST request to /raw_file/ now requires additional data attributes, sent as a multi-part form:
      • file - file upload,
      • source text input,
      • data_mapping_id - dropdown (endpoint /feature_maps/),
      • pipeline_id - dropdown,
      • run_date - text-input,
      • sheet_name - text input,
      • source- text input
    • You will get the corresponding InspectionRun id (gets created automatically) in the response from the post. I can make it available elsewhere if you need but you could also traverse backwards from GET /inspection_runs/ which lists raw_file_id's
  3. InspectionRun

    • An Inspection Run is created automatically (this is important because the RunMatch references the inspection runs directly, not the raw files).
  • you need two inspection run ids to create a run match
  1. RunMatch

    • The parent for all our match data
    • After we upload 2 RawFiles, it will generate 2 corresponding InspectionRun's that we can use to generate a RunMatch.
    • A POST to RunMatch requires just a name, run_a (earlier InspectionRun), run_b (later InspectionRun), and a pipeline_id
    • A RunMatchConf record is created in the background with my standard defaults, but we should allow the user to tweak the launch of the matcher
  2. MatchRunner

    • endpoint at POST /match_runner/<run_match_id> to launch the match for the run
    • I tried to make it run in the background but need more work, DB was causing connection issues.
    • This will be a long running request, matching the data in the background.
    • I've added some narrative info to the GET /run_match/1 endpoint
  3. MatchExporter

    • endpoint at GET /matchrunner/<run_match_id>/export
    • We should have an export CSV button for this in two places - on the matching interface and on the run_match listings.
    • Another long running process, assembles all the data from the database for the runmatch and outputs a CSV as text.
    • The user should receive a file download prompt.
    • I figured there wasn't any point sending a file which you would have to handle on your end, so decided on text - let me know if that works
@sheinin
Copy link
Collaborator

sheinin commented Sep 16, 2020

  1. Pipeline

Do you see it as a grid listing + "add new" interface?

  1. RawFile

Sounds like one screen with two identical input sections for files 1 and 2, and the submit button that sends two separate POST requests.

  1. InspectionRun

A display-only grid of "/inspection_runs"?

  1. RunMatch

I understand that this is our main interface, that now opens with the button "New" and the grid, will have an input form as described:

A POST to RunMatch requires just a name, run_a (earlier InspectionRun), run_b (later InspectionRun), and a pipeline_id

  1. MatchRunner

Is this a grid of run matches with the "launch" button and job status? Why not add "run" + status to the screen 4.?

I've added some narrative info to the GET /run_match/1 endpoint

Is it used in the UI?

@devonwalshe
Copy link
Owner Author

devonwalshe commented Sep 16, 2020

  1. Pipeline
    Do you see it as a grid listing + "add new" interface?

Yes please!

  1. RawFile

Sounds like one screen with two identical input sections for files 1 and 2, and the submit button that sends two separate POST requests.
Yes with one page, but we also need to POST the other form data for each raw file, which I've discussed above.

  1. InspectionRun
    A display-only grid of "/inspection_runs"?

I was putting it above for reference, not for frontend implementation - the users don't need to see this for now. However you need the ID of the 2 inspection runs (run_a, run_b) in order to create the run_match as you mentioned below.

  1. RunMatch

I understand that this is our main interface, that now opens with the button "New" and the grid, will have an input form as described:
A POST to RunMatch requires just a name, run_a (earlier InspectionRun), run_b (later InspectionRun), and a pipeline_id

Precisely!

  1. MatchRunner
    Is this a grid of run matches with the "launch" button and job status? Why not add "run" + status to the screen 4.?
  1. The runmatch matching launch should take place on the page after you upload the rawfiles - where we currently have the configuration form. The default data for the configuration form should be pulled from the run_match/<id> or run_matches - the conf is all included there now.

  2. The best place to launch the export I think is in the grid panel for the run match listings - just add a button to it saying "export data"

I've added some narrative info to the GET /run_match/1 endpoint

Is it used in the UI?

I think you can use the pipe_section_count and sections_checked datapoints

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants