Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend <> Backend communication #20

Closed
cristianberneanu opened this issue Jun 9, 2021 · 6 comments
Closed

Frontend <> Backend communication #20

cristianberneanu opened this issue Jun 9, 2021 · 6 comments

Comments

@cristianberneanu
Copy link
Contributor

We need to agree on the way the Frontend communicates with the Backend.

Since transpiling the reference code to JS resulted in poor performance, the anonymization code will stay in dotnet.
Furthermore, I don't think it is a good idea to manually build the query AST in JS land. It couples the Frontend and Backend internals too much. Sending a SQL statement feels cleaner.

As input we send: filename, query statement, anonymization settings.
As output we get: query result or an error.

Option 1: anonymize using the CLI.

We pass the input as command-line arguments , we get back the query result (as either CSV or JSON) in the stdout stream or we get an error in stderr stream.

PROs:

  • We don't need to have .NET code in the publisher repository;
  • It keeps the reference code separate from the GUI and free of pollution with Frontend concerns;
  • Allows for easy automatization, as all the functionality is easily accessible from the CLI;
  • Makes sure the reference tool works as intended from the CLI (since the Frontend depends on it).

CONs:

  • We won't have live progress reports (unless we get a bit hacky);
  • We pay the CLR startup cost for each anonymization call;
  • Functionality will be limited to what the CLI provides.

Option 2: anonymize using IPC.

We will need an additional .NET project in this repository that loads the core reference library and dispatches anonymization requests to it. We pass the input as a JSON object and we get back a JSON object with the result or error. We need to decide if we use a socket or the process stdio streams for message exchange.

PROs:

  • We can add functionality not supported by the CLI;
  • JSON messages are more expressive than invoking a CLI application;
  • Lower latency, since the CLR is kept loaded.

CONs:

  • Additional .NET code added to this repository;
  • CLI might become stale, since it will be rarely used;
  • Tighter coupling between the publisher and reference repositories;
  • Reference code will get polluted with Frontend concerns (like progress reports).

I am slightly in favor of Option 1 (I don't consider the drawbacks for it too big).

@sebastian
Copy link
Contributor

I don't think it is a good idea to manually build the query AST in JS land. It couples the Frontend and Backend internals too much. Sending a SQL statement feels cleaner.

Yes, building the AST in JS only made sense as long as the AST could immediately be executed there too.

@sebastian
Copy link
Contributor

sebastian commented Jun 9, 2021

I vote for Option 1 too.

I additionally vote for using JSON as the output as it's easier to use in the frontend than parsing some CSV output.

We can live without progress reports, and if we need it later we can get hacky then.

@edongashi
Copy link
Member

Do we drop the JS CSV parser? If yes, do we use the backend to figure out the shape when we load a file?
If not, we need to use 2 different CSV libraries where each may have their own tiny differences.

@sebastian
Copy link
Contributor

Do we drop the JS CSV parser? If yes, do we use the backend to figure out the shape when we load a file?
If not, we need to use 2 different CSV libraries where each may have their own tiny differences.

Good point, @edongashi.

We either need another parser for the GUI or need to extend the Reference with an endpoint that returns a schema...
In either case, as long as we want to support CSV, it seems the CLI interface must be extended to support providing a schema as part of the input too!?

@cristianberneanu
Copy link
Contributor Author

I say we do the CSV parsing only in the backend/reference tool.
To load the initial raw data (including the schema) the frontend could issue a standard SELECT * FROM 'file_name' query.

@cristianberneanu
Copy link
Contributor Author

cristianberneanu commented Jun 11, 2021

This seems settled (at least for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants