Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow import of CSV into assumed PG instance #7

Merged
merged 3 commits into from
Sep 28, 2022

Conversation

pdobacz
Copy link
Collaborator

@pdobacz pdobacz commented Sep 27, 2022

Closes #3. Comments:

  1. Note the ability to import > 1 table in separate wizards.
  2. Ideally, I'd like the tab to close when done (until we want editability), but to keep things lightweight I skipped that part, adding a "placeholder" tab invalidation in its place. The exact UX of this part is pending. Also the implementation of the invalidation is clunky, to be ironed out
  3. Mind the connectionConfig - this is my own working setup, I didn't pay much attention to make it work for anythig else, also pending the lifecycle-management part. I guess putting your own values there should work
  4. As mentioned in the TODO in the code - I'm wondering about the SQL injections we're allowing. I feel dumb not knowing how to deal with this, I couldn't find much advice on the internet either. Maybe it's not an issue, but the scenario I'm thinking is handling of CSV files coming from untrusted source. I have a feeling that we should not be vulnerable to trivial SQL injection, even for the ad-hoc PostgreSQL instance.
  5. I'm calling tables by the name of the CSV file, good enough for MVP, but we can fix this quickly in a follow-up

@pdobacz pdobacz force-pushed the piotr/type-preview-import-csv branch from 24ce1ba to 6e427b4 Compare September 27, 2022 14:11
@cristianberneanu
Copy link
Contributor

I'd like the tab to close when done

I think that table import should be done in the main admin panel tab, which gets reset to the initial step at the end of the import (so not closed).

@pdobacz
Copy link
Collaborator Author

pdobacz commented Sep 27, 2022

I'd like the tab to close when done

I think that table import should be done in the main admin panel tab, which gets reset to the initial step at the end of the import (so not closed).

yeah that makes sense too! Maybe best if I do it in a follow-up to not bloat the PR further.

@cristianberneanu
Copy link
Contributor

but the scenario I'm thinking is handling of CSV files coming from untrusted source.

This seem very unlikely to happen. And even if it does, the attacker won't get the response, so no data leakage can occur. And our database will store a copy of the original data, so no loss of data can occur either.

@pdobacz
Copy link
Collaborator Author

pdobacz commented Sep 27, 2022

but the scenario I'm thinking is handling of CSV files coming from untrusted source.

This seem very unlikely to happen. And even if it does, the attacker won't get the response, so no data leakage can occur. And our database will store a copy of the original data, so no loss of data can occur either.

On the "unlikely" part - I don't know... this is for the open data initiative after all. If we succeed, we should expect anything to be thrown at this. And COPY x FROM PROGRAM 'any-command' could theoretically execute pretty arbitrary code on the unsuspecting victims machine. Apart from that, there's the aspect of general hygiene. Seeing these SQLs put together like this hurts my eyes :(. Isn't there an easy fix for this?

Of course, nothing to tackle right now, but the more I think about this, the more I think we should leave an issue behind to revisit.

"AidSelectionHelp": {
"ID Selection": "ID Selection",
"<strong>WARNING:</strong> If this configuration is not done correctly, the data will not be properly anonymized.": "<strong>WARNING:</strong> If this configuration is not done correctly, the data will not be properly anonymized.",
"If the data has one row per person (or other <1>protected entity</1>), then no entity identifier column need be selected. Otherwise, select a column containing a unique ID per protected entity. <4>Click here for details on how to set.</4>": "If the data has one row per person (or other <1>protected entity</1>), then no entity identifier column need be selected. Otherwise, select a column containing a unique ID per protected entity. <4>Click here for details on how to set.</4>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like at all the amount of duplication and noise the current translation approach creates.
This is especially annoying for long strings. I think we should start using some shorter tags for the keys, i.e. "AidConfigurationWarning".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but let's have a separate issue for this (also to handle uniformly in desktop, once we have the spare cycles)

return true;
} catch (e) {
console.error(e);
message.error({ content: t('Data import failed!'), key: file.path, duration: 10 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good idea for these notifications to have a limited duration.
If the import process takes even a small amount of time, it will be very likely that the user will change focus to another app and will come back later to check the result, missing the notification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let's have an issue and revisit later.

const t = useT('AidSelectionStep');
const [aidColumn, setAidColumn] = useState('');
const [buttonState, setButtonState] = useState({
title: `Import into ${file.name} (destructive operation!)`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this destructive? We should ask for confirmation in case the table already exists.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) all comments are good points for follow-ups. I'll just add the issues right away and we decide on when/how etc.

For now it is a desctructive operation, because it deletes the (automatically named) table and reimports. There are I suppose multiple ways to improve here, depending on how we provide the table name change facility.

if (aidColumn == rowIndexColumn) {
await client.query(`ALTER TABLE "${tableName}" ADD COLUMN IF NOT EXISTS "${aidColumn}" SERIAL`);
}
await client.query(`CALL diffix.mark_personal('"${tableName}"', '"${aidColumn}"');`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user should be able to import public tables as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data discovery + preview + import to PG
2 participants