-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/49 dynamic schema generation #69
Conversation
nlp4all/models/DataSource.py
Outdated
id = Column(Integer, primary_key=True) | ||
user = Column(Integer, ForeignKey("user.id")) | ||
data_source_id = Column(String(80), unique=True, nullable=False) | ||
user_id = Column(Integer, nullable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I def shouldn't need a user id here, that's already in the user relationship, fixing now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one further comment here, that column has been removed, but we might need a column or columns at some point to store information about the data source, like which columns are used in analysis or filtering etc, but that most likely belongs in the controller / model of the specific analysis, not in the data source / data source manager.
all suggestions, comments, improvements, criticisms welcome! :D |
nlp4all/models/datasource_manager.py
Outdated
os.makedirs(ds_dir) | ||
# generate a unique filename from a hash of the data source id | ||
# this is to prevent a user from accessing another user's data | ||
ds_id_hash = hashlib.sha256(self._data_source_name.encode()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should make the hash not based on the data source name, because the user might change it (if we let them)...
The ID might be better, but it's also predictable, this might not matter too much because the data directory should never be publicly accessible, so I think we should change it to ID rather than name, as ID should be unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any circumstance why the ID of the user may change? some migration? Docker reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in any way that will affect things, migrations won't change user IDs, and independent installations can have different IDs but they will also have their own data sources and their own users so it won't be an issue
Fixes #49
So, we now have a fluent, SQLAlchemy compatible interface to dynamic schemas per user per datasource. These are stored in unique SQLAlchemy SQLite database files.