New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Airavata 3348 final cut #260
base: AIRAVATA-3315-storage-quotas
Are you sure you want to change the base?
Airavata 3348 final cut #260
Conversation
Airavata 3315 storage quotas
…vata into AIRAVATA-3348_finalCut Merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @vivekshresta . I have couple comments:
- As I've mentioned on the mailing list, ideally the storage resource itself enforces the quota. That said, it is possible to have Airavata validate a quota external to the storage resource as you've implemented here. However, one issue I have with adding an API method to check the quota is that this means the quota will only be enforced if the API client chooses to do so. Who calls
validateStorageLimit
and what do they do with it if it throws an exception? The client could just ignore the exception or just not callvalidateStorageLimit
at all. This seems like it would be better not as a public API method but as an internal one that is used by Airavata to check and enforce the quota, say perhaps as part of checks done before launching the experiment. - Also, since the storage quota check involves establishing an SSH connection and running a remote command, I think it would be better suited to being executed as a Helix task and in the background. This way it doesn't tie up API server resources and it will be able to tolerate transient network failures. It could either be incorporated into one of the existing workflows for launching an experiment, or, I think ideally, it could be implemented as a new workflow that gets queued up maybe after an experiment completes to get and store the amount of storage space used by a user (would need a database table to persist this).
Hi @machristie , Thanks for reviewing the code.
I did want the validation to happen internally, but the problem I faced was, during the experiment creation phase in Airavata, the experiment model does not have any data related to the StoragePreference in which the experiment is being created. Changing the createExperiment() method to accept another parameter would mean changes across all the gateways. And in my previous discussions with the team, I came to know that in the future, similar to choosing compute preferences for an experiment during the experiment creation phase, we're gonna develop a new functionality where the user gets to choose the StoragePreference in which he/she is going to create the experiment. With that thought process, I created a new API that can be invoked by any gateway, if they choose to use this feature. Basically, the public API can be changed to an internal API now. Will make those changes soon.
Even if I remove the new public API and integrate it with createExperiment(), this approach would still be consuming APIServer's resources(though we're using pooled resources instead of creating a new SSH connection every time). Does it make sense to just stick with the original approach - the gateway worrying about the storage quotas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't evaluate the algorithm yet. Need another pass after basic coding standards are met
|
||
GatewayResourceProfile gatewayResourceProfile = regClient.getGatewayResourceProfile(gatewayId); | ||
String token; | ||
if(isValid(storagePreference.getResourceSpecificCredentialStoreToken())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (
token = storagePreference.getResourceSpecificCredentialStoreToken(); | ||
else | ||
token = gatewayResourceProfile.getCredentialStoreToken(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null check for the token
RegistryService.Client regClient = registryClientPool.getResource(); | ||
try { | ||
StoragePreference storagePreference = regClient.getGatewayStoragePreference(gatewayId, storageResourceId); | ||
if(storagePreference.getUserStorageQuota() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (
|
||
String groupResourceProfileId = experiment.getUserConfigurationData().getGroupResourceProfileId(); | ||
if (groupResourceProfileId == null) { | ||
logger.error("Experiment not configured with a Group Resource Profile: {}", experiment.getExperimentId()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experiment {} not configured with a Group Resource Profile
public void validateStorageLimit(AuthzToken authzToken, ExperimentModel experiment, String storageResourceId) | ||
throws InvalidRequestException, AiravataClientException, AiravataSystemException, TException { | ||
String gatewayId = authzToken.getClaimsMap().get(Constants.GATEWAY_ID); | ||
RegistryService.Client regClient = registryClientPool.getResource(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return the regClient back to the pool after using
throw exception; | ||
} | ||
|
||
double userDirectorySize = Double.parseDouble(output.getStdOut().substring(0, output.getStdOut().length() - 2).trim())/1024.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print output.getStdOut() as a log before doing this. You don't know what's happening here in case of and error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do you think that the output format is same for all operating systems (Linux)? If not, your algorithm might be wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And do you think that the output format is same for all operating systems (Linux)? If not, your algorithm might be wrong
Did not know about such a use case. Will test it and update it if necessary.
throw exception; | ||
} | ||
} catch (Exception ex) { | ||
logger.error(ex.getMessage(), ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detailed error message
String[] directories = experimentDataDirectory.split("/"); | ||
StringBuilder userDirectory = new StringBuilder(); | ||
|
||
for(int i = 0; i < directories.length - 2; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible index out of bound error here
StringBuilder userDirectory = new StringBuilder(); | ||
|
||
for(int i = 0; i < directories.length - 2; i++) | ||
if(!directories[i].isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (
} | ||
} | ||
|
||
private String getUserDirectory(String experimentDataDirectory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a blind algorithm. At least provide your assumptions for experimentDataDirectory value as comments
Regarding validating internally, I would add the check to launchExperiment instead of createExperiment.
That's true, but the current approach only checks before the experiment runs and so doesn't account for experiment output files.
Well, my two cents, but I don't think the experiment needs to be deleted. We just need to set a flag in the database that the user is over quote on that storage resource and then prevent further file uploads/experiments on that storage resource.
Sure, Helix's task framework has builtin fault tolerance support, for example retrying in the case of failure: https://helix.apache.org/0.8.0-docs/tutorial_task_framework.html. By transient network failure I mean some sort of transient network failure between Airavata and the SSH host that prevents the SSH connection from establishing or completing successfully. |
This PR includes changes to validate the storage limit for a user. Airavata retrieves the size of a user given an experiment and validates against the UserStorageQuota in StoragePreference