Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented new REST API hpFlows to add high priority flows to Azkaban. #3014

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

djaiswal83
Copy link
Member

These high priority flows when run in containeirzed dispatch mode, will only use ACTIVE version of the docker image of all the jobtypes required.

This ensures that these flows always run on a stable version rather than any version which is being ramped up and is potentially unstable.
There API can be only called by either Azkaban admins or the designated owners.
Currently the ability to add owners is a manual process. An ADMIN only API to add such owners will come in the future.
API to remove a flow or list of flows from the high priority flows will also come in future.

Added unit tests as and where needed.

…ll only use ACTIVE version of the docker image of all the jobtypes required.

This ensures that these flows always run on a stable version rather than any version which is being ramped up and is potentially unstable.
There API can be only called by either Azkaban admins or the designated owners.
Currently the ability to add owners is a manual process. An ADMIN only API to add such owners will come in the future.
API to remove a flow or list of flows from the high priority flows will also come in future.

Added unit tests as and where needed.
*/
@Override
public boolean isHPFlowOwner(final String userId) {
final FetchHPFlowOwnership fetchHPFlowOwnership = new FetchHPFlowOwnership();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to create a new object every time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, I believe I followed the pattern used in the rest of the code. Do you see any harm in doing this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that in general dependencies should be injected. I don't see a specific issue here, but I think it might be better memory management to only create the object as necessary, especially since it seems like we're using this in a mostly static context (only to access the encapsulated query).

log.info("Added the HP flows by user " + userId);
return count;
} catch (final SQLException e) {
log.error("Unable to add HP flows", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we interested in adding the user in this log message, if we're adding the user in the info message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good catch. Will add.

if (e.getErrorCode() == SQL_ERROR_CODE_DUPLICATE_ENTRY) {
errorMessage = "Reason: One or more flows already exists.";
} else {
errorMessage = "Reason: ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we say "Reason: Unknown" here, or something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

public class HPFlowDTO extends BaseDTO {
// CSV of flow ids.
@JsonProperty("flowIds")
@NotBlank(message = "flowList cannot be empty.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be flowIds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Should be consistent.

* @return list of flow IDs.
*/
public List<String> getFlowIdList() {
if (this.flowIds.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this possible without an error from the Not Blank annotation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a sanity check. If you think its useless, let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better not to validate twice, but that is just my personal preference.

}

@Override
protected void handleGet(final HttpServletRequest req,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC: Is this a todo or do we have no plans to implement the Get call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time when this PR was created, expectation was to have basic functionality implemented hence this one was not implemented. However, things have now changed, it will be implemented in the upcoming commit for this PR.

* @param user
* @return
*/
private boolean hasHPFlowManagementPermission(final User user) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be incorrect as I am going from memory, but wouldn't a direct call to the permission manager provide the same functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API checks if the user is Azakaban admin or not, if yes, then PermissionManager is not called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. I was miss-remembering. Thank you.

Copy link
Contributor

@jakhani jakhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will users be added as owners for hp_flows? Will it be manual task and db entry directly? Can you please also add ticket for getHpFlows? It will be useful API. It will also be good to show this information on /status page. Can you please add task for that too?

* @return
* @throws ImageMgmtException
*/
public boolean hasPermission(final String userId) throws ImageMgmtException;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is validating permission for hp flow. Can you please rename this method to convey the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

}

// Throw exception with FORBIDDEN error.
log.error(String.format("API access permission check failed. The user %s "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please create object for error message and use it for both logging as well as exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do.

*/
public interface HPFlowDao {
/**
* isHPFlowOwner
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please appropriate comment for each method? For example, why is hpFlowOwner only taking userId and no reference for flow? Shouldn't there be a mapping between user and flow in metadata?
Please add appropriate method doc for rest of the methods too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in reply to main comment, the idea was to have only a handful of owners for a certain jobtype to use this functionality. The logic with mapping will come in next commit.


// Flow ID is a fully qualified flow name with format,
// <project_name>.<flow_ame>
private static String INSERT_HP_FLOW_OWNER =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any mapping between hp_flow_owners and hp_flows tables? Can anyone be owner for any flows and insert anything? Can you please add more detail about who can be the owner?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in earlier comment.

final FetchHPFlowOwnership fetchHPFlowOwnership = new FetchHPFlowOwnership();
try {
final String returnedName = this.databaseOperator
.query(FetchHPFlowOwnership.FETCH_HP_OWNER_BY_NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename FETCH_HP_OWNER_BY_NAME to FETCH_HP_FLOW_OWNER_BY_NAME?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

-- Definition for hp_flow_owners table. It contains list of owners who have
-- access to add/remove high priority flows.
CREATE TABLE IF NOT EXISTS hp_flow_owners (
owner VARCHAR(64) NOT NULL PRIMARY KEY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw insert and select query where this column name is "name". This sql script and code in PR will break things. Can you please test this part and change column name appropriately?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I tested it myself but I am admin so it never went to user query, will fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please test everything as user on solo-server and verify all the APIs. You will miss many bugs otherwise.

-- Definition for hp_flows. It contains the flow_id in format "project_name.flow_name".

CREATE TABLE IF NOT EXISTS hp_flows (
flow_id VARCHAR(256) NOT NULL PRIMARY KEY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this field based on my comments about flowId and flowName.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.


@VisibleForTesting
List<String> getFlowIdList(final HPFlowDTO hpFlowDTO) {
return hpFlowDTO.getFlowIdList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the input to get FlowIdList? Why is it needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add appropriate scope for this method and documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is for testing only, will add comments.


private static final Logger log = Logger.getLogger(HPFlowServlet.class);
private static final String BASE_HP_FLOW_URI = "/hpFlows";
private ConverterUtils converterUtils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Util should not be an instance. All the methods in utils should be public static. Can you please remove this instance?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 with @jakhani

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an instance, in the init() method, it is fetched from the server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils should only have static methods.

-- Definition for hp_flows. It contains the flow_id in format "project_name.flow_name".

CREATE TABLE IF NOT EXISTS hp_flows (
flow_id VARCHAR(256) NOT NULL PRIMARY KEY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add index for flow_id if there is chance of it growing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is primary key, it is indexed by default.

@@ -120,3 +120,20 @@ CREATE UNIQUE INDEX version_set_md5
-- alter table execution_flows add column dispatch_method TINYINT default 1;
-- CREATE INDEX ex_flows_dispatch_method ON execution_flows (dispatch_method);

-- Definition for hp_flow_owners table. It contains list of owners who have
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a design doc for the db tables, dao and service?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a method to update the hp_flow_owners table, such as insert or delete. Also it might be useful if we add a state (active 1 vs inactive 0) col for the owners table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As earlier stated, the original intent of this PR was to get basic functionality out and this was supposed to be followed by implementation for ownership mapping. However, this did not go in time due to other factors and the rest of the implementation will be done in this PR only.

* @return
* @throws ImageMgmtException
*/
public boolean hasPermission(final String userId) throws ImageMgmtException;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name should be hasHPFlowPermission

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it same as there is already a method with same name.

* @return boolean
*/
@Override
public boolean hasPermission(final String userId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems all HP owners have permission to update the hp_flows table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, yes. Eventually no.

-- Definition for hp_flows. It contains the flow_id in format "project_name.flow_name".

CREATE TABLE IF NOT EXISTS hp_flows (
flow_id VARCHAR(256) NOT NULL PRIMARY KEY,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about index col and project_id col?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow_id is fully qualified with project name.

@djaiswal83
Copy link
Member Author

At this PR creation time, the idea was to keep handful of owners for a jobtype partner who will add the HP flows. However, now the priorities have changed and we can work on implementation to manage ownership mapping.

*/
private boolean hasHPFlowManagementPermission(final User user) {
/**
* Azkaban ADMIN role must have full permission to access image management APIs. Hence, no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this copy-paste text perhaps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants