Skip to content

Conversation

@jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Aug 20, 2024

Purpose and background context

The Application Load Balancer (ALB) that manages authentication and authorization provides an access token and user claims (info about user) in the request headers routed to its target, the Flask app. This PR uses Flask-Login to implement a login method to identify users and detect "stale" sessions, refreshing access tokens and user claims JWTs as needed.

Note(s)

  1. A new environment variable SECRET_KEY is introduced. The SECRET_KEY is required by the Flask app to "securely sign[ing] the session cookie and can be used for any other security related needs by extensions or [the] application" and is set for the Flask app.
  2. A request_loader callback method is implemented to set the current_user in the Flask session.
  3. A 'logout' view has been added.
    • Request for input: Curious to get your thoughts here. 🤔 As the docstring describes, logging out is simply clearing the Flask session or cookie, but it doesn't log them out of the ALB or Okta. When I spoke with @cabutlermit , he mentioned that he was planning on setting the ALB auth token to expire after 1 hour (long after an ECS task run would complete -- i.e., user done with the application). It is unclear right now what this means, but essentially, if a user "logs out" and reloads the home page -- they will be automatically "logged in". Does this feel a little "off" in terms of user experience? 🤔
    image * This is what it looks like when the user visits the `/logout` route. * I removed the breadcrumbs / navigation that normally appear at the top: image Because when the user selects "Home" or "Process invoices", I worried it was confusing to see the page reload automatically 🤔. Thinking that removing the breadcrumbs / navigation would nudge users to reload the main URL instead.

How can a reviewer manually see the effects of these changes?

  1. Run make test and verify all tests are passing.
  2. New and updated fixtures and unit tests are introduced in this PR.
  3. It's also worth recalling the demonstration Graham shared with DataEng that explained how the ALB / Okta integration works.
  4. To see the effects of configuring the LOGIN_DISABLED environment variable, try setting the variable to "true" and "false" and running the app with the command: pipenv run sapinvoices_flask_app.

Includes new or updated dependencies?

YES; updated dependencies and added packages (flask_login, pyjwt) and dev-packages (types-requests)

Changes expectations for external applications?

NO

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/IN-1033

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed and verified
  • New dependencies are appropriate or there were no changes

@jonavellecuerdo jonavellecuerdo self-assigned this Aug 20, 2024
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1033-auth-flask-login branch 2 times, most recently from c827645 to 073e11e Compare August 20, 2024 17:18
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review August 20, 2024 17:47
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1033-auth-flask-login branch from 073e11e to 46adb59 Compare August 20, 2024 17:49
@jonavellecuerdo jonavellecuerdo changed the title Add login method to identify users Use Flask-Login for user session management Aug 20, 2024
@coveralls
Copy link

coveralls commented Aug 20, 2024

Pull Request Test Coverage Report for Build 10497196660

Details

  • 66 of 82 (80.49%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-1.6%) to 85.205%

Changes Missing Coverage Covered Lines Changed/Added Lines %
webapp/config.py 10 14 71.43%
webapp/utils/init.py 9 14 64.29%
webapp/app.py 47 54 87.04%
Totals Coverage Status
Change from base Build 10425222108: -1.6%
Covered Lines: 311
Relevant Lines: 365

💛 - Coveralls

@ghukill
Copy link
Contributor

ghukill commented Aug 20, 2024

  • Request for input: Curious to get your thoughts here. 🤔 As the docstring describes, logging out is simply clearing the Flask session or cookie, but it doesn't log them out of the ALB or Okta. When I spoke with @cabutlermit , he mentioned that he was planning on setting the ALB auth token to expire after 1 hour (long after an ECS task run would complete -- i.e., user done with the application). It is unclear right now what this means, but essentially, if a user "logs out" and reloads the home page -- they will be automatically "logged in". Does this feel a little "off" in terms of user experience? 🤔

Before I jump into a review, my vote would be to keep the flask logout route, but not show this anywhere in the UI.

This would allow us to add this functionality in the UI later if we want, but keep it from feeling "off" (agreed there) for the user experience.

I do think it's possible our flask app could more meaningfully "logout" (at the ALB and/or Okta level), but this hasn't been requested yet. Keeping this flask route, almost like a stub, I think would be helpful in case we do want to explore that later.

@jonavellecuerdo jonavellecuerdo force-pushed the IN-1033-auth-flask-login branch from dd7f80b to ca48a93 Compare August 20, 2024 19:19
Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this feels like a great approach, and well executed.

My comments and requested changes, many of which are nit-picky or stylistic, come from this being our first implementation of the ALB + lambda authentication, and the very real possibility this could become a template for how we do it in other applications.

I have no doubt this would work as expected, and laid out well at that, but felt this was a good time to offer some slight tweaks and alternate options before we get too baked in.

But again, overall, think it's great. Feels like an elegant approach to managing a "logged in user" when in fact every request that makes it to the flask app is already logged in. Leaning on flask-login feels like it makes sense, as it leans on its strengths of pre-defined patterns, built-in functions (e.g. logout_user, even if not yet really utilizing), and session management.

Nice work all around!

REGION = "us-east-1"


def parse_oidc_data(
Copy link
Contributor

@ghukill ghukill Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this utility function could benefit from a docstring. It's doing a lot of the heavy lifting of parsing authenticated user data from the HTTP headers on the request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a docstring (wip).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webapp/app.py Outdated
Comment on lines 75 to 89
# get user data if new session or refresh user data when access token expires
if oidc_access_token != session.get("oidc_access_token"):
oidc_data = parse_oidc_data(oidc_jwt_data, options={"verify_exp": False})
user = User(
id=oidc_data["mit_id"],
name=oidc_data["name"],
email=oidc_data["preferred_username"],
)
session.update(
{
"oidc_access_token": oidc_access_token,
"user": user.__dict__,
}
)
return User(**session["user"])
Copy link
Contributor

@ghukill ghukill Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to get a little complicated, but will try and make a case here.

Am I recalling correctly that we had discussed an approach that would only ever instantiate the User class in one way here? Where instead of storing a User.__dict__ in the session, we could store the parsed OIDC data and then use that? e.g.

  1. if token stale or token never set, parse OIDC data from headers
  2. store parsed OIDC data -- not the User object -- in the session, e.g. session['oidc_data'] = oidc_data
  3. last, always, instantiate a User object by using OIDC data in session:
# last part of function
oidc_data = session['oidc_data']
return User(
  id=oidc_data["mit_id"],
  name=oidc_data["name"],
  email=oidc_data["preferred_username"],
)

I would find this more readable, knowing that a User object is only instantiated in one way: pulling specific pieces of OIDC data already in the session. With that cognitive load understood and gone, I could then focus my attention on understanding when and why we might parse OIDC data from headers and set it in the session.

It may seem trivial, but I think this opens up the option of creating a new() or load() or from_session_data() method on the User class and using that data, e.g.:

class User(UserMixin):
    # ...
    # ...
    @classmethod
    def from_session_data(cls):
        oidc_data = session.get("oidc_data")
        return cls(
            id=oidc_data["mit_id"],
            name=oidc_data["name"],
            email=oidc_data["preferred_username"],
        )

Even if a method like this is not used, I still do think the @login_manager.request_loader function having only one line / one way of instantiating a User object has value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did do something like this, perhaps we could also add a safeguard handler right before we instantiate and return the User object, something along the lines of:

# rest of method...

if not session.get('oidc_data'):
    # do something meaningful...
return User.from_session_data()  # or if not the method, just instantiating and returning

We shouldn't be in a situation like this where there is not session data to work with... but this ensures if that somehow arises, the app will gracefully handle that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, apologies for missing this! I do remember us talking about replacing the use of user.__dict__. I updated the code to store OIDC data in the session to instantiate the User.

If we did implement a class method like User.from_session_data, how do you feel about returning None if oidc_data is not in the session? 🤔 I think I like it because it would align with load_user_from_request returning None if it is missing data from the request headers --- which is also worth noting (i.e., because we have these if statements, session must have OIDC data before it ever reaches User instantiation. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started a review, but I'll hold off until all changes are finalized.

I think returning None makes a ton of sense. Like you say, it mirrors the behavior if the headers are missing. No matter how you slice it: if we can't instantiate a User, then return None from this method and let the flask-login library decide how to handle things.

Good thinking!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed ☝️

Copy link
Contributor Author

@jonavellecuerdo jonavellecuerdo Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A User.from_session_data class method has been implemented. The request_loader callback is updated to use this method. Additional tests have been added.

Note: I wasn't sure if the idea was to have the condition if session.get('oidc_data') is None as part of the request_loader callback or part of User.from_session_data. 🤔 What I was proposing was the latter -- adding the condition as part of User.from_session_data. For what it's worth, I did try initially writing the condition in the callback, but I was getting a lot of linting errors -- the update I went with resolved those errors. Let me know what ya'll think!

webapp/app.py Outdated
"""

def __init__(
self, id: str, name: str, email: str | None = None # noqa: A002
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth considering something like user_id instead of id to avoid shadowing the python built-in id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I guess if we're considering not ignoring rule A002, I would propose updating to mit_id instead. What do you think, @ehanson8 ? 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we shouldn't be ignoring A002, we can always be more specific than id. I like mit_id!

Copy link
Contributor

@ghukill ghukill Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was going to say +1 to mit_id! But...

I was a little nervous that flask-login's UserMixin requires an id field, given that some other flask-login flow is passing around identifiers (almost like it knows which field to use).

I think this line in flask-login suggests that we do need this id field for it to function properly.

We could, of course, override all those related methods... but feels like a situation where it makes sense to ignore convention and linting a bit.

Sorry for starting this thread in the first place!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do folks feel about overriding UserMixin.get_id?

def get_id(self):
    try:
        return str(self.mit_id)
    except AttributeError:
        raise NotImplementedError("No `id` attribute - override `get_id`") from None

The error message seems to invite folks to override get_id method. That said, if MIT Data Warehouse decided to ever update this field name, we'd have to update the code in two places instead of one:

Given this, is everyone set for overriding this A002?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice find about the NotImplementedError, and how it suggests this is designed to be overridden. I'm sold!

I'm in full support of:

  • changing id to mit_id
  • overridding this method to treat self.mit_id as the unique, User identifier
  • and then, linting concern A002 no longer an issue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems like an elegant solution for this situation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The User class has been updated:

  • id attribute has been renamed to mit_id
  • get_id method implemented (overriding UserMixin.get_id)
    Note: I chose to ignore ruff error code EM101 to keep the try-except block consistent with parent method.

The line of code instantiating the User has also been updated.

@jonavellecuerdo jonavellecuerdo force-pushed the IN-1033-auth-flask-login branch 2 times, most recently from 3072861 to d3d80ba Compare August 21, 2024 15:54
webapp/app.py Outdated
Comment on lines 75 to 89
# get user data if new session or refresh user data when access token expires
if oidc_access_token != session.get("oidc_access_token"):
oidc_data = parse_oidc_data(oidc_jwt_data, options={"verify_exp": False})
user = User(
id=oidc_data["mit_id"],
name=oidc_data["name"],
email=oidc_data["preferred_username"],
)
session.update(
{
"oidc_access_token": oidc_access_token,
"user": user.__dict__,
}
)
return User(**session["user"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started a review, but I'll hold off until all changes are finalized.

I think returning None makes a ton of sense. Like you say, it mirrors the behavior if the headers are missing. No matter how you slice it: if we can't instantiate a User, then return None from this method and let the flask-login library decide how to handle things.

Good thinking!

@jonavellecuerdo jonavellecuerdo force-pushed the IN-1033-auth-flask-login branch 4 times, most recently from 1289e5b to f94b6d8 Compare August 21, 2024 18:01
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions but this is looking very good

Copy link
Contributor

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting an approval!

Thanks for making the changes per PR discussions. And thanks for trailblazing a new strategy for ALB authentication for lambda flask apps; I really do think we'll be able to reuse a lot of this.

I did make one comment / request, to remove the try/except block and NotImplementedError in the User.get_id() method. I'm comfortable approving, whether or not that change is made via a follow-up comit, but it makes sense to me to remove.

Nice work!

Why these changes are being introduced:
* The Application Load Balancer (ALB) that manages
authentication and authorization provides an access token
and user claims (info about user) in the request headers
routed to its target, the Flask app. With Flask-Login,
this data can be used to identify the 'current_user'
on the app and detect when a session may be stale.

How this addresses that need:
* Implement 'request_loader' callback method
* Add utility method to parse OIDC data from request headers
* Add 'User' class
* Add 'logout' view
* Decorate views using 'login_required'
* Add 'AWS_DEFAULT_REGION' to Config
* Add unit tests

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1033
Why these changes are being introduced:
* The application configuration variable LOGIN_DISABLED
is a convenient method to globally turn off the login method
(i.e., parsing OIDC data from header) for purposes of
unit testing and running the app in Dev1 (ALB is turned off for Dev1).

How this addresses that need:
* Add environment variable 'LOGIN_DISABLED' to Config

Side effects of this change:
* Environment variable LOGIN_DISABLED must be set to "true" for Dev1
and "false" for Prod and Stage.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1033
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1033-auth-flask-login branch from 2a9200d to 5e8380f Compare August 21, 2024 20:43
@jonavellecuerdo jonavellecuerdo merged commit 4e07259 into main Aug 21, 2024
@jonavellecuerdo jonavellecuerdo deleted the IN-1033-auth-flask-login branch August 21, 2024 20:49
@jonavellecuerdo jonavellecuerdo mentioned this pull request Aug 22, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants