# FastHTML OpenID walkthrough

This notebook illustrates using OAuth/OpenID in FastHTML, by using the library to follow Google's own [OpenID Connect](https://developers.google.com/identity/openid-connect/openid-connect#re-consent) guide. It therefore sits at an explanatory point in between short instructions to just get it working, and a deeper explanation of OAuth in general.

If you're looking for quick instructoons to get up and running, check out [FastHTML oauth explainer docs](https://fastht.ml/docs/explains/oauth.html). Or if you want deeper background resources, check out the pages [What is OAuth2](https://www.romaglushko.com/blog/whats-aouth2/#clients) and of course the [OAuth2 spec](https://datatracker.ietf.org/doc/html/rfc6749#section-2).

But if you want something in between, here we go!

### Authentication, OpenID, and OAuth

You might wonder, why are we talking about OpenID rather than OAuth? Or vice versa?

OpenID provides authentication. But it's built on the more general foundation of OAuth, so to use it you end up using OAuth configuration pages, libraries, and jargon. This creates enough moving pieces that, if you want to understand OpenID, you need to bite the bullet and learn the OAuth lingo. In particular, OAuth describes four roles:

- the Resource Owner (aka, the end-user).
   This will be a person who wants to login to our web app.
   
- the Client Application.
  This will be our web app, which will be implemented by `server.py`.
  
- the Authorization Server.
  This will be a server which dispenses _authorization codes_, _refresh tokens_, and _access tokens_. In this example, it will be Google's servers.
  
- the Resource Server.
  This will be a server which provides access to _resources_, when it is given an appropriate _access token_.  In this example this will also be Google's servers.

In this example, we are not accessing a _material resource_ like a Google Doc. We are only using Google as an identity provider, to enable users to login with their gmail account. So in this case, the resource is user identity information, such as a numerical Google Account ID, but also possibly other profile information like the user's name and email address.

This is the miminal use of OAuth which is at work whenever you use Google just for login ("Sign-in with Google"). This use of OAuth is what is described and refined by OpenID Connect. This is why, although we are only using Google for login (with OpenID), we get stated by setting up OAuth configurations.

## Configuring your app's Google Cloud OAuth configurations

First you need to configure a few settings in Google Cloud Console (GCC), so that Google and your app can coordinate.

Once you're at the GCC page for your app, these are the main settings you need to configure. (Real talk: this is the most annoying part of the process.)

### 1. Application type, and other settings

First, define your application type as as a "Web Application". (OAuth works a bit differently for other application types, which we will not discuss here.)

There are also settings for a [user consent screen](https://developers.google.com/identity/openid-connect/openid-connect#re-consent), which you may ignore for now.


### 2. Authorized Redirect URI

Second, on the GCC (Google Cloud Console) page, look for the section titled "Authorized Redirect URIs" and for now enter the following value (take note that it is `http` not `https`!): `http://localhost:5001/redirect`.

We will explain the meaning of this value later when we show what it is for.

### 3. ClientID and Client Secret

Finally, we need get your application's ID. For your app (the Client) to use OAuth2 to access Google identity server (the Authorization Server), your app needs to declare and prove who it is to Google.. In other words, your app, the Client, needs to be able to authenticate _itself_ to Google.

In a webapp, this is done by coordinating on a shared secret ahead of time -- the _Client ID_ and the _Client secret_. The client ID says who the app is, the client secret proves it. It is essentially your app's own password to Google.

After clicking the down arrow to download the credits you will see a modal dialog saying "OAuth client created", and showing your Client ID and Client secret, and saving them to a json file. Rename that file to `creds.json`: 

## Implementing your server

Now let us build an oauth2 Client with fasthtml. It is an OAuth Client since it is a client of Google's authorization servers. But it is still a web server to browser requests, so we will call it `server.py`.

We will follow the steps of the [Google guidance](https://developers.google.com/identity/openid-connect/openid-connect#re-consent), but using FastHTML and explaining details more fully.

First, create `server.py`, and import `fasthtml`, `fastlite` (to store our user database), and the `GoogleAppClient` class. This last class represents our app as the client for interacting with Google. When we create the Google AppClient, we give it our app's own credentials so it can authenticate itself (i.e., login) to Google's servers

In [None]:
#|export
from fasthtml.common import *
from fasthtml.oauth import GoogleAppClient, redir_url
from fastlite import *

cli = GoogleAppClient.from_file('creds.json')
# creds.json should be downloaded directly from Google Cloud Console,
# with a name like client_secret_LONGCODE.apps.googleusercontent.com.json

#### Creating an Authorization Link

Suppose an end-user wants to login to your app using their Google account.

To do this, they'll click your app's "sign-in with Google" link, an authorization link. This link will send the user to a Google page, where Google will ask them if they'd like to login to use your app.

(In OAuth terms, this is the end-user delegating to your app the authority to know their Google account ID. Or it can be thought of as asking Google to attest that a user exists with a stable account ID.)

We need to create that authorization link. The link needs to pack in a few obvious pieces of information to tell Google:

1. that the user is coming from your app (so it needs your Client ID),
2. what permissions your app is asking for (so it needs the so-called "scope"),
3. where Google should send the resulting permissions (so it needs your app's _authorized redirect URL_, where Google will send its reply indirectly by redirecting the user's browser).

Finally, the link also needs a fourth, less obvious piece of information:

4. an anti-CSRF token. This is a token you give Google, just so you can check it later, to ensure that any reply from Google is _actually a reply to your specific request_. This is to protect against an attacker who tricks a user with replies to his requests. This is known as a cross-site request forgery attack, so this last piece of information is known as the anti-CSRF token, which can be a random string.

Putting all the pieces together you can generate the authorization link like so:

In [None]:
import secrets

csrf_state_token = secrets.token_urlsafe(64)
redirect_uri="http://localhost:5001/redirect"
scope=['openid',                                           # for sub (the user ID) and picture
       'https://www.googleapis.com/auth/userinfo.email',   # for hd, email, email_verified
       'https://www.googleapis.com/auth/userinfo.profile'  # for name, family_name, given_name
      ]
print(cli.login_link(redirect_uri,scope=scope, state=csrf_state_token))

https://accounts.google.com/o/oauth2/v2/auth?response_type=code&client_id=67965167765-qbufldjg7hd9q81o3m6g8he5ek036usi.apps.googleusercontent.com&redirect_uri=http%3A%2F%2Flocalhost%3A5001%2Fredirect&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&state=x5pwYgANxWvdkfOH0MWiKYPJr9l-UTKzUXLOJwzR5sZSKGJLJbJXiC73_ZGzzt6_MM_lZ6XJG0iC9Keg9QDNPQ


#### Sending an authorization request to Google

Follow the above sign-in link, by **copying it into your pasteboard and opening in an incognito window**.

This is just as if you had clicked a "Sign-in to Google" button. We use an incognito window to ensure that Google generates a _fresh_ authorization code rather than redirecting you with an expended and nonfunctioning code.

The first time you click this sign-in link, this will take you to a Google login page. After logging in, Google will redirect your browser to the redirect URL, which will be the redirect URI we provided when preparing the link, plus additional query parameters. The second time you click this sign-in link, Google will redirect you immediately.

That redirect URL is the means by which Google's authorization server sends information back in reply, such as the authorizaiton code and the echo of the anti-csrf state token which you included with the original request. That information is what's in the additional query parameters.

**Copy the redirect URL from your browser's navigation bar**, so we can process it manually, as if we were already hosting a server at that URL's path which would receive the request and process it.


**Paste the redirect URI copied from your nav bar into the definition below**:

In [None]:
captured_redirect_URL = "http://localhost:5001/redirect?state=x5pwYgANxWvdkfOH0MWiKYPJr9l-UTKzUXLOJwzR5sZSKGJLJbJXiC73_ZGzzt6_MM_lZ6XJG0iC9Keg9QDNPQ&code=4%2F0Ab_5qllMspx79AyPdr6tx9-i4xPY01qpl-d91_jLHcOoeqjkVTlL9XRFTWMKRvNw_IR9iA&scope=email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+openid&authuser=0&hd=answer.ai&prompt=consent"
captured_redirect_URL

'http://localhost:5001/redirect?state=x5pwYgANxWvdkfOH0MWiKYPJr9l-UTKzUXLOJwzR5sZSKGJLJbJXiC73_ZGzzt6_MM_lZ6XJG0iC9Keg9QDNPQ&code=4%2F0Ab_5qllMspx79AyPdr6tx9-i4xPY01qpl-d91_jLHcOoeqjkVTlL9XRFTWMKRvNw_IR9iA&scope=email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+openid&authuser=0&hd=answer.ai&prompt=consent'

E.g., you should see something like: `http://localhost:5001/redirect?
 state=bmHlmaseHq_tyg4NnrM-3DHjmzzuad4nFSPl2c0GDFTjnTU9J-5wmWyOKBXS5G7cFUCNpYvfH-k8ZVmTQdqPGw
 &code=4%2F0Ab_5qlnrdI79TvSK4ik8lbhrnG1aQrMl6c9ktt0B9vxquClmmegSZhrwhLKDPjDWdU6LKQ
&scope=email+profile+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.profile+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+openid
 &authuser=0
 &hd=answer.ai
 &prompt=consent`

#### Confirming the anti-forgery state token

Now, let's get the authorization code and state from the redirect URL. When we handle this with a route, we will parse the code from the URL query parameters.  For now, we will parse the redirected URL manually.

In [None]:
from urllib.parse import urlparse, parse_qs

def get_code_and_state(url):
    parsed_url = urlparse(url)
    query_params = parse_qs(parsed_url.query)
    captured_code = query_params.get('code', [None])[0]
    captured_state = query_params.get('state', [None])[0]
    return (captured_code,captured_state)

(captured_code,captured_state) = get_code_and_state(captured_redirect_URL)

print(f"{captured_code=}")
print(f"{captured_state=}")


captured_code='4/0Ab_5qllMspx79AyPdr6tx9-i4xPY01qpl-d91_jLHcOoeqjkVTlL9XRFTWMKRvNw_IR9iA'
captured_state='x5pwYgANxWvdkfOH0MWiKYPJr9l-UTKzUXLOJwzR5sZSKGJLJbJXiC73_ZGzzt6_MM_lZ6XJG0iC9Keg9QDNPQ'


You should verify that the captured state matches the state you passed in:

In [None]:
assert csrf_state_token == captured_state

This confirms no one is CSRF attacking you!

#### Exchanging the authorization `code` for access token and user ID

The point of the authorization code is to authorize. 

In particular, you use the authorization code only once, in order to get access to other items, such as an access token (for accessing protected resources) and possibly a refresh token (for getting subsequent acccess tokens).

Let's use our new authorization code to get an access token now.

To request the token, we will need the code and the redirect URL, stripped of query parameters and fragments: 

In [None]:
from urllib.parse import urlparse, parse_qs

def strip_to_scheme_host_port_path(url):
    parsed_url = urlparse(url)
    u =  f"{parsed_url.scheme}://{parsed_url.hostname}"
    if port := parsed_url.port: u += f":{port}"
    u += f"{parsed_url.path}"
    return u

In [None]:
stripped_redirect_url = strip_to_scheme_host_port_path(captured_redirect_URL)
stripped_redirect_url

'http://localhost:5001/redirect'

#### Exchanging the `code` for access token and ID information, manually

In this section, we'll exchange the code for ID information directly with plain HTTP, to show how the mechanism works. But if you just want to get the results with the FastHTML API, skip to the next section, _Exchanging the `code` for ID information, with FastHTML_.

This is the URL for the Google authorization server, where we will send the authorization code to get back an OAuth access token:

In [None]:
cli.token_url

'https://oauth2.googleapis.com/token'

Now we build the payload of our request for an access token, using our auth code, our client credentials, and our redirect URL:

In [None]:
payload = dict(code=captured_code,
               redirect_uri=stripped_redirect_url, 
               client_id=cli.client_id, client_secret=cli.client_secret, 
               grant_type='authorization_code')
payload

Let us launch the request and inspect the response

In [None]:
import httpx
res = httpx.post(cli.token_url, json=payload)
res

<Response [200 OK]>

In [None]:
import json
json.loads(res.content).keys()

dict_keys(['access_token', 'expires_in', 'scope', 'token_type', 'id_token'])

The top-level keys include an oauth "access token", which we will use to make a follow-up request for the user information from the Google's URL for OpenID user information:

In [None]:
goog_info_url = cli.info_url
goog_info_url

'https://openidconnect.googleapis.com/v1/userinfo'

In [None]:
access_token = json.loads(res.content)['access_token']
access_token

In [None]:
user_info = httpx.get(goog_info_url, 
                      headers={'Authorization': f'Bearer {access_token}'}).json()
user_info

{'sub': '107642229056099864018',
 'picture': 'https://lh3.googleusercontent.com/a-/ALV-UjWgEkn5Ft5CN9LguAKv3tjwAV-s7X33jUmNNccScQGw9KDV3Q=s96-c',
 'email': 'ag@answer.ai',
 'email_verified': True,
 'hd': 'answer.ai'}

The most important value is under `sub`, which is short for "subject". This is Google's stable ID for the user. `hd` is a special Google key, indicating which hosted domain (i.e., which Google Cloud organization) the email address belongs to.

This is what we sought. We can now use this value to load and save info from our user database, from our session database, etc..

##### Sidebar: why an "id token"? why not a refresh token?

We got the access token from the response and used it to get user information like their ID (their `sub`) in a follow-up request. This is the classic oauth flow, where you exchange an authorization code for an access token and a refresh token, and then use the access token to access resources.

However, you might have noticed a few oddities. The original response also included an "id token" and omitted any refresh tokens. Why?

The reason for this is because we're not using just OAuth but OpenID, since we requested the 'openid' scope and chose Google's OpenID endpoint. The id token is an OpenID-defined value, a JWT which in fact includes the `sub`, the `hd`, and other values in our initial reply. So we didn't strictly need to use the access token for a follow up request to get those values. We did it to conform to the same flow which we would use for _other_ kinds of grants, like access to mutable resources like an API or a document.

This is also the reason we did not need and therefore did not request a refresh token. Refresh tokens are for refreshing access tokens, which expire fairly quickly. For instance, this one will expire in an hour, but that does not matter because we use it immediately to get an identifier, which is guaranteed to be accurate indefinitely.

In other words, we get a token which gives us short-lived access to the user's ID. But the ID does not change. In our app, the lifetime of our login session will depend not on the lifetime of that access token, but on a separate session mechanism which the app implements, using the ID simply as an identifier.

If you're curious, you can look inside the JWT ID token which we ignored, and see the familiar values and some other ones, whose meaning is defined by the [OpenID spec](https://openid.net/specs/openid-connect-core-1_0.html#IDToken).

In [None]:
import base64, json, hmac, hashlib

def decode_jwt(token, secret=None, verify=True):
    """
    Decode and parse a JWT token.
    
    Args:
        token (str): The JWT token to decode
        secret (str, optional): Secret key for verification
        verify (bool): Whether to verify the signature
        
    Returns:
        dict: The decoded payload
    """
    # Split the token into header, payload, and signature
    parts = token.split('.')
    if len(parts) != 3:
        raise ValueError("Invalid token format")
    
    header_b64, payload_b64, signature_b64 = parts
    
    # Decode header and payload
    def decode_base64_url(b64string):
        # Add padding if needed
        padding = '=' * (4 - len(b64string) % 4)
        b64string += padding
        # Replace URL-safe characters
        b64string = b64string.replace('-', '+').replace('_', '/')
        return base64.b64decode(b64string)
    
    header_json = decode_base64_url(header_b64)
    payload_json = decode_base64_url(payload_b64)
    
    header = json.loads(header_json)
    payload = json.loads(payload_json)
    
    # Verify signature if requested and secret is provided
    if verify and secret:
        if 'alg' not in header:
            raise ValueError("Algorithm not specified in header")
        
        # Get the algorithm from the header
        alg = header['alg']
        
        if alg == 'none':
            # No verification needed
            pass
        elif alg == 'HS256':
            # HMAC using SHA-256
            message = f"{header_b64}.{payload_b64}"
            signature = hmac.new(
                secret.encode('utf-8'),
                message.encode('utf-8'),
                hashlib.sha256
            ).digest()
            computed_sig_b64 = base64.urlsafe_b64encode(signature).decode('utf-8').rstrip('=')
            
            # Remove padding from the original signature for comparison
            signature_b64 = signature_b64.rstrip('=')
            
            if not hmac.compare_digest(signature_b64, computed_sig_b64):
                raise ValueError("Invalid signature")
        else:
            raise NotImplementedError(f"Algorithm {alg} not implemented")
    
    return {
        'header': header,
        'payload': payload
    }


In [None]:
decode_jwt(json.loads(res.content)['id_token'])['payload'].keys()

dict_keys(['iss', 'azp', 'aud', 'sub', 'hd', 'email', 'email_verified', 'at_hash', 'iat', 'exp'])

#### Exchanging the `code` for ID information, with FastHTML


This section will describe how to get the user ID information in exchange for an authorization code, using FastHTML rather than doing it manually.

(Since authorization codes are single-use, you will need to get a fresh authorization code for this to work.)

In FastHTML, you just call `retr_id`, passing in the code and the redirect url.

under the hood, this gets the access token and then uses it to request the ID value directly.

In [None]:
ident = cli.retr_id(captured_code, stripped_redirect_url)
ident

'107642229056099864018'

After calling `retr_id`, we can get other granted information by using `get_info`:


In [None]:
info = cli.get_info()
info

{'sub': '107642229056099864018',
 'name': 'Alexis Gallagher',
 'given_name': 'Alexis',
 'family_name': 'Gallagher',
 'picture': 'https://lh3.googleusercontent.com/a/ACg8ocICRqrdeqJnMOmf_5_Mbkb6PXM9Hs9NJMY_9TMJSL20Ig2J9Q=s96-c',
 'email': 'ag@answer.ai',
 'email_verified': True,
 'hd': 'answer.ai'}


> Note: How to use fasthtml.oauth.

Here are some key points to remember on how to use the functions in `fasthtml.oauth`:

1. The `retr_info` and `retr_id` and `parse_response` functions form a family, such that any member of the family may only be called once. This is because they take the authorization code as an argument and expend it, so later calls will fail.
2. The `get_info` function only works after you have called one of the functions above.
3. The `get_info` function only works until the access token expires, performs no caching, and does not handle refreshing the access token. So you need to track its call history yourself.


### Implementing the flow with route handlers

The above implements the entire authorization code flow manually. Now let us see how the same flow looks in a FastHTML app.

There are two differences to keep in mind:

First, obviously, in the above example, you manually copied and pasted the authorization URL into a browser to kick off the flow, and then manually copied and pasted the resulting redirect URL into the notebook for processing. In a web app, both those actions will be initiated by routes -- your app's _login route_ which presents the authorization URL for the user to sign-in with Google, and the _redirect route_, where the user receives the redirected browser.

Second, more subtly, in the above example it sufficed to save the anti-csrf token in a global variable. This is because we manually handled parsing the redirect URI so we knew it came from us, and because the notebook acted like a single browser session.

In a web app, which must handle many users at once, we will save the anti-csrf in the session dictionary. This binds the reply delivereed via the redirect URL with the initial request to the auth servers, ensuring that a user cannot be tricked into opening a redirect URI which was a reply to a request launched from a different browser, and therefore from a different user session.

Taking all that into account, the complete flow looks like this:

In [None]:
#|export
from fasthtml.common import *
from fasthtml.oauth import GoogleAppClient, redir_url
from fastlite import *

cli = GoogleAppClient.from_file('creds.json')
# creds.json should be downloaded directly from Google Cloud Console,
# with a name like client_secret_LONGCODE.apps.googleusercontent.com.json

app,rt = fast_app()
scope=['openid',
       'https://www.googleapis.com/auth/userinfo.email', 
       'https://www.googleapis.com/auth/userinfo.profile']
redirect_path="/redirect"

# when the user visits /login, they will be offered the link to sign-in with Google.
@rt
async def login(request, session):
    # 1. we create the anticsrf token, and save it in the user's session
    anti_csrf_token = secrets.token_urlsafe(64)
    session['anti_csrf_token'] = anti_csrf_token
    
    # we build the redirect URL dynamically, to avoid hardcoding the hostname and port.
    full_redirect_url = redir_url(request, redirect_path)
    # 2. when the user clicks the login_link, they launch the authentication request to Google
    return cli.login_link(full_redirect_url, scope=scope, state=csrf_state_token)        

# The reply arrives thru this redirect path, which we configured Google to know about
@rt 
async def redirect(request, session, code:str, state:str):
    # 3. we verify the anticsrf token was the one expected, ensuring this reply was to our own request
    if state != session['anti_csfr_token']:
        return "Login failed. CSRF token mismatch, suggesting a CSRF attack."
    else:
        # 4. We exchange the authorization code for an access token, and then for info
        # Retrieving the id expends the auth code, but also retrieves an access token
        ident = cli.retr_id(code, redir_url(request, redirect_path))
        # 5. Once we have the access token, we can fetch the user info until the token expires
        info = cli.get_info(code, redir_url(request, redirect_path))
        print(f"{ident=}")
        print(f"{info=}")
        # 6. Here we'd use the IDENT value to lookup the user's records in our user db,
        # or session db
        
