Merge branch 'master' into berickson/makefile

ETS-Next-Gen · May 23, 2024 · 3470258 · 3470258
2 parents ea94a54 + 0e27266
commit 3470258
Show file tree

Hide file tree

Showing 42 changed files with 1,228 additions and 371 deletions.
diff --git a/.github/workflows/versioning.yml b/.github/workflows/versioning.yml
@@ -0,0 +1,27 @@
+name: Pytest
+
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: GitHub Script
+      uses: actions/github-script@v7.0.1
+      with:
+        # The script to run
+        script: |
+            const shortSHA = context.sha.substring(0, 7);
+            const date = new Date().toISOString().split('T')[0]; // Gets date in YYYY-MM-DD format
+            const formattedDate = date.replace(/-/g, '.'); // Replaces '-' with '.'
+            const tagName = `${formattedDate}-${shortSHA}`;
+            
+            github.rest.git.createRef({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              ref: `refs/tags/${tagName}`,
+              sha: context.sha
+            })
diff --git a/Makefile b/Makefile
@@ -1,6 +1,7 @@
 PACKAGES ?= wo,awe
 
 run:
+	# If you haven't done so yet, run: make install
 	# we need to make sure we are on the virtual env when we do this
 	cd learning_observer && python learning_observer
 

diff --git a/autodocs/development.rst b/autodocs/development.rst
@@ -11,6 +11,8 @@ Development
    :parser: myst_parser.sphinx_
 .. include:: ../modules/writing_observer/README.md
    :parser: myst_parser.sphinx_
+.. include:: ../docs/interactive_environments.md
+   :parser: myst_parser.sphinx_
 .. include:: ../docs/privacy.md
    :parser: myst_parser.sphinx_
 .. include:: ../docs/config.md
@@ -20,4 +22,4 @@ Development
 .. include:: ../docs/testing.md
    :parser: myst_parser.sphinx_
 .. include:: ../docs/technologies.md
-   :parser: myst_parser.sphinx_
+   :parser: myst_parser.sphinx_
diff --git a/docs/interactive_environments.md b/docs/interactive_environments.md
@@ -0,0 +1,96 @@
+# Interactive Environments
+
+The Learning Observer can launch itself in a variety of ways. These
+include launching itself stand-alone, or from within an IPython
+kernel. The latter allows for users to directly interact with the LO
+system. Users can connect to the kernel through the command line or a
+Jupyter clients, such as the `ipython` console, Jupyter Lab, or
+Jupyter Notebooks. This is useful for debugging or rapidly prototyping
+within the system. When starting a kernel, the Learning Observer
+application can be started alongside the kernel.
+
+## IPython Kernel Commmunications
+
+We will give an overview of the IPython kernel architecture, and how
+we fit in. First, the IPython kernel architecture:
+
+1. The `IPython` kernel handles event loops for different
+   commmunications that occur within itself.
+1. These event loops handle code requests from the user or shutdown
+   requests from a system message.
+1. The events are communicated using the [ZMQ Protocol](https://zeromq.org/).
+
+There are 5 dedicated sockets for communications where events occur:
+
+1. **Shell**: Requests for code execution from clients come in
+1. **IOPub**: Broadcast channel which includes all side effects
+1. **stdin**: Raw input from user
+1. **Control**: Dedicated to shutdown and restart requests
+1. **Heartbeat**: Ensure continuous connection
+
+Upon startup, we create a separate thread to subscribe to, monitor and
+log events on the information rich IOPub socket.
+
+## Files
+
+* The **kernel file** is [describe, provide a simplified example or link to one]
+* The **connection file** is [describe, provide a simplified example or link to one]
+
+## Launching Learning Observer from an IPython Kernel
+
+We use an
+[aiohttp runner](https://docs.aiohttp.org/en/stable/web_reference.html#running-applications)
+to serve the LO application through the internal `ipykernel.io_loop`
+[Tornado](https://www.tornadoweb.org/en/stable/) event loop. The
+runner method attaches itself to the provided event loop instead of
+the normal running method which creates a new event loop.
+
+## IPython Shell/Kernel
+
+We can startup the server as:
+
+* A kernel we connect to (e.g. from Jupyter Lab)
+* An interactive shell including a kernel
+
+```bash
+# Start an interactive shell
+python learning_observer/ --loconsole
+```
+
+The IPython kernel parses specific arguments, which we should
+block. It also does not like blank arguments (e.g. --bar).
+
+```bash
+# Start an ipython kernel
+# note: the 1 is needed to make the ipython kernel instance we launch happy
+python learning_observer/ --lokernel 1
+# this will provide you a specific kernel json file to use
+# Connect to the specified kernel
+jupyter console --existing kernel-123456.json
+```
+
+## Jupyter Clients
+
+### Connect with Jupyter
+
+Jupyter clients have a set of directories they will look for kernels in.
+We need to create the LO kernel files so the client will be able to choose the LO kernel.
+Running the LO platform once will automatically create the kernel file in the `/<virtual_env>/share/jupyter/kernels/` directory.
+
+```bash
+# run once to create the kernel file
+python learning_observer/
+# open jupyter client of your choice
+jupyter lab
+# select the LO kernel from the kernel dropdown
+```
+
+### Helpers
+
+The system offers some helpers for working with the LO platform from a Jupyter client.
+The `local_reducer.ipynb` is an example notebook where we create a simple `event_count` reducer and create a corresponding dashboard.
+This notebook calls `jupyter_helpers.add_reducer_to_lo` which handles adding your created reducer to all relavant aspects of the system.
+
+The goal here is to be able to rapidly prototype reducers, queries, and dashboards. In the longer term, we would like to be able to compile these into a module, and perhaps even inject them into a running Learning Observer system.
+
+A long-term goal in building out this file is to have a smooth pathway from research code to production dashboards, using common tools and frameworks.
diff --git a/learning_observer/learning_observer/auth/__init__.py b/learning_observer/learning_observer/auth/__init__.py
@@ -111,4 +111,4 @@ def verify_auth_precheck():
                 "If you are not planning to use Google auth (which is the case for most dev\n" + \
                 "settings), please disable Google authentication in creds.yaml by\n" + \
                 "removing the google_auth section under auth."
-            raise learning_observer.prestartup.StartupCheck(error)
+            raise learning_observer.prestartup.StartupCheck("Auth: " + error)
diff --git a/learning_observer/learning_observer/auth/roles.py b/learning_observer/learning_observer/auth/roles.py
@@ -45,6 +45,6 @@ def validate_user_lists():
                 paths.data(USER_FILES[k])
             )
             raise learning_observer.prestartup.StartupCheck(
-                f"Created a blank {k} file: static_data/{USER_FILES[k]}\n"
+                f"Created a blank {k} file: {paths.data(USER_FILES[k])}\n"
                 f"Populate it with {k} accounts."
             )
diff --git a/learning_observer/learning_observer/auth/social_sso.py b/learning_observer/learning_observer/auth/social_sso.py
@@ -41,6 +41,35 @@
 import learning_observer.constants as constants
 import learning_observer.exceptions
 
+import pmss
+# TODO the hostname setting currently expect the port
+# to specified within the hostname. We ought to
+# remove the port and instead use the port setting.
+pmss.register_field(
+    name="hostname",
+    type=pmss.pmsstypes.TYPES.hostname,
+    description="The hostname of the LO webapp. Used to redirect OAuth clients.",
+    required=True
+)
+pmss.register_field(
+    name="protocol",
+    type=pmss.pmsstypes.TYPES.protocol,
+    description="The protocol (http / https) of the LO webapp. Used to redirect OAuth clients.",
+    required=True
+)
+pmss.register_field(
+    name="client_id",
+    type=pmss.pmsstypes.TYPES.string,
+    description="The Google OAuth client ID",
+    required=True
+)
+pmss.register_field(
+    name="client_secret",
+    type=pmss.pmsstypes.TYPES.string,
+    description="The Google OAuth client secret",
+    required=True
+)
+
 
 DEFAULT_GOOGLE_SCOPES = [
     'https://www.googleapis.com/auth/userinfo.profile',
@@ -60,6 +89,20 @@
     'https://www.googleapis.com/auth/classroom.announcements.readonly'
 ]
 
+# TODO Type list is not yet supported by PMSS 4/24/24
+# pmss.register_field(
+#     name='base_scopes',
+#     type='list',
+#     description='List of Google URLs to look for.',
+#     default=DEFAULT_GOOGLE_SCOPES
+# )
+# pmss.register_field(
+#     name='additional_scopes',
+#     type='list',
+#     description='List of additional URLs to look for.',
+#     default=[]
+# )
+
 
 async def social_handler(request):
     """Handles Google sign in.
@@ -91,10 +134,10 @@ async def _google(request):
     if 'error' in request.query:
         return {}
 
-    hostname = settings.settings['hostname']
-    protocol = settings.settings.get('protocol', 'https')
+    hostname = settings.pmss_settings.hostname()
+    protocol = settings.pmss_settings.protocol()
     common_params = {
-        'client_id': settings.settings['auth']['google_oauth']['web']['client_id'],
+        'client_id': settings.pmss_settings.client_id(types=['auth', 'google_oauth', 'web']),
         'redirect_uri': f"{protocol}://{hostname}/auth/login/google"
     }
 
@@ -125,7 +168,7 @@ async def _google(request):
     url = 'https://accounts.google.com/o/oauth2/token'
     params = common_params.copy()
     params.update({
-        'client_secret': settings.settings['auth']['google_oauth']['web']['client_secret'],
+        'client_secret': settings.pmss_settings.client_secret(types=['auth', 'google_oauth', 'web']),
         'code': request.query['code'],
         'grant_type': 'authorization_code',
     })

diff --git a/learning_observer/learning_observer/cache.py b/learning_observer/learning_observer/cache.py
@@ -23,7 +23,7 @@ def connect_to_memoization_kvs():
             'key in `creds.yaml`.\n'\
             '```\nmemoization:\n  type: stub\n```\nOR\n'\
             '```\nmemoization:\n  type: redis_ephemeral\n  expiry: 60\n```'
-        raise learning_observer.prestartup.StartupCheck(error_text)
+        raise learning_observer.prestartup.StartupCheck("KVS: "+error_text)
 
 
 def async_memoization():

diff --git a/learning_observer/learning_observer/dashboard.py b/learning_observer/learning_observer/dashboard.py
@@ -8,6 +8,7 @@
 import json
 import jsonschema
 import numbers
+import pmss
 import queue
 import time
 
@@ -35,6 +36,16 @@
 import learning_observer.communication_protocol.schema
 import learning_observer.settings
 
+pmss.register_field(
+    name='dangerously_allow_insecure_dags',
+    type=pmss.pmsstypes.TYPES.boolean,
+    description='Data can be queried either by system defined execution DAGs '\
+                '(directed acyclic graphs) or user created execution DAGs. '\
+                'This is useful for developing new system queries, but should not '\
+                'be used in production.',
+    default=False
+)
+
 
 def timelist_to_seconds(timelist):
     '''
@@ -439,7 +450,7 @@ async def dispatch_named_execution_dag(dag_name, funcs):
 
 async def dispatch_defined_execution_dag(dag, funcs):
     query = None
-    if not learning_observer.settings.settings.get('dangerously_allow_insecure_dags', False):
+    if not learning_observer.settings.pmss_settings.dangerously_allow_insecure_dags():
         debug_log(await dag_submission_not_allowed())
         funcs.append(dag_submission_not_allowed())
         return query

diff --git a/learning_observer/learning_observer/google.py b/learning_observer/learning_observer/google.py
@@ -129,6 +129,8 @@ async def raw_google_ajax(runtime, target_url, **kwargs):
     request = runtime.get_request()
     url = target_url.format(**kwargs)
     user = await learning_observer.auth.get_active_user(request)
+    if constants.AUTH_HEADERS not in request:
+        raise aiohttp.web.HTTPUnauthorized(text="Please log in")  # TODO: Consistent way to flag this
 
     cache_key = "raw_google/" + learning_observer.auth.encode_id('session', user[constants.USER_ID]) + '/' + learning_observer.util.url_pathname(url)
     if settings.feature_flag('use_google_ajax') is not None:
@@ -139,8 +141,6 @@ async def raw_google_ajax(runtime, target_url, **kwargs):
                 GOOGLE_TO_SNAKE
             )
     async with aiohttp.ClientSession(loop=request.app.loop) as client:
-        if constants.AUTH_HEADERS not in request:
-            raise aiohttp.web.HTTPUnauthorized(text="Please log in")  # TODO: Consistent way to flag this
         async with client.get(url, headers=request[constants.AUTH_HEADERS]) as resp:
             response = await resp.json()
             learning_observer.log_event.log_ajax(target_url, response, request)
@@ -192,7 +192,7 @@ def connect_to_google_cache():
                     '```\ngoogle_cache:\n  type: filesystem\n  path: ./learning_observer/static_data/google\n'\
                     '  subdirs: true\n```\nOR\n'\
                     '```\ngoogle_cache:\n  type: redis_ephemeral\n  expiry: 600\n```'
-                raise learning_observer.prestartup.StartupCheck(error_text) 
+                raise learning_observer.prestartup.StartupCheck("Google KVS: " + error_text) 
 
 
 def initialize_and_register_routes(app):

diff --git a/learning_observer/learning_observer/incoming_student_event.py b/learning_observer/learning_observer/incoming_student_event.py
@@ -16,9 +16,8 @@
 import os
 import time
 import traceback
-import urllib.parse
 import uuid
-import socket
+import weakref
 
 import aiohttp
 
@@ -191,6 +190,10 @@ async def handler(request, client_event):
             filename, preencoded=True, timestamp=True)
         await pipeline(event)
 
+    # when the handler garbage collected (no more events are being passed through),
+    # close the log file associated with this connection
+    weakref.finalize(handler, log_event.close_logfile, filename)
+
     return handler
 
 
@@ -282,6 +285,8 @@ async def decode_and_log_event(events):
                 json_event = json.loads(msg.data)
             log_event.log_event(json_event, filename=filename)
             yield json_event
+        # done processing events, can close logfile now
+        log_event.close_logfile(filename)
     return decode_and_log_event
 
 
@@ -307,6 +312,7 @@ async def incoming_websocket_handler(request):
     await ws.prepare(request)
     lock_fields = {}
     authenticated = False
+    reducers_last_updated = None
     event_handler = failing_event_handler
 
     decoder_and_logger = event_decoder_and_logger(request)
@@ -334,14 +340,15 @@ async def update_event_handler(event):
         if not authenticated:
             return
 
-        nonlocal event_handler
+        nonlocal event_handler, reducers_last_updated
         if 'source' in lock_fields:
             debug_log('Updating the event_handler()')
             metadata = lock_fields.copy()
         else:
             metadata = event
         metadata['auth'] = authenticated
         event_handler = await handle_incoming_client_event(metadata=metadata)
+        reducers_last_updated = learning_observer.stream_analytics.LAST_UPDATED
 
     async def handle_auth_events(events):
         '''This method checks a single method for auth and
@@ -406,6 +413,14 @@ async def filter_blacklist_events(events):
                 await ws.send_json(bl_status)
                 await ws.close()
 
+    async def check_for_reducer_update(events):
+        '''Check to see if the reducers updated
+        '''
+        async for event in events:
+            if reducers_last_updated != learning_observer.stream_analytics.LAST_UPDATED:
+                await update_event_handler(event)
+            yield event
+
     async def pass_through_reducers(events):
         '''Pass events through the reducers
         '''
@@ -421,6 +436,7 @@ async def process_ws_message_through_pipeline():
         events = decode_lock_fields(events)
         events = handle_auth_events(events)
         events = filter_blacklist_events(events)
+        events = check_for_reducer_update(events)
         events = pass_through_reducers(events)
         # empty loop to start the generator pipeline
         async for event in events: