-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Educational Exercises and Activities #153
Comments
Hi Adam, Thank you for your proposal. I wanted to point out that everything you're suggesting can already be implemented using JavaScript without requiring changes to HTML standards or additional browser support. Accessing Global Objects in iframes
Custom Formats and SPARQL
Fetching Network Resources
Security Considerations
By leveraging JavaScript, you can achieve the functionality you're looking for without necessitating changes to existing HTML standards. If there are specific scenarios where these capabilities seem insufficient, please share more details so we can explore them further. |
It does appear that that initial set of features can be implemented via a set of JavaScript libraries. I have since updated the proposal with a few new ideas: (1) obtaining settings and configurations from operating systems or Web browsers (e.g., education-related service endpoints, e.g., learners' schools' servers), and (2) education-related bidirectional interprocess communication scenarios, e.g., with intelligent tutoring systems. The current conceptual model utilizes nested frames:
Adding considered MIME types:
Any thoughts on the conceptual model? As considered, collections of homework, quiz, and exam items could be gathered together into zip archives along with those multimedia resources utilized by them (e.g., hypertext, fonts, stylesheets, scripts, metadata, images, animations, 3D models, audio clips). Should this be a way to go, a feature for Web browsers would be to recognize packages' MIME type to unzip and load up their contents for display. Another detail would be to ensure that nested items could be maximized to fully utilize content display areas, and/or that fullscreen was possible, for those HTML5-based items in nested frames. It may be the case that these features are already possible with some scripting logic. I would like a bit more time to brainstorm to your question about Web browser feature ideas. Presently, ideas include: (1) obtaining learners' operating system or Web browser settings with respect to education-related service endpoints (e.g., education-related service endpoints, e.g., learners' schools' servers), and (2) education-related bidirectional interprocess communication scenarios, e.g., with intelligent tutoring system applications. |
Right, but what I was getting to was: what does the Web Platform not give you (as a primitive) to meet your requirements? Conceptually, this can't be domain specific. The web generally only deals with generalized user cases, not, say "education use cases"... those may be covered generally, however.
That would need to be weighed against user privacy. There is little reason to trust such institutions from a user's perspective - or for those institutions to trust themselves with such privacy sensitive data.
As with the first question: what can't you do over fetch, web sockets, or WebRTC or whatever? |
Traditionally, to turn in homework assignments or to hand in completed quizzes or exams, learners have provided their teachers and schools with some educational data. Beyond completed sets of items, more modern, granular forms of educational data include, but are not limited to: timing data (how long did an item or each part of an item take a learner), items' user-interface event logs, and dialogue transcripts or event logs from intelligent tutoring systems. Educational data, e.g., xAPI data, can be stored in learning record stores. Educational data can be processed and analyzed per educational data mining techniques. According to Wikipedia, applications of educational data mining include: (1) the analysis and visualization of data, (2) providing feedback for supporting instructors, (3) recommendations for students, (4) predicting student performance, (5) student modeling, (6) detecting undesirable student behaviors, (7) grouping students, (8) social network analysis, (9) developing concept maps, (10) constructing courseware, and (11) planning and scheduling. With respect to points 3, 4, and 5, there are open learner modeling and analytics to consider. In these approaches, learners can access, view, and be benefitted by their learner models, using this information to be able to better select and prioritize their practice activities. On these topics, there are preschool, kindergarten, elementary school, middle school, secondary or high school, trade and vocational school, university, and recreational and lifelong learning scenarios to consider. The topics also span sectors. Beyond academia, there are also industry (e.g., business training), public sector (e.g., government personnel training), and military domains to consider. Brainstorming to your point: there could be user permissions when learners first initialize their educational resources (e.g., websites, digital books, digital textbooks) and when they connect these to any remote services, including servers at their schools?
I have also thought about WebRTC on these topics. The following recent video shows multimodal language models seeing displayed items and learners performing on these items while simultaneously engaging in dialogue and answering questions: https://www.youtube.com/watch?v=IvXZCocyU_M . Based on that video (which shows two desktop windows), I am thinking about client-side interoperability, e.g., interprocess communication, between Web browsers displaying educational resources (e.g., websites, digital books, digital textbooks) and intelligent tutoring systems to enable new features and capabilities. |
One thing that I'm hoping to discuss is enabling content authors and developers to be able to provide data and metadata for items within nested frames in Web browsers to external connected applications, e.g., intelligent tutoring systems, on clients. Here are some more thoughts with respect to bidirectional interprocess and interapplication communication between Web browsers and other software applications. Using the following variables: var item_description = 'http://www.example.com/2024/#item-description';
var item_instructions = 'http://www.example.com/2024/#item-learner-instructions';
var item_objectives = 'http://www.example.com/2024/#item-educational-objectives';
var item_hints = 'http://www.example.com/2024/#item-hints'; and with something like: window.exportData(item_description, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportData(item_instructions, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportData(item_objectives, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportData(item_hints, 'text/json', 'en', 'data:text/json;base64,SGVsbG8sIFdvcmxkIQ=='); and/or: window.exportData(item_description, 'text/plain', 'en', my_js_callback_1);
window.exportData(item_instructions, 'text/plain', 'en', my_js_callback_2);
window.exportData(item_objectives, 'text/plain', 'en', my_js_callback_3);
window.exportData(item_hints, 'text/json', 'en', my_js_callback_4); external processes, e.g., intelligent tutoring systems, would be able to connect and detect available exported data and functions (per semantic identifiers and other content-negotiation data) and could choose to retrieve or invoke these. From the perspective of external software applications, implementation particulars for obtaining exported data from Web browsers' tabs would depend upon the operating system. There would also be a matter of enabling external software applications to detect changes in exported data or functions of interest to them, e.g., when items were completed by learners and new items were presented to them. With respect to ensuring that multiple exported data could be synchronized, e.g., that all of the available exported data refer to the same item, something like the following could be considered: window.exportOpen();
window.exportData(item_description, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportData(item_instructions, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportData(item_objectives, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportData(item_hints, 'text/json', 'en', 'data:text/json;base64,SGVsbG8sIFdvcmxkIQ==');
window.exportClose(); and/or: window.exportOpen();
window.exportData(item_description, 'text/plain', 'en', my_js_callback_1);
window.exportData(item_instructions, 'text/plain', 'en', my_js_callback_2);
window.exportData(item_objectives, 'text/plain', 'en', my_js_callback_3);
window.exportData(item_hints, 'text/json', 'en', my_js_callback_4);
window.exportClose(); Below, the sketches are refactored to show possibilities: window.interprocess.open();
window.interprocess.setExport(item_description, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.interprocess.setExport(item_instructions, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.interprocess.setExport(item_objectives, 'text/plain', 'en', 'data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==');
window.interprocess.setExport(item_hints, 'text/json', 'en', 'data:text/json;base64,SGVsbG8sIFdvcmxkIQ==');
window.interprocess.close(); and/or: window.interprocess.open();
window.interprocess.setExport(item_description, 'text/plain', 'en', my_js_callback_1);
window.interprocess.setExport(item_instructions, 'text/plain', 'en', my_js_callback_2);
window.interprocess.setExport(item_objectives, 'text/plain', 'en', my_js_callback_3);
window.interprocess.setExport(item_hints, 'text/json', 'en', my_js_callback_4);
window.interprocess.close(); There are a variety of related technologies (e.g., fetch, cross-document messaging, web sockets, WebRTC, etc.) and interprocess and interapplication communication topics have been explored previously (e.g., Web Intents). I have some additional ideas about JS API with respect to the other direction of communication, how external processes or applications might provide data properties and functions to scripts running in Web browsers. Any thoughts on a JS API for unidirectional or bidirectional interprocess and interapplication communication? |
A different approach for enabling interprocess and interapplication communication is presented below, one more directly inspired by the Web Platform. In an operating-system-dependent and -mediated manner, other processes (e.g., intelligent tutoring systems) could connect to Web browser processes and provide them with interfaces with which to enable message passing. With respect to browser-side JavaScript, something like the following would enable Web developers to get those processes connected to Web browser and then to get those window objects within the processes. var w = window.interprocess.getProcess(/* ? */).getWindow(/* ? */);
w.postMessage(...); As shown in the following example, some sort of window.interprocess.addEventListener(
"message",
(event) => {
if (event.origin !== "app://vendor/application/major.minor.build.revision/instanceNumber") return;
// ...
},
false,
); The following example shows how one could utilize a new window.interprocess.addEventListener(
"message",
(event) => {
var process = window.interprocess.getProcess(event.origin);
if(typeof process !== "undefined" && process !== null) {
if(process.about...) {
// ...
}
}
},
false,
); Any thoughts on these more Web-Platform-inspired ideas for bidirectional interprocess and interapplication communication? |
Right, browsers would never allow that because the web app could directly attack the process. I'd encourage you to take a look at how Web Authn, Payment Request API, or the new Digital Credentials API work... those shows specialized, secure, and privacy preserving approaches to talking to native applications while minimizing the risk of attacks from message passing. |
Ok, I will take a look at how Web Authn, Payment Request API, and new Digital Credentials API work towards enabling specialized, secure, and privacy-preserving bidirectional communication between intelligent tutoring systems and Web browsers presenting learners with educational exercises and activities. In these regards, thus far considered:
|
If I understand correctly, you are indicating that mutual authentication should be required for inter-application communication and that, from the browser-side, interfaces to cryptographic objects should be reused from other, existing APIs (e.g., Web Authn, Payment Request API, or Digital Credentials API). That is, examples illustrating mTLS (e.g., for Node.js) involve providing file-system paths to important resources ( Brainstorming:
|
Right, but now change “intelligent tutoring system” to “application” to generalize the use case. It sounds a little bit like #151 might have some overlap here. At the same time, enabling direct cross-process/cross-application communication is not something neither native or web apps really do (for security reasons). If you want to share data to an application, Web Share is a good option. What, specifically, would the tutoring agent do? Like, complete this sequence:
|
That is a good point about generalizing to the broader use case of secure cross-process/cross-application communication. Yes, I also see some overlap with #151. 1. Web Browser ⟶ Tutoring AgentWeb browsers are desired to be able to share data about homework items with tutoring agents. Homework items' data could be used to populate portions of LLMs' prompts or to add to multimodal dialogue contexts. In theory, relaying these data to tutoring agents could be accomplished using Web Share. Homework items might be displayed in a frame, however, and "transient activation" would, as I understand it, be needed per homework item. That is, using Web Share would mean that end-users would have to click on a 2. Tutoring Agent ⟶ Web BrowserIn theory, homework items' data sent to tutoring agents could include sets of described "landmarks" in items' accompanying multimedia resources. Capable tutoring agents could make use of these described "landmarks" to select, show, and highlight content in items' multimedia resources. For example, a mathematics homework item might have an accompanying illustration depicting a right triangle with three sides and three angles. A tutoring agent would be able to select, show, and highlight any of these six described "landmarks" in the illustration. For example, a physics homework item might have an accompanying illustration depicting two penguins, a rope, and a pulley. A tutoring agent would, similarly, be able to select, show, and highlight any of these four described "landmarks" in the illustration. As considered, beyond selecting, showing, and highlighting elements in 2D pictures, these approaches would be desired to also work with stateful multimedia, animations, and 3D graphics visualizations. 3. What Would the Tutoring Agent Do?To your question, a tutoring agent would receive data about homework items to help end-users and would be able to invoke items' interface functions, e.g., to engage in multimodal communication by intelligently selecting, showing, and highlighting described "landmarks" in items' accompanying multimedia resources. |
Sorry, please bear with me as this is new to me. Are there examples of these tutoring agents in the wild? (e.g., browser extension) or couldn't the agent just run directly on the website? (e.g., via something like WebNN) If it's just a software component, then the site could just directly interface with the model through JS. What am I missing? |
According to Wikipedia, examples of intelligent tutoring systems in the wild include: Algebra Tutor, SQL-Tutor, EER-Tutor, COLLECT-UML, StoichTutor, Mathematics Tutor, eTeacher, ZOSMAT, REALP, CIRCSIM-Tutor, Why2-Atlas, SmartTutor, AutoTutor, ActiveMath, ESC101-ITS, AdaptErrEx, GIFT, SHERLOCK, Cardiac Tutor, and CODES. There are also tutoring-related browser extensions (e.g., here). The recent Khanmigo video shows another example of the state of the art. In this video, one can see a learner and tutor agent discussing a right triangle. Generalizing, tutor agents could receive information about homework items (beyond "seeing" them) and they could also refer to, point to, and visually highlight, content in items' accompanying multimedia resources. Also, brainstorming on these "tutor-item interoperation" topics, beyond visually selecting and highlighting "landmarks" in items' accompanying multimedia resources, AI tutor agents could populate items' text input fields with learner-specified content on their behalf. That is, agents, e.g., tutors, could serve as natural-language user interfaces to Web-based content, e.g., homework items. For example, a learner might tell a tutor agent that an item's right triangle's hypotenuse was "5", and the tutor agent could enter that value into the appropriate text input field of that item. In theory, this would occur via the invocation of another natural-language-described function on the item's interface. Alternatively, tutor agents could verbally respond to learners that they were correct and indicate for them to then enter values into the appropriate text input field of the item. To your question, a tutor agent and educational content, e.g., homework items, could be provided in one browser tab. However, these software components (e.g., digital textbooks, tutoring agents) might be from separate vendors. One could also put these software components into two or more interoperating browser tabs (e.g., in separate browser windows). Based on that video, however, I thought about interprocess / interapplication communication scenarios. |
@marcoscaceres, on these topics of exploring approaches to enabling secure communications between websites and AI assistants, in this case intelligent tutors, thinking about some of the points that you raised here, I added some ideas to a new issue (#168) in its "Protocols" subsection. The ideas involve that messages exchanged between websites and AI assistants could be semantic graphs, instead of text strings or byte arrays. Then, per ontologies and shapes constraints (SHACL), developers could define messages classes which could be used to validate message instances. In addition to message classes, protocol definitions could specify rules (including time-based), valid sequences of message classes, valid state transitions, and so forth. var channel = window.assistant.openChannelForProtocol('http://example.org/2024/protocol-123/#');
if(channel != null)
{
channel.onmessage = (event) => {
switch(event.class)
{
case 'http://example.org/2024/protocol-123/#messageClass1':
messageHandler1(event.graph);
break;
case 'http://example.org/2024/protocol-123/#messageClass2':
messageHandler2(event.graph);
break;
...
}
};
channel.postMessage(...);
} Also, in addition to communicating software (e.g., websites, AI assistants) being able to validate received messages and ensure conformance with defined communication protocols, communication-mediating software (e.g., web browsers, operating systems) could, for non-encrypted communications, or for encrypted communications that they were a party to, be configured to use communication protocol definitions to ensure the conformance of communicating software with the defined communication protocols. That is, communication channel objects could ensure conformance with the communication protocols provided to them when they were created or initialized. Showcasing the history of these topics, here are some relevant Wikipedia articles:
Today, gRPC is an example of a popular open-source (interface description) language and framework for generating software for remote procedure calls, a type of request-response communication protocol. From its documentation, here is an example of a service definition: // The greeter service definition.
service Greeter {
// Sends a greeting
rpc SayHello (HelloRequest) returns (HelloReply) {}
}
// The request message containing the user's name.
message HelloRequest {
string name = 1;
}
// The response message containing the greetings
message HelloReply {
string message = 1;
} I am finding these communication protocol, process calculus, and actor model topics to be interesting, e.g., in the context of multi-agent systems (e.g., the AutoGen framework). |
Introduction
Hello. I am pleased to share some brainstorming towards advancing the state of the art with respect to educational exercises and activities, e.g., homework, quiz, and exam items, sequences of such items, and interoperability with tutoring agents.
Experience API
The Experience API (xAPI) is an e-learning software specification that records and tracks various types of learning experiences for learning systems. Learning experiences are recorded in a Learning Record Store (LRS), which can exist within traditional learning management systems (LMSs) or on their own.
See also: xAPI.js.
Items
Items, e.g., homework items, can be HTML5-based resources so as to be able to utilize hypertext, fonts, stylesheets, scripts, metadata, images, animations, 3D graphics, audio and video.
Items may stream xAPI events to one or more learner-configured LRS's as they are interacted with by learners and upon completion, e.g., as they are answered by learners.
Items should also utilize JavaScript to signal upon completion that a next item can be presented to a learner. In this completion signal, an object may be passed as an argument to be returned to the item-sequencing control logic.
Sequences of Items
Sequences of items could be bundled into OCF-based archives containing HTML5-based items and their multimedia resources.
JavaScript could be utilized to express both static and dynamic, e.g., adaptive, sequences of items. The following example shows a simple item sequence comprised of four items:
The presentation of sequences of items may not require any local or remote service to provide readers with adaptive or personalized item sequences, may optionally utilize one or more local or remote services, may require access to one or more local or remote services to function, or may operate while offline, storing educational data locally, while expecting to connect to the Internet at a later point.
See also: APIs related to navigation and session history
See also: Infrastructure for sequences of documents
Tutoring Agents
A bidirectional communication is envisioned between items and tutoring agents.
Item-to-tutor Communication
Web browsers are desired to be able to share items' events (e.g., initialization, finalization) and data with tutoring agents.
These data could be used to populate portions of LLMs' prompts or to enhance multimodal dialogue contexts.
These data could include items' natural-language educational objectives, instructions, descriptions, and hints.
These data could include sets of described "landmarks" in items' accompanying multimedia resources. Capable tutoring agents could make use of these described "landmarks" to select, show, and highlight content in items' multimedia resources.
These data could include sets of described input fields. Capable tutoring agents could make use of these to enter data obtained through dialogue into the input fields on learners' behalf.
These data could include hints for learners.
Tutor-to-item Communication
Selecting and Highlighting Items' Landmarks
A mathematics homework item, for example, might have an accompanying illustration depicting a right triangle with three sides and three angles. A tutoring agent would be able to select, show, and highlight any of these six described "landmarks" in the illustration.
A physics homework item, for example, might have an accompanying illustration depicting two penguins, a rope, and a pulley. A tutoring agent would, similarly, be able to select, show, and highlight any of these four described "landmarks" in the illustration.
These approaches would be desired to work with 2D content, stateful multimedia, animations, and 3D graphics visualizations.
Entering Data into Items' Input Fields
Tutoring agents could also select multiple-choice answers and populate items' input fields with learner-specified content on learners' behalf. That is, tutoring agents could serve as natural-language user interfaces to homework items.
A learner might, for example, verbally tell a tutoring agent that an item's right triangle's hypotenuse was "5", and that tutoring agent could enter that specified value into an appropriate input field of the item.
Other Technical Discussion Topics
Artificial Intelligence
Artificial intelligence systems could utilize homework items as training data and interact with and solve these items.
Interprocess / Interapplication Communication
A tutoring agent and educational content, e.g., homework items, could be provided in one browser tab. However, these software components (e.g., digital textbooks, tutoring agents) would probably be from separate vendors. These software components could also be provided in two or more interoperating browser tabs (e.g., in separate browser windows).
In particular if a tutoring agent was not Web-based, interprocess / interapplication communication between tutoring agents' client applications and Web browsers would benefit these educational exercises and activities scenarios.
Services
Services, e.g., one or more xAPI LRS's, could be managed by platforms and subsequently loaded by and utilized by educational resources through interfaces. In this way, learners would not have to repeatedly log on to or connect to services, e.g., LRS's, per educational resource or activity therein.
Preferences, Settings, and Configuration
Educational resources could store preferences, settings, and configuration on platforms, each having one or more (e.g., URI-based) keys, and each being available via one or more hierarchical paths (pages and nested sections of settings) which could be utilized for access control, navigational, and display purposes.
Learners could search for, retrieve, navigate to, and access (e.g., read, write) extensible preferences, settings, and configuration using their platforms' unified settings areas.
Nested Frames
As considered, sequences of items displayed to learners may make use of nested frames.
Conclusion
Thank you. Per the WICG proposal process,
I am looking forward to discussing and improving this preliminary proposal with your feedback and to finding interested collaborators to create fuller documents with which to spur innovation and to seek consensus from the community and stakeholders.
The text was updated successfully, but these errors were encountered: