Skip to content

LearningInIT

Lorand Kedves edited this page Aug 13, 2023 · 22 revisions

Learning and information systems

We create information systems (IS) to represent a state of an other, target system (TS) so that we can investigate or control it. This requires regular "polling" of that TS or even change it to keep them in sync, that dynamism is part of our design.

However, there is another dynamism in this scenario between the IS and the TS. The TS itself, or our understanding of it may change over time, just like the tools and platforms that we used to implement our IS. If this kind of dynamism is generally not part of our design, following such changes will jeopardize the structure and reliability of the IS. But if we handle those changes properly, it is like we learn the TS by creating the IS and improve our understanding while maintaining it. The goal of information systems set by the pioneers was to support cooperative thinking of experts when dealing with unseen problems, see Vannevar Bush: As We May Think article, or the book Tools For Thought by Howard Rheingold.

The Fish Model

The following "Fish Model" is an attempt to identify the fundamental conceptual steps needed to embrace learning in active information systems. The upper, "D" row represents "Data Structures", the bottom "A" is for "Algorithms", together they together form information systems (aka Programs, according to Niklaus Wirth). The explanation of the middle A, B, C letters comes from Douglas Engelbart: A is an information system that supports an external system as is; B is all activities that examine A and allow optimization of both A and the target system; C is all activities that optimize B, which is, by nature, not related to any particular A and therefore can be used in any environment. The numbers 1-6 represent abstract concepts needed for their respective cells: 1 is the data structure needed for Engelbart A activities, 5 is how algorithms should appear for Engelbart B level, etc., the others have special roles.

D         1 - 2 - 3 
   -1   /           \ 
 -3   0   A   B   C   7
   -2   \           /
A         4 - 5 - 6

Disclaimer.

  1. The following statements are my own, not related to any organization I belong to or work for.
  2. This is not a scientific publication to start a discussion. I write flexible information systems, some of them open source. These concepts are behind the architecture and code I create, necessary to understand but could be hard to "reverse engineer" from them.
  3. I only touch related areas like the relationship of knowledge, language and text, or between learning and natural human motivations, without detailed explanations. This is a deliberately oversimplified, "in a nutshell" summary of some abstract goals behind the codes.
  4. If any time it still gets too complicated or boring, just jump to the Conclusions at the end.

Level 0: Standard information systems

In "programming as we know it", we have a programming language that allows us to put text labels on memory segments (both code and data) and refer to them from text files that control the access to them.

Text, by nature, is a persistent snapshot of knowledge at any given time, coders generally ask for proper specifications that they translate to data structures and algorithms - there is no "learning" involved at this point. When our understanding or the external systems change, the text becomes obsolete and must be updated, recompiled, tested etc.

Change is inevitable in any live environment, and the standard tools we use to create the information system limit our ability to follow the target system. This is the opposite of learning, the older the system, the more obsolete it is, the more people want to start it from scratch again. That is level 0.

Improving Data Structures

To follow changes in the structure of the data, the first realization is that using the natural source code representation is harmful. Of course, the class and function definition must be updated, but the real problem is that in order to access the data values, all calling codes use the existing constructs. Refactoring is painful even in a monolithic system, but if the updated module is public and used by unknown parties... well, they will have a problem. A fundamentally different approach is needed.

Level 1: “Outsourcing” Metadata

The first hint is that when you design the data structures and functions of your information system, you don’t need a programming language. You can completely describe those structures on a whiteboard, a specification, or in any existing system design tool like an UML editor. Then, you must translate those structures to command in often multiple environments (backend and frontend programming languages, database and web API declarations, swagger files, etc.)

Or… keep all that information in a configuration environment because it is a quite simple graph: tokens, data types, relations, multiplicity. You can create a runtime that behaves according to a UML description, without translating it to another text format. Of course, you will need to use the tokens in a source code, but you can change the data structure or store it externally (like in a JSON file), not in source code (write, compile, deploy).

The bonus of this approach is that you can do your “housekeeping”: persistence, serialization, validation, etc. based on that configuration, above the level of the current target language(s) and platform(s). This is what you get from an environment with runtime type information, reflection, or a dynamically typed language.

Level 2: Learning Metadata Runtime

With level 1, the data structure and function definitions (actually, a function parameter list can be considered a type, the function is called with a configured instance of that type) are in configuration.

The next step is if your system builds this configuration on the fly, from the data it handles. There are various existing metadata description sources, like you can query them from a database, read an XSD schema definition, a swagger file, etc. In other cases, the information is bundled together with the data, like in an XML file: namespaces, tags, attributes. In a JSON file you have the attributes but no explicit metadata (but there are extensions like JSON:API), you can have CSV files where you may have column names and can guess the data types, etc.

The point is: your running system can learn the meta information of the data it handles. Of course, this further decreases the significance of coded algorithms because you don’t know even the data structures when you write and compile your code. However, the housekeeping can still be strong, tools like a database manager, spreadsheet editor or diagram generator works the same way: they don’t know the data structures at compile time. With an embedded script executor, you can create a complete data management system on this level.

Level 3: Self-contained Metadata

Until this point, the system consists of a runtime environment that allows explicit loading, or on-the-fly learning of metadata and handle object instances and attributes according to the metadata. It allows creating housekeeping components that work purely based on the on-the-fly knowledge, probably using additional custom configurations.

However, the runtime itself is a kind of housekeeping, with its own metadata: the types and attributes that describe types, attributes, etc. This meta-meta information can be handled the same way as any other, saved and loaded from external configuration. The only problem is that the code of the runtime is necessary to boot the system, therefore, a direct connection must be created between the meta configuration and the token objects that the runtime source refers to. This code must be generated from the current configuration and compiled with the runtime algorithm.

For this price, the runtime is self-contained, a sufficient meta-meta solution is completely portable between languages and platforms. Any system that can be described by these meta-meta object set can be loaded and executed on any platform that has the runtime implemented. Similarly, the data and function structure of those systems can be changed without compilation and deployment; the side effects may be covered in the configuration of the housekeeping components.

This last limitation leads to the next section...

Improving Algorithms

Algorithms provide a controlled way to change the state of a system represented by data, they are like a recipe to execute on some raw materials to create a meal. They can be either monolithic or “spaghetti”, when one huge recipe contains all activities (like, explaining all even how to use a knife to chop the meat); or modular (like, “bake it in the oven on 120 degrees for 15 minutes”), but in this case you need a lot of extra information and embedded experience (is that degree Fahrenheit or Celsius, what pan you can use in the oven, in my oven 10 minutes baking is enough, etc.)

Modular systems in general are easier to maintain but the interaction between the components must be properly managed, hence you have recipes for that as well called “design patterns”. The first level of that modularity is creating “global” functions, thus encapsulating a specific activity (like, querying the system clock or give a warning signal), another is to create “service objects” that contain a set of closely related functions and even state of a service (like, a complex user interface or a live connection with a database).

Using the improvements in the previous sections, the metadata of both the parameter list of all functions, and the state of a service object can be handled (stored or updated) the same way as “pure data” objects. Source code (and compiled binary executable modules) only there to access this data and implement the required functionality in the current programming language and chosen tools. The following steps start from here.

Level 4: The Motherboard: Dialog Graph

Following the analogy of moving from one super-recipe to a modular approach, any activity can be modeled as an abstract graph of “materials” and “agents”. Each agent encapsulates one specific activity that it can execute if all required input materials are given in the required quantity and creates some “artificial material” that is either the result of the whole activity or input material of other agents.

A truly modular information system consists of a properly configured set of agents and data objects between them. The agent “listens to” one of more data objects; combines their content with its own state and static configuration to update other data objects that it “commits” when done. When a data object is committed, its listeners are activated. The important difference between traditional modular programming is that the agents don’t “know”: never call functions or change state of each other; they are completely self-contained by their configuration, state, input, and output data. Building a required service means creating their cooperation in a form of “Dialog”, a reactive model of the system behavior, a “white box” from the perspective of the interaction among the agents.

The system builder sees all the affected data and responsible for matching them, probably using references and plain converters along the way to align the terminology used by the different agents. Consequentially, replacing agents, plugging loggers and validators into the process flow, etc. – or following the changes in the used terminologies is transparent. The implementation dependency among agents is zero. You only need the same runtime that can execute the dialog, and the implementation of all agents that encapsulate one specific service (a web server, a log collector, serializer, persistent storage, GUI environment, etc.). Your actual system is built upon these agents and above any platform or programming language.

Level 5: The Microchips: Workflows

The previous level relies on a “black box” model of the agents, we just assume that they will do the required activities on the related data or external services. Technically, programmers write the source codes and build dynamic libraries that the runtime can load and use. The runtime solves the integration of these modules, but when the local module metadata (data structures, functions) change, these modules must be updated and recompiled.

The next step is understanding that the algorithm inside any agent can be fully described by a workflow that consists of data access, calculations, function calls and flow control operations. The programmer translates this to the code text only to send it to the compiler or interpreter that builds the graph back from the text to generate the instructions that the current platform can execute. Eliminating this text roundtrip allows a totally transparent system creation: the primary form of any algorithm is its fully configured workflow graph that the runtime can execute directly (thus having the ability of changing the algorithm on-the-fly) or generate the source code and build the executable binary from it (for better performance).

This level depends on the target environment as it must activate local service providers like a database connector or web server via custom workflows. However, they isolate the implementation details from the abstract agent definition while maintaining the total transparency of the solution, and you can follow changes of the data structures by updating the data references in the workflow.

Level 6: Self-contained Runtime

Just like with the Metadata, the algorithm runtime environment can and should be self-contained. The fundamental idea is that the runtime itself is an algorithm, can be wrapped into a few agents in source code. In this case, they can also be created as a proper set of workflows and dialogs from which the equivalent source codes can be generated.

The absolute core are the agents that provide 1: the configurative metadata and data management: instance, attribute and relation management, 2: the listening and activation mechanisms behind the dialog graphs and 3: loading binary modules and creating agent instances. However, a broader environment contains reference implementation of complex agents like serialization (for storing local configuration and data and interaction between multiple independent nodes like a backend server and a client-facing frontend), user interface, persistence (like a database), etc. Any actual system can rely on these components or implement compatible ones for better performance or more sophisticated external services.

The environment is completed by a native platform/language connector that allows source code generation, compilation, building and debugging. In this way, the system can generate itself completely on its current platform, or with parallel components, on multiple platforms.

Level 7: A Thinking Machine

I PROPOSE to consider the question, ‘Can machines think?’ This should begin with definitions of the meaning of the terms ‘machine’ and ‘think’. The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous. If the meaning of the words ‘machine’ and ‘think’ are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, ‘Can machines think?’ is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.

The new form of the problem can be described in terms of a game which we call the ‘imitation game’. It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either ‘X is A and Y is B’ or ‘X is B and Y is A’.

We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?’

This is the definition of the imitation game from Alan Turing's 1950 article, followed by a meticulous proof that over time, a machine will surely be able to fool the interrogator who will not be able to tell a human and a machine apart better than a man from a woman - no matter if that machine can think or not. Because the imitation game does not answer the question if machines can think (due to the lack of precise, objective and agreed definition of those words) but replaces it with one that at least scientifically meaningful and Turing could confidently answer with 'yes'.

A personal note here. We talk about Alan Turing! This guy was stubborn enough to keep asking ‘what is computation?’ until he came up with a mathematical formula, the Turing Machine (TM) to describe and execute any algorithm that has mathematical definition. He also went on showing that a single, special TM can contain all possible TMs, therefore this Universal Turing Machine is the only one we must create to address any problem mathematics can describe – although it may run forever. See also, singularity. This person would not waste his time on defining a technical task to solve in a 'quarterly review of psychology and philosophy' – but rather the opposite: proving that playing the imitation game is futile and dangerous. The real challenge of this article is hidden in plain sight: give a proper definition of ‘machine’ and ‘think’ on par with the UTM so that you can address the original question instead of a pointless, anthropomorph or popularity-based approach towards science and engineering.

An information system that uses a self-contained metadata structure and a runtime that is equivalent with any information system that we can create using Turing-complete programming languages, is a sufficient, constructive definition of ‘machine’. The state of this machine is represented by a network of knowledge items that hold information as the values of properly defined attributes; the algorithms can access and change any piece of information in this network, which is a proper description of ‘thinking’. The self-contained nature of both the metadata and algorithm ensures that we can store and manage our knowledge of any system that we can objectively and precisely describe, see also the definition of ‘transferrable knowledge’ by JCR Licklider. In short, this is a proper, constructive form of Artificial General Intelligence. (Of course, I am not there yet, I only have a good enough level 3 and various partial implementations of 4, 5, 6 components, but see no blocker issue with any of them. However, this area is dominated by the streetlight effect, so I continue my research in the background, behind paying jobs in the past 20+ years.)

Bonus: emergence in real life, outside the fairy tale / thriller hypes. This system is fully transparent to itself, there is no boundary of human-only source codes and activities. The knowledge is represented by graphs that can be tested for purity and performance, optimized by graph algorithms, just like human thinking can be modelled by pruning and optimizing our knowledge structures (yet in a much less transparent environment). With the extended version of the runtime that contains code generation and building, this system has no limit to improve and rebuild itself. Before coming up with the bogeyman hype of Artificial Superintelligence, even a properly selected bookshelf is more “intelligent” than any individual human being: we are not made to reliably store and manage huge amount information at any given moment. However, this system can reliably handle and even automatically improve a transparent (therefore, explainable) ”model of everything”, helping us to understand the world better, identify root causes beyond our natural motivations, intellectual limitations, idea bubbles and comfort zones. The very reason information systems were invented for, see As We May Think by Vannevar Bush, 1945 July and detailed by JCR Licklider, Libraries of the Future, 1965.

Below zero

This was a quick journey from today’s “state of the art” coding to AGI by moving towards a flexible general software architecture that embraces the learning process all along the way: design, implement, deploy, maintain, refactor, and finally recycle an information system. There are some commonly known tools, and experiments of making information system building more efficient that were not mentioned above. That is because they do not point to the right direction according to my understanding and experience, and despite all efforts and apparent benefits, they only make the situation worse. For the sake of clarity, here are the three major ones.

Level -1: Automation

The text files and configurations needed to create an information system are redundant, repetitive representation of the same knowledge and it is sensitive of various human errors. Why not automate them? Generate Java source from XSD schema definitions or SQL table scripts! Use source code annotations or pre-compilers to generate codes, scripts, web server configurations! Generate all these directly from UML diagrams! This is like realizing that we are going in the wrong direction, so let’s go faster to get over it.

This critique seems to be against the advanced Level 3 and 6 tooling that also contains text generation. The difference is that there the primary form of the metadata and algorithm is the network, the runtime can execute them directly. The text generation (SQL table scripts, source code, etc.) are necessary to use existing services like an SQL server or a highly capable optimizing compiler. The generated texts are not modified manually, no custom language tricks are needed to activate a specific implementation of an otherwise general service.

On Level -1, these texts are the primary form of the system (source codes, schema definitions, table scripts, server, and deployment configurations, etc.), initially generated but then tweaked by gurus. In other scenarios, programmers must learn the extra magic of the selected tools like annotations to create the ORM mapping and query capabilities, web service entry points, etc. and hope that they can solve all custom request with the required performance.

Level -2: Adaptation

The current, very popular area is called Machine Learning, where an information system mimics a target system so complicated that we can’t provide rules for that behavior.

We use a very simple but general information system, a complex neural network, feed it with the current operation of the target system and goals (desired or expected value of some of the attributes) to achieve. The neural network adapts to this scenario and then used to generate the goal attributes. Due to its general nature, the neural network-based tools can be used in any scenario, from anomaly detection to labeling, image recognition or text generation. These areas are extremely hard to handle with traditional information system building.

There are issues with this approach like we can’t know when it will fail, how it would behave in a fundamentally new environment or that it will never look beyond the original evaluations. But in this context, the worst is that this process is not learning, but… adaptation. This does not help us understand the target system better, nor to find mistakes in our assumptions and rules, nor finding new goals. That would be learning.

Level -3: Code Language Models

The combination of level -1 and -2 leads to the realization that programming is just a perfect translation of a request to a running source code in a selected programming language. Let’s assume that the millions of programmers have already done all possible translations and the result is available freely in public code repositories, discussions, tutorials, and samples. Let’s use it to train a neural net, feed it with the proper keywords it will pop up the most popular solution to anything. Copy-paste these segments into a common project and we have a “tried and true” running system. Forget about programmers as everything is coded already, train “prompt engineers” to dig out the answer.

Of course, there are quite a few issues with this approach. The quality of the network depends on the size and reliability of the training set, so it probably can generate quite good Java Swing GUIs – but the newer languages where the solutions depend on constantly changing modules linked by dependency managers like Maven, the size (and quality) of the compatible sample gets lower. The quality of the response also decreases with the complexity of the question, the best answers are available to problems that the “programmer” should have learned during his/her multi-year training at a proper university. That begs the question if the primary use is passing a tech interview and rule out a rival who thinks about the question and comes up with an answer outside the comfort zone of the evaluator. Similarly, the most popular, copy-pasted solutions are the mediocre, easy-to-understand ones; those that contain any human ingenuity are filtered out as an exception. The AI crawlers collect knowledge from tutorials, professional portals, etc. and provide a copy-paste-ready solution, so that less and less people turn to those sources, interact with more experienced professionals, let alone, read professional technical materials, pass exams by actual knowledge, or accept failures.

The real problem in this context is that AI can generate codes from multiple solutions to the same problem, indicating that we failed to create properly modular and reusable environment. A generic algorithm like quick sort is defined by its workflow graph, that is the implementation of the algorithm, nobody has to write it in any platform. All functions related to dates should exist in one instance, as a workflow with some parameters (like holidays in 2023 in a specified country). If we know the metadata of a system, the complete management interface can be generated on-the-fly, without coding. And so on… The fact that we use language models to generate code means that our industry accepted its own failure, concluded that building information systems is too complex for humans and gave up on learning.

Of course, I have no chance to change this momentum because it is comfortable for the participants and profitable for the “product people”, this is just a minority report for the record.

Conclusions (much less technical)

In the common discourse about building information systems, we think about mundane and boring things like computers, tablets, and phones, use terms like “Information Technology”, “Computer Science”, “Software Engineering”, etc. They feel to be very far from ideas like “knowledge management” or “artificial intelligence”, and things that we relate to the future and see in science fiction movies. This gap invites people with non-technical backgrounds in philosophy, neuroscience, business, law, international relations, politics, etc. to be well-known experts in the field of AI, make bold statements and discuss dangers related to it. To return the favor, here are some notes on the relationship between the history and future of human individual and social behavior and the advanced knowledge management and communication infrastructure, without any formal education on those fields.

Evolution and Intelligence

Following Darwin, fundamental attributes do not appear randomly in the process of evolution, they must be beneficial in the process of the natural selection. Intelligence is a fundamental human attribute but also very individual and “abstract” in nature: does not make us stronger than a python, faster than a tiger, etc. The real advantage of intelligence over physical attributes matters when we hunt for a pray with the same strength and speed. Each other. But the other side of this coin is that the chances of even the strongest and smartest individual diminishes against an organized group, which makes communication and cooperation a selection factor too. The “we against them” pattern is a very simple explanation how early mankind made thinking and talking stronger selection factors than physical attributes. However, this is a heritage that we must not forget when we talk about “human nature”: just like you have left and right legs, you have gut-level patterns with the same strength to help or harm another human being.

That would be the end of the story in the water, but on the ground and with agile hands, we could make tools that allowed us to change our environment beyond any physical human capabilities: chopping wood, digging the earth, storing water, etc. The collected knowledge of a community (“we”) became a defining factor of an individual, you are not just a human but a gatherer, hunter, toolmaker of your tribe. The tribe acts like an entity built from people, needs resources to survive, enough new members and time to train them to fulfill its roles. The size of the community and the level of knowledge are both limited by the efficiency of communication: more reliable and distant interaction allows organizing larger, therefore, stronger groups. You move from local tribes to clans, nations, multinational empires, a global control structure. A huge, organized group allows greater stability and has abundant resources to further improve knowledge and its management: you get science, religion, arts; their results are useful to create stronger bonds among “us” and more efficient weapons against “them”. Say hello to prof. Dawkins.

A global civilization inevitably has enough power to destroy itself, the only chance to survive is to think of itself as “we” (including future generations and planetary life in general), leaving the “they” category empty. Unfortunately, it must realize that all concepts that led to the global stage are optimized to force its members against “them”. The ultimate winner of the “Strongest Competitive World Model Prize” has no chance to control a cooperative global civilization, there is no step-by-step process to turn a caterpillar into a butterfly. This is where we are today, the root cause of the existential threat that we face is not a rogue Artificial Superintelligence but the combination of the human nature, the powerful infrastructure (communication, transportation, factories, weapons), and the fundamental concepts that we use to understand and describe our world. Our only chance is a total paradigm shift.

Understanding the World

First, it is not possible for any individual or group to understand the world as is. Not only because of the sensory and cognitive limitations but the fact that we must build a coherent model of the world in our minds using a current terminology that we continuously change as our scientific knowledge improves (see also, Gödel). What we really aim for is a satisfactory equilibrium between practical, usable knowledge that we can use to predict and control our environment and some kind of faith that fills the gaps, be it a religion or “believing in science”. Until the end of the cold war, each side needed to find and train the best minds with reliable knowledge to build usable weapons. Since then, the feeling of “global existential struggle between equal forces” disappeared, the result is explained as the behavioral sink (John B Calhoun), we are amusing ourselves to death (Neil Postman). The ever-increasing bandwidth, the ever-present connection, the diminishing distance between real and fake information (text, image, video) put each one of us in a situation that our brain is not prepared for: too much information, too little meaning and trust, no sense of power to change anything.

The 90/9/1 rule

Trust and abuse

Augmented Intelligence

Original goals

Neural nets adapt to current situation

Who is learning?