<h1>CS4619: Artificial Intelligence II</h1>
<h1>Conclusions
</h1>
<h2>
    Derek Bridge<br>
    School of Computer Science and Information Technology<br>
    University College Cork
</h2>

<h1>Initialization</h1>
$\newcommand{\Set}[1]{\{#1\}}$ 
$\newcommand{\Tuple}[1]{\langle#1\rangle}$ 
$\newcommand{\v}[1]{\pmb{#1}}$ 
$\newcommand{\cv}[1]{\begin{bmatrix}#1\end{bmatrix}}$ 
$\newcommand{\rv}[1]{[#1]}$ 
$\DeclareMathOperator{\argmax}{arg\,max}$ 
$\DeclareMathOperator{\argmin}{arg\,min}$ 
$\DeclareMathOperator{\dist}{dist}$
$\DeclareMathOperator{\abs}{abs}$

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

<h1>Introduction</h1>
<ul>
    <li>In two 5-credit modules, we've only scratched the surface of AI.</li>
    <li>But today we'll step back.</li>
</ul>

<h1>Applied AI and Artificial General Intelligence</h1>
<ul>
    <li>Let's start with a quote from Elon Musk (3rd July 2021):
        <figure>
            <img src="images/musk_quote.jpeg" />
        </figure>
        and, in the same vein, <a href="https://www.theguardian.com/commentisfree/2023/dec/06/driverless-cars-future-vehicles-public-transport">a newspaper article that is pessimistic about the deploymnet of self-driving cars</a> (although the article would be more balanced if it too into account the fact that Waymo's autonomous vehicles took 700,000 trips in 2023).
    </li>
</ul>

<ul>
    <li>Rodney Brooks (robotics pioneer) gives his <i>Three Laws of AI</i>:
        <ol>
            <li>When an AI system performs a task, human observers immediately estimate its general competence in areas that seem related. Usually that estimate is wildly overinflated.</li>
         <li>Most successful AI deployments have a human somewhere in the loop (perhaps the person they are helping) and their intelligence smooths the edges.</li>
        <li>Without carefully boxing in how an AI system is deployed there is always a long tail of special cases that take decades to discover and fix. Paradoxically all those fixes are AI-complete themselves.</li>
    </ol>
</ul>

<h1>Skill-Based, Applied AI</h1>
<ul>
    <li>Everything that passes for "AI" at the moment involves building special-purpose systems capable of
        handling narrow, well-described tasks.
        <ul>
            <li>At the end of the second Reinforcement Learning lecture, we mentioned some work that looks at learning across multiple tasks. But this is in its infancy, and still quite domain-specific (e.g. multiple games).</li>
            <li>There are no AI systems that exhibit anywhere near the range of skills that you exhibit.</li>
            <li>We're still nowhere near having an AI robot that helps us in the kitchen.</li>
        </ul>
    </li>
    <li>We measure success using performance measures that quantify the skill of the system at the given task.
        <ul>
            <li>By these measures, we sometimes achieve human-level or supra-human-level performance.
            </li>
        </ul>
    </li>
    <li>Appropriately deployed, these systems are useful tools.</li>
    <li>But these systems are brittle (examples below), which is not what we expect of "intelligence".</li>
    <li>The main way we are building these systems is supervised learning, so let's focus in on that.</li>
</ul>

<h1>Problems with Supervised Learning</h1>
<ul>
    <li>Objectives:
        <ul>
            <li>Learning seeks to minimise a loss function, usually prediction error.</li>
            <li>But this function is only a proxy for what we care about in the real world.</li>
            <li>E.g. in a recommender system we might minimise star rating prediction error but what we
                care about is satisfaction (relevance, surprise, diversity, &hellip;).
            </li>
        </ul>
    </li>
    <li>Confidence:
        <ul>
            <li>We assume training, validation and test sets are representative of the population.</li>
            <li>But not many practical systems can also output how confident they are in a prediction.
                <ul>
                    <li>(The probabilities produced by an output neuron are often not a good measure of 
                        confidence &mdash; see below.)
                    </li>
                </ul>
            </li>
            <li>And, almost no systems can recognize when an unseen example falls outside the distribution on
                which the system was trained.
            </li>
        </ul>
    </li>
    <li>Robustness:
        <ul>
            <li>How vulnerable is the system to noise? or to deliberate attack?</li>
            <li>Neural networks were supposed to be more robust than more brittle technologies such as
                Decision Trees.
            </li>
            <li>But <i>adversarial examples</i>: an example formed by applying small but intentional
                perturbations to an original example, such that the adversarial example
                results in the model outputting an incorrect answer with high probability but typically
                does not change the way humans would label the example
                <figure style="display: flex" >
                    <img src="images/adv1.png" /> <img src="images/adv2.png" />
                </figure>
            </li>
            <li>Relatedly, Hosseini and Poovendran: <a href="https://labs.ece.uw.edu/nsl/papers/hossein_2017.pdf">Deep Neural Networks Do Not Recognize
Negative Images</a></li>
        </ul>
    </li>
    <li>Bias:
        <ul>
            <li>There may be intentional or unintentional bias in the choice of:
                <ul>
                    <li>the loss function;</li>
                    <li>the features and how they are preprocessed; and</li>
                    <li>the training examples.</li>
                </ul>
            </li>  
            <li>Some of the bias may be systemic (i.e. societal such as sexism, etc.); some of it may be 
                selection bias (due to deficiencies in
                the data collection process).
            </li>
            <li>Examples:
                <ul>
                    <li><a href="https://www.jefftk.com/p/detecting-tanks">https://www.jefftk.com/p/detecting-tanks</a> (this one is an urban myth)</li>
                    <li><a href="https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html">https://www.nytimes.com/2018/02/09/technology/facial-recognition-race-artificial-intelligence.html</a></li>
                    <li><a href="https://www.theguardian.com/technology/2015/jul/01/google-sorry-racist-auto-tag-photo-app">https://www.theguardian.com/technology/2015/jul/01/google-sorry-racist-auto-tag-photo-app</a></li>
                    <li><a href="https://www.theguardian.com/us-news/2018/jan/17/software-no-more-accurate-than-untrained-humans-at-judging-reoffending-risk">https://www.theguardian.com/us-news/2018/jan/17/software-no-more-accurate-than-untrained-humans-at-judging-reoffending-risk</a></li>
                    <li>From MIT Review: <a href="https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/">Hundreds of AI tools have been built to catch covid. None of them helped.</a></li>
                </ul>
            </li>
        </ul>
    </li>
    <li>Trust (is this system safe?):
        <ul>
            <li>We've implied that once you find a system whose test error is low, you deploy it.</li>
            <li>In reality, decisions to deploy are not, or should not be, so straightforward.
                <ul>
                    <li>Think about the apprenticeship served by medical students and junior doctors.</li>
                    <li>Think about all the ways that Google and others are testing their self-driving vehicles
                        <ul>
                            <li>E.g. <a href="https://cacm.acm.org/magazines/2018/2/224621-a-comprehensive-self-driving-car-test/fulltext?mobile=false">https://cacm.acm.org/magazines/2018/2/224621-a-comprehensive-self-driving-car-test/fulltext</a></li>
                        </ul>
                    </li>
                </ul>
            </li>
        </ul>
    </li>
    <li>Distribution shifts: 
        <ul>
            <li>For error (or accuracy) estimation to make sense, we assumed that our training and test sets were drawn from the same distribution.</li>
            <li>Suppose our model performs well enough on the test set that we decide to deploy it. We will be asked to make predictions for unseen examples. What is there to say that these will aso be drawn from the same distribution?</li>
            <li>There are several types of distribution shift that may mean that our model performs worse in practice than it did during its development:
                <ul>
                    <li>Covariate shift: Covariate is a fancy word for feature. So covariate shift is a shift in the distribution of feature values. For example, maybe we developed a classifier using well-centered front-on close-ups of faces but we try to use our classifier on images taken during daily use.</li>
                    <li>Label shift: This is a shift in the distribution of the targets. For example, maybe a certain disease becomes more prevalent.</li>
                    <li>Concept shift: In some domains, classes themsleves undergo redefinition. For example, classifiers that say whether an item of clothing is fashionable or not; or a classifier that says whether a joke is funny or not; or a classifier that says whether a tweet is offensive or not.</li>
                </ul>
            </li>
        </ul>
    </li>
    <li>To what extent, do interpretable models and explanations solve the problems listed above?</li>
</ul>

<h1>Are we even doing AI?</h1>
<table>
    <tr>
        <td rowspan="4" style="border: 1px solid black;"><img src="images/chollet.jpg" /></td>
        <td style="border: 1px solid black;">"...the problem of cognition has almost no overlap with supervised learning, 
             what we're currently good at -- learning to map space x to space y given a dense 
             sampling of x-cross-y"
             <a href="https://twitter.com/fchollet/status/947119817286438913?s=03">https://twitter.com/fchollet/status/947119817286438913?s=03</a>
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid black;">"... parametric models trained with gradient descent make it easy to automate something, but have little ability to deviate from the patterns they've learned. Meanwhile, the real world is full of surprises, and handling it requires the ability to adapt."
             <a href="https://twitter.com/fchollet/status/1373116543626735624">https://twitter.com/fchollet/status/1373116543626735624</a>
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid black;">"Arguably, the main challenges of cognition are not even represented in classic supervised learning 
             from big data -- exploration, goal-setting, open-endedness, abstraction, extreme generalization, 
             learning from few data points..."
             <a href="https://twitter.com/fchollet/status/947439738008547328?s=03">https://twitter.com/fchollet/status/947439738008547328?s=03</a>
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid black;">"For all the progress made, it seems like almost all important questions in AI remain unanswered. 
             Many have not even been properly asked yet."
             <a href="https://twitter.com/fchollet/status/837188765500071937">https://twitter.com/fchollet/status/837188765500071937</a>
         </td>
     </tr>
</table>
<ul>
    <li>In other words, we have become good at taking large quantities of labeled data and finding patterns in the data. We use the models that we learn in systems that exhibit skill at particular tasks.</li>
    <li>But this has little to do with intelligence.</li>
    <li>Intelligence is the ability to eficiently adapt to new tasks, not skill at specialised tasks. Why? Because the real world is always different from what you've seen so far. Intelligence evolved to cope with this.</li>
    <li>And note the point about efficiency: humans generalise quickly from very few (probably unlabeled) examples.</li>
</ul>

<h1>Artificial General Intelligence (AGI)</h1>
<ul>
    <li>AGI has many definitions &mdash; some people aren't even convinced it's a coherent concept. (AI academic Melanie Mitchell has said that the big tech companies will redfine AGI into existence! In other words, they'll keep watering down the definition until their own systems qualify.)</li>
    <li>One definition (although I doubt he'd want to use the phrase AGI) is from Chollet:
        <quote>
            The intelligence of a system is a measure of its skill-acquisition efficiency over a 
            scope of tasks, with respect to priors, experience, and generalization difficulty.
        </quote>
    </li>
    <li>In <a href="https://arxiv.org/pdf/1911.01547.pdf">https://arxiv.org/pdf/1911.01547.pdf</a>,
        Chollet has proposed ARC (Abstraction and Reasoning Challenge). This is a task + dataset (now hosted as a <a href="https://www.kaggle.com/c/abstraction-and-reasoning-challenge/overview">Kaggle competition</a>)
        <ul>
            <li>work out what relates some pairs of images; use this to complete a final pair;</li>
            <li>but the relationships require reasoning, abstraction and analogy.</li>
        </ul>
    </li>
    <li>For most of 2024, the best-scoring programs iwere able to solve about 55% of unseen ARC tasks. If you see people report higher results, then (a) they may be filtering the validation set to contain only a subset of the tasks, or (b) there may be leakage.</li>
    <li>But, at the end of 2024, OPenAI's o3 model scores 85%. Remember, o3 is an LLM Agent. This remarkable level of performance comes with two caveats: (a) this was its performance on something called the public test set and semi-private test set, whereas what we're really interested in is performance on the private test set, and (b) the competition specifies cost constraints, and these were substantialy exceeded.</li>
    <li>See <a href="https://aiguide.substack.com/p/did-openai-just-solve-abstract-reasoning">this blog post</a> for a discussion of these results.</li>
    <li>According to Harvard psychologists Spelke &amp; Kinzler, human cognition is founded, in part, on four systems for representing objects, actions, number, and space:
        <ul>
            <li>Objectness: knowledge that the world can be parsed into objects that have certain physical properties, such as traveling in continuous trajectories, being preserved through time, and interacting upon contact;</li>
            <li>Numerosity: knowledge of small quantities and notions of “smallest,” “largest,” “greater than,” “lessthan,” etc.;</li>
            <li>Basic geometry and topology: knowledge of lines, simple shapes, symmetries, containment, etc.;</li>
            <li>Agents and goal-directed behavior: knowledge that some entities are agents who have their own intentions and act to achieve goals.</li>
        </ul>
    </li>
    <li>You can argue that performing well on the tasks in Chollet's ARC requires knowledge of at least the first three of these, maybe all four, and using that knowledge to generalise. (This is argued in Moskvichev et all: <a href="https://doi.org/10.48550/arXiv.2305.07141">The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain</a>.)</li>
    <li>This leads us into a discussion of knowledge&hellip;</li>
</ul>

<h1>Common sense &mdash; the least common of all the senses</h1>
<ul>
    <li>As well as reasoning, abstraction and analogy, current AI systems lack common sense.</li>
    <li>What is common sense knowledge?
        <ul>
            <li>E.g. Marvin Minsky asks: Can you push a car with a feather?</li>
            <li>E.g. look at these photos and answer some questions: when, where, who, why, what happened
                just before, what will happen after, &hellip;
                <figure style="display: flex; align-items: flex-start">
                    <img src="images/grad.jpeg" /> <img src="images/coffee.jpeg" /> 
                </figure>
            </li>
            <li>E.g. Winograd sentences (which were the subject of a <a href="http://commonsensereasoning.org/winograd.html">competition</a>):
                <ul>
                    <li>"The town councilors refused to give the demonstrators a permit because they feared violence."<br>
                        Who feared violence? The town councilors or the demonstrators?
                    </li>
                    <li>"The town councilors refused to give the demonstrators a permit because they advocated violence."<br>
                        Who advocated violence?
                    </li>
                </ul>
            </li>
            <li>E.g. study these two stories (from Searle, John. R. (1980) Minds, brains, and programs. 
                Behavioral and Brain Sciences 3 (3): 417-457) and tell me whether the man ate the hamburger:
                <ul>
                    <li>"A man went into a restaurant and ordered a hamburger. When the hamburger arrived it 
                        was burned to a crisp, and the man stormed out of the restaurant angrily, without paying 
                        for the hamburger or leaving a tip."
                    </li>
                    <li>"A man went into a restaurant and ordered a hamburger; when the hamburger came he was 
                        very pleased with it; and as he left the restaurant he gave the waitress a large tip
                        before paying his bill."
                    </li>
                </ul>
            </li>
            <li>Intuitive physics; intuitive biology; intuitive psychology; etc:
                <ul>
                    <li>Our everyday theories of physics (objects, forces, &hellip;), biology (e.g. taxonomies),
                        psychology (beliefs, desires, intentions, sensations, emotions, &hellip;), society 
                        (posessions, crime, marriage, &hellip;) &hellip;
                    </li>
                </ul>
            </li>
        </ul>
    </li>
    <li>We use this knowledge to constrain what we learn and what we predict.</li>
    <li>Can computers acquire this knowledge? How? What are the consequences of being deficient in certain
    parts of this knowledge?
    </li>
</ul>

<p>
    Finally, let's revisit some material that was in lecture 1 of CS4618 AI 1
</p>

<h1>Is AI even possible?</h1>
<table>
    <tr>
        <td style="border: 1px solid blue; vertical-align: top; text-align: left;">
            <b>No</b>: there's a special and essential ingredient that can't be replicated, e.g. soul, spirit,
            consciousness, free will, creativity, humour, &hellip;
            <p>
            Perhaps we can <b>simulate</b> intelligence:
            </p>
            <ul>
                <li>Outwardly, systems may <em>behave as if</em> intelligent.</li>
                <li>But, because they lack the sepcial ingredient, the way they achieve this behaviour (the internal process) doesn't qualify as true
                    thinking.
                </li>
            </ul>
        </td>
        <td style="border: 1px solid blue; vertical-align: top;">
            <b>Yes</b>, we can build <b>true human-like</b> intelligence.
        </td>
        <td style="border: 1px solid blue; vertical-align: top;">
            <b>Yes</b>, we can build true intelligences but they won't necessarily be like us.<br />
            AI = <b>alien intelligence</b>.
        </td>
    </tr>
</table>
<ul>
    <li>Where do you sit in this table? Or, do you have a different view?</li>
</ul>

<h1>What are the risks?</h1>
<table style="width: 100%;">
            <tr>
                <td style="border-right-width: 0"><img style="width: 100px" src="images/hawking.jpg" /></td>
                <td style="border-left-width: 0; text-align: left;">
                            "The development of full artificial intelligence could spell the end of the human
                            race&hellip;It would take off on its own, and re-design itself at an ever 
                            increasing rate." (Stephen Hawking)
                </td>
            </tr>
            <tr>
                <td style="border-right-width: 0"><img style="width: 100px" src="images/musk.jpg" /></td>
                <td style="border-left-width: 0; text-align: left;">
                            "&hellip;the most serious threat to the survival of the human race&hellip;" (Elon Musk)
                </td>
            </tr>
            <tr>
                <td style="border-right-width: 0"><img style="width: 100px" src="images/altman.jpg" /></td>
                <td style="border-left-width: 0; text-align: left;">
                            “My worst fear is that we—the field, the technology, the industry—cause significant harm to the world.” (Sam Altman)
                </td>
            </tr>
            <!--
            <li>In March 2023, the so-called Future of Life Institute called for a pause of at least 6 months on research into AI systems of at least the complexity of GPT-4. Their <a href="https://futureoflife.org/open-letter/pause-giant-ai-experiments/">letter</a> was signed by many tech leaders and some AI scientists.</li>
            <li>In May 2023, the so-called Center for AI Safety released a statement: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.'' (Question: why no mention of climate change?) The statement was signed by many tech leaders and AI scientists, including Sam Altman, the CEO of OpenAI, the organisation responsible for ChatGPT.</li>
            -->
            <tr>
                <td style="border-right-width: 0"><img style="width: 100px" src="images/hinton.jpg" /></td>
                <td style="border-left-width: 0; text-align: left;">
                            In 2023, Geoff Hinton, great-great-grandson of George Boole, sometimes referred to as one of the "godfathers of AI" for his pioneering work in deep learning, resigned from Google so that he could "freely speak out about the risks of A.I.". He says he partly regrets his life's work, and he has spoken about AI wiping out humanity.
                </td>
            </tr>
        </table>

<h1>So what are the real risks?</h1>
<ul>
    <li>We can analyse the dangers in terms of:
        <ul>
            <li>malevolent goals, and</li>
            <li>destructive methods for achieving benevolent or malevolent goals (e.g. methods that have
                unacceptable externalities)
            </li>
        </ul>
    </li>
    <li>In the near to medium term, we should worry much less about super-intelligences that develop their own
        malevolent goals (e.g. to kill, enslave or displace us)
    </li>
    <li>Rather, we should worry about governments, corporations and individuals intentionally or 
        unintentionally building AI systems that try to achieve their goals using destructive methods<br />
        E.g.
        <ul>
            <li>so-called 'collateral damage' from autonomous weapons</li>
            <li>displacement of employment</li>
            <li>reduction in the economic, military or social value of some classes of human beings</li>
            <li>invasions of privacy</li>
            <li>'filter bubbles' or 'echo chambers'</li>
            <li>adoption or perpetuation of bias and prejudice</li>
            <li>data-intensive AI restricted to a handful of hardware-rich and data-rich corporations</li>
        </ul>
        (See lecture 1 in CS4618 for a fuller list)
    </li>
    <li>Stuart Russell, a major figure in the field, suggests we need to approach AI with a different mindset for dealing with both the shorter-term and longer-term risks. Not this:
        <ul>
            <li>Machines are intelligent to the extent that their actions can be expected to meet their objectives.</li>
        </ul>
       But this:
        <ul>
            <li>Machines are beneficial to the extent that their actions can be expected to meet our objectives.</li>
        </ul>
    </li>   
</ul>

<h1>Where Next?</h1>
<ul>
    <li>Students often ask what to look at next, after these modules. Obviously, the answer depends on what areas interest you. I can perhaps give some recommendations if you ask me. But here I just list some books/web sites that I have found to be useful. The first two have influenced the CS4618 &amp; CS4619 modules a fair bit. But the others are good too.</li>
    <li>François Chollet: <i>Deep Learning with Python (2nd edn)</i>, Manning Publications, 2021. Publisher's web site: <a href="https://www.manning.com/books/deep-learning-with-python-second-edition">https://www.manning.com/books/deep-learning-with-python-second-edition</a>; code: <a href="https://github.com/fchollet/deep-learning-with-python-notebooks">https://github.com/fchollet/deep-learning-with-python-notebooks</a></li>
    <li>Aurélien Géron: <i>Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd edn)</i>, O' Reilly, 2019. Publisher's web site: <a href="https://www.oreilly.com/library/view/hands-on-machine-learning/9781098125967/">https://www.oreilly.com/library/view/hands-on-machine-learning/9781098125967/</a>; code: <a href="https://github.com/ageron/handson-ml3">https://github.com/ageron/handson-ml3</a>.</li>
    <li>Sebastian Raschka , Yuxi (Hayden) Liu , Vahid Mirjalili: <i>Machine Learning with PyTorch and Scikit-Learn</i>, Packt Publications, 2022. Publisher's web site: <a href="https://www.packtpub.com/product/machine-learning-with-pytorch-and-scikit-learn/9781801819312">https://www.packtpub.com/product/machine-learning-with-pytorch-and-scikit-learn/9781801819312</a>; code: <a href="https://github.com/rasbt/machine-learning-book">https://github.com/rasbt/machine-learning-book</a></li>
    <li>Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola: <i>Dive into Deep Learning</i>. Interactive book: <a href="http://d2l.ai/">http://d2l.ai/</a></li>
    <li>Jeff Heaton: <i>Applications of Deep Neural Networks with Keras</i>. PDF available: <a href="https://www.heatonresearch.com/book/applications-deep-neural-networks-keras.html">https://www.heatonresearch.com/book/applications-deep-neural-networks-keras.html</a>; code: <a href="https://github.com/jeffheaton/t81_558_deep_learning">https://github.com/jeffheaton/t81_558_deep_learning</a></li>
    <li>Hironobu Suzuki: <i>The Engineer's Guide to Deep Learning</i>. Online at: <a href="https://www.interdb.jp/dl/index.html">https://www.interdb.jp/dl/index.html</a>
    </li>
    <li>Simon J.D. Prince: <i>Understanding Deep Learning</i>. Online at <a href="https://udlbook.github.io/udlbook/">https://udlbook.github.io/udlbook/</a></li>
</ul>