We are reading through some of the great papers in the history of artificial intelligence. The reading list is intentionally broad pulling from computer science, mathematics, philosophy, lingustics, and cognitive science, and we have a preference for older papers (i.e. written before the year 2000). The only other selection criterion is they have to be shortish papers (< 15 pages), so sorry, no books! The goal is broad understanding of the research that forms the foundations for present day AI, and an a deeper understanding of the context in which each paper was written.
- Paper: An Inductive Inference Machine
Author(s): Ray Solmonoff
Year: 1956
Date: July 14th, 2020
Presenter: Joe Hakim
Link: http://raysolomonoff.com/publications/indinf56.pdf
Slides: https://docs.google.com/presentation/d/14whQ2ZuEjGtz2hRvwvHKhj6eGbzKWRABUjeFxn7RhKM/edit?usp=sharing
tl;dr: A description of an algorithm that can perform prediction on grids of numbers and operations. Works by using prior examples and specific transformations thereof to produce predictions favoring 'utility' and 'consistency'. Some brief philosophical discussion towards making the algorithm more "AI-like" and theoretical properties.
Important because: Early (earliest?), truly probabilistic treatment of machine learning. - Paper: Generalization of Pattern Recognition in a Self-Organizing System
Authors: WA Clark, BG Farley
Year: 1955
Date: August 11th, 2020
Presenter: Eric Chen
Link: https://pdfs.semanticscholar.org/616b/9f5b957de2249ed1ae433b9be1bf1d45cdef.pdf
Slides: https://www.dropbox.com/s/6u8qzp0zmodjr71/08112020_journalclub.pptx
tl;dr: Clark and Farley present two experiments on the application of neural nets to the generalization of pattern recognition. The first experiment demonstrates that the net can be successfully trained (“organized”) on input patterns subjected to random variation, while the second experiment demonstrates that a trained net can successfully classify new input sequences into three classes based on observed behavior. The authors use many techniques that are similar to modern machine learning techniques.
Important because: Early application of neural nets to the generalization of pattern recognition/classification; precursor to modern machine learning techniques (noise injection, data augmentation, model ensembles) - Paper: Minds, brains, and programs
Author(s): John Searle
Year: 1980
Date: September 24th, 2020
Presenter: Matthew Lee
Link: http://cogprints.org/7150/1/10.1.1.83.5248.pdf
Slides: https://drive.google.com/file/d/1h8SESVBIBc7B8I-2fXw62hcyh1rUShQC/view?usp=sharing
tl;dr: John Searle presents his famous thought experiment, the Chinese Room Argument. The Chinese Room Argument is a position that argues against "Strong AI" or AI that truly understands. The argument is simple, a man is in a room with formal rules to correlate Chinese symbols with other Chinese symbols, when he receives an "input" script of Chinese symbols he is able to correctly map the symbols to the correct "output" answers in Chinese symbols, the man however still does not understand Chinese. This philosophical paper explores artificial intelligence and whether or not a computer is able to understand or is simply manipulating formal symbols.
Important because: A famous thought experiment, similar to the Turing test, that explores a machine capabilities to achieve human-like intelligence. - Paper: Prediction and Entropy of Printed English
Author(s): Claude Shannon
Year: 1950
Date: October 22nd, 2020
Presenter: Ben Kompa
Link: http://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf
Slides: https://docs.google.com/presentation/d/1sidRSdmpqW1uGkmEUh5NSy_1UBaXIWYHtwlONJeDulo/edit?usp=sharing
tl;dr: Shannon provides an early analysis of the entropy of English language that holds true even today. He begins by defniing the entropy of an N-gram, which is simply N letters of English. Then, he considers two careful experiments. The first experiments consists of a participant attempting to guess the next letter of a text with only one chance at guessing the letter before moving on. The second experiment allows the participant to guess as many times as necessary until the correct letter is chosen. After deriving theoretical bounds on the N-gram entropy of English, Shannon is able to use the results of the second experiment to provide upper and lower bounds on N-gram entropy that are relevant even today.
Important because: Early exploration of the entropy of langugae, releveant to language models today. - Paper: Maximum Likelihood from Incomplete Data Via the EM Algorithm
Author: A. P. Dempster, N. M. Laird and D. B. Rubin
Year: 1977
Date: June 4, 2021
Presenter: Rudraksh Tuwani
Link: http://www.markirwin.net/stat221/Refs/dlr1977.pdf
Slides: https://www.overleaf.com/read/fzysyqdprtmd
tl;dr: The paper presents EM as a general optimization framework for finding the maximum likelihood estimates in case of missing or incomplete data. The authors give a detailed derivation of the algorithm and sketch out potential applications for missing data, grouped/censored/truncated data, mixture models etc.
Important because: It is the first paper to present the EM algorithm as a general optimization procedure for a suite of problems in Statistics. - Paper: Statistical modeling: The two cultures
Author: Leo Breiman
Year: 2001
Presenter: Rudraksh Tuwani
Link: https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
Slides: https://docs.google.com/presentation/d/1I4dabXf_LKnQAZfSfbz5x3nAaiPRh4548Jd2adsfb4Q/edit?usp=sharing
tl;dr: Leo Breiman contrasts the approaches and methods of the two cultures in Statistics. The data modeling culture involves constructing a generative model for the data and subsequent analysis of the constructed model. The algorithmic modeling culture instead seeks to build a black box that can accurately predict the response from the covariates. Leo Breiman primarily advocates for the algorithmic culture, arguing that it is impossible to construct accurate generative models in most real-world scenarios. Consequently, any analysis or conclusions drawn from the generative models are likely to be wrong. However, in some situations, it may be possible to construct a reliable generative model. Ultimately, the choice of the method should be dictated by the problem at hand and not by which culture the data scientist relates with the most.
Important because: As statistics and machine learning get more intertwined, it is essential to appreciate the perspectives arising from both cultures and not get involved in culture wars.
Papers we hope to cover eventually, listed in a semi-random order:
- Paper: Computing machinery and intelligence
Author: Alan Turing
Year: 1950
Link: http://www.cse.chalmers.se/~aikmitr/papers/Turing.pdf#page=442 - Paper: Learning Representations by Back-propagating Errors
Author: David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams
Year: 1986
Link: http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf - Paper: The perceptron: a probabilistic model for information storage and organization in the brain
Author: A. P. Dempster, N. M. Laird and D. B. Rubin
Year: 1958
Link: https://www.cs.cmu.edu/~epxing/Class/10715-14f/reading/Rosenblatt.perceptron.pdf - Paper: A learning algorithm for Boltzmann machines
Author: David H Ackley, Geoffrey E Hinton, Terrence J Sejnowski
Year: 1985
Link: https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog0901_7 - Paper: Support-Vector Networks
Author: Corinna Cortes, Vladimir Vapnik
Year: 1995
Link: https://link.springer.com/content/pdf/10.1007%252FBF00994018.pdf - Paper: The use of multiple measurements in taxonomic problems
Author: RA Fisher
Year: 1936
Link: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1936.tb02137.x - Paper: An essay towards solving a problem in the doctrine of chances
Author: Thomas Bayes
Year: 1763
Link: https://www.ias.ac.in/article/fulltext/reso/008/04/0080-0088 - Paper: Prediction and entropy of printed English
Author: Claude Shannon
Year: 1951
Link: http://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf - Paper: A theory of the learnable
Author: Leslie Valiant
Year: 1984
Link: http://axon.cs.byu.edu/~dan/678/papers/Learning%20Theory/Valiant.pdf - Paper: Intelligence without representation
Author: Rodney Brooks
Year: 1996
Link: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.5683&rep=rep1&type=pdf - Paper: Elephants don't play chess
Author: Rodney Brooks
Year: 1990
Link: https://www2.cs.sfu.ca/~vaughan/teaching/894/papers/elephants.pdf - Paper: Some studies in machine learning using the game of checkers
Author: Arthur L. Samuel
Year: 1959
Link: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.368.2254&rep=rep1&type=pdf - Paper: Why the future doesn't need us
Author: Bill Joy
Year: 2000
Link: http://science.slc.edu/~jmarshall/courses/2007/fall/singularity/readings/bill_joy_wired.pdf - Paper: As we may think
Author: Vannevar Bush
Year: 1945
Link: https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/ - Paper: Deep Blue
Author: Murray Campbell, Joseph Hoane Jr., Feng-hsiung Hsu
Year: 2002
Link: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.2714&rep=rep1&type=pdf - Paper: The paradoxical success of fuzzy logic
Author:Charles Elkan
Year: 1995
Link: https://www.aaai.org/Papers/AAAI/1993/AAAI93-104.pdf - Paper: Programs with common sense
Author: John McCarthy
Year: 1959
Link: http://www-formal.stanford.edu/jmc/mcc59.pdf