Skip to content

Reading great papers in the history of artificial intelligence and machine learning

Notifications You must be signed in to change notification settings

beamlab-hsph/journalclub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 

Repository files navigation

Important papers in the history of artificial intelligence

We are reading through some of the great papers in the history of artificial intelligence. The reading list is intentionally broad pulling from computer science, mathematics, philosophy, lingustics, and cognitive science, and we have a preference for older papers (i.e. written before the year 2000). The only other selection criterion is they have to be shortish papers (< 15 pages), so sorry, no books! The goal is broad understanding of the research that forms the foundations for present day AI, and an a deeper understanding of the context in which each paper was written.

Journal club papers to date

  • Paper: An Inductive Inference Machine
    Author(s): Ray Solmonoff
    Year: 1956
    Date: July 14th, 2020
    Presenter: Joe Hakim
    Link: http://raysolomonoff.com/publications/indinf56.pdf
    Slides: https://docs.google.com/presentation/d/14whQ2ZuEjGtz2hRvwvHKhj6eGbzKWRABUjeFxn7RhKM/edit?usp=sharing
    tl;dr: A description of an algorithm that can perform prediction on grids of numbers and operations. Works by using prior examples and specific transformations thereof to produce predictions favoring 'utility' and 'consistency'. Some brief philosophical discussion towards making the algorithm more "AI-like" and theoretical properties.
    Important because: Early (earliest?), truly probabilistic treatment of machine learning.
  • Paper: Generalization of Pattern Recognition in a Self-Organizing System
    Authors: WA Clark, BG Farley
    Year: 1955
    Date: August 11th, 2020
    Presenter: Eric Chen
    Link: https://pdfs.semanticscholar.org/616b/9f5b957de2249ed1ae433b9be1bf1d45cdef.pdf
    Slides: https://www.dropbox.com/s/6u8qzp0zmodjr71/08112020_journalclub.pptx
    tl;dr: Clark and Farley present two experiments on the application of neural nets to the generalization of pattern recognition. The first experiment demonstrates that the net can be successfully trained (“organized”) on input patterns subjected to random variation, while the second experiment demonstrates that a trained net can successfully classify new input sequences into three classes based on observed behavior. The authors use many techniques that are similar to modern machine learning techniques.
    Important because: Early application of neural nets to the generalization of pattern recognition/classification; precursor to modern machine learning techniques (noise injection, data augmentation, model ensembles)
  • Paper: Minds, brains, and programs
    Author(s): John Searle
    Year: 1980
    Date: September 24th, 2020
    Presenter: Matthew Lee
    Link: http://cogprints.org/7150/1/10.1.1.83.5248.pdf
    Slides: https://drive.google.com/file/d/1h8SESVBIBc7B8I-2fXw62hcyh1rUShQC/view?usp=sharing
    tl;dr: John Searle presents his famous thought experiment, the Chinese Room Argument. The Chinese Room Argument is a position that argues against "Strong AI" or AI that truly understands. The argument is simple, a man is in a room with formal rules to correlate Chinese symbols with other Chinese symbols, when he receives an "input" script of Chinese symbols he is able to correctly map the symbols to the correct "output" answers in Chinese symbols, the man however still does not understand Chinese. This philosophical paper explores artificial intelligence and whether or not a computer is able to understand or is simply manipulating formal symbols.
    Important because: A famous thought experiment, similar to the Turing test, that explores a machine capabilities to achieve human-like intelligence.
  • Paper: Prediction and Entropy of Printed English
    Author(s): Claude Shannon
    Year: 1950
    Date: October 22nd, 2020
    Presenter: Ben Kompa
    Link: http://languagelog.ldc.upenn.edu/myl/Shannon1950.pdf
    Slides: https://docs.google.com/presentation/d/1sidRSdmpqW1uGkmEUh5NSy_1UBaXIWYHtwlONJeDulo/edit?usp=sharing
    tl;dr: Shannon provides an early analysis of the entropy of English language that holds true even today. He begins by defniing the entropy of an N-gram, which is simply N letters of English. Then, he considers two careful experiments. The first experiments consists of a participant attempting to guess the next letter of a text with only one chance at guessing the letter before moving on. The second experiment allows the participant to guess as many times as necessary until the correct letter is chosen. After deriving theoretical bounds on the N-gram entropy of English, Shannon is able to use the results of the second experiment to provide upper and lower bounds on N-gram entropy that are relevant even today.
    Important because: Early exploration of the entropy of langugae, releveant to language models today.
  • Paper: Maximum Likelihood from Incomplete Data Via the EM Algorithm
    Author: A. P. Dempster, N. M. Laird and D. B. Rubin
    Year: 1977
    Date: June 4, 2021
    Presenter: Rudraksh Tuwani
    Link: http://www.markirwin.net/stat221/Refs/dlr1977.pdf
    Slides: https://www.overleaf.com/read/fzysyqdprtmd
    tl;dr: The paper presents EM as a general optimization framework for finding the maximum likelihood estimates in case of missing or incomplete data. The authors give a detailed derivation of the algorithm and sketch out potential applications for missing data, grouped/censored/truncated data, mixture models etc.
    Important because: It is the first paper to present the EM algorithm as a general optimization procedure for a suite of problems in Statistics.
  • Paper: Statistical modeling: The two cultures
    Author: Leo Breiman
    Year: 2001
    Presenter: Rudraksh Tuwani
    Link: https://projecteuclid.org/download/pdf_1/euclid.ss/1009213726
    Slides: https://docs.google.com/presentation/d/1I4dabXf_LKnQAZfSfbz5x3nAaiPRh4548Jd2adsfb4Q/edit?usp=sharing
    tl;dr: Leo Breiman contrasts the approaches and methods of the two cultures in Statistics. The data modeling culture involves constructing a generative model for the data and subsequent analysis of the constructed model. The algorithmic modeling culture instead seeks to build a black box that can accurately predict the response from the covariates. Leo Breiman primarily advocates for the algorithmic culture, arguing that it is impossible to construct accurate generative models in most real-world scenarios. Consequently, any analysis or conclusions drawn from the generative models are likely to be wrong. However, in some situations, it may be possible to construct a reliable generative model. Ultimately, the choice of the method should be dictated by the problem at hand and not by which culture the data scientist relates with the most.
    Important because: As statistics and machine learning get more intertwined, it is essential to appreciate the perspectives arising from both cultures and not get involved in culture wars.

Potential paper list

Papers we hope to cover eventually, listed in a semi-random order:

About

Reading great papers in the history of artificial intelligence and machine learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages