Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
207 lines (176 sloc) 6.99 KB

Who am I

What is Computational Linguistics?

  • what we barely cover

    • probabilistic methods (JurafskyMartin, ManningSchütze, GemanJohnson2003)
  • language as a computational problem

    • how is language computed

      • cognitive
      • applications (learning from the masters)
    • what are its computational properties

    • can we use these properties to make sense of empirical phenomena

    • do linguistic domains exhibit computational differences

    • are linguistic ideas about computability/economy plausible?

    • readings: Penn 2006: Symbolic Computational Linguistics Pullum & Kornai: Mathematical Linguistics Kornai: Mathematical Linguistics, Ch1 & 10 Savitch & Manaster-Ramer: Generative Capacity Matters Krahmer10: Computational Linguistics and Psychology Wilks: Computational Linguistics History RiggleFeatureChart

  • phonology

    • segments and strings

      • formalizing strings
      • how does formalization proceed?
        • set out axioms, base terms
        • define complex concepts in terms of these simpler ones
        • definition must be precise enough that one can tell for any object in the domain of study whether is satisfies the definition or not
        • give examples of bad definitions from literature (e.g. Norvin Richards thesis)
      • why bother with formalization?
        • Chomsky quote; see also my thesis; Müller's 3.7.2
        • circle vs linearly ordered graph; which one is a string?
        • formalization VS implementation
          • python implementation is not in terms of sets with ordering function
          • python makes additional distinctions (list VS string)
    • string languages

      • is phonology infinite?
      • why we assume it nonetheless
        • nonce words follow a system --> generalization
        • succinctness
        • Savitch paper
    • dependencies

      • local
      • non-local (why don't we model it as local?)
      • existence/absence conditions
      • uniqueness conditions (tone?)
      • interval conditions
    • how would we code this up?

      • bigrams

        • k-factor; local interpretation
        • conjunction of negated literals; string as model of formula
        • "if you can't say it in two different ways, then you can't say it at all"
        • closure properties: complementation, intersection, not union, relabeling ( b --> a; (ab)* --> (aa)* )
        • local substring substitution closure
        • boolean algebra of grammars
        • learnability
          • lattice structure
        • adding probabilities
          • inferring probabilities
          • smoothing techniques
          • probabilistic algebra (associativity --> doesn't matter if we scan left to right!)
          • reading: SmithJohnson on WCFGs and PCFGs
        • generalization to n-grams
      • up the ladder

        • strictly piecewise
          • k-factor with precendece interpretation
          • conjunction of negated literals
          • intersection of good tails (where does this belong? check Jeff's thesis)
        • locally testable (at least one)
          • boolean closure
        • locally threshold testable (exactly one; primary stress)
          • existential quantification
        • star-free
          • first-order logic
          • interval conditions
          • counter-free languages
        • reading: Pullum&Rogers () Animal Pattern Learning Experiments
      • finite-state

        • hidden alphabet bigrams
        • automata
          • automaton constructions
            • complementation
            • union
            • intersection
        • Myhill-Nerode
        • regular expressions
        • pumping lemma
        • non-determinism
        • powerset construction (size VS speed trade-off)
        • mso
        • connection between non-determinism and existential quantification
        • phonology: primary stress in Creek and Cairene Arabic
      • finite-state semantics

        • describing event structure
        • generalized quantifiers
      • transductions

        • finite-state
        • subsequential
        • closure properties
        • application to phonology and morphology (2-level morphology)
        • equivalence of SPE and OT Readings: Riggle Thesis, Generating Contenders, Violation Semirings PyPhon
      • Automaton-Grammar connection --> switch to trees --> syntax

    • Literature

      • Heinz survey papers
      • Bird Computational Phonology
      • BirdEllison on Autosegmental Phonology
      • Heinz on Tier-local Phonology
      • Heinz thesis
      • McNaugton & Pappert
      • KeenanMoss
      • Sipser
      • Kozen
      • HopcroftUllman
      • RegMSO equivalence (Morawietz)
      • Kenstowicz06 Phonology survey paper
  • syntax

    • weak generative capacity: syntax is not regular reading: MohriSproat On A Common Fallacy HeinzIdsardi (Science and TopiCS)

    • can probabilities salvage regular models?

      • hidden markov models
      • yes and no
      • do increase performance
      • do not provide right structures for semantic interpretation
      • probabilities conflate many issues
        • colocation/transition probability (I shiveringly admonished his popsicle; colorless green ideas sleep furiously)
        • word frequency (vex VS irritate, erudite VS educated)
        • world-knowledge (I saw [a movie with Heidecker] VS I saw [a movie] [with Tim]))
    • formalizing trees

      • graph
      • Gorn-domains
    • local tree languages/CFGs

      • subtree substitution closure
      • feature grammars/unification
      • head projection/category refinement
      • tree intersection != string intersection
    • recognizable tree languages

      • CFL string yields (easily proved via Thatcher's theorem)
      • reading: Rogers96 Strictly Local: Recognizable
    • weak generative capacity: syntax is not context-free

    • TAG

    • MGs

    • 2-step perspective

    • tree transductions

      • synchronous grammars
      • tree transducers
      • logical tree transductions
      • new perspective of the T-model
    • Literature

      • GecsegSteinby
      • Fülöp book
      • Comon et al
      • TAG anthology
      • Kobele06
      • Trautwein Computational Pitfalls
      • Müller Syntax Textbook
  • parsing

algorithmic concepts bigO binary search

data structures
    linked list
    hash table
    adjacency matrix
    adjacency list
    priority queue

    divide and conquer
    dynamic programming, memoization
    linear programming