Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
207 lines (176 sloc) 6.99 KB

Who am I

What is Computational Linguistics?

  • what we barely cover

    • probabilistic methods (JurafskyMartin, ManningSchütze, GemanJohnson2003)
  • language as a computational problem

    • how is language computed

      • cognitive
      • applications (learning from the masters)
    • what are its computational properties

    • can we use these properties to make sense of empirical phenomena

    • do linguistic domains exhibit computational differences

    • are linguistic ideas about computability/economy plausible?

    • readings: Penn 2006: Symbolic Computational Linguistics Pullum & Kornai: Mathematical Linguistics Kornai: Mathematical Linguistics, Ch1 & 10 Savitch & Manaster-Ramer: Generative Capacity Matters Krahmer10: Computational Linguistics and Psychology Wilks: Computational Linguistics History RiggleFeatureChart

  • phonology

    • segments and strings

      • formalizing strings
      • how does formalization proceed?
        • set out axioms, base terms
        • define complex concepts in terms of these simpler ones
        • definition must be precise enough that one can tell for any object in the domain of study whether is satisfies the definition or not
        • give examples of bad definitions from literature (e.g. Norvin Richards thesis)
      • why bother with formalization?
        • Chomsky quote; see also my thesis; Müller's 3.7.2
        • circle vs linearly ordered graph; which one is a string?
        • formalization VS implementation
          • python implementation is not in terms of sets with ordering function
          • python makes additional distinctions (list VS string)
    • string languages

      • is phonology infinite?
      • why we assume it nonetheless
        • nonce words follow a system --> generalization
        • succinctness
        • Savitch paper
    • dependencies

      • local
      • non-local (why don't we model it as local?)
      • existence/absence conditions
      • uniqueness conditions (tone?)
      • interval conditions
    • how would we code this up?

      • bigrams

        • k-factor; local interpretation
        • conjunction of negated literals; string as model of formula
        • "if you can't say it in two different ways, then you can't say it at all"
        • closure properties: complementation, intersection, not union, relabeling ( b --> a; (ab)* --> (aa)* )
        • local substring substitution closure
        • boolean algebra of grammars
        • learnability
          • lattice structure
        • adding probabilities
          • inferring probabilities
          • smoothing techniques
          • probabilistic algebra (associativity --> doesn't matter if we scan left to right!)
          • reading: SmithJohnson on WCFGs and PCFGs
        • generalization to n-grams
      • up the ladder

        • strictly piecewise
          • k-factor with precendece interpretation
          • conjunction of negated literals
          • intersection of good tails (where does this belong? check Jeff's thesis)
        • locally testable (at least one)
          • boolean closure
        • locally threshold testable (exactly one; primary stress)
          • existential quantification
        • star-free
          • first-order logic
          • interval conditions
          • counter-free languages
        • reading: Pullum&Rogers () Animal Pattern Learning Experiments
      • finite-state

        • hidden alphabet bigrams
        • automata
          • automaton constructions
            • complementation
            • union
            • intersection
        • Myhill-Nerode
        • regular expressions
        • pumping lemma
        • non-determinism
        • powerset construction (size VS speed trade-off)
        • mso
        • connection between non-determinism and existential quantification
        • phonology: primary stress in Creek and Cairene Arabic
      • finite-state semantics

        • describing event structure
        • generalized quantifiers
      • transductions

        • finite-state
        • subsequential
        • closure properties
        • application to phonology and morphology (2-level morphology)
        • equivalence of SPE and OT Readings: Riggle Thesis, Generating Contenders, Violation Semirings PyPhon
      • Automaton-Grammar connection --> switch to trees --> syntax

    • Literature

      • Heinz survey papers
      • Bird Computational Phonology
      • BirdEllison on Autosegmental Phonology
      • Heinz on Tier-local Phonology
      • Heinz thesis
      • McNaugton & Pappert
      • KeenanMoss
      • Sipser
      • Kozen
      • HopcroftUllman
      • RegMSO equivalence (Morawietz)
      • Kenstowicz06 Phonology survey paper
  • syntax

    • weak generative capacity: syntax is not regular reading: MohriSproat On A Common Fallacy HeinzIdsardi (Science and TopiCS)

    • can probabilities salvage regular models?

      • hidden markov models
      • yes and no
      • do increase performance
      • do not provide right structures for semantic interpretation
      • probabilities conflate many issues
        • colocation/transition probability (I shiveringly admonished his popsicle; colorless green ideas sleep furiously)
        • word frequency (vex VS irritate, erudite VS educated)
        • world-knowledge (I saw [a movie with Heidecker] VS I saw [a movie] [with Tim]))
    • formalizing trees

      • graph
      • Gorn-domains
    • local tree languages/CFGs

      • subtree substitution closure
      • feature grammars/unification
      • head projection/category refinement
      • tree intersection != string intersection
    • recognizable tree languages

      • CFL string yields (easily proved via Thatcher's theorem)
      • reading: Rogers96 Strictly Local: Recognizable
    • weak generative capacity: syntax is not context-free

    • TAG

    • MGs

    • 2-step perspective

    • tree transductions

      • synchronous grammars
      • tree transducers
      • logical tree transductions
      • new perspective of the T-model
    • Literature

      • GecsegSteinby
      • Fülöp book
      • Comon et al
      • TAG anthology
      • Kobele06
      • Trautwein Computational Pitfalls
      • Müller Syntax Textbook
  • parsing

algorithmic concepts bigO binary search

data structures
    string
    list
    linked list
    stack
    array
    hash table
    adjacency matrix
    adjacency list
    priority queue

techniques
    divide and conquer
    dynamic programming, memoization
    linear programming