Skip to content
Permalink
Browse files

initial commit

  • Loading branch information...
thomas--graf committed Jan 10, 2015
0 parents commit 8b9db392b31bebe74823fc00d07eed9ae3238cd4
Showing with 931 additions and 0 deletions.
  1. +230 −0 curriculum.mdown
  2. +31 −0 img/dot/ProgramOverview.dot
  3. +45 −0 main.tex
  4. 0 main.tex.latexmain
  5. +255 −0 mycommands.sty
  6. +187 −0 mypackages.sty
  7. +183 −0 tex/syllabus.tex
@@ -0,0 +1,230 @@
## Who am I

## How does the course fit into the program?


CompLing1 Mathematical Methods
| | |
| | |
| -------------------------
| |
| |
NLP CompLing2
|
|
-------------------------------------------------
| | | | |
| | | | |
CompPhon CompSyn CompSem Processing Learnability



## What is Computational Linguistics?

- it is not about
- computers as tools for linguistic research
- programming

- what we barely cover
- probabilistic methods (JurafskyMartin, ManningSchütze, GemanJohnson2003)

- what we do not cover, but will profit from what we talk about
- speech recognition
- OCR
- parsing
- semantic analysis
- machine translation

- language as a computational problem
- how is language computed
- cognitive
- applications (learning from the masters)
- what are its computational properties
- can we use these properties to make sense of empirical phenomena
- do linguistic domains exhibit computational differences
- are linguistic ideas about computability/economy plausible?

- readings:
Penn 2006: Symbolic Computational Linguistics
Pullum & Kornai: Mathematical Linguistics
Kornai: Mathematical Linguistics, Ch1 & 10
Savitch & Manaster-Ramer: Generative Capacity Matters
Krahmer10: Computational Linguistics and Psychology
Wilks: Computational Linguistics History

- phonology

- segments and strings
- formalizing strings
- how does formalization proceed?
- set out axioms, base terms
- define complex concepts in terms of these simpler ones
- definition must be precise enough that one can tell for any object in the domain of study whether is satisfies the definition or not
- give examples of bad definitions from literature (e.g. Norvin Richards thesis)
- why bother with formalization?
- Chomsky quote; see also my thesis; Müller's 3.7.2
- circle vs linearly ordered graph; which one is a string?
- formalization VS implementation
- python implementation is not in terms of sets with ordering function
- python makes additional distinctions (list VS string)

- string languages
- is phonology infinite?
- why we assume it nonetheless
- nonce words follow a system --> generalization
- succinctness
- Savitch paper

- dependencies
- local
- non-local (why don't we model it as local?)
- existence/absence conditions
- uniqueness conditions (tone?)
- interval conditions

- how would we code this up?
- bigrams
- k-factor; local interpretation
- conjunction of negated literals; string as model of formula
- "if you can't say it in two different ways, then you can't say it at all"
- closure properties: complementation, union, intersection, not relabeling ( b --> a; (ab)* --> (aa)* )
- local substring substitution closure
- boolean algebra of grammars
- learnability
- lattice structure
- adding probabilities
- inferring probabilities
- smoothing techniques
- probabilistic algebra (associativity --> doesn't matter if we scan left to right!)
- reading: SmithJohnson on WCFGs and PCFGs
- generalization to n-grams

- up the ladder
- strictly piecewise
- k-factor with precendece interpretation
- conjunction of negated literals
- intersection of good tails (where does this belong? check Jeff's thesis)
- locally testable (at least one)
- boolean closure
- locally threshold testable (exactly one; primary stress)
- existential quantification
- star-free
- first-order logic
- interval conditions
- counter-free languages
- reading: Pullum&Rogers () Animal Pattern Learning Experiments

- finite-state
- hidden alphabet bigrams
- automata
- automaton constructions
- complementation
- union
- intersection
- Myhill-Nerode
- regular expressions
- pumping lemma
- non-determinism
- powerset construction (size VS speed trade-off)
- mso
- connection between non-determinism and existential quantification
- phonology: primary stress in Creek and Cairene Arabic

- finite-state semantics
- describing event structure
- generalized quantifiers

- transductions
- finite-state
- subsequential
- closure properties
- application to phonology and morphology (2-level morphology)
- equivalence of SPE and OT

- Automaton-Grammar connection --> switch to trees --> syntax

- Literature
- Heinz survey papers
- Bird Computational Phonology
- BirdEllison on Autosegmental Phonology
- Heinz on Tier-local Phonology
- Heinz thesis
- McNaugton & Pappert
- KeenanMoss
- Sipser
- Kozen
- HopcroftUllman
- RegMSO equivalence (Morawietz)

- syntax

- weak generative capacity: syntax is not regular
reading: MohriSproat On A Common Fallacy
HeinzIdsardi (Science and TopiCS)

- can probabilities salvage regular models?
- hidden markov models
- yes and no
- do increase performance
- do not provide right structures for semantic interpretation
- probabilities conflate many issues
- colocation/transition probability (I shiveringly admonished his popsicle; colorless green ideas sleep furiously)
- word frequency (vex VS irritate, erudite VS educated)
- world-knowledge (I saw [a movie with Heidecker] VS I saw [a movie] [with Tim]))

- formalizing trees
- graph
- Gorn-domains

- local tree languages/CFGs
- subtree substitution closure
- feature grammars/unification
- head projection/category refinement
- tree intersection != string intersection

- recognizable tree languages
- CFL string yields (easily proved via Thatcher's theorem)
- reading: Rogers96 Strictly Local: Recognizable

- weak generative capacity: syntax is not context-free

- TAG
- MGs

- 2-step perspective

- tree transductions
- synchronous grammars
- tree transducers
- logical tree transductions
- new perspective of the T-model

- Literature
- GecsegSteinby
- Fülöp book
- Comon et al
- TAG anthology
- Kobele06
- Trautwein Computational Pitfalls
- Müller Syntax Textbook

- parsing

- complexity theory

algorithmic concepts
data structures
string
list
linked list
stack
array
hash table
adjacency matrix
adjacency list
priority queue

techniques
divide and conquer
dynamic programming, memoization
linear programming
@@ -0,0 +1,31 @@
digraph G {
Syntax [label="Syntax 1 (Lin 521)"];
Phon [label="Phonology 1 or Phonetics (Lin 522/523)"];
CompLing1 [label="Computational Linguistics 1 (Lin 537)"];
CompLing2 [label="Computational Linguistics 2 (Lin 637)"];
Methods [label="Statistics or Mathematical Methods in Linguistics (Lin 538/539)"];
CompSem [label="Computational Semantics (Lin 626)"];
CompPhon [label="Computational Phonology (Lin 627)"];
CompSyn [label="Computational Syntax (Lin 628)"];
Learning [label="Learnability (Lin 629)"];
Parsing [label="Parsing and Processing (Lin 630)"];
NLP [label="Introduction to NLP (CSE 628)"];
Speech [label="Speech Processing (CSE 542)"];
Machine [label="Machine Learning (CSE 512)"];

CompLing1 -> NLP;
CompLing1 -> CompLing2;

Syntax -> CompLing2;
Phon -> CompLing2;
Methods -> CompLing2;

CompLing2 -> CompSem;
CompLing2 -> CompPhon;
CompLing2 -> CompSyn;
CompLing2 -> Learning;
CompLing2 -> Parsing;

NLP -> Machine;
NLP -> Speech;
}
@@ -0,0 +1,45 @@
%=================================================================
% preamble
%=================================================================
\documentclass[11pt,letterpaper]{book}

\newcommand{\theauthor}{Thomas Graf}
\newcommand{\university}{Stony Brook University}
\newcommand{\emailaddress}{lin637@thomasgraf.net}
\newcommand{\coursenumber}{Lin637}
\newcommand{\coursename}{Computational Linguistics 2}
\newcommand{\thetitle}{\texorpdfstring{\coursenumber\\ \coursename}{\coursenumber --- \coursename}}
\newcommand{\thekeywords}{graduate level, lecture, computational linguistics, phonology, syntax}
\newcommand{\thedate}{}

\usepackage{mypackages}
\usepackage{mycommands}



%=================================================================
% title format
%=================================================================
\author{\theauthor}
\title{\thetitle}
\date{\thedate}

%=================================================================
% content
%=================================================================
% \includeonly{./tex/ConstituencyTests}

\begin{document}
\raggedbottom
\pagenumbering{Roman}
\maketitle
\tableofcontents
\clearpage

\include{./tex/syllabus}
\pagenumbering{arabic}

\pagestyle{empty}
% \include{./tex/h1}

\end{document}
No changes.

0 comments on commit 8b9db39

Please sign in to comment.
You can’t perform that action at this time.