Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 477 lines (434 sloc) 28.102 kB
7ce3f4d @alopez Added CC license notice and credits
authored
1 ---
2 ---
958f969 @alopez First draft of course page
authored
3 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
5
7ce3f4d @alopez Added CC license notice and credits
authored
6 <!-- The two dashed lines at the top of this file ensure that it is
7 processed by Jekyll; do not remove them-->
958f969 @alopez First draft of course page
authored
8 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en-us">
9 <head>
10 <meta http-equiv="content-type" content="text/html; charset=utf-8" />
11 <title>JHU MT course</title>
12
13 <!-- Homepage CSS -->
14 <link rel="stylesheet" href="screen.css" type="text/css" media="screen, projection" />
15 </head>
16 <body>
17
18 <div class="site">
19
20 <div class="leftsidebar">
21 <p><img src="img/artsrouni.jpg" width="180"
7ce3f4d @alopez Added CC license notice and credits
authored
22 alt="mechanical brain"/><p>
23 <p><i>Georges Artrouni's mechanical brain, a translation device patented in France in 1933.
24 (Image from Corbé by way of <a href="http://www.hutchinsweb.me.uk/IJT-2004.pdf">John Hutchins</a>)</i></p>
0118656 @mjpost added assignment 0
mjpost authored
25 <hr/>
26 <p>
8acc110 @alopez Adds sidebar links
authored
27 <ul>
802551c @alopez Adds some clarification, navigation
authored
28 <li><b>Assignments</b> (<a href="leaderboard.html">Leader board</a>)
29 <ul class="real">
30 <li><a href="assignment0.html">Assignment 0</a> (not graded)</li>
31 <li><a href="hw1.html"><b>Assignment 1</b></a> (due Feb 22, midnight EST)</li>
32 </ul>
33 </li>
8acc110 @alopez Adds sidebar links
authored
34 <li><b><a href="#overview">Overview</a></b></li>
35 <li><b><a href="#grading">Grading</a></b></li>
36 <li><b><a href="#schedule">Schedule</a></b></li>
37 <li><b><a href="#software">Software</a></b></li>
38 <li><b><a href="#data">Data</a></b></li>
39 <li><b><a href="#resources">Resources</a></b></li>
40 </ul>
41 </p>
958f969 @alopez First draft of course page
authored
42 </div>
43
44 <div class="content">
8acc110 @alopez Adds sidebar links
authored
45 <a name="#overview"></a><h1>Machine Translation <font color="lightgrey">: 600.468 : Spring 2012</font></h1>
958f969 @alopez First draft of course page
authored
46 <div id="course" class="cv">
47
48 <p><b>Instructors:</b>
b03abb4 @alopez Fixed URLs after running link checker
authored
49 <a href="http://www.cs.jhu.edu/~ccb/">Chris Callison-Burch</a>,
50 <a href="http://www.cs.jhu.edu/~alopez/">Adam Lopez</a>, and
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
51 <a href="http://www.cs.jhu.edu/~post/">Matt Post</a>.
7695a60 @alopez Updated course policy and administriva
authored
52 </p>
53
54 <p><b>TAs:</b>
55 <a href="http://www.cs.jhu.edu/~jonny/">Jonny Weese</a> and
56 <a href="http://www.cs.jhu.edu/~juri/">Juri Ganitkevitch</a>.
57 </p>
58
958f969 @alopez First draft of course page
authored
59
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
60 <p><b>Time and Place:</b> Tuesdays and Thursdays from 3:00-4:15, Hackerman 320
7695a60 @alopez Updated course policy and administriva
authored
61 <p><b>Office Hours:</b> Tuesdays and Thursdays from 4:15 (immediately after class) or by appointment.</p>
2988d74 @alopez Updates lecture nots and assignment 1 links
authored
62 <div class="news">
63 <ul class="real">
2b21279 @alopez Add project description
authored
64 <li><b><a href="#project">Project Proposals Due on March 13</a></b></li>
2988d74 @alopez Updates lecture nots and assignment 1 links
authored
65 <li><b><a href="https://piazza.com/jhu/spring2012/en600468">Forum for announcements, questions, discussion, etc.</a></b></li>
66 </div>
bcaab69 @alopez Course proposal following instructor meeting
authored
67 <p><b>Level</b>: Senior undergraduate or first-year graduate.
958f969 @alopez First draft of course page
authored
68 <p><b>Course Catalog Description</b>:
69 <a href="http://translate.google.com">Google translate</a> can instantly
70 translate between any pair of over fifty human languages (for instance, from
bcaab69 @alopez Course proposal following instructor meeting
authored
71 French to English). How does it do that? Why does it make the errors that it
9cbcf3c @alopez Adds Bing Translator link
authored
72 does? And how can you build something better? Modern translation systems
73 like Google Translate and <a href="http://www.microsofttranslator.com/">Bing Translator</a>
74 <i>learn</i>
fbea1ca @alopez Added course num and book clarification
authored
75 how to translate by reading millions of words of already translated text,
bcaab69 @alopez Course proposal following instructor meeting
authored
76 and this course will show you how they work. The course covers a diverse set
77 of fundamental building blocks from linguistics, machine learning, algorithms,
78 data structures, and formal language theory, along with their application to
79 a real and difficult problem in artificial intelligence.
958f969 @alopez First draft of course page
authored
80
7695a60 @alopez Updated course policy and administriva
authored
81 <p><b>Textbook:</b>
2076bca @alopez Add link to errata page for textbook
authored
82 Some readings will be drawn from <a href="http://www.statmt.org/book/">Statistical Machine Translation</a> (<a href="http://statmt.org/book/errata.html">errata</a>)
7695a60 @alopez Updated course policy and administriva
authored
83 by <a href="http://homepages.inf.ed.ac.uk/pkoehn/">Philipp Koehn</a>
2076bca @alopez Add link to errata page for textbook
authored
84 You can read it online through the <a href="https://catalyst.library.jhu.edu/catalog/bib_3522360">JHU library</a> or
85 or purchase from <a href="http://www.amazon.com/Statistical-Machine-Translation-Philipp-Koehn/dp/0521874157">Amazon</a>.
7695a60 @alopez Updated course policy and administriva
authored
86 A more compact (but therefore less thorough) <a href="http://www.cs.jhu.edu/~alopez/papers/survey.pdf">survey</a> by
87 <a href="http://www.cs.jhu.edu/~alopez">Adam Lopez</a> is available for free. Note that the readings (and therefore the textbook)
88 aren't strictly required; what's required is that you understand the concepts and are able to apply them.</p>
89
90
958f969 @alopez First draft of course page
authored
91 <p><b>Goals</b>:
92 By the end of the course, you should have a good grasp of what goes into
bcaab69 @alopez Course proposal following instructor meeting
authored
93 a building a large-scale natural language processing system, and experience
958f969 @alopez First draft of course page
authored
94 selecting and applying diverse techniques from computer science to solve
95 real-world problems.
96
97 <p><b>Requirements</b>:
bcaab69 @alopez Course proposal following instructor meeting
authored
98 You'll need strong programming skills for homeworks and a final project.
99 Natural language processing (<a href="http://www.cs.jhu.edu/~jason/465/">465</a>)
100 is recommended, but not required.
101
958f969 @alopez First draft of course page
authored
102
8acc110 @alopez Adds sidebar links
authored
103 <a name="grading"></a><h2>Course Structure and Grading</h2>
7695a60 @alopez Updated course policy and administriva
authored
104 <h3>Homework (4 assignments, 10 points apiece)</h3>
364f79e @alopez Fixed some broken URLs
authored
105 <p>Class meetings will consist mainly of lectures, but the only way to
958f969 @alopez First draft of course page
authored
106 fully understand all of the problems that you need to solve in building a
bcaab69 @alopez Course proposal following instructor meeting
authored
107 language processing system is to go out and build one. To that end, the course
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
108 will be evaluated on four <i>competitive</i> homework assignments and a final
364f79e @alopez Fixed some broken URLs
authored
109 project.</p>
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
110
111 <p>The goal of each homework assignment will be to build a system the solves
7695a60 @alopez Updated course policy and administriva
authored
112 a well-defined subproblem of machine translation. Students will earn a passing
113 grade (7 points) by correctly implementing a standard algorithm that we specify,
114 and additional credit for building the best system according to an objective
115 metric: 6 points for the best system, 5 points for the second best, and so on.
116 To receive an A in the class, you must compete!
117 For each task we will provide a simple baseline, datasets, and metrics. The tasks
118 are:</p>
364f79e @alopez Fixed some broken URLs
authored
119
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
120 <ul>
2988d74 @alopez Updates lecture nots and assignment 1 links
authored
121 <li><a href="hw1.html"><b>Alignment</b>: given a set of translated documents that are aligned at the sentence level,
122 identify the words that are translations of each other.</a> (<b>Due Feb 22</b>)</li>
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
123 <li><b>Decoding</b>: find the most probable translation of a sentence, given a
124 translation model and a new input sentence.</li>
125 <li><b>Reranking</b>: find the most <i>accurate</i> translation of a sentence,
126 given an input and a list of ranked alternative translations.</li>
127 <li><b>Evaluation</b>: design a metric of translation accuracy that correlates
128 with human judgement..</li>
129 </ul>
130
7695a60 @alopez Updated course policy and administriva
authored
131 <h3>In-Class Presentation: Language in Ten Minutes (10 points)</h3>
132 <p>How are you going to build a machine translation system unless you know at
133 least a little bit about language? You will be required to give
134 a short presentation (~10 minutes) on a particular language <i>that you do
135 not speak natively</i>, e.g., Arabic, Chinese, Czech, Hindi, Italian, or Maltese.</p>
136
137 <p>You should prepare three to six slides for your presentation, covering
138 language facts (demographics, location, etc.) important linguistic
139 characteristics (orthography, morphology, syntax) and computational efforts
140 such as resources, tools, papers. For instance, how many entries are there
141 about the language in the <a href="http://www.mt-archive.info">MT
142 Archive?</a> and what are they generally about? Be creative and have fun.
143 Asking for help from native speakers or language experts is great. But you are
144 ultimately responsible for the presentation.</p>
145
146 <p>This assignment was inspired by <a href="http://www.nizarhabash.com/">Nizar Habash</a>.
147 You might want to browse the
148 <a href="https://sites.google.com/site/comse6998machinetranslation/language-in-10-minutes">examples
149 from his class</a>, and his list of recommended resources:
150 <a href="http://www.ethnologue.com/">Ethnologue</a>,
151 <a href="http://www.omniglot.com/">Omniglot</a>,
152 <a href="http://www.aboutworldlanguages.com/">About World Languages</a>, and the
153 <a href="http://www.mt-archive.info/">Machine Translation Archive</a>.</p>
154
155 <p>Presentations will be graded on thoroughness and clarity. What
156 did you learn from your research that was really interesting? Tell us!</p>
157
2b21279 @alopez Add project description
authored
158 <a name="project"></a>
7695a60 @alopez Updated course policy and administriva
authored
159 <h3>Final Projects (40 points)</h3>
160
364f79e @alopez Fixed some broken URLs
authored
161 <p>The final project will
bcaab69 @alopez Course proposal following instructor meeting
authored
162 be designed by the student or groups of students, with guidance from the
7ce3f4d @alopez Added CC license notice and credits
authored
163 instructors. As with the homework assignments, it should be on well-defined
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
164 problem with clearly identified input, output, and evalution, and executed
364f79e @alopez Fixed some broken URLs
authored
165 with creativity and depth.</p>
166
7695a60 @alopez Updated course policy and administriva
authored
167 <p>Towards the middle of the term you will be required to turn in a brief
168 project proposal (10 points), laying out the problem, your proposed solution, and a
169 plan for implementation and evaluation. Your final project report (20 points) should explain
170 your implementation, evaluation, and analysis, focusing on a single question:
171 What did you learn? The projects will be presented during an interactive poster
2b21279 @alopez Add project description
authored
172 session during the final exam period (10 points).</p>
173
174 <p>The project proposal should be 1-2 pages (there is no hard limit, but it
175 will take us longer to give you feedback if your proposal is long or
176 unclear) and must clearly identify:</p>
177
178 <ul class="real">
179 <li>A single question or problem related to machine translation. This should be
180 stated in the first paragraph. We strongly advise including some simple examples
181 to illustrate the question or problem.</li>
182 <li>An outline of the work to be done: how will your project answer the question
183 or attempt to solve the problem? What models and algorithms will you implement?
184 What <a href="#software">software</a> will you use?</li>
185 <li>A description of planned experiments: how will you know if the question was
186 answered or the problem was solved? You should clearly identify input, output,
187 and evaluation strategy.</li>
188 </ul>
189
190 <p>The proposal is a contract. If we give you full credit for it, that means
191 we expect you to implement it and do a good analysis of the results, and we
192 will give you full credit for the entire project if you do. If you turn in a
193 weak proposal, we will give you the opportunity to submit a revised one before
194 moving forward, but the longer you take to define your project, the less time
195 you will have to implement it, so it's in your best interest to take advantage
196 of this early checkpoint.</p>
197
198 <p>Before the proposal is due, you should make an appointment with one of the
199 instructors in order to discuss project ideas; this will enable you to submit
200 a proposal with full confidence that it will be well-received. Before meeting
201 with us, you might want to browse over topics that we'll be covering later in
202 the term, since these might suggest ideas to you. We will however give you
203 fairly wide latitude to choose a topic as long as it's related in some way to
204 translation and is technically interesting, so you should not feel restricted to
205 these topics. We can suggest topics to you in individual meetings if you're
206 stumped, but it will help us to know what your interests and strengths are, so
207 be prepared to tell us what you're curious about.</p>
208
209 <p>Groups projects of any size are permitted, but we will require an amount of
210 work that is linear in group size, so you should take into account the overhead
211 of group coordination when forming groups. Each group should turn in a single
212 proposal identifying all members. All group members will receive the same grade,
213 and you are stuck with your group members once your proposal is finalized: we
214 refuse to adjudicate stories about who did or did not contribute. Choose your
215 partners carefully.
7695a60 @alopez Updated course policy and administriva
authored
216
217 <h3>Quizzes (10 points)</h3>
218
219 <p>These are mainly designed to help us understand how well you're following along.</p>
958f969 @alopez First draft of course page
authored
220
221
8acc110 @alopez Adds sidebar links
authored
222 <a name="schedule"></a><h2>Tentative Schedule</h2>
958f969 @alopez First draft of course page
authored
223
364f79e @alopez Fixed some broken URLs
authored
224 <p>Subject to change as the term progresses.</p>
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
225 <table>
958f969 @alopez First draft of course page
authored
226 <tbody>
227 <tr bgcolor="lightgrey">
228 <td valign="top">
229 <b>Date</b>
230 </td>
231 <td valign="top">
232 <b>Topics</b>
233 </td>
234 <td valign="top">
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
235 <b>Lecturer</b>
958f969 @alopez First draft of course page
authored
236 </td>
237 <td valign="top">
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
238 <b>Readings</b> (starred readings are strongly recommended to graduate students)
958f969 @alopez First draft of course page
authored
239 </td>
240 </tr>
241
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
242 <script type="text/javascript">
243 var dates = [
244 "Jan 31",
245 "Feb 2",
246 "Feb 7",
247 "Feb 9",
248 "Feb 14",
249 "Feb 16",
250 "Feb 21",
251 "Feb 23",
252 "Feb 28",
253 "Mar 1",
254 "Mar 6",
255 "Mar 8",
256 "Mar 13",
257 "Mar 15",
258 "Mar 27",
259 "Mar 29",
260 "Apr 3",
261 "Apr 5",
262 "Apr 10",
263 "Apr 12",
264 "Apr 17",
265 "Apr 19",
266 "Apr 24",
267 "Apr 26",
268 "May 1",
269 "May 3"
270 ];
271 var topics = [
9c12905 @alopez Adds Keynote version of Day 1 slides
authored
272 ['Introduction<br/> <a href="slides/JHU_MT_lecture_2012-01-31.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-01-31.key">[keynote]</a>',
51b58fb @alopez A small but important difference
authored
273 "All",
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
274 ['Koehn chapter 1',
839a69e @alopez Adds Day 1 slides and link to Knight's '97 AIMag paper
authored
275 'Knight, <a href="http://www.isi.edu/natural-language/mt/aimag97.pdf">Automating Knowledge Acquisition for Machine Translation</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
276 '* Weaver, <a href="http://www.mt-archive.info/Weaver-1949.pdf">Translation</a>',
277 '* Kay, <a href="http://www.stanford.edu/~mjkay/CurrentState.pdf">Translation</a>',
278 ]],
dfe4991 @alopez Adds Feb 2 lecture slides
authored
279 ['Probability and Language Models<br/> <a href="slides/JHU_MT_lecture_2012-02-02.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-02.key">[keynote]</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
280 "Lopez",
281 ['Koehn chapters 2 and 7']],
f4490c6 @alopez Fixes typo
authored
282 ['Learning Translation Models: Word Alignment<br/> <a href="slides/JHU_MT_lecture_2012-02-07.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-07.key">[keynote]</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
283 "Lopez",
284 ['Koehn chapter 3',
d502fae @alopez Added links for IBM Models
authored
285 '<a href="http://www.isi.edu/natural-language/mt/wkbk.rtf">Kevin Knight&rsquo;s tutorial on the IBM Models and EM</a>',
286 '<a href="http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/ibm12.pdf">Michael Collins&rsquo; notes on the IBM Models</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
287 '* Brown et al., <a href="http://aclweb.org/anthology-new/J/J90/J90-2002.pdf">A Statistical Approach to Machine Translation</a>'
288 ]],
05d074b @alopez Adds Feb 9 slides
authored
289 ['Learning Better Translation Models<br/><a href="slides/JHU_MT_lecture_2012-02-09.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-09.key">[keynote]</a><br/>Language in 10 minutes: Afrikaans<br/><a href="slides/Afrikaans.pdf">[pdf]</a> <a href="slides/Afrikaans.key">[keynote]</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
290 "Lopez",
291 ['Koehn chapter 4',
36befde @alopez Adds Brown et al. 1993
authored
292 '* Brown et al. <a href="http://aclweb.org/anthology-new/J/J93/J93-2003.pdf">The Mathematics of Statistical Machine Translation: Parameter Estimation</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
293 '* Vogel and Ney, <a href="http://aclweb.org/anthology-new/C/C96/C96-2141.pdf">HMM-based word alignment in statistical translation</a>',
294 '* Liang et al., <a href="http://aclweb.org/anthology-new/N/N06/N06-1014.pdf">Alignment by Agreement</a>'
295 ]],
f733d8a @mjpost stack-decoder --> word-decoder
mjpost authored
296 ['Decoding: Predicting Translations<br/> <a href="slides/JHU_MT_lecture_2012-02-14.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-14.key">[keynote]</a> <a href="http://github.com/mjpost/word-decoder">[code]</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
297 "Post",
5e855f3 @mjpost swapped order of decoding readings
mjpost authored
298 ['Koehn chapter 6',
299 '* Germann et al., <a href="http://aclweb.org/anthology-new/P/P01/P01-1030.pdf">Fast Decoding and Optimal Decoding for Machine Translation</a>'
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
300 ]],
cca3b3d @mjpost updated main page and slide PDFs
mjpost authored
301 ['Decoding continued<br/> <a href="slides/JHU_MT_lecture_2012-02-16.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-16.key">[keynote]</a> <a href="http://cs.jhu.edu/~post/mt-class/stack-decoder/">[live demo]</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
302 "Post",
5e855f3 @mjpost swapped order of decoding readings
mjpost authored
303 ['* Knight, <a href="http://aclweb.org/anthology-new/J/J99/J99-4005.pdf">Decoding Complexity in Word-Replacement Translation Models</a>'
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
304 ]],
c4e745f @alopez Add recent lectures
authored
305 ['Phrase-based Models<br/> <a href="slides/JHU_MT_lecture_2012-02-21.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-21.key">[keynote]</a>', "Lopez",
d15d53e @alopez Update readings for phrase-based models and reorder lectures
authored
306 ['Koehn sections 5.1-5.2',
307 '* Koehn et al., <a href="http://aclweb.org/anthology-new/N/N03/N03-1017.pdf">Statistical Phrase-Based Translation</a>',
308 '* Marcu and Wong, <a href="http://www.aclweb.org/anthology-new/W/W02/W02-1018.pdf">A Phrase-Based,Joint Probability Model for Statistical Machine Translation</a>',
309 '* DeNero et al., <a href="http://aclweb.org/anthology-new/D/D08/D08-1033.pdf">Sampling Alignment Structure under a Bayesian Translation Model</a>'
310 ]],
9c5e9e3 @callison-burch Added link to evaluation lecture slides.
callison-burch authored
311 ['Evaluating Translation Systems<br/><a href="slides/JHU_MT_lecture_2012-02-23.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-23.key">[keynote]</a><br/>Language in 10 minutes: German <a href="slides/German.pdf">[pdf]</a> <a href="slides/German.key">[ppt]</a>',
64aae54 @mjpost added german slides
mjpost authored
312 "Callison-Burch",
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
313 ['Koehn chapter 8',
314 '* Papineni et al., <a href="http://aclweb.org/anthology-new/P/P02/P02-1040.pdf">Bleu: a Method for Automatic Evaluation of Machine Translation</a>',
315 '* Callison-Burch et al., <a href="http://aclweb.org/anthology-new/E/E06/E06-1032.pdf">Re-Evaluating the Role of BLEU in Machine Translation Research</a>'
316 ]],
c4e745f @alopez Add recent lectures
authored
317 ['Feature-Based Models<br/> <a href="slides/JHU_MT_lecture_2012-02-28.pdf">[pdf]</a> <a href="slides/JHU_MT_lecture_2012-02-28.key">[keynote]</a>', "Lopez",
318 ['Koehn chapters 9'
319 ]],
320 ["Loss-Sensitive Training of Feature-Based Models", "Lopez",
321 ['* Och, <a href="http://aclweb.org/anthology-new/P/P03/P03-1021.pdf">Minimum Error Rate Training in Statistical Machine Translation</a>',
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
322 '* Hopkins and May, <a href="http://aclweb.org/anthology-new/D/D11/D11-1125.pdf">Tuning as Ranking</a>'
323 ]],
324 ["Weighted Automata", "Lopez",
325 ['Mohri, <a href="http://www.cs.nyu.edu/~mohri/pub/cl1.pdf">Finite-State Transducers in Language and Speech Processing.</a>'
326 ]],
327 ["Modeling Translation with Weighted Automata", "Lopez",
364f79e @alopez Fixed some broken URLs
authored
328 ['Knight and Al-Onaizan, <a href="http://www.isi.edu/natural-language/mt/mt-wfst.ps">Translation with Finite-State Devices</a>',
329 '* Kumar et al., <a href="http://mi.eng.cam.ac.uk/~wjb31/ppubs/ttmjnle.pdf"> A weighted finite state transducer translation template model for statistical machine translation</a>'
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
330 ]],
331 ["Syntax-based Translation", "Callison-Burch",
332 ['Collins et al., <a href="http://aclweb.org/anthology-new/P/P05/P05-1066.pdf">Clause Restructuring for Statistical Machine Translation</a>',
333 'Chiang, <a href="http://isi.edu/~chiang/papers/synchtut.pdf">An Introduction to Synchronous Grammars</a>',
334 '* Wu, <a href="http://aclweb.org/anthology-new/J/J97/J97-3002.pdf">Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora</a>',
335 ]],
336 ["Syntax-based Translation", "Callison-Burch",
337 ['Yamada and Knight, <a href="http://aclweb.org/anthology-new/P/P01/P01-1067.pdf">Syntax-based Statistical Machine Translation</a>',
338 'Fox, <a href="http://www.aclweb.org/anthology-new/W/W02/W02-1039.pdf">Phrasal Cohesion and Statistical Machine Translation</a>',
339 '* Galley et al., <a href="http://aclweb.org/anthology-new/N/N04/N04-1035.pdf">What&rsquo;s in a Translation Rule?</a>'
340 ]],
341 ["Syntax-based Decoding", "Post",
342 [
343 'Chiang, <a href="http://aclweb.org/anthology-new/J/J07/J07-2003.pdf">Hierarchical Phrase-based Translation</a>',
344 '* Iglesias et al., <a href="http://aclweb.org/anthology-new/N/N09/N09-1049.pdf">Hierarchical Phrase-Based Translation with Weighted Finite State Transducers</a>',
345 '* Iglesias et al., <a href="http://aclweb.org/anthology-new/D/D11/D11-1127.pdf">Hierarchical Phrase-Based Translation Representations</a>'
346 ]],
347 ["Tree Automata", "Lopez",
348 ["TBD"]],
349 ["Morphology and Translation", "Post",
350 ["TBD"]],
351 ["Creative Data Collection", "Callison-Burch",
352 ['Oard et al., <a href="http://aclweb.org/anthology-new/N/N03/N03-2026.pdf">Desperately Seeking Cebuano</a>',
353 'Munteanu and Marcu, <a href="http://aclweb.org/anthology-new/J/J05/J05-4003.pdf">Improving Machine Translation Performance by Exploiting Non-Parallel Corpora</a>'
354 ]],
355 ["More Creative Data Collection", "Callison-Burch",
356 [
357 'Callison-Burch, <a href="http://aclweb.org/anthology-new/D/D09/D09-1030.pdf">Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk</a>'
358 ]],
359 ["Guest Lecture: Translation without Parallel Corpora", "Irvine and Smith",
360 []],
361 ["Syntax-based Language Models", "Post",
362 ["TBD"
363 ]],
364 ["Applications of Word Alignment", "Post",
365 ["TBD"
366 ]],
367 ["Representing Huge Translation Models", "Lopez",
368 ['Callison-Burch et al., <a href="http://aclweb.org/anthology-new/P/P05/P05-1032.pdf">Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases</a>',
369 'Lopez, <a href="http://aclweb.org/anthology-new/D/D07/D07-1104.pdf">Hierarchical Phrase-Based Translation with Suffix Arrays</a>'
370 ]],
371 ["Representing Huge Language Models", "Lopez",
372 ['Brants et al., <a href="http://aclweb.org/anthology-new/D/D07/D07-1090.pdf">Large Language Models in Machine Translation</a>',
373 'Talbot and Brants, <a href="http://aclweb.org/anthology-new/P/P08/P08-1058.pdf">Randomized Language Models via Perfect Hash Functions</a>'
374 ]],
375 ["Paraphrasing", "Callison-Burch",
376 ['Callison-Burch et al., <a href="http://aclweb.org/anthology-new/N/N06/N06-1003.pdf">Improved Statistical Machine Translation Using Paraphrases</a>',
377 ]],
378 ["Domain Adaptation", "TBD",
379 ["TBD"]]
380 ];
381 var assignments = {
7695a60 @alopez Updated course policy and administriva
authored
382 "Feb 21" : "Homework 1 due (tentative)"
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
383 }
384 for (i=0; i<dates.length; i++){
385 document.write('<tr');
386 if (i%2==1){ document.write(' bgcolor="lightblue"'); }
387 document.write('>');
388 document.write('<td>' + dates[i] + '</td>');
389 if (i<topics.length){
390 document.write('<td>' + topics[i][0] + '</td>');
391 document.write('<td>' + topics[i][1] + '</td>');
392 document.write('<td>');
393 document.write('<ul>');
394 if (topics[i].length >2){
395 for (j=0; j<topics[i][2].length; j++){
396 document.write('<li>' + topics[i][2][j] + '</li>');
397 }
398 }
399 document.write('</ul>');
400 document.write('</td>');
401 }
402 document.write('</tr>');
403 }
404 </script>
405
958f969 @alopez First draft of course page
authored
406 </tbody>
407 </table>
408
8acc110 @alopez Adds sidebar links
authored
409 <a name="software"></a><h2>Software</h2>
958f969 @alopez First draft of course page
authored
410 State-of-the-art translation algorithms are implemented in a number of
411 open-source projects. The most popular of these are listed below.
412 They are all actively maintained and have significant userbases.
413 You are free to use and extend these tools (or others) in devising your
414 final project.
415 <ul>
416 <li><a href="http://cs.jhu.edu/~ccb/joshua/">Joshua</a>: a translation toolkit for syntax-based translation, developed at Johns Hopkins (Java).</li>
417 <li><a href="http://www.statmt.org/moses/">Moses</a>: a widely-used toolkit implementing most major translation algorithms (C++).</li>
418 <li><a href="http://www.cdec-decoder.org">cdec</a>: a fast decoder for a variety of translation models (C++).</li>
419 <li><a href="http://kheafield.com/code/kenlm/">KenLM</a>: a fast language-modeling toolkit, can be used with the above systems (C++).</li>
420 <li><a href="http://www.speech.sri.com/projects/srilm/">SRI-LM</a>: a widely-used language modeling toolkit with many features, used with the above systems (C++).</li>
421 <li><a href="http://code.google.com/p/giza-pp/">Giza++</a>: a widely-used word alignment toolkit, originally developed at a Johns Hopkins summer workshop (C++).</li>
b03abb4 @alopez Fixed URLs after running link checker
authored
422 <li><a href="http://code.google.com/p/berkeleyaligner/">Berkeley Aligner</a>: a robust Java implementation of several innovative alignment algorithms (Java).</li>
958f969 @alopez First draft of course page
authored
423 </ul>
424
8acc110 @alopez Adds sidebar links
authored
425 <a name="data"></a><h2>Data</h2>
958f969 @alopez First draft of course page
authored
426 Modern machine translation systems work by learning from large amounts
427 of data. Many datasets are freely available. You should use whatever data
428 is appropriate to the problem that you decide to work on for your project.
429 <ul>
430 <li><a href="http://www.statmt.org/wmt11/translation-task.html">Machine Translation workshop 2011 shared task data</a>, used in research evaluations (French-English, Spanish-English, Czech-English, Haitian Creole-English).</li>
431 <li><a href="http://langtech.jrc.it/JRC-Acquis.html">JRC-Acquis</a>, legislative text of the European Union (22 European languages).</li>
432 <li><a href="http://statmt.org/europarl/">Europarl</a>, proceedings of the European Parliament (22 European languages).</li>
433 <li><a href="http://www.isi.edu/natural-language/download/hansard/">Canadian Hansards</a>, proceedings of the Canadian Parliament (French and English).</li>
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
434 <li><a href="http://opus.lingfil.uu.se/">OPUS</a> is a collection of parallel corpora in a variety of languages and domains. Includes some interesting domains such as film subtitles.
958f969 @alopez First draft of course page
authored
435 </ul>
436
8acc110 @alopez Adds sidebar links
authored
437 <a name="resources"></a><h2>Other Resources and Classes</h2>
958f969 @alopez First draft of course page
authored
438 <ul>
364f79e @alopez Fixed some broken URLs
authored
439 <li>The <a href="http://aclweb.org/anthology-new/">ACL Anthology</a> archives papers published by the Association for Computational Linguistics, which covers a wide variety of topics in natural language processing. It includes many of the classic papers on machine translation.
440 <li>The <a href="http://mt-archive.info/">MT Archive</a> holds historical and modern research papers on machine translation. There is some overlap with the ACL Anthology, but it is focused specifically on machine translation, and also includes many papers from other venues, as well as historical papers..</li>
441 <li><a href="http://homepages.inf.ed.ac.uk/pkoehn/">Philipp Koehn</a> maintains <a href="http://statmt.org/">statmt.org</a>, with pointers to various resources.
b03abb4 @alopez Fixed URLs after running link checker
authored
442 <li><a href="http://www.cs.jhu.edu/~jason/">Jason Eisner</a>'s <a href="http://www.cs.jhu.edu/~jason/465/">natural language processing class</a> at JHU.
bcaab69 @alopez Course proposal following instructor meeting
authored
443 <li><a href="http://www.cs.jhu.edu/~mdredze/">Mark Dredze</a>'s <a href="http://www.cs.jhu.edu/~mdredze/teaching/2011_600_475/">machine learning class</a> at JHU.
364f79e @alopez Fixed some broken URLs
authored
444 <li>Chris Callison-Burch taught a one-week machine translation course at ESSLLI 2005 with <a href="http://homepages.inf.ed.ac.uk/pkoehn/">Philipp Koehn</a>:
445 <a href="http://homepages.inf.ed.ac.uk/pkoehn/publications/esslli-slides-day1.pdf">1</a>,
446 <a href="http://homepages.inf.ed.ac.uk/pkoehn/publications/esslli-slides-day2.pdf">2</a>,
447 <a href="http://homepages.inf.ed.ac.uk/pkoehn/publications/esslli-slides-day3.pdf">3</a>,
448 <a href="http://homepages.inf.ed.ac.uk/pkoehn/publications/esslli-slides-day4.pdf">4</a>,
449 <a href="http://homepages.inf.ed.ac.uk/pkoehn/publications/esslli-slides-day5.pdf">5</a>.</li>
958f969 @alopez First draft of course page
authored
450 <li>Adam Lopez taught a one-week <a href="http://www.cs.jhu.edu/~alopez/esslli2010.html">machine translation course</a> at ESSLLI 2010.</li>
bcaab69 @alopez Course proposal following instructor meeting
authored
451 <li>Machine translation course at <a href="http://nlg.isi.edu/teaching/cs599mt/">University of Southern California</a>.</li>
452 <li>Machine translation course at <a href="http://www.inf.ed.ac.uk/teaching/courses/mt/">University of Edinburgh</a>.</li>
453 <li>Machine translation course at <a href="https://catalyst.uw.edu/workspace/kristout/20547/123745">University of Washington</a>.</li>
454 <li>Machine translation course at <a href="http://www.cs.cmu.edu/afs/cs/project/cmt-55/lti/Courses/731/www/Spring-11/ClassSchedule2011.htm">Carnegie Mellon University</a>.</li>
835987c @alopez Updated URLs
authored
455 <li>Machine translation course at <a href="https://sites.google.com/site/comse6998machinetranslation/">Columbia University</a>.</li>
ffafd98 @alopez Update course page with readings, homework details, etc.
authored
456 <li>Machine translation course at <a href="http://www.cs.sfu.ca/~anoop/teaching/CMPT-882-Fall-2011/">Simon Fraser University</a>.</li>
457 <li>A course in advanced topics in machine translation at <a href="http://www.cs.cmu.edu/afs/cs/project/cmt-55/lti/Courses/734/www/">Carnegie Mellon University</a> runs concurrently with this class.</li>
958f969 @alopez First draft of course page
authored
458 </ul>
459 </div>
460
7ce3f4d @alopez Added CC license notice and credits
authored
461 <div class="footer">
462 <p>Last updated on {{ site.time | date: "%B %d, %Y" }}. Site created using
463 <a href="http://git-scm.com/">git</a>,
b03abb4 @alopez Fixed URLs after running link checker
authored
464 <a href="https://github.com/mojombo/jekyll">jekyll</a>,
296baf7 @alopez Adds github credit
authored
465 and <a href="http://www.vim.org/">vim</a>, and hosted on <a href="https://github.com/">github</a>.</p>
7ce3f4d @alopez Added CC license notice and credits
authored
466 <p><a rel="license" href="http://creativecommons.org/licenses/by/3.0/">
467 <img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by/3.0/80x15.png" /></a><br />
468 Except where noted, the lectures, assignments, and other material hosted on this page were created by
469 <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">Adam Lopez, Chris Callison-Burch, and Matt Post</span> and are licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution 3.0 Unported License</a>.
470 That means you're ree to reuse the
b03abb4 @alopez Fixed URLs after running link checker
authored
471 <a href="https://github.com/mt-class/mt-class.github.com">source code</a>, though please acknowledge that you got it from us. Thanks!</p>
7ce3f4d @alopez Added CC license notice and credits
authored
472 </div>
473
958f969 @alopez First draft of course page
authored
474 </body>
475 </html>
e38f077 @mjpost test change
mjpost authored
476
Something went wrong with that request. Please try again.