This repo contains materials for Introduction to Computational Linguistics–a course given by Aaron Steven White at the University of Rochester.
This course covers foundational concepts in computational linguistics. Major focus is placed on the use of formal languages as a tool for understanding natural language as well as on developing students' ability to implement foundational algorithms pertaining to those formal languages. Topics include basic formal language theory, finite state phonological and morphological parsing, and syntactic parsing for context free grammars and mildly context sensitive formalisms.
This course relies on concepts covered in an introductory linguistics course and an introductory programming course. With respect to the latter, it specifically assumes that you can competently write scripts that do non-trivial things and can work competently with Python's object-oriented programming facilities but maybe not develop a package on your own.
Aaron Steven White is an Associate Professor of Linguistics and Computer Science at the University of Rochester, where he directs the Formal and Computational Semantics lab (FACTS.lab). His research investigates the relationship between linguistic expressions and conceptual categories that undergird the human ability to convey information about possible past, present, and future configurations of things in the world.
In addition to being a principal investigator on numerous federally funded grants and contracts, White is the recipient of a National Science Foundation Faculty Early Career Development (CAREER) award. His work has appeared in a variety linguistics, cognitive science, and natural language processing venues, including Semantics & Pragmatics, Glossa, Language Acquisition, Cognitive Science, Cognitive Psychology, Transactions of the Association for Computational Linguistics, and Empirical Methods in Natural Language Processing.
The site itself is built using Quarto. The source files for this site are available on github at aaronstevenwhite/intro-to-cl
. You can obtain the files by cloning this repo.
git clone https://github.com/aaronstevenwhite/intro-to-cl.git
All further code on this page assumes that you are inside of this cloned repo.
cd intro-to-cl
To build this site, you will need to install Quarto as well as its include-code-files
and line-highlight
extensions.
quarto add quarto-ext/include-code-files
quarto add shafayetShafee/line-highlight
These extensions are mainly used for including and highlighting parts of external files.
All pages that have executed code blocks are generated from jupyter notebooks, which were run within a Docker container constructed using the Dockerfile contained in this repo.
Assuming you have Docker installed, the image can be built using:
docker build --platform linux/amd64 -t intro-to-cl .
A container based on this image can then be constructed using:
docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work intro-to-cl
To access jupyter, simply copy the link provided when running this command. It should look something like this (though your access tokens will differ):
To access the server, open this file in a browser:
file:///home/jovyan/.local/share/jupyter/runtime/jpserver-8-open.html
Or copy and paste one of these URLs:
http://4738b6192fb0:8888/lab?token=8fc165776e7e99c98ec19883f750071a187e85a0a9253b81
http://127.0.0.1:8888/lab?token=8fc165776e7e99c98ec19883f750071a187e85a0a9253b81
You can change the port that docker forwards to by changing the first 8888
in the -p 8888:8888
option–e.g. to redirect port 10000 -p 10000:8888
. Just remember to correspondingly change the port you attempt to access in your browser: so even though the message above has you accessing port 8888, that's the docker container's port 8888, which forwards to your machine's 10000.
The development of these materials was supported by the University of Rochester and a National Science Foundation grant: CAREER: Logical Form Induction (BCS/IIS-2237175).
Introduction to Computational Linguistics by Aaron Steven White is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/aaronstevenwhite/intro-to-cl.