Skip to content

A course introducing computational linguistics to advanced undergraduates and early graduate students in linguistics.

License

Notifications You must be signed in to change notification settings

aaronstevenwhite/intro-to-cl

Repository files navigation

Introduction to Computational Linguistics

This repo contains materials for Introduction to Computational Linguistics–a course given by Aaron Steven White at the University of Rochester.

About the course

This course covers foundational concepts in computational linguistics. Major focus is placed on the use of formal languages as a tool for understanding natural language as well as on developing students' ability to implement foundational algorithms pertaining to those formal languages. Topics include basic formal language theory, finite state phonological and morphological parsing, and syntactic parsing for context free grammars and mildly context sensitive formalisms.

Prerequisites

This course relies on concepts covered in an introductory linguistics course and an introductory programming course. With respect to the latter, it specifically assumes that you can competently write scripts that do non-trivial things and can work competently with Python's object-oriented programming facilities but maybe not develop a package on your own.

About the instructor

Aaron Steven White is an Associate Professor of Linguistics and Computer Science at the University of Rochester, where he directs the Formal and Computational Semantics lab (FACTS.lab). His research investigates the relationship between linguistic expressions and conceptual categories that undergird the human ability to convey information about possible past, present, and future configurations of things in the world.

In addition to being a principal investigator on numerous federally funded grants and contracts, White is the recipient of a National Science Foundation Faculty Early Career Development (CAREER) award. His work has appeared in a variety linguistics, cognitive science, and natural language processing venues, including Semantics & Pragmatics, Glossa, Language Acquisition, Cognitive Science, Cognitive Psychology, Transactions of the Association for Computational Linguistics, and Empirical Methods in Natural Language Processing.

Installation

The site itself is built using Quarto. The source files for this site are available on github at aaronstevenwhite/intro-to-cl. You can obtain the files by cloning this repo.

git clone https://github.com/aaronstevenwhite/intro-to-cl.git

All further code on this page assumes that you are inside of this cloned repo.

cd intro-to-cl

Installing Quarto and extensions

To build this site, you will need to install Quarto as well as its include-code-files and line-highlight extensions.

quarto add quarto-ext/include-code-files
quarto add shafayetShafee/line-highlight

These extensions are mainly used for including and highlighting parts of external files.

Building the Docker container

All pages that have executed code blocks are generated from jupyter notebooks, which were run within a Docker container constructed using the Dockerfile contained in this repo.

Assuming you have Docker installed, the image can be built using:

docker build --platform linux/amd64 -t intro-to-cl .

A container based on this image can then be constructed using:

docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work intro-to-cl

To access jupyter, simply copy the link provided when running this command. It should look something like this (though your access tokens will differ):

To access the server, open this file in a browser:
    file:///home/jovyan/.local/share/jupyter/runtime/jpserver-8-open.html
Or copy and paste one of these URLs:
    http://4738b6192fb0:8888/lab?token=8fc165776e7e99c98ec19883f750071a187e85a0a9253b81
    http://127.0.0.1:8888/lab?token=8fc165776e7e99c98ec19883f750071a187e85a0a9253b81

You can change the port that docker forwards to by changing the first 8888 in the -p 8888:8888 option–e.g. to redirect port 10000 -p 10000:8888. Just remember to correspondingly change the port you attempt to access in your browser: so even though the message above has you accessing port 8888, that's the docker container's port 8888, which forwards to your machine's 10000.

Acknowledgments

The development of these materials was supported by the University of Rochester and a National Science Foundation grant: CAREER: Logical Form Induction (BCS/IIS-2237175).

License Creative Commons License

Introduction to Computational Linguistics by Aaron Steven White is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at https://github.com/aaronstevenwhite/intro-to-cl.

About

A course introducing computational linguistics to advanced undergraduates and early graduate students in linguistics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages