# List of projects

- As part of this class you have to complete one of the projects below and give a short presentation about it in class (see the syllabus for details on how this factors into your grade).
- Most projects are team efforts, but some can be done on your own.
- Each project has a **difficulty** rating, which also indicates the **suggested minimum group size**.
- Irrespective of the difficulty rating, the **maximum group size is 3**.

## Improving the lecture notebooks (Difficulty: 0)

This project is aimed at students who feel that they could benefit significantly from going over the notebooks a few more times.
Reread the notebooks (including the LIN 120 recap).
Then pick 3 units (which may include the recap units), and for each one of their notebooks add the following:

1. A list of questions you have about the subject matter. This can include clarification, things you're curious about that are not covered there, and so on.
   Any question is a good question.

1. Concrete improvements.
   For instance, if you think you know what a passage means but also think that it is worded poorly, provide a different phrasing.

1. Lots of practice exercises, with solutions.
   The exercises should be short (comparable to LIN 120 exercises).
   The solutions should be comparable to the solutions handed out for assignments in this class.
   That is to say, they shouldn't just contain the answer, but also explain why this is the answer, what alternative solutions are and why they're better or worse, and why some solutions that students might try don't actually work.
   This will take up most of your time and is the main component for grading of this project.

If multiple students pick this project, they must work on distinct units.

## Calculating arbitrary edit distances (Difficulty: 1)

This project is comparable in difficulty to a challenge task in a homework assignment (but since you have a lot more time to work on it, it isn't quite as challenging).
The notebooks cover how to efficiently calculate Levenshtein distance with dynamic programming techniques, but the lecture notes described a number of other edit distances.
Generalize the code for the Levenshtein distance so that it can be used to calculate the distance for any one of these metrics.

## Improved spell checker (Difficulty: 2)

Write a spell checker that not only detects misspellings, but also offers suggestions.
The suggestions should be ranked.
It is your job to decide on a useful ranking.
Relevant conditions might be Levenshtein distance, frequency of the suggestions, fit with preceding words, among others.
Do not try to take phonetic similarity into account --- it requires a pronunciation corpus, which you might not be able to get.

Suggested techniques: dynamic programming, n-grams

## Poetry generator (Difficulty: 2)

Write a program that automatically creates poems.
The notion of poem is deliberately old-school:

1. there has to be a meter that's obeyed by each line, and
1. there has to be some kind of rhyming pattern

Your solution should be able to generate a virtually infinite number of distinct poems.
So you can't just create a list of fixed poems and have the program choose between them.
You also shouldn't just rely on a prefabricated template with one or two gaps that get filled by arbitrary words.
Basically: if a naive user can easily figure out after a certain number runs how you're doing it, your solution isn't sophisticated enough. 

Ideally, your poems should survive the [bot or not challenge](http://botpoet.com/): when presented as part of a collection of poems from *bot or not*, your poem should be classified as bot-generated less than 50% of the time (try it on your mom).

Suggested techniques: n-grams, POS tags, finite-state automata

## Text adventure (Difficulty: 2)

A text adventure is a bit like your Choose-Your-Own-Adventure book, except that the user doesn't choose from a fixed list of options but rather enters text to tell the computer what he or she wants to do.
The key coding aspect of a text adventure is the text parser that interprets the user input and maps it to one of the available actions.
For a concrete example of how text adventures work, check [this youtube video](https://www.youtube.com/watch?v=PWQDccL0aXM).

This project has two central challenges.
One is finding a good data structure for storing the text passages, linking them together, and associating them with available actions.
The other one is handling the user input.
This requires tokenization, normalizing capitalization, simple spelling correction without user feedback (not nearly as sophisticated as the dedicated spellchecker project), and extracting keywords.

Suggested techniques: list of keywords + Levenshtein distance for spell checking, regular expressions for tokenization and keyword extraction

## Finite-state transducer implementation of a language's phonology (Difficulty: 1-3; depends on complexity of rewrite rules)

As you know, phonology uses rewrite rules to describe the mapping from underlying forms to surface realizations.
Each rewrite rule can actually be translated to a finite-state transducer.
Hence it is possible to implement a phonological grammars as a collection of finite-state transducers that are run one after the other.
For this project, you would pick a language of your choice and implement a rewrite-rule based description of its phonology in terms of finite-state transducers. 

Make sure you consult with me before picking a language.

## Finite-state automata as Boolean matrix multiplication (Difficulty: 3, but can be done by a single person)

This project is for the mathematically inclined only.
Finite-state automata can be represented as a collection of matrices where all cells are either True or False.
These are called *Boolean matrices*.
The process of determining whether a string is accepted by an automaton is equivalent to a specific sequence of matrix multiplications.

This project involves two components:

1. Read up on the connection between automata and Boolean matrix multiplication.
   The reading materials will be supplied by me, but you should be comfortable reading mathematical notation.
1. Implement code that converts an automaton into its equivalent Boolean matrix representation, as well as an alternative to the `.accepts` method that uses Boolean matrix multiplication.
   As part of this, you will have to learn how matrices are handled in Python.
   
Suggested techniques: memoization for matrix multiplication, the `numpy` package for representing matrices

## Suggest an independent project

Many of your are already engaged in research or coding projects.
Your class project can be something that actively contributes to that.
If you want to do this, create a new thread in this folder and describe there what the project would look like.
Others can chime in too.
I'll let you know whether I think it's a feasible project for this class.