In [10]:
%%html
<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>

# NDAK18000U Overview
Content at [https://github.com/coastalcph/nlp-course](github.com/coastalcph/nlp-course).
Click on slides for *Course Logistics*.

### NDAK18000U Details

- **Course Organizer**: [Daniel Hershcovich](https://danielhers.github.io/)
- **Teachers**: Daniel Hershcovich and [Anders Søgaard](https://anderssoegaard.github.io/)
- **Teaching Assistants**:
    - Ruchira Dhar, Zixuan Xu, Christian Jensen, Thomas Brun Lau Christensen

### NDAK18000U Schedule

- Lectures:  
    -  Tuesdays, 13-15 in Aud 05, Universitetsparken 5 (HCØ), Weeks 36-41 + 43-44
    
- Lab Sessions:
    - Group 1: Mondays, 10-12 in the old library (4-0-17), Universitetsparken 1, Weeks 37-41 + 43-44
    - Group 2: Fridays, 10-12 in the old library (4-0-17), Universitetsparken 1, Weeks 36-41 + 43-44

We will assign you to one of two lab session groups based on your answers to the [Getting to Know You survey](https://absalon.ku.dk/courses/76988/assignments/210867).
If you have not filled it in yet, do it as soon as possible.
You will receive an announcement about your assignment before the first lab session.

In [13]:
%cd ..

/home/daniel/nlp-course


In [14]:
import re
from IPython.display import Markdown, display

with open('README.md', 'r') as f:
    content = f.read()

# Regular expression to find all <a href="..."> that don't start with 'http'
pattern = re.compile(r'<a href=([\'"])(?!http)([^\'"]+)\1>')

# Function to prepend '../' to each link
def replacer(match):
    quote = match.group(1)
    link = match.group(2)
    new_link = f"../{link}"
    return f'<a href={quote}{new_link}{quote}>'

# Replace links in content to be relative to the top directory of the repo
display(Markdown(pattern.sub(replacer, content)))

# Natural Language Processing (NDAK18000U)
## Course at the University of Copenhagen

Materials from this interactive book are used throughout the Natural Language Processing course at the Department of Computer Science, University of Copenhagen. The official course description can be found [here](https://kurser.ku.dk/course/ndak18000u). Materials covered each week are listed below. The course schedule and materials are tentative and subject to minor changes. Most reading material is from [Speech and Language Processing by Jurafsky & Martin](https://web.stanford.edu/~jurafsky/slp3).

<table><tr><th>Week</th><th>Reading (before lecture)</th><th>Lecture (Tuesday)</th><th>Lab (Friday &amp; Monday)</th><th>Lab notebook</th></tr>
     <tr><td>36</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/2.pdf'>Chapter 2 up to end of 2.5</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/4.pdf'>Chapter 4 up to end of 4.5</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/5.pdf'>Chapter 5 up to end of 5.6</a><br>
      </td><td>3. Sep. 2024:<br>
      Course Logistics (<a href='../chapters/course_logistics.ipynb'>slides</a>)<br>
      Introduction to NLP (<a href='../chapters/intro_short.ipynb'>slides</a>)<br>
      Tokenisation &amp; Sentence Splitting (<a href='../chapters/tokenization.ipynb'>notes</a>, <a href='../chapters/tokenization_slides.ipynb'>slides</a>, <a href='../exercises/tokenization.ipynb'>exercises</a>)<br>
      Text Classification (<a href='../chapters/doc_classify_slides_short.ipynb'>slides</a>)<br>
      </td><td>6. &amp; 9. Sep. 2024:<br>
      Jupyter notebook setup, introduction to <a href='https://colab.research.google.com/'>Colab</a><br>
      Introduction to <a href='https://pytorch.org/tutorials/'>PyTorch</a><br>
      Project group arrangements<br>
      Questions about the course project<br>
      </td><td><a href='../labs/notebooks_2024/lab_1.ipynb'>lab 1</a></td></tr>
     <tr><td>37</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/3.pdf'>Chapter 3 up to end of 3.5</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/6.pdf'>Chapter 6 up to end of 6.4</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/7.pdf'>Chapter 7 up to end of 7.5</a><br>
      </td><td>10. Sep. 2024:<br>
      Language Modelling (<a href='../chapters/language_models_slides.ipynb'>slides</a>)<br>
      Word Embeddings (<a href='../chapters/dl-representations_simple.ipynb'>slides</a>)<br>
      </td><td>13. &amp; 16. Sep. 2024:<br>
      Word representations and sentiment classification<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2024/lab_2.ipynb'>lab 2</a></td></tr>
     <tr><td>38</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/7.pdf'>Chapter 7 up to end of 7.6</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/8.pdf'>Chapter 8 up to end of 8.7</a>
     </td><td>17. Sep. 2024:<br>
      Recurrent Neural Networks (<a href='../chapters/rnn_slides_ucph.ipynb'>slides</a>)<br>
      Neural Language Models (<a href='../chapters/dl-representations_contextual.ipynb'>slides</a>)<br>
      </td><td>20. &amp; 23. Sep. 2024:<br>
      Error analysis and explainability<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2024/lab_3.ipynb'>lab 3</a></td></tr>
    <tr><td>39</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/17.pdf'>Chapter 17 up to end of 17.3</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/19.pdf'>Chapter 19 up to end of 19.2</a>
      </td><td>24. Sep. 2024:<br>
      Sequence Labelling (<a href='../chapters/sequence_labeling_slides.ipynb'>slides</a>, <a href='../chapters/sequence_labeling.ipynb'>notes</a>)<br>
      Parsing (<a href='../chapters/dependency_parsing_slides_active.ipynb'>slides</a>)<br>
      </td><td>27. &amp; 30. Sep. 2024:<br>
      Sequence labelling and beam search<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2024/lab_4.ipynb'>lab 4</a></td></tr>
     <tr><td>40</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/8.pdf'>Chapter 8 up to end of 8.8</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/9.pdf'>Chapter 9 up to end of 9.2</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/10.pdf'>Chapter 10 up to end of 10.2</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/11.pdf'>Chapter 11</a>
      </td><td>1. Oct. 2024:<br>
      Attention (<a href='../chapters/attention_slides2.ipynb'>slides</a>)<br>
      Transformers (<a href='../chapters/dl-representations_contextual_transformers.ipynb'>slides</a>)<br>
      </td><td>4. &amp; 7. Oct. 2024:<br>
      Language Models with <a href='https://huggingface.co/course/chapter1'>Transformers</a> and RNNs<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2024/lab_5.ipynb'>lab 5</a></td></tr>
     <tr><td>41</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/14.pdf'>Chapter 14</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/20.pdf'>Chapter 20</a>
      </td><td>8. Oct. 2024:<br>
      Information Extraction (<a href='../chapters/information_extraction_slides.ipynb'>slides</a>)<br>
      Question Answering (<a href='../chapters/question_answering_slides.ipynb'>slides</a>)<br>
      </td><td>11. &amp; 21. Oct. 2024:<br>
      In-depth look at Transformers and Multilingual QA<br>
      Project help<br>
      </td><td><a href='../labs/notebooks_2024/lab_6.ipynb'>lab 6</a></td></tr>
    <tr><td>43</td><td>
      <a href='https://web.stanford.edu/~jurafsky/slp3/12.pdf'>Chapter 12</a><br>
      <a href='https://web.stanford.edu/~jurafsky/slp3/13.pdf'>Chapter 13</a><br>
      </td><td>22. Oct. 2024:<br>
      Machine Translation (<a href='../chapters/nmt_slides_active.ipynb'>slides</a>)<br>
      Transfer Learning (<a href='../chapters/xling_transfer_learning_slides.ipynb'>slides</a>)<br>
      </td><td>25. &amp; 28. Oct. 2024: Project help.</td><td></td></tr>
    <tr><td>44</td><td>
      <a href='https://aclanthology.org/Q19-1004.pdf'>Belinkov and Glass, 2019</a>
      </td><td>29. Oct. 2024:<br>
      Interpretability (<a href='../chapters/interpretability_slides.ipynb'>slides</a>)<br>
      </td><td>1. Nov. 2024: Project help.</td><td></td></tr></table>

The easiest way to view the course content is via the static [nbviewer](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb). 
To be able to make changes to the book and render it dynamically, see the [installation instructions](INSTALL.md).


### Course Requirements
* Familiarity with machine learning (probability theory, linear algebra, classification)
* Knowledge of programming (Python)
* No prior knowledge of natural language processing or linguistics is required

Relevant machine learning competencies can be obtained through one of the following courses: 
* [NDAK22000U Machine Learning A (MLA)](https://kurser.ku.dk/course/ndak22000u) and/or [NDAK22001U Machine Learning B (MLB)](https://kurser.ku.dk/course/ndak22001u)
* [NDAK16003U Introduction to Data Science (IDS)](https://kurser.ku.dk/course/ndak16003u)
* [NDAB23000U Grundlæggende Data Science (GDS)](https://kurser.ku.dk/course/ndak23000u)
* [Machine Learning, Coursera](https://www.coursera.org/learn/machine-learning)

See also the [course description](https://kurser.ku.dk/course/ndak18000u).

### About You: previously taken courses related to NLP?

<img src="../img/survey_q1.png" width=100%>

### About You: previously taken courses in Machine Learning?

<center>
<img src="../img/survey_q2.png" width=60%>
</center>

### About You: experience with using neural network software libraries?

<img src="../img/survey_q3.png" width=100%>

### About You: degree are you enrolled in

<center>
<img src="../img/survey_q4.png" width=70%>
</center>

### About You: what you want to get out of this course

<img src="../img/survey_q5.png" width=100%>

### About You: what you want to get out of the lab sessions

<img src="../img/survey_q6.png" width=100%>

### Course Materials
* We will be using the [nlp-course](../overview.ipynb) repository 
* Contains **interactive** [jupyter](http://jupyter.org/) notebooks and slides
    * View statically [here](https://nbviewer.jupyter.org/github/coastalcph/nlp-course/blob/master/overview.ipynb)
    * Use interactively via install, see [github repo](https://github.com/coastalcph/nlp-course) instructions  
* Recordings of 2020 lectures are available on [Absalon](https://absalon.ku.dk/courses/68562/external_tools/14563)
* References to other material are given in context
* This is work in progress.
    * [Previous iterations of the course at DIKU](https://github.com/copenlu/stat-nlp-book)
    * Use `git pull` regularly for updates
    * *Watch* for updates
    * Please contribute by adding issues on github when you see errors
* For assignment hand-in, announcements, discussion forum, check [Absalon](https://absalon.ku.dk/courses/76988)

### Teaching Methods
* Lectures
* Hands-on lab (TA) sessions
* Group project
* Occasional small exercises during lectures, so bring your laptop
* Background material to read before each lecture

### Assessment Methods

* **[Group project (50%)](https://absalon.ku.dk/courses/76988/assignments/210864)**, can be completed in a group of up to 3 students
    * Released 3 September, **hand-in 1 November 17:00**
    * Joint report, contribution of each student should be stated clearly
    * Code to be uploaded as attachment
    * Individual grade for each group member, based on the quality and quantity of their contributions
    * Submission via Digital Exam
    * Consists of several parts tied to weekly lecture topics
    * AI assistance is allowed **with restrictions**
    * We cannot guarantee responses to queries about the project after 31 October 15:00

### Assessment Methods

* **Group project (50%)**, can be completed in a group of up to 3 students
    * AI assistance is allowed **with restrictions**:
        * As coding tools (e.g., GitHub Copilot): no restrictions.
        * As writing tools: no restrictions.
        * As search tools: no restrictions. Usual citation requirements apply.
        * As generation tools for *new* ideas: generated content must be clearly highlighted. Prompts/transcripts must be included.

See project description for more details.

### Assessment Methods

* **Group project (50%)**, can be completed in a group of up to 3 students
    * Finding a group: 
       * Deadline for group forming: **9 September 17:00**
       * We offer to help you find a group -- fill in the [Getting to Know You survey](https://absalon.ku.dk/courses/76988/assignments/210867) by the end of *first lecture day,* **3 September 17:00**
       * If you choose this option, you will be informed of your assigned group on **4 September**
       * You can still change groups afterwards by asking other students to swap groups (it's your responsibility to arrange this)
       * Otherwise, we assume you will find a group by yourself in the first course week, e.g. by coordinating with other students in the lab session

### Assessment Methods

* **In-person written exam (50%)**, to be completed individually
    * Date: 8 November
    * Duration: 1.5 hours
    * Theoretical exam, covering the whole course curriculum
    * All aids allowed - but per [UCPH policy](https://kunet.ku.dk/work-areas/teaching/digital-learning/chatgpt-and-ai/guidelines-and-rules-for-chatgpt/Pages/default.aspx), ChatGPT/GPT-4 and similar LLMs/generative AI are **not** permitted for the the exam

### Late Hand-In

* Late hand-ins **cannot be accepted**
* Exceptions can be made in rare cases, e.g. due to illness with doctor's notice
    * Get in touch with course organizer at least one working day in advance

### Plagiarism

* Don't do it
* Don't enable it
* Check [rules and consequences](https://student-ambassador.ku.dk/rights/avoid-plagiarism/) if unclear

### Docker

* The book and tutorials run in a [docker](https://www.docker.com/) container
* Container comes with all dependencies pre-installed
* You can install it on your machine or on Google Colab/Azure/AWS machines
* We recommend you use this container for your project
   * Contains all core software packages for solving the project
   * You may use additional packages if needed

In [6]:
display(Markdown("../INSTALL.md"))

../INSTALL.md

### Python

* Lectures, lab exercises and assignments focus on **Python**
* Python is a leading language for data science, machine learning etc., with many relevant libraries
* We expect you to know Python, or be willing to learn it **on your own**
* Labs and assignments focus on development within [jupyter notebooks](http://jupyter.org/)

### Lab Sessions

* Some lab sessions are tutorial-style (to introduce you to practical aspects of the course)
* Other lab sessions are open-topic. You can use them as an opportunity to:
   * ask the TAs clarifying questions about the lectures and/or project
   * ask the TAs for informal feedback on your project so far
   * work on your project with your group

### Discussion Forum

* Our Absalon page has a [**discussion forum**](https://absalon.ku.dk/courses/76988/discussion_topics).
* Please post questions there (instead of sending private emails) 
* We give low priority to **questions already answered** in previous lectures, tutorials and posts, 
    * and to **pure programming related issues**
* We expect you to **search online** for answers to your questions before you contact us.
* You are highly encouraged to participate and **help each other** on the forum. 
* The teaching team will check the discussion forum regularly **within normal working hours**
    * do not expect answers late in the evenings and on weekends
    * **start working on your project early**
    * come to the lab sessions and ask questions there

### DIKU NLP

* Research Section, UCPH Computer Science Department
* Faculty members: Isabelle Augenstein (head of section), Pepa Atanasova, Daniel Hershcovich, Desmond Elliott, Anders Søgaard
* Official webpage: https://di.ku.dk/english/research/nlp/
* List of group members: https://copenlu.com/ ; http://coastalcph.github.io/; https://elliottd.github.io/people.html
* Twitter: 
    * @copenlu https://twitter.com/CopeNLU
    * @coastalcph https://twitter.com/coastalcph
* Always looking for strong MSc students
* PhD positions available dependent on funding