This page includes all the materials for the course KKLT0030 Automatic text processing 5 credits.

The course Moodle page has private materials, such as possible recordings and announcements:

Tue Oct 25

  • Getting started
  • Notebook 1
  • Commands
    • Getting data and printing stuff: wget, echo
    • Printing files: cat, head, tail
    • Copying, renaming, removing: cp, rm, mv
    • Others: wc -w, ls

Wed Oct 26

  • Notebook2
  • Commands: egrep, sort, uniq
  • Options
    • egrep -v, -i, -w, -c, -B, -A
    • head -n, tail -n
    • wc -l, -w
    • uniq -c, sort -r, -n
  • Pipes, especially frequency counts
    • sort | uniq -c | sort -rn

Tue Nov 1

  • Notebook3 exercises

Wed Nov 2

  • Notebook4
  • Git clone for cloning Github reports
  • Gzipped files using gzip and zcat
  • Changing characters using tr
    • Combining tr to a frequency list pipeline
    • Using tr to normalize
  • Regular expressions

Tue Nov 8

  • Notebook 5 exercies

Wed Nov 9

  • Notebook 6
  • Dependency syntax analysis pipeline
  • Sentence + token segmentation, lemmatisation, POS, dependencies
  • conllu format
  • Universal dependencies treebanks
  • Trankit parser

Tue Nov 15

  • Notebook 7
  • Running python scripts

Wed Nov 16

  • Notebook 8
  • Working on the server (Note that the exam will be on server!)
  • Scripts

Tue Nov 19

  • Notebook 9

Wed Nov 20

  • Course cancelled (announced both in a Moodle message and at class, I hope no one showed up)

Tue Nov 29

  • Notebook 10
  • More exercises on working on the server, regular expressions, handling data

Wed Nov 30

Wed Dec 7

  • Exam, option 1
  • 12.15-13.45 in 420.1 (same as lectures)

Tue Dec 13

  • Exam, option 2
  • 8.30-10.00 in Agora IT-luokka K126A (same as lectures)


