Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?


This page includes all the materials for the course KKLT0030 Automatic text processing 5 credits.

The course Moodle page has private materials, such as possible recordings and announcements:

Tue Oct 25

  • Getting started
  • Notebook 1
  • Commands
    • Getting data and printing stuff: wget, echo
    • Printing files: cat, head, tail
    • Copying, renaming, removing: cp, rm, mv
    • Others: wc -w, ls

Wed Oct 26

  • Notebook2
  • Commands: egrep, sort, uniq
  • Options
    • egrep -v, -i, -w, -c, -B, -A
    • head -n, tail -n
    • wc -l, -w
    • uniq -c, sort -r, -n
  • Pipes, especially frequency counts
    • sort | uniq -c | sort -rn

Tue Nov 1

  • Notebook3 exercises

Wed Nov 2

  • Notebook4
  • Git clone for cloning Github reports
  • Gzipped files using gzip and zcat
  • Changing characters using tr
    • Combining tr to a frequency list pipeline
    • Using tr to normalize
  • Regular expressions

Tue Nov 8

  • Notebook 5 exercies

Wed Nov 9

  • Notebook 6
  • Dependency syntax analysis pipeline
  • Sentence + token segmentation, lemmatisation, POS, dependencies
  • conllu format
  • Universal dependencies treebanks
  • Trankit parser

Tue Nov 15

  • Notebook 7
  • Running python scripts

Wed Nov 16

  • Notebook 8
  • Working on the server (Note that the exam will be on server!)
  • Scripts

Tue Nov 19

  • Notebook 9

Wed Nov 20

  • Course cancelled (announced both in a Moodle message and at class, I hope no one showed up)

Tue Nov 29

  • Notebook 10
  • More exercises on working on the server, regular expressions, handling data

Wed Nov 30

Wed Dec 7

  • Exam, option 1
  • 12.15-13.45 in 420.1 (same as lectures)

Tue Dec 13

  • Exam, option 2
  • 8.30-10.00 in Agora IT-luokka K126A (same as lectures)


No description, website, or topics provided.






No releases published


No packages published