ATP_kurssi
This page includes all the materials for the course KKLT0030 Automatic text processing 5 credits.
The course Moodle page has private materials, such as possible recordings and announcements: https://moodle.utu.fi/course/view.php?id=26139
Tue Oct 25
- Getting started
- Notebook 1
- Commands
- Getting data and printing stuff: wget, echo
- Printing files: cat, head, tail
- Copying, renaming, removing: cp, rm, mv
- Others: wc -w, ls
Wed Oct 26
- Notebook2
- Commands: egrep, sort, uniq
- Options
- egrep -v, -i, -w, -c, -B, -A
- head -n, tail -n
- wc -l, -w
- uniq -c, sort -r, -n
- Pipes, especially frequency counts
- sort | uniq -c | sort -rn
Tue Nov 1
- Notebook3 exercises
Wed Nov 2
- Notebook4
- Git clone for cloning Github reports
- Gzipped files using gzip and zcat
- Changing characters using tr
- Combining tr to a frequency list pipeline
- Using tr to normalize
- Regular expressions
Tue Nov 8
- Notebook 5 exercies
Wed Nov 9
- Notebook 6
- Dependency syntax analysis pipeline
- Sentence + token segmentation, lemmatisation, POS, dependencies
- conllu format
- Universal dependencies treebanks
- Trankit parser
Tue Nov 15
- Notebook 7
- Running python scripts
Wed Nov 16
- Notebook 8
- Working on the server (Note that the exam will be on server!)
- Scripts
Tue Nov 19
- Notebook 9
Wed Nov 20
- Course cancelled (announced both in a Moodle message and at class, I hope no one showed up)
Tue Nov 29
- Notebook 10
- More exercises on working on the server, regular expressions, handling data
Wed Nov 30
- Notebook 11
- In case the Notebook 11 is not accessible as from the Github repo, you can use this version: https://colab.research.google.com/drive/16EFdusy496svEkMSxTvP0pHtzTZikHer?usp=sharing
- Recap
Wed Dec 7
- Exam, option 1
- 12.15-13.45 in 420.1 (same as lectures)
Tue Dec 13
- Exam, option 2
- 8.30-10.00 in Agora IT-luokka K126A (same as lectures)