The rise of large language models such as ChatGPT marks a moment that seems to blur the boundary between artificial and human intelligence. Such language models excel at comprehending human language, and provide assistance to individuals in many text works. In this seminar, we will delve into the domain of large language models, with a particular focus given to diachronic models. These models require the ability to understand the development of human language, including both the past and the present, as well as the changes that occur over time. To begin, we will look into the development of human language, namely language change and variation over time. We will then explore the machine learning techniques behind diachronic language models. Lastly, we will examine the implications of these models in the fields of historical linguistics, natural language generation and social sciences.
This seminar series takes place at Heidelberg University on Tuesdays from 15:15 to 16:45, in INF 325 / SR 24 during the winter seminster of 2023.
In each lecture, there will be two presentations that cover the same topic, but each presenter should choose a different paper or a textbook chapter to prepare his/her presentation. Please make sure to mark your selected papers/chapters in "References" when blocking your time slots.
Zoom link: https://kta-email.zoom.us/j/97565882219?pwd=OFhIUW43UFl2a0NUSWtEWURkU1d6QT09Meeting ID: 975 6588 2219Passcode: 427190- Zoom link: https://kta-email.zoom.us/j/92709054859?pwd=R3lLa0I0S3JkdkpUWFdQVVdneUhIQT09
- Meeting ID: 927 0905 4859
- Passcode: 412460
- Monday 13:00 - 14:00
- Friday 13.00 - 14:00
- You will receive timely feedback from us through email and Discord during the office hours.
- Join us at Discord: https://discord.gg/PpwbcYyX
- Fill out time slots for presentations: https://shorturl.at/hRZ01
- Topic voting for term papers: https://forms.gle/PxJeRZGfftZei4qE9
- Course webpage at Uni Heidelberg: https://www.cl.uni-heidelberg.de/courses/ws23/diaclms/
- Survey paper: https://www.overleaf.com/read/sqrzdvrcpypx#d593bf
- Analysis paper: https://www.overleaf.com/read/tpjmjrnnfdvm#adfa83
The templates include writing hints for each section. No template yet for a position paper; contact me if you want this format.
The seminar will be held in English.
- Language Change in Dialect Continua: A Survey on Diachronic and Diatopic Variation in NLP
- Exploring Future Work: Using Language Models to Detect Syntactic Change Over Time
- Experimental Methods for Detecting Phonetic Change Over Time: A Comprehensive Survey
Date | Topics | Disciplines | Presenters | References |
---|---|---|---|---|
17/10/2023 | Language change & language models | CL | Wei Zhao | L1 |
24/10/2023 | Speed and types of linguistic change | HL | Wei Zhao | A3 |
31/10/2023 | Grammaticalization: Part 1 Part 2 | HL | Melis Çelikkol, Lydia Körber | C1, C2 |
31/10/2023 | Mini-lecture on Paper Review Writing | - | Wei Zhao | - |
07/11/2023 | Guest lecture on language models: Part 1 Part 2 | NLP | Maxime Peyrard, Jonas Belouadi | No need |
14/11/2023 | No seminar (GaML 2023) | |||
21/11/2023 | Large language model: Part 1 Part 2 | NLP | Ke Ren, Wenzhuo Chen | E2, E3 |
28/11/2023 | Diachronic language model: Part 1 Part 2 | NLP | Siqi He, Chenpei Xie | D4, D5 |
05/12/2023 | Prompt engineering for large language model and guest lecture on language model Part 1 Part 2 Part 3 | NLP | Veerav Chebrolu, Hans Martin Ramsl, Xiran Hu | M1, M4 |
12/12/2023 | Semantic change detection: Part 1 Part 2 | NLP | Katharina Altrichter, Blanca Birn | F3, F5 |
19/12/2023 | Guest lecture on semantic change: Part 1 Part 2 | CL | Dominik Schlechtweg | F8, F9 |
Winter Break | ||||
09/01/2024 | Syntactic change detection Part 1 Part 2 | CL | Maya Arseven, Hiu Lam Choy | G2, G3 |
16/01/2024 | Temporal machine translation Part 1 Part 2 | NLP | Atila Martens, Long Kim | L1, L2 |
23/01/2024 | Temporal text summarization Part 1 Part 2 | NLP | Xinyu Liang, Haofang Fan | H2, H4 |
30/01/2024 | Reconstruction of historical text | NLP | Tim Kolber | J1 |
30/01/2024 | Evaluation for Text Generation | NLP | Wei Zhao | - |
06/02/2024 | Temporal misinformation detection Part 1 Part 2 | CSS | Geng Zhao, Amir Ghadanfar | K1, K2 |
Note that this timetable will be updated once a week on the day after a weekly lecture ends. Please refer to the spreedsheet for ongoing updates (incl., papers to-be-read) when you want to write a paper review and prepare questions.
Note that papers without provided links are freely accessible on the Internet. For papers that are not publicaly avaiable, we download these papers and provide the links to them.
- Speed of linguistic change
- A1 Dialect contact and the speed of Jespersen’s cycle in Middle Low German
- A2 The Determinants of Diachronic Stability
- A3 Sociolinguistic typology and the speed of linguistic change
- A4 Is the rate of linguistic change constant?
- Syntactic change
- B1 Complexity as L2-difficulty: Implications for syntactic change
- B2 Syntactic change: A Minimalist Approach to Grammaticalization
- B3 Introduction to syntactic change
- Grammaticalization
- C1 Introduction to Grammaticalization
- C2 What is it then, this Grammaticalization?
- C3 The Formal Semantics of Grammaticalization
- C4 What counts as (an instance of) grammaticalization?
- C5 Information-based Modeling of Diachronic Linguistic Change: from Typicality to Productivity
- Diachronic language model
- D1 Dynamic Word Embeddings
- D2 Contextualized diachronic word representations
- D3 Dynamic Contextualized Word Embeddings
- D4 Are Large Language Models Temporally Grounded
- D5 Diachronic word embeddings and semantic shifts: a survey
- Large language model
- E1 A survey on evaluation of large language models
- E2 A Survey of Large Language Models
- E3 Chatgpt: A meta-analysis after 2.5 months
- Semantic change detection
- F1 Diachronic word embeddings and semantic shifts: a survey
- F2 Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models
- F3 “Cultural shift or linguistic drift? comparing two computational models of semantic change
- F4 Diachronic word embeddings reveal statistical laws of semantic change
- F5 Grammar and meaning: Analysing the topology of diachronic word embeddings
- F6 What about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-Lexical and Grammatical Constructions
- F7 Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
- F8 SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
- F9 LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
- Syntactic change detection
- G1 Tracing Syntactic Change in the Scientific Genre
- G2 Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
- G3 Stability of Syntactic Dialect Classification over Space and Time
- G4 Exploring morphosyntactic variation and change with Distributional Semantic Models
- G5 Using distributional semantics to study syntactic productivity in diachrony: A case study
- Temporal text summarization
- H1 Incremental temporal summarization in multi-party meetings
- H2 Improving ROUGE for Timeline Summarization
- H3 An Evaluation Corpus For Temporal Summarization
- H4 Context or No Context? A preliminary exploration of human-in-the-loop approach for Incremental Temporal Summarization in meetings
- Temporal machine translation
- I1 A Machine Translation Approach for Modernizing Historical Documents Using Backtranslation
- I2 Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings
- Reconstructon of historical manuscripts
- J1 Reconstructing ancient literary texts from noisy manuscripts
- J2 Restoring and attributing ancient texts using deep neural networks
- J3 Machine Learning for Ancient Languages: A Survey
- Social Science
- K1 Temporal Graph Analysis of Misinformation Spreaders in Social Media
- K2 Learn over Past, Evolve for Future: Forecasting Temporal Trends for Fake News Detection
- K3 Modeling Conversation Structure and Temporal Dynamics for Jointly Predicting Rumor Stance and Veracity
- K4 Generalizing to the future: Mitigating entity bias in fake news detection
- The Theory of language change
- L1 Revisiting the five foundational problems of Weinreich, Labov & Herzog (1968)
- L2 Principles of linguistic change, volume 1: Internal factors
- L3 Principles of Linguistic Change, Volume 2: Social factors
- L4 Principles of linguistic change, volume 3: Cognitive and cultural factors
- ACL Anthology: A freely available database for CL and NLP Publications
- The Science of prompt engineering
- M1 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
- M2 Prompt Engineering
- M3 Prompt Engineering Guide
- M4 Prompt Engineering a Prompt Engineer
- Term paper (preliminary version): 12, March, 2024 (not mandatory)
- Term paper (final version): 19, March, 2024
- Presentation slides: Tuesdays at 12:00
- Paper review & questions: Tuesdays at 12:00
- Block time slots for presentations: 20, October, 2023 (at least for presentations in the first few weeks)
- Topic voting for term papers: 24, October, 2023 (Can be extended if needed)