Diachronic Language Models

Overview

The rise of large language models such as ChatGPT marks a moment that seems to blur the boundary between artificial and human intelligence. Such language models excel at comprehending human language, and provide assistance to individuals in many text works. In this seminar, we will delve into the domain of large language models, with a particular focus given to diachronic models. These models require the ability to understand the development of human language, including both the past and the present, as well as the changes that occur over time. To begin, we will look into the development of human language, namely language change and variation over time. We will then explore the machine learning techniques behind diachronic language models. Lastly, we will examine the implications of these models in the fields of historical linguistics, natural language generation and social sciences.

This seminar series takes place at Heidelberg University on Tuesdays from 15:15 to 16:45, in INF 325 / SR 24 during the winter seminster of 2023.

Organization

In each lecture, there will be two presentations that cover the same topic, but each presenter should choose a different paper or a textbook chapter to prepare his/her presentation. Please make sure to mark your selected papers/chapters in "References" when blocking your time slots.

Remote lectures from Jan 9th to Feb 6th

~~Zoom link: https://kta-email.zoom.us/j/97565882219?pwd=OFhIUW43UFl2a0NUSWtEWURkU1d6QT09~~
~~Meeting ID: 975 6588 2219~~
~~Passcode: 427190~~
Zoom link: https://kta-email.zoom.us/j/92709054859?pwd=R3lLa0I0S3JkdkpUWFdQVVdneUhIQT09
Meeting ID: 927 0905 4859
Passcode: 412460

Office Hours

Monday 13:00 - 14:00
Friday 13.00 - 14:00
You will receive timely feedback from us through email and Discord during the office hours.

Useful Links

Join us at Discord: https://discord.gg/PpwbcYyX
Fill out time slots for presentations: https://shorturl.at/hRZ01
Topic voting for term papers: https://forms.gle/PxJeRZGfftZei4qE9
Course webpage at Uni Heidelberg: https://www.cl.uni-heidelberg.de/courses/ws23/diaclms/

Templates for Term Papers

Survey paper: https://www.overleaf.com/read/sqrzdvrcpypx#d593bf
Analysis paper: https://www.overleaf.com/read/tpjmjrnnfdvm#adfa83

The templates include writing hints for each section. No template yet for a position paper; contact me if you want this format.

Language

The seminar will be held in English.

Best Term Papers

Program Schedule

Date	Topics	Disciplines	Presenters	References
17/10/2023	Language change & language models	CL	Wei Zhao	L1
24/10/2023	Speed and types of linguistic change	HL	Wei Zhao	A3
31/10/2023	Grammaticalization: Part 1 Part 2	HL	Melis Çelikkol, Lydia Körber	C1, C2
31/10/2023	Mini-lecture on Paper Review Writing	-	Wei Zhao	-
07/11/2023	Guest lecture on language models: Part 1 Part 2	NLP	Maxime Peyrard, Jonas Belouadi	No need
14/11/2023	No seminar (GaML 2023)
21/11/2023	Large language model: Part 1 Part 2	NLP	Ke Ren, Wenzhuo Chen	E2, E3
28/11/2023	Diachronic language model: Part 1 Part 2	NLP	Siqi He, Chenpei Xie	D4, D5
05/12/2023	Prompt engineering for large language model and guest lecture on language model Part 1 Part 2 Part 3	NLP	Veerav Chebrolu, Hans Martin Ramsl, Xiran Hu	M1, M4
12/12/2023	Semantic change detection: Part 1 Part 2	NLP	Katharina Altrichter, Blanca Birn	F3, F5
19/12/2023	Guest lecture on semantic change: Part 1 Part 2	CL	Dominik Schlechtweg	F8, F9
Winter Break
09/01/2024	Syntactic change detection Part 1 Part 2	CL	Maya Arseven, Hiu Lam Choy	G2, G3
16/01/2024	Temporal machine translation Part 1 Part 2	NLP	Atila Martens, Long Kim	L1, L2
23/01/2024	Temporal text summarization Part 1 Part 2	NLP	Xinyu Liang, Haofang Fan	H2, H4
30/01/2024	Reconstruction of historical text	NLP	Tim Kolber	J1
30/01/2024	Evaluation for Text Generation	NLP	Wei Zhao	-
06/02/2024	Temporal misinformation detection Part 1 Part 2	CSS	Geng Zhao, Amir Ghadanfar	K1, K2

Note that this timetable will be updated once a week on the day after a weekly lecture ends. Please refer to the spreedsheet for ongoing updates (incl., papers to-be-read) when you want to write a paper review and prepare questions.

References

Note that papers without provided links are freely accessible on the Internet. For papers that are not publicaly avaiable, we download these papers and provide the links to them.

Speed of linguistic change
- A1 Dialect contact and the speed of Jespersen’s cycle in Middle Low German
- A2 The Determinants of Diachronic Stability
- A3 Sociolinguistic typology and the speed of linguistic change
- A4 Is the rate of linguistic change constant?
Syntactic change
- B1 Complexity as L2-difficulty: Implications for syntactic change
- B2 Syntactic change: A Minimalist Approach to Grammaticalization
- B3 Introduction to syntactic change
Grammaticalization
- C1 Introduction to Grammaticalization
- C2 What is it then, this Grammaticalization?
- C3 The Formal Semantics of Grammaticalization
- C4 What counts as (an instance of) grammaticalization?
- C5 Information-based Modeling of Diachronic Linguistic Change: from Typicality to Productivity
Diachronic language model
- D1 Dynamic Word Embeddings
- D2 Contextualized diachronic word representations
- D3 Dynamic Contextualized Word Embeddings
- D4 Are Large Language Models Temporally Grounded
- D5 Diachronic word embeddings and semantic shifts: a survey
Large language model
- E1 A survey on evaluation of large language models
- E2 A Survey of Large Language Models
- E3 Chatgpt: A meta-analysis after 2.5 months
Semantic change detection
- F1 Diachronic word embeddings and semantic shifts: a survey
- F2 Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models
- F3 “Cultural shift or linguistic drift? comparing two computational models of semantic change
- F4 Diachronic word embeddings reveal statistical laws of semantic change
- F5 Grammar and meaning: Analysing the topology of diachronic word embeddings
- F6 What about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-Lexical and Grammatical Constructions
- F7 Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
- F8 SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
- F9 LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
Syntactic change detection
- G1 Tracing Syntactic Change in the Scientific Genre
- G2 Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
- G3 Stability of Syntactic Dialect Classification over Space and Time
- G4 Exploring morphosyntactic variation and change with Distributional Semantic Models
- G5 Using distributional semantics to study syntactic productivity in diachrony: A case study
Temporal text summarization
- H1 Incremental temporal summarization in multi-party meetings
- H2 Improving ROUGE for Timeline Summarization
- H3 An Evaluation Corpus For Temporal Summarization
- H4 Context or No Context? A preliminary exploration of human-in-the-loop approach for Incremental Temporal Summarization in meetings
Temporal machine translation
- I1 A Machine Translation Approach for Modernizing Historical Documents Using Backtranslation
- I2 Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings
Reconstructon of historical manuscripts
- J1 Reconstructing ancient literary texts from noisy manuscripts
- J2 Restoring and attributing ancient texts using deep neural networks
- J3 Machine Learning for Ancient Languages: A Survey
Social Science
- K1 Temporal Graph Analysis of Misinformation Spreaders in Social Media
- K2 Learn over Past, Evolve for Future: Forecasting Temporal Trends for Fake News Detection
- K3 Modeling Conversation Structure and Temporal Dynamics for Jointly Predicting Rumor Stance and Veracity
- K4 Generalizing to the future: Mitigating entity bias in fake news detection
The Theory of language change
- L1 Revisiting the five foundational problems of Weinreich, Labov & Herzog (1968)
- L2 Principles of linguistic change, volume 1: Internal factors
- L3 Principles of Linguistic Change, Volume 2: Social factors
- L4 Principles of linguistic change, volume 3: Cognitive and cultural factors
ACL Anthology: A freely available database for CL and NLP Publications
The Science of prompt engineering
- M1 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
- M2 Prompt Engineering
- M3 Prompt Engineering Guide
- M4 Prompt Engineering a Prompt Engineer

Submission Deadlines

Term paper (preliminary version): 12, March, 2024 (not mandatory)
Term paper (final version): 19, March, 2024
Presentation slides: Tuesdays at 12:00
Paper review & questions: Tuesdays at 12:00
Block time slots for presentations: 20, October, 2023 (at least for presentations in the first few weeks)
Topic voting for term papers: 24, October, 2023 (Can be extended if needed)

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
attachments		attachments
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attachments

attachments

README.md

README.md

Repository files navigation

Diachronic Language Models

Overview

Organization

Remote lectures from Jan 9th to Feb 6th

Office Hours

Useful Links

Templates for Term Papers

Language

Best Term Papers

Program Schedule

References

Submission Deadlines

About

Releases

Packages

andyweizhao/diaclms

Folders and files

Latest commit

History

attachments

attachments

README.md

README.md

Repository files navigation

Diachronic Language Models

Overview

Organization

Remote lectures from Jan 9th to Feb 6th

Office Hours

Useful Links

Templates for Term Papers

Language

Best Term Papers

Program Schedule

References

Submission Deadlines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages