Skip to content

andyweizhao/diaclms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 

Repository files navigation

Diachronic Language Models

Overview

The rise of large language models such as ChatGPT marks a moment that seems to blur the boundary between artificial and human intelligence. Such language models excel at comprehending human language, and provide assistance to individuals in many text works. In this seminar, we will delve into the domain of large language models, with a particular focus given to diachronic models. These models require the ability to understand the development of human language, including both the past and the present, as well as the changes that occur over time. To begin, we will look into the development of human language, namely language change and variation over time. We will then explore the machine learning techniques behind diachronic language models. Lastly, we will examine the implications of these models in the fields of historical linguistics, natural language generation and social sciences.

This seminar series takes place at Heidelberg University on Tuesdays from 15:15 to 16:45, in INF 325 / SR 24 during the winter seminster of 2023.

Organization

In each lecture, there will be two presentations that cover the same topic, but each presenter should choose a different paper or a textbook chapter to prepare his/her presentation. Please make sure to mark your selected papers/chapters in "References" when blocking your time slots.

Remote lectures from Jan 9th to Feb 6th

Office Hours

  • Monday 13:00 - 14:00
  • Friday 13.00 - 14:00
  • You will receive timely feedback from us through email and Discord during the office hours.

Useful Links

Templates for Term Papers

The templates include writing hints for each section. No template yet for a position paper; contact me if you want this format.

Language

The seminar will be held in English.

Best Term Papers

Program Schedule

Date Topics Disciplines Presenters References
17/10/2023 Language change & language models CL Wei Zhao L1
24/10/2023 Speed and types of linguistic change HL Wei Zhao A3
31/10/2023 Grammaticalization: Part 1 Part 2 HL Melis Çelikkol, Lydia Körber C1, C2
31/10/2023 Mini-lecture on Paper Review Writing - Wei Zhao -
07/11/2023 Guest lecture on language models: Part 1 Part 2 NLP Maxime Peyrard, Jonas Belouadi No need
14/11/2023 No seminar (GaML 2023)
21/11/2023 Large language model: Part 1 Part 2 NLP Ke Ren, Wenzhuo Chen E2, E3
28/11/2023 Diachronic language model: Part 1 Part 2 NLP Siqi He, Chenpei Xie D4, D5
05/12/2023 Prompt engineering for large language model and guest lecture on language model Part 1 Part 2 Part 3 NLP Veerav Chebrolu, Hans Martin Ramsl, Xiran Hu M1, M4
12/12/2023 Semantic change detection: Part 1 Part 2 NLP Katharina Altrichter, Blanca Birn F3, F5
19/12/2023 Guest lecture on semantic change: Part 1 Part 2 CL Dominik Schlechtweg F8, F9
Winter Break
09/01/2024 Syntactic change detection Part 1 Part 2 CL Maya Arseven, Hiu Lam Choy G2, G3
16/01/2024 Temporal machine translation Part 1 Part 2 NLP Atila Martens, Long Kim L1, L2
23/01/2024 Temporal text summarization Part 1 Part 2 NLP Xinyu Liang, Haofang Fan H2, H4
30/01/2024 Reconstruction of historical text NLP Tim Kolber J1
30/01/2024 Evaluation for Text Generation NLP Wei Zhao -
06/02/2024 Temporal misinformation detection Part 1 Part 2 CSS Geng Zhao, Amir Ghadanfar K1, K2

Note that this timetable will be updated once a week on the day after a weekly lecture ends. Please refer to the spreedsheet for ongoing updates (incl., papers to-be-read) when you want to write a paper review and prepare questions.

References

Note that papers without provided links are freely accessible on the Internet. For papers that are not publicaly avaiable, we download these papers and provide the links to them.

  • Speed of linguistic change
  • Syntactic change
  • Grammaticalization
  • Diachronic language model
    • D1 Dynamic Word Embeddings
    • D2 Contextualized diachronic word representations
    • D3 Dynamic Contextualized Word Embeddings
    • D4 Are Large Language Models Temporally Grounded
    • D5 Diachronic word embeddings and semantic shifts: a survey
  • Large language model
    • E1 A survey on evaluation of large language models
    • E2 A Survey of Large Language Models
    • E3 Chatgpt: A meta-analysis after 2.5 months
  • Semantic change detection
    • F1 Diachronic word embeddings and semantic shifts: a survey
    • F2 Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models
    • F3 “Cultural shift or linguistic drift? comparing two computational models of semantic change
    • F4 Diachronic word embeddings reveal statistical laws of semantic change
    • F5 Grammar and meaning: Analysing the topology of diachronic word embeddings
    • F6 What about Grammar? Using BERT Embeddings to Explore Functional-Semantic Shifts of Semi-Lexical and Grammatical Constructions
    • F7 Linguistic Variation and Change in 250 Years of English Scientific Writing: A Data-Driven Approach
    • F8 SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
    • F9 LSCDiscovery: A shared task on semantic change discovery and detection in Spanish
  • Syntactic change detection
    • G1 Tracing Syntactic Change in the Scientific Genre
    • G2 Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
    • G3 Stability of Syntactic Dialect Classification over Space and Time
    • G4 Exploring morphosyntactic variation and change with Distributional Semantic Models
    • G5 Using distributional semantics to study syntactic productivity in diachrony: A case study
  • Temporal text summarization
    • H1 Incremental temporal summarization in multi-party meetings
    • H2 Improving ROUGE for Timeline Summarization
    • H3 An Evaluation Corpus For Temporal Summarization
    • H4 Context or No Context? A preliminary exploration of human-in-the-loop approach for Incremental Temporal Summarization in meetings
  • Temporal machine translation
    • I1 A Machine Translation Approach for Modernizing Historical Documents Using Backtranslation
    • I2 Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings
  • Reconstructon of historical manuscripts
    • J1 Reconstructing ancient literary texts from noisy manuscripts
    • J2 Restoring and attributing ancient texts using deep neural networks
    • J3 Machine Learning for Ancient Languages: A Survey
  • Social Science
    • K1 Temporal Graph Analysis of Misinformation Spreaders in Social Media
    • K2 Learn over Past, Evolve for Future: Forecasting Temporal Trends for Fake News Detection
    • K3 Modeling Conversation Structure and Temporal Dynamics for Jointly Predicting Rumor Stance and Veracity
    • K4 Generalizing to the future: Mitigating entity bias in fake news detection
  • The Theory of language change
  • ACL Anthology: A freely available database for CL and NLP Publications
  • The Science of prompt engineering

Submission Deadlines

  • Term paper (preliminary version): 12, March, 2024 (not mandatory)
  • Term paper (final version): 19, March, 2024
  • Presentation slides: Tuesdays at 12:00
  • Paper review & questions: Tuesdays at 12:00
  • Block time slots for presentations: 20, October, 2023 (at least for presentations in the first few weeks)
  • Topic voting for term papers: 24, October, 2023 (Can be extended if needed)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published