CodE Alltag is a German-language email corpus. It has been pseudonymized and assessed for its formality.
It contains eight different segments:
- CodE Alltag S
- CodE Alltag XL FINANCE
- CodE Alltag XL GERMAN
- CodE Alltag XL MOVIES
- CodE Alltag XL PHILOSOPHY
- CodE Alltag XL TEENS
- CodE Alltag XL TRAVELS
- CodE Alltag XL EVENTS
Additional repositories comprise:
- formality scores between +1 (most formal) and -1 (most informal) for each email, sentence and word
- a privacy tagger that recognizes privacy-sensitive information in German emails and similar text genres