# Abgeordnetenwatch Emails

### Augment Abgeordnetenwatch data to include Abgeordnete (representives) email

Processes the Bundestag-Abgeordnete.csv from https://www.abgeordnetenwatch.de/ and adds email addresses for each representative as accurately as possible

## Method

Creates email addresses based on first_name, last_name

This process tries to use the same convention as the Bundestag uses when forming email addresses.

All conventions are based on observation and may not be entirely correct.

## Observed conventions in email formatting:

- Hyphenated "-" last names such as Aschenberg-Dugnus

  Preserved in the email address
- Hyphenated "-" first names such as Leif-Erik

  Preserved in the email address
- All accented characters "äéöüß" are converted

  See accent_aliases dictionary below
- If there are multiple first names, there is no consistency to whether they are included or not and how. Most of the time the second name is just dropped, but not always.

  For these occurances, the known_emails lookup is used to explicity set the email address

- If a last name contains "von", there is no consistency to whether it is included or not and how.

  For these occurances, the known_emails lookup is used to explicity set the email address
- If a last name starts with "de " as in "de Vries" the "de" is preserved and the space removed

## Usage

_Free to use, duplicate, whatever._

Unzip contents of data zipfile to project directory then:

```
pip install -r requirements.txt
jupyter notebook
```

Run the notebook!

In the email processing step you will be warned if there is a name that could generate a non-conventional email address


In [34]:
import pandas as pd
reps = pd.read_csv('./Daten-abgeordnetenwatch/Bundestag-Abgeordnete.csv', sep=';')
reps.columns = reps.iloc[0]
reps = reps[1:]
constituencies = pd.read_csv('./Daten-abgeordnetenwatch/wahlkreise_de_bundestag.csv', sep=';')
constituencies.columns = constituencies.iloc[0]
constituencies = constituencies[1:]


In [35]:
# Setup for email generation

import re
# A helper regular expression to match on any unexpected characters like spaces, dashes etc etc
non_alphanum_regex = re.compile(r'[^a-z0-9ßäöüèé-]+', re.IGNORECASE)

accent_aliases = {
    'Ä': 'Ae',
    'Ö': 'Oe',
    'Ü': 'Ue',
    'ä': 'ae',
    'ö': 'oe',
    'ü': 'ue',
    'é': 'e',
    'è': 'e',
    'ß': 'ss',
}

accent_translate_map = {ord(key):val for key, val in accent_aliases.items()}

# If emails don't follow any common rules, they can be set here to ensure accuracy
# Lookup key should be of the form f'{first_name} {last_name}' where first_name and last_name are the values taken from the corresponding columns in the dataset
known_emails = {
    'Jan R. Nolte': 'jan.nolte@bundestag.de',
    'Lorenz Gösta Beutin': 'lorenz.beutin@bundestag.de',
    'Ernst Dieter Rossmann': 'ernst-dieter.rossmann@bundestag.de',
    'Michael von Abercron': 'michael.vonabercron@bundestag.de',
    'Konstantin von Notz': 'konstantin.notz@bundestag.de',
    'Amira Mohamed Ali': 'amira.mohamedali@bundestag.de',
    'Armin Paul Hampel': 'armin.paulus.hampel@bundestag.de',
    'Ottmar von Holtz': 'ottmar.vonholtz@bundestag.de',
    'Hans-Georg von der Marwitz': 'hans-georg.vondermarwitz@bundestag.de',
    'Beatrix von Storch': 'beatrix.vonstorch@bundestag.de',
    'Reinhard Arnold Houben': 'reinhard.houben@bundestag.de',
    'Matthias W. Birkwald': 'matthias-w.birkwald@bundestag.de',
    'Alexander Graf Lambsdorff': 'alexander.graflambsdorff@bundestag.de',
    'Alexander S. Neu': 'alexander.neu@bundestag.de',
    'Olaf in der Beek': 'olaf.inderbeek@Bundestag.de',
    'Berengar Elsner von Gronow': 'berengar.elsnervongronow@bundestag.de',
    'Jan R. Nolte': 'jan.nolte@bundestag.de',
    'Hermann Otto Solms': 'hermann.solms@bundestag.de',
    'Tobias Matthias Peterka': 'tobias.peterka@bundestag.de',
    'Christian von Stetten': 'christian.stetten@bundestag.de',
    'Axel Eduard Fischer': 'axel.fischer@bundestag.de',
    'Karl A. Lamers': 'karl-a.lamers@bundestag.de',
    'Matern von Marschall': 'matern.vonmarschall@bundestag.de',
    'Helin Evrim Sommer': 'helin-evrim.sommer@bundestag.de',
    'Wilhelm von Gottberg': 'wilhelm.vongottberg@bundestag.de',
    'Martin E. Renner': 'martin.renner@bundestag.de'
}



In [36]:
# Email generation
emails = []

for rep in reps.itertuples():
    try:
        # Is there an explicit email address set for this name?
        email = known_emails[f'{rep.first_name} {rep.last_name}']
    except KeyError:
        # There is not, so we generate it to the best of our knowledge
        first_name = rep.first_name.lower().strip()
        last_name = rep.last_name.lower().strip()

        if last_name.startswith('de '):
            # 'de Vries', 'De Riddler' etc
            last_name = last_name.replace('de ', 'de')
        else:
            # Do a check to see if this name might generate an "unconventional" email
            if non_alphanum_regex.search(rep.first_name + rep.last_name):
                print(
                    f'Warning: "{rep.first_name}", "{rep.last_name}" ("{rep.first_name} {rep.last_name}") may have an unconventional email address format'
                )

        email = f'{first_name}.{last_name}@bundestag.de'.translate(accent_translate_map)
    emails.append(email)

print(len(emails), 'emails generated')


709 emails generated


In [37]:
# Append emails row to the dataframe
reps['email'] = emails



In [38]:
# Save it, using a comma as a separator this time!
reps.to_csv('./Daten-abgeordnetenwatch/Bundestag-Abgeordnete-emails.csv', sep=',')