Skip to content
I may be slow to respond
I may be slow to respond


  • 2 discussions answered


@deutschestextarchiv @zentrum-lexikographie
Block or Report

Block or report adbar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Hi there! 👋


  Web   |     Blog   |   🐦  Twitter   |   🎞  Youtube   |     Coffee


🔭  Currently working on gathering texts on the Web and detecting word trends

Programming experience

🖩  First programs written on a TI-83 Plus in TI-BASIC

Top Langs

Most popular blog posts


  1. trafilatura Public

    Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

    Python 507 68

  2. htmldate Public

    Fast and robust date extraction from web pages, with Python or on the command-line

    Python 51 17

  3. simplemma Public

    Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

    Python 35 2

  4. py3langid Public

    Forked from saffsd/

    Faster, modernized fork of the language identification tool

    Python 9 2

  5. courlan Public

    Clean, filter and sample URLs to optimize data collection – includes spam, content type and language filters

    Python 12 2

  6. German-NLP Public

    Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

    229 42

683 contributions in the last year

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Mon Wed Fri

Contribution activity

June 2022

Created an issue in adbar/htmldate that received 4 comments

Memory leak

See issue adbar/trafilatura#216. Extracting the date from the same web page multiple times shows that the module is leaking memory, this doesn't ap…


Seeing something unexpected? Take a look at the GitHub profile guide.