News headers

Scrape Swedish news sites to get the headers. All methods allowed, currently including DOM parsing, using GraphQL, parsing <script> tags and more. :)

* Must be initialized with a sha256 hash

Example usage

>> from scraper import Aftonbladet, SVT
>>> s = SVT()
>>> headers = s.headers()
>>> print(headers[0])
Dödsfall som kopplas till e-cigg ökar – ny studie analyserar skadorna
Forskare: Som att utsättas för senapsgas
https://svt.se/nyheter/utrikes/antal-dodsfall-kopplade-till-e-cigg-okar

>>> a = Aftonbladet()
>>> headers = a.headers()
>>> print(headers[5])
Varför ska vi amma för att rädda klimatet?
Öhagen Britterna kan väl sluta dricka te i stället
https://www.aftonbladet.se/family/a/P9w4Q5/varfor-ska-vi-amma-for-att-radda-klimatet

>>> headers[3].title
Stänger alla butiker – och ger ledigt för fest

>>> headers[3].url
https://www.aftonbladet.se/nyheter/a/vQygkp/jysk-ger-alla-anstallda-ledigt--dagen-efter-personalfest

Implement new sub class

Just extend the Reader and implement name, url and headers.

import header

class MySite(Scraper):
    @classmethod
    def name(cls):
        return "My Site"

    @classmethod
    def url(cls):
        """
        The URL for the site.
        """
        return "https://mysite.se"

    def headers(self):
        """
        Return a list of all headers for the site.
        """

        return [
            header.Header(
                "A Title",
                "A text",
                "https://a-url.se",
                True if "paywall" else False,
            )
        ]

Watcher

A simple watcher is bundled with the repository to make it easier to watch for new articles in desired scrapers. Example usage:

from scraper import SVT, DN
from watcher import Watcher

scrapers = [SVT(), DN()]
w = Watcher(scrapers, 60)

for a in w.articles():
    print("New article posted!")
    print(a)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
example		example
.gitignore		.gitignore
LICENCE		LICENCE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
header.py		header.py
scraper.py		scraper.py
setup.py		setup.py
watcher.py		watcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News headers

Example usage

Implement new sub class

Watcher

About

Releases

Packages

Contributors 2

Languages

License

bombsimon/news-headers

Folders and files

Latest commit

History

Repository files navigation

News headers

Example usage

Implement new sub class

Watcher

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages