Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .env.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
TEST_DATABASE_URL=
TEST_CONSUMER_KEY=
TEST_CONSUMER_SECRET=
TEST_ACCESS_TOKEN=
TEST_ACCESS_TOKEN_SECRET=
5 changes: 4 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,10 @@ script: py.test
before_install:
- sudo apt-get -y update
- sudo apt-get install firefox-geckodriver
- sudo apt-get install --upgrade chromium-chromedriver
before_script:
- wget https://chromedriver.storage.googleapis.com/83.0.4103.39/chromedriver_linux64.zip
- unzip chromedriver_linux64.zip -d /home/travis/virtualenv/python3.7.1/bin/
- export CHROME_BIN=chromium-browser
after_failure: cat test/diffengine.log
notifications:
slack:
Expand Down
56 changes: 40 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,30 +88,31 @@ Logs can be found in `diffengine.log` in the storage directory, for example
Checkout [Ryan Baumann's "diffengine" Twitter list] for a list of known
diffengine Twitter accounts that are out there.

## Tweeting text options
## Config options

By default, the tweeted diff will include the article's title and the archive diff url, [like this](https://twitter.com/mp_diff/status/1255973684994625539).
### Database engine

You change this by tweeting what's changed: the url, the title and/or the summary. For doing so, you need to specify **all** the following `lang` keys:
By default the database is configured for Sqlite and the file `./diffengine.db` through the `db` config prop

```yaml
lang:
change_in: "Change in"
the_url: "the URL"
the_title: "the title"
and: "and"
the_summary: "the summary"
db: sqlite:///diffengine.db
```

Only if all the keys are defined, the tweet will include what's changed on its content, followed by the `diff.url`. Some examples:
This value responds to the [database URL connection string format](http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#database-url).

- "Change in the title"
- "Change in the summary"
- "Change in the title and the summary"
For instance, you can co˚nnect to your postgresql database using something like this.

And so on with all the possible combinations between url, title and summary
```yaml
db: postgresql://postgres:my_password@localhost:5432/my_database
```

In case you store your database url connection into an environment var, like in Heroku. You can simply do as follows.

## Multiple Accounts & Feed Implementation Example
```yaml
db: "${DATABASE_URL}"
```

### Multiple Accounts & Feed Implementation Example

If you are setting multiple accounts, and multiple feeds if may be helpful to setup a
directory for each account. For example:
Expand Down Expand Up @@ -155,6 +156,29 @@ twitter:
consumer_secret: CONSUMER_SECRET
```

### Tweet content

By default, the tweeted diff will include the article's title and the archive diff url, [like this](https://twitter.com/mp_diff/status/1255973684994625539).

You change this by tweeting what's changed: the url, the title and/or the summary. For doing so, you need to specify **all** the following `lang` keys:

```yaml
lang:
change_in: "Change in"
the_url: "the URL"
the_title: "the title"
and: "and"
the_summary: "the summary"
```

Only if all the keys are defined, the tweet will include what's changed on its content, followed by the `diff.url`. Some examples:

- "Change in the title"
- "Change in the summary"
- "Change in the title and the summary"

And so on with all the possible combinations between url, title and summary

### Support for environment vars

The configuration file has support for [environment variables](https://medium.com/chingu/an-introduction-to-environment-variables-and-how-to-use-them-f602f66d15fa). This is useful if you want to keeping your credentials secure when deploying to Heroku, Vercel (former ZEIT Now), AWS, Azure, Google Cloud or any other similar services. The environment variables are defined on the app of the platform you use or directly in a [dotenv file](https://12factor.net/config), which is the usual case when coding locally.
Expand All @@ -176,7 +200,7 @@ MY_CONSUMER_SECRET_ENV_VAR="CONSUMER_SECRET"

Done! You can use diffengine as usual and keep your credentials safe.

## Adding a Twitter account when the configuration file is already created
### Adding a Twitter account when the configuration file is already created

You can use the following command for adding Twitter accounts to the config file.

Expand Down
7 changes: 7 additions & 0 deletions config-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
db: ${TEST_DATABASE_URL}
twitter:
consumer_key: ${TEST_CONSUMER_KEY}
consumer_secret: ${TEST_CONSUMER_SECRET}
token:
access_token: ${TEST_ACCESS_TOKEN}
access_token_secret: ${TEST_ACCESS_TOKEN_SECRET}
102 changes: 60 additions & 42 deletions diffengine/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
import os
import re
import sys
import json
import time
import yaml
import bleach
Expand All @@ -19,46 +18,50 @@
import logging
import argparse
import requests
import selenium
import htmldiff2
import feedparser
import subprocess
import readability
import unicodedata

from peewee import *
from playhouse.migrate import SqliteMigrator, migrate
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
from envyaml import EnvYAML

from diffengine.exceptions.webdriver import UnknownWebdriverError
from diffengine.exceptions.twitter import ConfigNotFoundError, TwitterError
from diffengine.exceptions.sendgrid import SendgridConfigNotFoundError, SendgridError
from diffengine.exceptions.twitter import TwitterConfigNotFoundError, TwitterError
from diffengine.text import to_utf8
from diffengine.sendgrid import SendgridHandler
from diffengine.twitter import TwitterHandler
from diffengine.exceptions.sendgrid import (
ConfigNotFoundError as SGConfigNotFoundError,
SendgridError,
from envyaml import EnvYAML
from peewee import (
DatabaseProxy,
CharField,
DateTimeField,
OperationalError,
ForeignKeyField,
Model,
SqliteDatabase,
TextField,
)
from diffengine.sendgrid import SendgridHandler
from playhouse.db_url import connect
from playhouse.migrate import SqliteMigrator, migrate
from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode

home = None
config = {}
db = SqliteDatabase(None)
database = DatabaseProxy()
browser = None


class BaseModel(Model):
class Meta:
database = db
database = database


class Feed(BaseModel):
url = CharField(primary_key=True)
name = CharField()
url = TextField(primary_key=True)
name = TextField()
created = DateTimeField(default=datetime.utcnow)

@property
Expand Down Expand Up @@ -102,7 +105,7 @@ def get_latest(self):


class Entry(BaseModel):
url = CharField()
url = TextField()
created = DateTimeField(default=datetime.utcnow)
checked = DateTimeField(default=datetime.utcnow)
tweet_status_id_str = CharField(null=False, default="")
Expand Down Expand Up @@ -154,9 +157,9 @@ def get_latest(self):
be returned.
"""

# make sure we don't go too fast
# TODO: can we remove this? Why is this here?
time.sleep(1)
time_sleep = config.get("time_sleep", 0)
if time_sleep > 0:
time.sleep(time_sleep)

# fetch the current readability-ized content for the page
logging.info("checking %s", self.url)
Expand Down Expand Up @@ -231,11 +234,11 @@ class FeedEntry(BaseModel):


class EntryVersion(BaseModel):
title = CharField()
url = CharField(index=True)
summary = CharField()
title = TextField()
url = TextField(index=True)
summary = TextField()
created = DateTimeField(default=datetime.utcnow)
archive_url = CharField(null=True)
archive_url = TextField(null=True)
entry = ForeignKeyField(Entry, backref="versions")
tweet_status_id_str = CharField(null=False, default="")

Expand Down Expand Up @@ -299,6 +302,18 @@ class Diff(BaseModel):
emailed = DateTimeField(null=True)
blogged = DateTimeField(null=True)

@property
def url_changed(self):
return self.old.url != self.new.url

@property
def title_changed(self):
return self.old.title != self.new.title

@property
def summary_changed(self):
return self.old.summary != self.new.summary

@property
def html_path(self):
# use prime number to spread across directories
Expand Down Expand Up @@ -485,17 +500,20 @@ def home_path(rel_path):


def setup_db():
global db
db_file = config.get("db", home_path("diffengine.db"))
logging.debug("connecting to db %s", db_file)
db.init(db_file)
db.connect()
db.create_tables([Feed, Entry, FeedEntry, EntryVersion, Diff], safe=True)
try:
migrator = SqliteMigrator(db)
migrate(migrator.add_index("entryversion", ("url",), False))
except OperationalError as e:
logging.debug(e)
global home, database
database_url = config.get("db", "sqlite:///diffengine.db")
logging.debug("connecting to db %s", database_url)
database_handler = connect(database_url)
database.initialize(database_handler)
database.connect()
database.create_tables([Feed, Entry, FeedEntry, EntryVersion, Diff], safe=True)

if isinstance(database_handler, SqliteDatabase):
try:
migrator = SqliteMigrator(database_handler)
migrate(migrator.add_index("entryversion", ("url",), False))
except OperationalError as e:
logging.debug(e)


def chromedriver_browser(executable_path, binary_location):
Expand Down Expand Up @@ -532,7 +550,7 @@ def setup_browser(engine="geckodriver", executable_path=None, binary_location=""


def init(new_home, prompt=True):
global home, browser
global home, config, browser
home = new_home
load_config(prompt)
try:
Expand Down Expand Up @@ -565,7 +583,7 @@ def main():
twitter_handler = TwitterHandler(
twitter_config["consumer_key"], twitter_config["consumer_secret"]
)
except ConfigNotFoundError as e:
except TwitterConfigNotFoundError as e:
twitter_handler = None
logging.warning("error when creating Twitter Handler. Reason", str(e))
except KeyError as e:
Expand Down Expand Up @@ -629,7 +647,7 @@ def process_entry(entry, feed_config, twitter=None, sendgrid=None, lang={}):
version.diff, feed_config.get("sendgrid", {})
)

except SGConfigNotFoundError as e:
except SendgridConfigNotFoundError as e:
logging.error(
"Missing configuration values for publishing entry %s",
entry.url,
Expand Down
4 changes: 2 additions & 2 deletions diffengine/exceptions/sendgrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ class SendgridError(RuntimeError):
pass


class ConfigNotFoundError(SendgridError):
class SendgridConfigNotFoundError(SendgridError):
"""Exception raised if the Sendgrid instance has not the API key"""

def __init__(self):
Expand All @@ -14,6 +14,6 @@ def __init__(self, diff_id):
self.message = "diff %s was already emailed with sendgrid " % diff_id


class ArchiveUrlNotFoundError(SendgridError):
class SendgridArchiveUrlNotFoundError(SendgridError):
def __init__(self):
self.message = "not publishing without archive urls"
4 changes: 2 additions & 2 deletions diffengine/exceptions/twitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ class TwitterError(RuntimeError):
pass


class ConfigNotFoundError(TwitterError):
class TwitterConfigNotFoundError(TwitterError):
"""Exception raised if the Twitter instance has not the required key and secret"""

def __init__(self):
Expand All @@ -21,7 +21,7 @@ def __init__(self, diff):
self.message = "diff %s has already been tweeted" % diff.id


class AchiveUrlNotFoundError(TwitterError):
class TwitterAchiveUrlNotFoundError(TwitterError):
def __init__(self, diff):
self.message = "not tweeting without archive urls for diff %s" % diff.id

Expand Down
8 changes: 4 additions & 4 deletions diffengine/sendgrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@

from diffengine.exceptions.sendgrid import (
AlreadyEmailedError,
ConfigNotFoundError,
ArchiveUrlNotFoundError,
SendgridConfigNotFoundError,
SendgridArchiveUrlNotFoundError,
)


Expand Down Expand Up @@ -43,13 +43,13 @@ def publish_diff(self, diff, feed_config):
if diff.emailed:
raise AlreadyEmailedError(diff.id)
elif not (diff.old.archive_url and diff.new.archive_url):
raise ArchiveUrlNotFoundError()
raise SendgridArchiveUrlNotFoundError()

api_token = feed_config.get("api_token", self.api_token)
sender = feed_config.get("sender", self.sender)
receivers = feed_config.get("receivers", self.receivers)
if not all([api_token, sender, receivers]):
raise ConfigNotFoundError
raise SendgridConfigNotFoundError

subject = self.build_subject(diff)
message = Mail(
Expand Down
Loading