Skip to content

Improvements to business-logic and data model for Feed on /planet #685

@m1rm

Description

@m1rm

Follow up on #680

@ismailarilik observed Pierre Schmitz being duplicated in the Subscribe List on archlinux.org/planet. His assumption of a duplicate entry in the DB seems to be correct. I did some digging and would like to do some refactoring regarding a sustainable fix/improvement.

Status Quo

Where the list is built

Templatetemplates/planet/index.html:

<h4>Subscribe</h4>
<ul class="planet-list">
  {% for feed in official_feeds %}
  <li>
    <a href="{{ feed.website }}" title="{{ feed.title }}">{{ feed.title }}</a>
  </li>
  {% endfor %}
</ul>

Viewplanet/views.py:

def index(request):
    context = {
        'official_feeds': Feed.objects.all(),
        'planets': Planet.objects.all(),
        'feed_items': FeedItem.objects.order_by('-publishdate')[:25],
    }
    return render(request, 'planet/index.html', context)

Feed.objects.all() returns every feed row; there is no distinct() or merge by user.

Database and Model information

The only code that creates Feed rows is the UserProfile pre_save handler in devel/models.py (create_feed_model):

def create_feed_model(sender, **kwargs):
    ...
    if obj.website_rss == dbmodel.website_rss:
        return

    title = obj.alias
    if obj.user.first_name and obj.user.last_name:
        title = obj.user.first_name + ' ' + obj.user.last_name

    Feed.objects.filter(website_rss=dbmodel.website_rss).all().delete()
    Feed.objects.create(title=title, website=website, website_rss=obj.website_rss)

Relevant model facts (planet/models.py):

  • Feed has no foreign key to User.
  • website_rss is not unique.
  • Rows are keyed only by RSS URL and display title.

Duplication observed in production

Image

Note

I talked to Pierre and he updated his profile (website rss) and deleted the URL, saved his profile and was not present in the list. Then he re-added his feed URL, saved and shortly after he appeared in the list again, only once this time.

How can the duplicate come into place?

All creation goes through create_feed_model on UserProfile pre_save in devel/models.py. It only runs when website_rss changes on save. Then it:

  • Deletes rows where website_rss == old profile value (dbmodel.website_rss)
  • Inserts a new row for the new URL

Caution

There is no “one feed per user” rule and no uniqueness on website_rss.
One can think of szenarios like those that could cause duplicates:

  • Profile RSS changes A → B, but a row for B already exists
  • Delete targets the wrong URL (feed row ≠ old profile value)
  • First time RSS is set while that URL already has a row

The (seemingly) intended happy path:
Profile RSS changes A → B, only one row for B exists (or none), and every existing row for this person used A. Delete removes A, create adds B → one Subscribe entry.

The issues:

  • delete by old URL only
  • create always adds a row
  • no per-user dedupe

What I would like to do

My assumption is that one user can only have one feed. So I think it would be nice to introduce a foreign key constraint on Feed in planet/models.py for the user (OneToOne relation which gives us uniqueness ootb). Have it nullable at first, then run a command/fill in the missing data and adjust the business logic to respect the new column. Once migrated and cleaned up in production, we can run a second migration to make the column non nullable to ensure a feed is always attached to one user.

The rough plan:

Step Action
1 Deploy migration (nullable user on Feed)
2 Run backfill + dedupe_planet_feeds on production
3 Deploy application code: sync_planet_feed, update_planet 301 fix, tests
4 Optional migration: non-null user, unique website_rss
5 Verify /planet Subscribe list; spot-check RSS import via update_planet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions