Skip to content

Rss podcast enclosure support#55836

Closed
Dane (dancreee) wants to merge 2 commits intoairbytehq:masterfrom
Claraity:rss-podcast-enclosure-support
Closed

Rss podcast enclosure support#55836
Dane (dancreee) wants to merge 2 commits intoairbytehq:masterfrom
Claraity:rss-podcast-enclosure-support

Conversation

@dancreee
Copy link

What

Add support for podcast enclosure URLs in the RSS source connector and implement customizable start date filtering. This enhancement allows the connector to extract media file URLs from podcast RSS feeds and filter entries based on a configurable start date, which is critical for podcast data ingestion use cases.

How

  • Extended the RSS source connector to identify and extract podcast enclosure URLs (media files)
  • Added enclosure URL field to the schema in the manifest.yaml
  • Modified components.py to properly extract enclosure URLs from podcast feeds
  • Implemented start date filtering functionality to allow users to specify from which date they want to pull podcast entries
  • Created test files to verify functionality with podcast feeds, including the start date filtering

Review guide

  1. airbyte-integrations/connectors/source-rss/source_rss/components.py - Added enclosure URL extraction logic and start date filtering
  2. airbyte-integrations/connectors/source-rss/source_rss/manifest.yaml - Added enclosure field to schema
  3. airbyte-integrations/connectors/source-rss/integration_tests/test_podcast_feeds.py - Added tests for podcast feed parsing and start date filtering
  4. airbyte-integrations/connectors/source-rss/integration_tests/sample_config.json - Updated sample config to demonstrate start date configuration

User Impact

Users will now be able to:

  1. Extract podcast media file URLs from RSS feeds, enabling podcast data ingestion workflows
  2. Configure a start date to control which podcast entries should be included in the sync, allowing for more efficient data loading and updates

These enhancements address common requirements for users working with podcast feeds that were previously unsupported.

No negative side effects expected as these are additive changes that don't alter existing functionality.

Can this PR be safely reverted and rolled back?

  • YES 💚
  • NO ❌

@vercel
Copy link

vercel bot commented Mar 19, 2025

Dane (@dancreee) is attempting to deploy a commit to the Airbyte Growth Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link

CLAassistant commented Mar 19, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@marcosmarxm Marcos Marx (marcosmarxm) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution Dane (@dancreee) is it possible to allow maintainers to edit your branch? It is necessary to run tests and update some files.
Some missing:

  • Update the connector version in metadata.yaml and docs/connector.md changelog entry

start_datetime:
type: MinMaxDatetime
datetime: "{{ (now_utc() - duration('PT23H')).strftime('%Y-%m-%dT%H:%M:%S%z') }}"
datetime: "{{ (config['start_date'] if 'start_date' in config else now_utc().strftime('%Y-%m-%dT%H:%M:%S%z')) }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please return the 23h ago as default not now.

description: RSS Feed URL
start_date:
type: string
description: "Start date for collecting RSS items in ISO format (e.g., 2020-01-01T00:00:00Z). Items published before this date will be ignored. Defaults to 23 hours ago."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23 hours ago UTC

print("---")


if __name__ == "__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EOF

@github-project-automation github-project-automation bot moved this from Backlog to Waiting Contributor in 🧑‍🏭 Community Pull Requests Mar 20, 2025
@marcosmarxm Marcos Marx (marcosmarxm) moved this from Waiting Contributor to Inactive in 🧑‍🏭 Community Pull Requests Mar 26, 2025
@marcosmarxm
Copy link
Contributor

Close due lack of response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants