Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to extract tags from live posts #680

Closed
wants to merge 5 commits into from
Closed

Add support to extract tags from live posts #680

wants to merge 5 commits into from

Conversation

Assilfira
Copy link

Problem

Currently, tags are not included in the zip file that is extracted from Medium. This means that valuable metadata is lost during the migration process. To solve this problem, we need to extract the tags from somewhere else and use them in the Ghost migrating tool.

Description

This PR introduces a new feature to extract tags from Medium live posts and use them in the Ghost migrating tool. This enhancement will make it easier to migrate content from Medium to Ghost without losing valuable metadata like tags.

Changes Made

  • Added a new function to extract tags from Medium live posts
  • Modified the Ghost migration tool to include tags in the migrated posts

Target

Medium Migrate Tool

Checklist

  • Code compiles correctly
  • Tests pass

@PaulAdamDavis
Copy link
Member

Thank you for this! I just wanted to leave a note here to say I will review it soon.

@PaulAdamDavis
Copy link
Member

Hi @Assilfira! I've taken a good look at this but I think there's a couple of issues. It seems we can't rely on tags being in the HTML if the post is gated (annoying), and it'll scrape for tags each time a migration is run as they're not cached locally (which happens a lot for me as I build & test new features).

I needed tag scraping for a current project and ultimately went with a different approach. e11e54b
This looks at the window.__APOLLO_STATE__ value which has all sorts of data in it & is always available, and loops over to get the tags. mg-webscraper also stores the response locally as JSON.

Thanks again for this though, you highlighted an issue that needed addressing! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants