Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV source #1

Closed
wants to merge 48 commits into from
Closed

Add CSV source #1

wants to merge 48 commits into from

Conversation

hvelarde
Copy link
Member

No description provided.

@hvelarde hvelarde force-pushed the hvelarde-cvssource branch 7 times, most recently from 4c7586f to 3c151c9 Compare July 29, 2015 18:50
@hvelarde hvelarde force-pushed the hvelarde-cvssource branch 3 times, most recently from eb66a9d to ea411a8 Compare July 30, 2015 19:21
davisagli and others added 19 commits July 31, 2015 15:02
… the WXR dump, since the latter contains unprocessed Wordpress-specific markup
 * workaround for last-item lxml bug (fixed in lxml 3.2.2)
 * added new 'import-comment' boolean setting.  If false (default) then no wordpress comments are imported
…rialize wordpress metadata about images, attachments and other useful stuff
…ss it on for later use

 * extract information about 'Image' from post metadata tags (looks like this corresponds roughly to the 'lead image' for a post).
 * extract information about wordpress attachments from wp:attachment_url tags and the associated post metadata tags.
 * extract information about disqus comment threads for posts, useful if you want to re-associate disqus threads later?

add a new pipeline section which downloads 'enclosures' from the wordpress site and loads them into plone as files.

Incremental improvements would be to associate enclosures with posts later via 'related items' or some similar mechanism.  Also to skip downloading any enclosures where the url either is not in a whitelist of good urls or is in a blacklist of bad urls.  Whitelist and blacklist to be set by configuration?
For now, do just code analysis.
hvelarde and others added 28 commits July 31, 2015 15:02
Import path now uses the same structure as the WordPress site:

- posts and pages are imported according to the permalink_structure
- attachments are imported into the wp-content/uploads folder

Also the following changes were made to the code:

- Update documentation
- Split code in more modules
- Get categories and tags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants