Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import a forum thread into Cassandre #1

Open
benel opened this issue Feb 16, 2011 · 0 comments
Open

Import a forum thread into Cassandre #1

benel opened this issue Feb 16, 2011 · 0 comments
Labels

Comments

@benel
Copy link
Member

benel commented Feb 16, 2011

  • The original page URL could be saved into Cassandre as an alternative resource.
  • Neither phpBB nor Doct***o produces well-formed xHTML. Therefore we cannot use xpath nor XSLT. There could be a set of regular expressions for each kind of forum.

Matthieu implemented forums parsers that might be reused:

    $p_post_seperator = '/class="messagetable"/u',
    $p_thread_title = '/Sujet : <h3>(.*)<\/h3>/u',
    $p_date = '/le (\d\d-\d\d-\d\d\d\d).*(\d\d:\d\d:\d\d)/u',
    $p_author = '/<b class="s2">(.*?)<\/b>/u',
    $p_id = '/<a name="t(\d+)"><\/a>/u',
    $p_msg = '/<div id="para\d+">(.*)<\/div><\/td><\/tr>/u',
    $p_end_thread = '/<script language="javascript" type="text\/javascript">var listenumreponse/u',
    $p_nextpg = '/<div class="pagepresuiv"><a href="(.*)" class="cHeader" accesskey="x">Page Suivante<\/a>/u'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant