Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import the google group posts to the new discourse install #324

Closed
DefProc opened this issue Jul 21, 2016 · 8 comments

Comments

@DefProc
Copy link

commented Jul 21, 2016

The best process I found seems to be from @pacharanero: https://github.com/pacharanero/google_group.to_discourse

@pacharanero

This comment has been minimized.

Copy link

commented Jul 22, 2016

Hi @DefProc and DOES. 'Tis a small world indeed. I am Marcus, I run Leigh Hackspace, your geographically closest Hackspace in the NW.

I did a google group to discourse scraper about 18 months ago because I was setting up a Discourse instance and had been asked to find a way of scraping from google groups into discourse. It's a ruby script, and it's a bit flaky.

More recently I've been doing a scrape/export of GG data using https://github.com/henryk/gggd to export the GG to .mbox format, and then importing into Discourse using Discourse's own mbox importer script. The first half (export) worked fine, but I haven't had chance to have a go at importing yet.

I'll keep you posted

@pacharanero

This comment has been minimized.

Copy link

commented Sep 6, 2016

I have now updated the original google group code so it is much more of a one-step 'run the script' import process. Full information is at: https://meta.discourse.org/t/migration-of-google-groups-to-discourse/48012

@pacharanero

This comment has been minimized.

Copy link

commented Sep 6, 2016

Let me know if it works for you. I really should extract that script out of the whole forked Discourse thing. In fact I will.

@DefProc

This comment has been minimized.

Copy link
Author

commented Sep 6, 2016

I see that moved to: https://github.com/pacharanero/google_group.to_discourse

I'm just sorting manager access to the group, then I can give this a go.

It looks a lot more manageable than the previous process, so thanks for the upgrade!

@DefProc

This comment has been minimized.

Copy link
Author

commented Oct 25, 2016

@pacharanero Brilliant, the google_group to discourse importer worked perfectly, thank you. It did around 1100 posts over a couple of hours of downloads and then sorted them nicely into posts and replies.

The only problems I had were I'd incorrectly named the cookies file (should be cookies.txt and I initially saved it as something else), and the google group name (which is does-liverpool). But in both cases, the error messages gave me the information I needed.

I did have to spin up a server with discourse separately, so I had shell access (because I was working on a hosted install). Although export worked one way (from hosted to new VPS) and I could run the google groups script fine, I've ended up in database revision hell for going back, so the everything's still on the VPS at discourse.doesliverpool.com.

@DefProc DefProc closed this Oct 25, 2016

@pacharanero

This comment has been minimized.

Copy link

commented Oct 28, 2016

I've ended up in database revision hell for going back, so the everything's still on the VPS at discourse.doesliverpool.com.

Sounds like the hosted version was updated while you were doing the import, maybe? Have you tried updating your VPS version to the same Discourse version (go for a specific GH commit) as the hosted version?

Or if not, maybe ask Discourse for some help - they are reasonably amenable to helping people on hosted plans.

@DefProc

This comment has been minimized.

Copy link
Author

commented Oct 28, 2016

Yeah, the discourse-docker image defaults to the tests-passing branch, so I started with a 1.7.?.beta? version.

It looks like the database version doesn't change when you roll back the git version (?) although I expect it would do if I cleared everything and started again. I did find in the forums that the database versions will auto update forwards and they're not interested in writing stuff to do the reverse (which sounds reasonable).

The hosting provider (reasonably) sticks to the stable versions, although we had the option to move up to their top tier hosting package to sort it out.

As the self-hosted package is cheaper, and not much more difficult to keep updated, I'm likely to stick to that now. And could even let the database versions catch up before moving across — unless I get up the enthusiasm to redo the google groups export!

@DefProc

This comment has been minimized.

Copy link
Author

commented Nov 21, 2016

OK, I've had another go at getting the googlegroups stuff across to doesliverpool.discoursehosting.net by creating new discourse install (using discourse-docker) that was creating using discourse v1.6.4, so the database version matched.

As the postgresql version's didn't quite match (9.5.4 in docker vs 9.3.9 in discoursehosting), there were a couple of lines in the extracted dump.sql file, before recompressing and uploading.

row_security wasn't introduced until 9.5.?, so I had to remove the following line ­— with no noticable ill effects:

'SET row_security = off;'

And the namespace declaration is slightly different, so the api_keys table can't be created unless the previous commented line ends with Tablespace:.

Changed:

-- Name: api_keys; Type: TABLE; Schema: public; Owner: -
--

CREATE TABLE api_keys (

to:

-- Name: api_keys; Type: TABLE; Schema: public; Owner: -; Tablespace: 
--

CREATE TABLE api_keys (

So there are up-to-date google group records in doesliverpool.discoursehosting.net

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.