Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a rake task for populating the database #282

Open
gravitystorm opened this issue May 14, 2013 · 20 comments
Open

Create a rake task for populating the database #282

gravitystorm opened this issue May 14, 2013 · 20 comments
Labels
dx Developer Experience

Comments

@gravitystorm
Copy link
Collaborator

Currently the installation notes spend about 90 lines explaining how to populate the database from an extract. A substantial portion of this concerns osmosis options and resetting sequences.

This could be simplified by creating a rake task that calls osmosis with the configured database parameters and takes care of the sequences.

@tomhughes
Copy link
Member

Well populating the database should very much be relegated to an "optional extras" section of the documentation - most people just looking to install the rails code in order to develop it should never need to to do any sort of import.

So I don't object to this idea, but really the easiest solution for most people is not to bother, which also means you don't have to explain how to install osmosis, making things even simpler!

@gravitystorm
Copy link
Collaborator Author

Oh indeed - I'm just aiming to reduce the length of the 'optional extras' section too!

@pnorman
Copy link
Contributor

pnorman commented Jul 15, 2013

Once the schema is set up it is "just" one osmosis command to load in data. Perhaps just redirect to the osmosis install instructions and give the command?

@gravitystorm
Copy link
Collaborator Author

The original instructions have lots of resetting of sequences - does osmosis now handle all of these?

@pnorman
Copy link
Contributor

pnorman commented Jul 15, 2013

That's for loading data then editing it locally, which is more specialized. If all you need to do is load some data to have something to view, it's pretty easy. Perhaps have a rake task to set sequences to MAX(id) or whatever is appropriate, and run the task after loading data?

@danstowell
Copy link
Contributor

Just to note that I'm currently hoping to try some tweaks to the /browse/ pages, which do require a data import, or else there are no items to browse. (Especially since, by default, the search box's results come from remote.)

At present there is no documentation (since the doc has gone from wiki into the CONFIG.md, which tells me literally nothing except this issue number), and no script. We are stuck! Please, something... (even if it's just a link to deprecated wiki instructions...)

@tomhughes
Copy link
Member

@danstowell I usually just jump into the editor and draw something when I want to test something like that - that way I can be sure it will have the tags I need for whatever I'm testing.

But sure, I'm not going to refuse a patch that provides (sensible) instructions on how to load some data. I believe it is a non-trivial thing to do however, so they will need to be well written.

@pnorman
Copy link
Contributor

pnorman commented Sep 17, 2013

Just to note that I'm currently hoping to try some tweaks to the /browse/ pages, which do require a data import, or else there are no items to browse. (Especially since, by default, the search box's results come from remote.)

Are you wanting to import data, or import then edit data? The first is relatively easy - create the database then use the osmosis --write-apidb task to import data (keeping in mind that --write-apidb is amazingly slow). To do it properly (not giving superuser to every postgres account involved) can be a bit annoying.

If the latter, then you need to muck about with sequences so when you add a node it gets a suitable ID.

Having set up an apidb database a few times for testing, I can say that it was never properly documented anywhere.

@danstowell
Copy link
Contributor

I only need to import data. (I need fairly rich data, so drawing a few ways myself isn't quite enough for me.)

Just for the record (for anyone drafting a rake task!), here's what I've done which seems to have given me a usable read copy of the apidb:

  • Of course I had first done the database setup described in INSTALL.md, so I already had the databases and the postgres user set up.
  • Ensure the database is empty (so that the import doesn't clash on IDs):
    osmosis --truncate-apidb host="localhost" database="openstreetmap" user="openstreetmap" password="" validateSchemaVersion="no"
  • Load the dump into the db (for Greater London, this took about 20 minutes on my laptop):
    osmosis --read-pbf file=greater-london-latest.osm.pbf --write-apidb host="localhost" database="openstreetmap" user="openstreetmap" password="" validateSchemaVersion="no"
  • Optional step: postgres tools to reclaim disk space:
    sudo -u postgres vacuumdb -afvz
    sudo -u postgres reindexdb

This does not reset the sequences, as mentioned above in this thread. However it doesn't seem to need any weird privileges in the postgres user accounts. But then, since I don't know the database internals I don't know if the sequences issue is the only quirk of the (development) database I've landed myself with.

@gravitystorm gravitystorm mentioned this issue Mar 23, 2014
@msergiu80

This comment was marked as off-topic.

@tomhughes

This comment was marked as off-topic.

@msergiu80

This comment was marked as off-topic.

@hanchao
Copy link
Contributor

hanchao commented Sep 19, 2016

@danstowell Error occurred. Any help?

[hanchao@map ~]$ osmosis --truncate-apidb host="******" database="******" user="******" password="******" validateSchemaVersion="no"
九月 19, 2016 5:14:58 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Osmosis Version 0.45
九月 19, 2016 5:14:58 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Preparing pipeline.
九月 19, 2016 5:14:58 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Launching pipeline execution.
九月 19, 2016 5:14:58 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Pipeline executing, waiting for completion.
九月 19, 2016 5:14:59 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Pipeline complete.
九月 19, 2016 5:14:59 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Total execution time: 674 milliseconds.

[hanchao@map ~]$ osmosis --read-pbf file=china-latest.osm.pbf --write-apidb host="******" database="******" user="******" password="****" validateSchemaVersion="no"
九月 19, 2016 5:15:11 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Osmosis Version 0.45
九月 19, 2016 5:15:11 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Preparing pipeline.
九月 19, 2016 5:15:11 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Launching pipeline execution.
九月 19, 2016 5:15:11 下午 org.openstreetmap.osmosis.core.Osmosis run
信息: Pipeline executing, waiting for completion.
九月 19, 2016 5:16:18 下午 org.openstreetmap.osmosis.core.pipeline.common.ActiveTaskManager waitForCompletion
严重: Thread for task 1-read-pbf failed
org.openstreetmap.osmosis.core.OsmosisRuntimeException: Unable to insert user with id 3642735 into the database.
        at org.openstreetmap.osmosis.apidb.v0_6.impl.UserManager.insertUser(UserManager.java:143)
        at org.openstreetmap.osmosis.apidb.v0_6.impl.UserManager.addOrUpdateUser(UserManager.java:191)
        at org.openstreetmap.osmosis.apidb.v0_6.ApidbWriter.process(ApidbWriter.java:1098)
        at crosby.binary.osmosis.OsmosisBinaryParser.parseDense(OsmosisBinaryParser.java:138)
        at org.openstreetmap.osmosis.osmbinary.BinaryParser.parse(BinaryParser.java:124)
        at org.openstreetmap.osmosis.osmbinary.BinaryParser.handleBlock(BinaryParser.java:68)
        at org.openstreetmap.osmosis.osmbinary.file.FileBlock.process(FileBlock.java:135)
        at org.openstreetmap.osmosis.osmbinary.file.BlockInputStream.process(BlockInputStream.java:34)
        at crosby.binary.osmosis.OsmosisReader.run(OsmosisReader.java:45)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "users_display_name_idx"
  详细:Key (display_name)=(Nodes&Roads) already exists.
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
        at org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:334)
        at org.openstreetmap.osmosis.apidb.v0_6.impl.UserManager.insertUser(UserManager.java:140)
        ... 9 more

九月 19, 2016 5:16:18 下午 org.openstreetmap.osmosis.core.Osmosis main
严重: Execution aborted.
org.openstreetmap.osmosis.core.OsmosisRuntimeException: One or more tasks failed.
        at org.openstreetmap.osmosis.core.pipeline.common.Pipeline.waitForCompletion(Pipeline.java:146)
        at org.openstreetmap.osmosis.core.Osmosis.run(Osmosis.java:92)
        at org.openstreetmap.osmosis.core.Osmosis.main(Osmosis.java:37)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launchStandard(Launcher.java:330)
        at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:238)
        at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
        at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
        at org.codehaus.classworlds.Launcher.main(Launcher.java:47)

@mojodna
Copy link

mojodna commented Sep 19, 2016

@hanchao this is the root cause: drolbr/Overpass-API#257 (even if it didn't come from Overpass).

Here's a shell script that will remap user IDs after the extract has been converted to XML: https://github.com/AmericanRedCross/posm-replay-tool/blob/c25d8e1f62af44e0664190723eba51eef7b93adc/remap-userid.sh

My notes in https://github.com/AmericanRedCross/posm-replay-tool/blob/28deac4193859b71af34d845f013194a37871870/LOCAL.md#initialization explain what's going on and steps to fix it (locally).

@hanchao
Copy link
Contributor

hanchao commented Sep 20, 2016

@mojodna Thanks. This helped greatly。

3642735 1708958 have the same name (Nodes&Roads)

<node id="1546831475" version="2" timestamp="2016-08-31T17:30:44Z" uid="3642735" user="Nodes&amp;Roads" changeset="41831739" lat="34.6455964" lon="110.3129604"/>
<node id="1582851239" version="3" timestamp="2016-02-18T20:41:39Z" uid="1708958" user="Nodes&amp;Roads" changeset="37297109" lat="38.9261979" lon="113.8765729"/>

https://www.openstreetmap.org/api/0.6/node/1582851239

<node id="1582851239" visible="true" version="3" changeset="37297109" timestamp="2016-02-18T20:41:39Z" user="georhoko" uid="1708958" lat="38.9261979" lon="113.8765729"/>

The name in china-latest.osm.pbf is not updated
http://download.geofabrik.de/asia/china-latest.osm.pbf

@tomhughes
Copy link
Member

@maxdeepfield Your comment has nothing to do with this thread and in any case is not appropriate here - please ask on the dev or rails-dev mailing lists, or on #osm-dev on IRC if you need help using the code.

@maxdeepfield
Copy link

reading "about 90 lines explaining how to populate the database" is not very hard task if these lines exists and will work. any progress on this?

@maxdeepfield
Copy link

maxdeepfield commented Dec 2, 2019

Anyway, osmosis --write-apidb works fine with geofabrik extracts, after import I can see and edit data, then get the result via osmosis --read-apidb-current, what with --write-pbf gives new dataset.

Also it works with xml/osm files exported directly from openstreetmap website via bounding box, so it is super easy to test on "real" data.

If processes done without errors - do I need to care about something?
How actually I can feel not to "able to edit the data I have loaded"?

@gravitystorm gravitystorm added the dx Developer Experience label Dec 18, 2019
@prusswan
Copy link

Managed to import my region extract from geofabrik after resolving various data issues, which can be resolved with some database knowledge and background understanding (e.g. data extracts need to satisfy some integrity conditions) of these issues #1988, #2449, #2543

Anyway, I feel the real issue now may be that users don't realize "populating the database from an OSM extract" requires the data to be imported to meet certain conditions, for the step to "just work" (and possibly for the proposed rake task to run without getting tripped). Would it help to have another section covering the kind of data preparation which may be required?

@victorovento
Copy link

Well I think this script will be never written.

saerdnaer added a commit to saerdnaer/openstreetmap-website that referenced this issue Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dx Developer Experience
Projects
None yet
Development

No branches or pull requests

10 participants