Stress test schema #1

eightysteele · 2012-10-31T21:21:21Z

Basically for each unique name we'll store a MULTIPOINT of all unique points. We'll also store an array of OccurrenceID strings, one per point. For points with multiple IDs, the value will be a list of CSV IDs. The max points to test is 2 million.

Here's how to create the table on CartoDB:

Create polygon table in CartoDB dashboard
SELECT AddGeometryColumn('points', 'the_geom_multipoint', 4326, 'MULTIPOINT', 2)
ALTER TABLE points ADD COLUMN occids text[]

Then we need to load in 2 million points like this:

INSERT INTO points (name, occids, the_geom_multipoint) values ('testname', '{"1","10,11,12,13"}', st_geomfromtext('MULTIPOINT ((0.896666666667 9.93166666667), (19.583334 47.166668))', 4326))

And finally test the performance of this query:

SELECT (ST_DumpPoints(ST_Transform(t.the_geom_multipoint,3857))).geom as the_geom_webmercator, unnest(t.occids) from gbif_points_test as t WHERE t.name = 'testname'

If the performance isn't great, Vizz thinks we might consider unpacking points to a new table once they are uploaded.

The text was updated successfully, but these errors were encountered:

walterj · 2012-10-31T21:23:25Z

Quick question. We will very soon need to be able to filter (and exclude from visualization, or show in different color) points by month and year. This straightforward in this scheme?

eightysteele · 2012-10-31T21:33:53Z

Right now, for our immediate goal, we're looking to quickly support a specific use case: Given a scientific name, quickly map all points. So as it is above, we're not factoring in date information. But I think we can do that another way.

Andrew at Vizzuality created a pretty amazing visualization of the VertNet data in CartoDB. Basically it's an animation that shows specimens collected over time. Click the link below and watch. The dates are in the lower right corner:

http://cartodb.github.com/torque/examples/vertnet.html

walterj · 2012-10-31T21:36:09Z

Aaron, thanks. Do say which other way. I don’t think animations are going to cut it. I mean interactive ability to filter for key fields. We need that essentially as soon as we put the new points on the ap.

eightysteele · 2012-10-31T21:49:25Z

Totally. So in addition to month and year, what are the other key fields to filter on.

walterj · 2012-10-31T22:05:21Z

Acccuracy, Institution.

Rob?

Is this all useful, or am I complicating things too much?

robgur · 2012-10-31T22:27:34Z

Hi guys --- I can see a bunch of use cases we need to support re:
filtering and I think it is essential to have this conversation before we
get too far down the road. Walter hit the very most important ones right
up front: 1) Month in order to get seasonality; 2) Year in order to look
at temporal trends. This gives us sciname/date/location which are
crucial for us in terms of mass filtering of points. The other use cases
strike me as less critical and we could provide a way for users to generate
temporary tables from dynamic queries if needed for other kinds of point
data returns. I see the point of filtering by geospatial "uncertainty" as
also potentially important. One nice thing: these are all stored in
formats that are pretty easy to index/small.

I also want us to be able to flag records --- that is to write into
these tables for records where there is some assessment of a problem (zoo
record, etc). So we really need to plan for that use case too for points.
I think this covers it though. Aaron, thanks for keeping us in the loop
and this really exposes how that loop-keeping is super useful to nip issues
before we get too far downstream.

Best, Rob

On Wed, Oct 31, 2012 at 4:05 PM, Walter Jetz notifications@github.comwrote:

Acccuracy, Institution.

Rob?

Is this all useful, or am I complicating things too much?

Walter

From: Aaron Steele [mailto:notifications@github.com]
Sent: Wednesday, October 31, 2012 5:49 PM
To: MapofLife/fossa
Cc: Jetz, Walter
Subject: Re: [fossa] Stress test schema (#1)

Totally. So in addition to month and year, what are the other key fields
to filter on.

—
Reply to this email directly or view it on GitHub<
https://github.com/MapofLife/fossa/issues/1#issuecomment-9963420>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-9963869.

eightysteele · 2012-10-31T22:48:43Z

Yup, yup, good feedback guys. Let me fold this in and marinate. More thoughts soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stress test schema #1

Stress test schema #1

eightysteele commented Oct 31, 2012

walterj commented Oct 31, 2012

eightysteele commented Oct 31, 2012

walterj commented Oct 31, 2012

eightysteele commented Oct 31, 2012

walterj commented Oct 31, 2012

robgur commented Oct 31, 2012

eightysteele commented Oct 31, 2012

Stress test schema #1

Stress test schema #1

Comments

eightysteele commented Oct 31, 2012

walterj commented Oct 31, 2012

eightysteele commented Oct 31, 2012

walterj commented Oct 31, 2012

eightysteele commented Oct 31, 2012

walterj commented Oct 31, 2012

robgur commented Oct 31, 2012

eightysteele commented Oct 31, 2012