Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stress test schema #1

Open
eightysteele opened this issue Oct 31, 2012 · 7 comments
Open

Stress test schema #1

eightysteele opened this issue Oct 31, 2012 · 7 comments

Comments

@eightysteele
Copy link

Basically for each unique name we'll store a MULTIPOINT of all unique points. We'll also store an array of OccurrenceID strings, one per point. For points with multiple IDs, the value will be a list of CSV IDs. The max points to test is 2 million.

Here's how to create the table on CartoDB:

  1. Create polygon table in CartoDB dashboard
  2. SELECT AddGeometryColumn('points', 'the_geom_multipoint', 4326, 'MULTIPOINT', 2)
  3. ALTER TABLE points ADD COLUMN occids text[]

Then we need to load in 2 million points like this:

INSERT INTO points (name, occids, the_geom_multipoint) values ('testname', '{"1","10,11,12,13"}', st_geomfromtext('MULTIPOINT ((0.896666666667 9.93166666667), (19.583334 47.166668))', 4326))

And finally test the performance of this query:

SELECT (ST_DumpPoints(ST_Transform(t.the_geom_multipoint,3857))).geom as the_geom_webmercator, unnest(t.occids) from gbif_points_test as t WHERE t.name = 'testname'

If the performance isn't great, Vizz thinks we might consider unpacking points to a new table once they are uploaded.

@walterj
Copy link
Member

walterj commented Oct 31, 2012

Quick question. We will very soon need to be able to filter (and exclude from visualization, or show in different color) points by month and year. This straightforward in this scheme?

@eightysteele
Copy link
Author

Right now, for our immediate goal, we're looking to quickly support a specific use case: Given a scientific name, quickly map all points. So as it is above, we're not factoring in date information. But I think we can do that another way.

Andrew at Vizzuality created a pretty amazing visualization of the VertNet data in CartoDB. Basically it's an animation that shows specimens collected over time. Click the link below and watch. The dates are in the lower right corner:

http://cartodb.github.com/torque/examples/vertnet.html

@walterj
Copy link
Member

walterj commented Oct 31, 2012

Aaron, thanks. Do say which other way. I don’t think animations are going to cut it. I mean interactive ability to filter for key fields. We need that essentially as soon as we put the new points on the ap.

@eightysteele
Copy link
Author

Totally. So in addition to month and year, what are the other key fields to filter on.

@walterj
Copy link
Member

walterj commented Oct 31, 2012

Acccuracy, Institution.

Rob?

Is this all useful, or am I complicating things too much?

@robgur
Copy link
Member

robgur commented Oct 31, 2012

Hi guys --- I can see a bunch of use cases we need to support re:
filtering and I think it is essential to have this conversation before we
get too far down the road. Walter hit the very most important ones right
up front: 1) Month in order to get seasonality; 2) Year in order to look
at temporal trends. This gives us sciname/date/location which are
crucial for us in terms of mass filtering of points. The other use cases
strike me as less critical and we could provide a way for users to generate
temporary tables from dynamic queries if needed for other kinds of point
data returns. I see the point of filtering by geospatial "uncertainty" as
also potentially important. One nice thing: these are all stored in
formats that are pretty easy to index/small.

I also want us to be able to flag records --- that is to write into
these tables for records where there is some assessment of a problem (zoo
record, etc). So we really need to plan for that use case too for points.
I think this covers it though. Aaron, thanks for keeping us in the loop
and this really exposes how that loop-keeping is super useful to nip issues
before we get too far downstream.

Best, Rob

On Wed, Oct 31, 2012 at 4:05 PM, Walter Jetz notifications@github.comwrote:

Acccuracy, Institution.

Rob?

Is this all useful, or am I complicating things too much?

Walter

From: Aaron Steele [mailto:notifications@github.com]
Sent: Wednesday, October 31, 2012 5:49 PM
To: MapofLife/fossa
Cc: Jetz, Walter
Subject: Re: [fossa] Stress test schema (#1)

Totally. So in addition to month and year, what are the other key fields
to filter on.


Reply to this email directly or view it on GitHub<
https://github.com/MapofLife/fossa/issues/1#issuecomment-9963420>.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-9963869.

@eightysteele
Copy link
Author

Yup, yup, good feedback guys. Let me fold this in and marinate. More thoughts soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants