Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting the Multi part to Single part support. #35

Closed
dharmeshpurohit opened this issue Sep 24, 2013 · 19 comments
Closed

Requesting the Multi part to Single part support. #35

dharmeshpurohit opened this issue Sep 24, 2013 · 19 comments

Comments

@dharmeshpurohit
Copy link

I am working on dissolve functionality and I used ST_Aggr_Union to achieve the dissolve result. I am getting the result as multi-part result.

I am looking for some function which may provide single-part result or may convert into single-part polygon from multi-part polygon.

Please let me know if further information is needed.

@randallwhitman
Copy link
Contributor

ST_Geometry for Hive in Spatial Framework is designed based on the OGC specification (GIS Simple feature access SQL option); this issue may effectively request extensions to ST_Geometry.

Workaround 1 - ST_NumGeometries and ST_GeometryN: Store the intermediate results. For each multi-part geometry in the intermediate results: for 1 to ST_NumGeometries (perhaps with a generated script), call ST_GeometryN(index).

Workaround 2 - ArcMap: for those who have ArcMap, store intermediate results in JSON, import the JSON to features in ArcMap (with Geoprocessing Tools for Hadoop), then perform further geoprocessing in ArcMap. Sample: Vehicle Trip Discovery with GIS Tools for Hadoop - Import into ArcMap and Visualize

Workaround 3: custom MapReduce job: instead of using Hive, use a custom MapReduce job, and call Geometry-API-java directly.

As an alternative to approaches specific to Spatial Framework for Hadoop, there may be an approach which is generic to hive - such as something like generate_series (as in PostgreSQL - for use with ST_GeometryN and ST_NumGeometries).

As stated here on Github, pull requests are welcome, if someone would like to implement an enhancement to Spatial Framework for Hadoop.

@randallwhitman
Copy link
Contributor

More on: Workaround 1 - ST_NumGeometries and ST_GeometryN:
select ST_AsText(ST_GeometryN(ST_GeomFromText('MultiPoint((10 1),(2 20),(3 3))'),rn)) from (select explode(array(1,2,3)) as rn from onerow) sub;
POINT (10 1)
POINT (2 20)
POINT (3 3)

@dharmeshpurohit
Copy link
Author

Question regarding work-around 1 :

  1. In above query array is being used that can be feasible ones Issue Support Array of Geometry and Geometry as Struct #36 gets closed Is that correct?
  2. In the result some of the Geometry displayed as NULL which is not NULL but thats issue because of ST_AsText generating null but its valid geometry. In that case will above query help?

@randallwhitman
Copy link
Contributor

The Array (or a general generate_series() function for hive) would make it more convenient (workaround 1), possibly a single SQL statement. However, I think it is possible with some resourcefulness to use some scripting and ST_NumGeometries and ST_GeometryN from current ST_Geometry for Hive, to achieve the desired result. Something like:

  1. ST_NumGeometries query to place counts of polygons per multi-polygon into an intermediate table
    (filtering for only those records which are MultiPolygon in the result of the ST_Aggr_Union query);
  2. Script (e.g. perl/python/etc) to generate multiple SQL statements (in one or more SQL scripts) like
    select ST_AsText(ST_GeometryN(shape,rn)) from (select explode(array(1, 2, ..., nthCountOfPolys)) as rn from onerow) sub join testdata;
  3. Run the script[s] that were generated in step (2), to output the single-part polygon results.

@randallwhitman
Copy link
Contributor

Lateral view can be used to select multiple columns when using a UDTF such as explode:

select rn, ST_AsText(ST_GeometryN(shape,rn)) from test1 lateral view explode(array(1,2,3)) sub as rn;

@dharmeshpurohit
Copy link
Author

Is this something I can do using query or I have have implement respective to get one by one geometry from the MULTIPOLYGON geometry?

@randallwhitman
Copy link
Contributor

The sample query posted this morning 2 comments above, if table test1 contains one row with a MultiPolygon of 3 parts in its "shape" column, will output 3 rows of index number and single-part polygon, like:
1 | POLYGON((1 1, ... 1 1))
2 | POLYGON((2 2, ... 2 2))
3 | POLYGON((3 3, ... 3 3))

@randallwhitman
Copy link
Contributor

The report of ST_AsText returning NULL, has been separated out to issue #38.

@dharmeshpurohit
Copy link
Author

Above example is useful, I am able to resolve half portion of my issue.
for example:

select test_dissolve.id, rn, ST_AsText(ST_GeometryN(test_dissolve.shape)) from test_dissolve lateral view explode(array(1,2,3)) sub as rn;

Issue in this query is array parameters are static which should be dynamic.
In order to resolve this issue I should be able to do some like array (startvalue,stopvalue) so that I can say array(1,ST_NumGeometries(test_dissolve.shape)).

In order to make it happen I think I need to write UDTF which have behave like for loop.

@randallwhitman
Copy link
Contributor

Yes, exactly.

Options include:

  • Write a UDTF that yields each polygon of a multi-polygon, in a separate row (like PostGIS ST_Dump);
  • Write a UDF that returns the polygons as an array, and wrap its call with explode();
  • Request or contribute a general Hive enhancement, at hive.apache.org, for the equivalent of PostgreSQL generate_series(), for use such as generate_series(1, ST_NumGeometries(...));
  • Script (in Perl/Python/etc.) to generate and run multiple HQL queries;
  • Maybe there is a way to join on a utility table(int, array) containing (1, array(1)), (2, array(1,2)), (3, array(1,2,3)), (4, array(1,2,3,4)), ... ;
  • Custom Map-Reduce application (in Java, calling geometry-api-java).

@dharmeshpurohit
Copy link
Author

@randallwhitman : I would like to thank you for the nice idea of creating UDTF for generate_series

I have implemented generate_series UDTF which can accepts 3 arguments star,stop and increment.
and I am able to convert multi-part geometries into single-part geometries.

here is the example query:

SELECT test_dissolve.key, rn, ST_AsText(ST_GeometryN(test_dissolve.shape,rn)) FROM test_dissolve lateral view generate_series(1,ST_NumGeometries(test_dissolve.shape),1) sub as rn where ST_NumGeometries(test_dissolve.shape) is not null;

@randallwhitman
Copy link
Contributor

Excellent Dharmesh, great to hear that generate_series is implemented. If you would like to contribute the enhancement to Hive at hive.apache.org, it will be useful to more people and in more situations. (Or if you prefer I guess you could just post it in your own Github repository. Either way, if you choose to, it would be great if you would like to post a link from here in this issue.)

@dharmeshpurohit
Copy link
Author

Yes, I would like to contribute the enchancement to Hive at
http://hive.apache.org

Could you please give me how can to that?

Regards,

Dharmesh

On Tue, Oct 8, 2013 at 11:49 AM, Randall Whitman
notifications@github.comwrote:

Excellent Dharmesh, great to hear that generate_series is implemented. If
you would like to contribute the enhancement to Hive at hive.apache.orghttp://hive.apache.org/issue_tracking.html,
it will be useful to more people and in more situations. (Or if you prefer
I guess you could just post it in your own Github repository. Either way,
if you choose to, it would be great if you would like to post a link from
here in this issue.)


Reply to this email directly or view it on GitHubhttps://github.com//issues/35#issuecomment-25907195
.

Regards,

Dharmesh Purohit

Cell : +1-563-370-1367

@randallwhitman
Copy link
Contributor

Sounds great. Take a look at http://hive.apache.org/issue_tracking.html - that should help get you started - given that you've already implemented generate_series, you could file an enhancement and then immediately attach the source code as a patch.

@dharmeshpurohit
Copy link
Author

Great , I would do so.

Thank you.

On Tue, Oct 8, 2013 at 1:20 PM, Randall Whitman notifications@github.comwrote:

Sounds great. Take a look at http://hive.apache.org/issue_tracking.html -
that should help get you started - given that you've already implemented
generate_series, you could file an enhancement and then immediately
attach the source code as a patch.


Reply to this email directly or view it on GitHubhttps://github.com//issues/35#issuecomment-25914620
.

Regards,

Dharmesh Purohit

Cell : +1-563-370-1367

@randallwhitman
Copy link
Contributor

@dharmeshpurohit - curious if you have had any success in offering generate_series as an enhancement to Apache Hive.

@dharmeshpurohit
Copy link
Author

I am planning to add it may be this weekend I will get chance to work on unit test for the generate_series.

@randallwhitman
Copy link
Contributor

Sounds good :)

@randallwhitman
Copy link
Contributor

Closing as it sounds like the request is resolved.

@dharmeshpurohit: please go ahead and comment any updates on contributing the generate-series enhancement to Apache Hive, even though this issue is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants