Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Unexected error: [Errno 11004] getaddrinfo failed' Error while Migrating GDB Feature Class to HDFS using ArcGISTools #40

Open
mahendersg opened this issue Mar 16, 2016 · 4 comments

Comments

@mahendersg
Copy link

We are facing issue while migrating GDB Feature Class to Hadoop HDFS using GIS Tools for Hadoop Geoprocessing tools.
Following is the system environment details being used :

ArcGIS Client : 10.3.1/10.2.2
Hadoop version : hadoop 2.4.1
Python version : python 2.7.5
ArcSDE: 10.2.2
RDBMS: Oracle 11.2.0.4
ClusterInfo: MasterNode(Nos.1),Secondary Node(Nos.1),DataNodes(Nos.8)

Following steps followed to install and configure ArcGIS tools for hadoop environment:

'
a) Added the ‘geo processing tools for hadoop' Downloaded from GIThub weblink 'https://github.com/Esri/gis-tools-for-hadoop' in hadoop.

b) Enabled webhdfs in hdfs by editing hdfs-site.xml in /opt/hadoop/etc/hadoop/hdfs-site.xml.

c) Added jar 'spatial-sdk-hadoop.jar' and 'esri – geometry - api.jar' in /opt/hadoop 2.4.1/share/hadoop/tools/lib location of our Hadoop master node.

d) Browse for ArcGIS Geoprocessing tool Tool box having python scripts for Hadoop using ArcCatalog 10.3.1

e) Above step enables hadoop tools for ArcGIS, and converted the feature class into json file using ‘features to json’ feature in hadoop toolbox.

f) ’Copy to hdfs’ Scripting tool in hadoop tool box of ArcGIS has been used in order to copy json files to hdfs.

g) Got Error message 'Unexected error: [Errno 11004] getaddrinfo failed'

Error message after running tool:

_Start Time: Wed Mar 09 18:43:44 2016
Running script CopyToHDFS...
Unexpected error : [Errno 11004] getaddrinfo failed
Traceback (most recent call last):
File "", line 184, in execute
File "D:\GIS tools for hadoop\geoprocessing-tools-for-hadoop-master\geoprocessing-tools-for-hadoop-master\webhdfs\webhdfs.py", line 91, in copyToHDFS
fileUploadClient.request('PUT', redirect_path, open(source_path, "rb"), headers={})
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 1007, in _send_request
self.endheaders(body)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 969, in endheaders
self._send_output(message_body)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 829, in send_output
self.send(msg)
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 791, in send
self.connect()
File "C:\Python27\ArcGIS10.2\Lib\httplib.py", line 772, in connect
self.timeout, self.source_address)
File "C:\Python27\ArcGIS10.2\Lib\socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11004] getaddrinfo failed

We followed all the guidelines and steps specified in following weblinks and references:

https://esri.github.io/gis-tools-for-hadoop/
https://github.com/Esri/gis-tools-for-hadoop/wiki

Please provide the resolution .

@randallwhitman
Copy link
Contributor

Cross-reference:
#22
Esri/geoprocessing-tools-for-hadoop#14

@climbage
Copy link
Member

This error is happening during the redirect from the namenode to the datanode that is actually storing the data. You can tell because it has redirect_path in the stack trace.

fileUploadClient.request('PUT', redirect_path, open(source_path, "rb"), headers={})

First, verify that the datanodes are accessible to the client machine running ArcGIS? If they aren't, you will need to make those available to the client.

Second, verify that the namenode is not using network addresses that are internal to the cluster. If you browse to http://[namenode hostname]:50070/dfsnodelist.jsp?whatNodes=LIVE, you should see the Transferring Address that the namenode uses in datanode redirects. Make sure that the client is able to connect to those datanodes using the transferring addresses.

Let us know what you figure out.

@mahendersg
Copy link
Author

We are still getting [Errno 11004] getaddrinfo failed using GIS Tools however proceeded with alternate method to to move the Building.json file size ~6.5 GB from Building Feature Class having ~48 million records to HDFS.
Following steps were followed to load json in Hadoop HDFS using Hive in reference with document https://github.com/Esri/gis-tools-for-hadoop/wiki/Aggregating-CSV-Data-%28Spatial-Binning%
Post migration of Building json to Building table Hive aggregation queries resulting in error.
Detailed steps as below:
:

Add Jar

add jar /volumes/disk1/tc/gis-tools-for-hadoop-master/gis-tools-for-hadoop-master/samples/lib/esri-geometry-api.jar;
add jar /volumes/disk1/tc/gis-tools-for-hadoop-master/gis-tools-for-hadoop-master/samples/lib/spatial-sdk-hadoop.jar;
create temporary function ST_Point as 'com.esri.hadoop.hive.ST_Point';
create temporary function ST_Contains as 'com.esri.hadoop.hive.ST_Contains';
create temporary function ST_AsText as 'com.esri.hadoop.hive.ST_AsText';
create temporary function ST_Intersection as 'com.esri.hadoop.hive.ST_Intersection';

Create Table

create external table Building(OBJECTID INT,
RILUNIQUEID string,
RILFEATURECODE string,
BLDGNO string,
BLDGNAME string,
BLDGTYPE string,
BLDGSUBTYPE string,
BLDGCLASS string,
BLDGROAD string,
BLDGSUBROAD string,
SUBLOCALITY string,
CITYNAME string,
STATENAME string,
BLDGSIZE string,
TAG string,
PINCODE INT,
NUMBEROFFLATS INT,
NUMBEROFSHOPS INT,
BLDG_TYPE string,
CABLEOPERATORNAME string,
AREA_1 INT,
LBU2 string,
SOCIETYCOMPLEXNAME string,
BLDGCONDITION string,
BLDGCONSTRUCTION string,
AFFLUENCEINDICATOR string,
ROOFTOPANTENNA string,
REMARKS string,
VINTAGE INT,
BOI string,
NETWORKREF string,
NOOFCOMMERCIAL INT,
BUILDING_RJID string,
UPDATESOURCE string,
PLOTSURVEYNO string,
TPY_ID string,
LOCALITYNAME string,
SUBSUBLOCALITY string,
CITYCODE string,
LOCALITYCODE string,
LOCALITY_RJID string,
DATASOURCE string,
CREATED_USER string,
CREATED_DATE string,
LAST_EDITED_USER string,
LAST_EDITED_DATE string,
LTERFS string,
FTTXRFS string,
BLCMSTATUS string,
TALUKCODE string,
TALUKNAME string,
DISTRICTCODE string,
DISTRICTNAME string,
BOICATEGORY string,
LTE_COVERAGE string,
NEIGHBOURHOODCODE string,
JIOCENTERNAME string,
NUMBEROFFLOORS INT,
VILLAGENAME string,
VILLAGE_RJID string,
JIOCENTERCODE string,
BLDG_CATEGORY string,
GLOBALID_1 string,
JIOCENTER_RJID string,
JIOCENTER_SAP_ID string,
INCOME_LEVEL string,boundaryshape binary)
ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
STORED AS INPUTFORMAT 'com.esri.json.hadoop.EnclosedJsonInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

Load data

hadoop fs -put /volumes/disk1/tc/Building.json /volumes;

hadoop fs -ls /volumes;

LOAD DATA INPATH '/volumes/Building.json' OVERWRITE INTO TABLE Building;

There was no error observed in load data process.

hive> describe extended building;
OK
objectid int from deserializer
riluniqueid string from deserializer
rilfeaturecode string from deserializer
bldgno string from deserializer
bldgname string from deserializer
bldgtype string from deserializer
bldgsubtype string from deserializer
bldgclass string from deserializer
bldgroad string from deserializer
bldgsubroad string from deserializer
sublocality string from deserializer
cityname string from deserializer
statename string from deserializer
bldgsize string from deserializer
tag string from deserializer
pincode int from deserializer
numberofflats int from deserializer
numberofshops int from deserializer
bldg_type string from deserializer
cableoperatorname string from deserializer
area_1 int from deserializer
lbu2 string from deserializer
societycomplexname string from deserializer
bldgcondition string from deserializer
bldgconstruction string from deserializer
affluenceindicator string from deserializer
rooftopantenna string from deserializer
remarks string from deserializer
vintage int from deserializer
boi string from deserializer
networkref string from deserializer
noofcommercial int from deserializer
building_rjid string from deserializer
updatesource string from deserializer
plotsurveyno string from deserializer
tpy_id string from deserializer
localityname string from deserializer
subsublocality string from deserializer
citycode string from deserializer
localitycode string from deserializer
locality_rjid string from deserializer
datasource string from deserializer
created_user string from deserializer
created_date string from deserializer
last_edited_user string from deserializer
last_edited_date string from deserializer
lterfs string from deserializer
fttxrfs string from deserializer
blcmstatus string from deserializer
talukcode string from deserializer
talukname string from deserializer
districtcode string from deserializer
districtname string from deserializer
boicategory string from deserializer
lte_coverage string from deserializer
neighbourhoodcode string from deserializer
jiocentername string from deserializer
numberoffloors int from deserializer
villagename string from deserializer
village_rjid string from deserializer
jiocentercode string from deserializer
bldg_category string from deserializer
globalid_1 string from deserializer
jiocenter_rjid string from deserializer
jiocenter_sap_id string from deserializer
income_level string from deserializer
boundaryshape binary from deserializer

Detailed Table Information Table(tableName:building, dbName:landbase, owner:hadoop, createTime:1459342351, lastAccessTime:0, retention:0,

sd:StorageDescriptor(cols:[FieldSchema(name:objectid, type:int, comment:null), FieldSchema(name:riluniqueid, type:string, comment:null), FieldSchema

(name:rilfeaturecode, type:string, comment:null), FieldSchema(name:bldgno, type:string, comment:null), FieldSchema(name:bldgname, type:string,

comment:null), FieldSchema(name:bldgtype, type:string, comment:null), FieldSchema(name:bldgsubtype, type:string, comment:null), FieldSchema

(name:bldgclass, type:string, comment:null), FieldSchema(name:bldgroad, type:string, comment:null), FieldSchema(name:bldgsubroad, type:string,

comment:null), FieldSchema(name:sublocality, type:string, comment:null), FieldSchema(name:cityname, type:string, comment:null), FieldSchema

(name:statename, type:string, comment:null), FieldSchema(name:bldgsize, type:string, comment:null), FieldSchema(name:tag, type:string, comment:null),

FieldSchema(name:pincode, type:int, comment:null), FieldSchema(name:numberofflats, type:int, comment:null), FieldSchema(name:numberofshops, type:int,

comment:null), FieldSchema(name:bldg_type, type:string, comment:null), FieldSchema(name:cableoperatorname, type:string, comment:null), FieldSchema

(name:area_1, type:int, comment:null), FieldSchema(name:lbu2, type:string, comment:null), FieldSchema(name:societycomplexname, type:string,

comment:null), FieldSchema(name:bldgcondition, type:string, comment:null), FieldSchema(name:bldgconstruction, type:string, comment:null),

FieldSchema(name:affluenceindicator, type:string, comment:null), FieldSchema(name:rooftopantenna, type:string, comment:null), FieldSchema

(name:remarks, type:string, comment:null), FieldSchema(name:vintage, type:int, comment:null), FieldSchema(name:boi, type:string, comment:null),

FieldSchema(name:networkref, type:string, comment:null), FieldSchema(name:noofcommercial, type:int, comment:null), FieldSchema(name:building_rjid,

type:string, comment:null), FieldSchema(name:updatesource, type:string, comment:null), FieldSchema(name:plotsurveyno, type:string, comment:null),

FieldSchema(name:tpy_id, type:string, comment:null), FieldSchema(name:localityname, type:string, comment:null), FieldSchema(name:subsublocality,

type:string, comment:null), FieldSchema(name:citycode, type:string, comment:null), FieldSchema(name:localitycode, type:string, comment:null),

FieldSchema(name:locality_rjid, type:string, comment:null), FieldSchema(name:datasource, type:string, comment:null), FieldSchema(name:created_user,

type:string, comment:null), FieldSchema(name:created_date, type:string, comment:null), FieldSchema(name:last_edited_user, type:string, comment:null),

FieldSchema(name:last_edited_date, type:string, comment:null), FieldSchema(name:lterfs, type:string, comment:null), FieldSchema(name:fttxrfs,

type:string, comment:null), FieldSchema(name:blcmstatus, type:string, comment:null), FieldSchema(name:talukcode, type:string, comment:null),

FieldSchema(name:talukname, type:string, comment:null), FieldSchema(name:districtcode, type:string, comment:null), FieldSchema(name:districtname,

type:string, comment:null), FieldSchema(name:boicategory, type:string, comment:null), FieldSchema(name:lte_coverage, type:string, comment:null),

FieldSchema(name:neighbourhoodcode, type:string, comment:null), FieldSchema(name:jiocentername, type:string, comment:null), FieldSchema

(name:numberoffloors, type:int, comment:null), FieldSchema(name:villagename, type:string, comment:null), FieldSchema(name:village_rjid, type:string,

comment:null), FieldSchema(name:jiocentercode, type:string, comment:null), FieldSchema(name:bldg_category, type:string, comment:null), FieldSchema

(name:globalid_1, type:string, comment:null), FieldSchema(name:jiocenter_rjid, type:string, comment:null), FieldSchema(name:jiocenter_sap_id,

type:string, comment:null), FieldSchema(name:income_level, type:string, comment:null), FieldSchema(name:boundaryshape, type:binary, comment:null)],

location:hdfs://jiogis-cluster-jiogis-master-001:9000/user/hive/warehouse/landbase.db/building,

inputFormat:com.esri.json.hadoop.EnclosedJsonInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false,

numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:com.esri.hadoop.hive.serde.JsonSerde, parameters:{serialization.format=1}),

bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}),

storedAsSubDirectories:false), partitionKeys:[], parameters:{numFiles=1, EXTERNAL=TRUE, transient_lastDdlTime=1459342519, COLUMN_STATS_ACCURATE=true,

totalSize=6665990138, numRows=0, rawDataSize=0}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)
Time taken: 0.208 seconds, Fetched: 69 row(s)

hive> select count(OBJECTID) from building;
Query ID = hadoop_20160411131717_73f71c12-353a-4119-8ab3-913d978a2dc1
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1460354375516_0007, Tracking URL = http://jiogis-cluster-jiogis-master-001:8088/proxy/application_1460354375516_0007/
Kill Command = /opt/hadoop/bin/hadoop job -kill job_1460354375516_0007
Hadoop job information for Stage-1: number of mappers: 25; number of reducers: 1
2016-04-11 13:17:57,568 Stage-1 map = 0%, reduce = 0%
2016-04-11 13:18:51,298 Stage-1 map = 88%, reduce = 100%, Cumulative CPU 33.68 sec
2016-04-11 13:18:52,323 Stage-1 map = 100%, reduce = 100%
MapReduce Total cumulative CPU time: 33 seconds 680 msec
Ended Job = job_1460354375516_0007 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1460354375516_0007_m_000000 (and more) from job job_1460354375516_0007
Examining task ID: task_1460354375516_0007_m_000003 (and more) from job job_1460354375516_0007
Examining task ID: task_1460354375516_0007_m_000001 (and more) from job job_1460354375516_0007
Examining task ID: task_1460354375516_0007_m_000008 (and more) from job job_1460354375516_0007
Examining task ID: task_1460354375516_0007_m_000015 (and more) from job job_1460354375516_0007

Task with the most failures(4):

Task ID:
task_1460354375516_0007_m_000000

URL:

http://jiogis-cluster-jiogis-master-001:8088/taskdetails.jsp?jobid=job_1460354375516_0007&tipid=task_1460354375516_0007_m_000000

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"attributes":

{"OBJECTID":40712,"SUBLOCALITY":"Shakti Nagar

2","CITYNAME":"Bhuj","STATENAME":"Gujarat","TAG":null,"PINCODE":370427,"LBU2":"NEW","VINTAGE":2011,"BOI":null,"BUILDING_RJID":"BHUJBD0031982","LOCALI

TYNAME":"Sanskar

Nagar","SUBSUBLOCALITY":null,"CITYCODE":"BHUJ","LOCALITYCODE":"SNKR","LOCALITY_RJID":"LOY71336","DATASOURCE":null,"FTTXRFS":null,"BLCMSTATUS":null,"T

ALUKCODE":"BHUJ","TALUKNAME":"Bhuj","DISTRICTCODE":"BHUJ","DISTRICTNAME":"Kachchh","BOICATEGORY":null,"NEIGHBOURHOODCODE":null,"JIOCENTERNAME":"Bhuj"

,"VILLAGENAME":"Mirjhapar (CT)","VILLAGE_RJID":"VIE78276","JIOCENTERCODE":"JC01","GLOBALID_1":"{87ACB15B-BB59-42FB-8737-

5111B9A239B6}","JIOCENTER_RJID":"GJ-BHUJ-JC01-0275","JIOCENTER_SAP_ID":"I-GJ-BHUJ-JCO-

0001","SHAPE_Length":35.082851836058126,"SHAPE_Area":66.70308817988206},"geometry":{"curveRings":[[[-1293826.0616008043,2638881.98328707],[-

1293835.0307057127,2638881.8490332216],[-1293835.104782246,2638888.9112596065],[-1293824.5208404362,2638889.0695212036],[-

1293824.4993598238,2638887.027283214],[-1293825.616667755,2638887.010383025],{"c":[[-1293826.1089845225,2638886.5079577304],[-

1293825.966182138,2638886.8604469104]]},[-1293826.0616008043,2638881.98328707]]]}}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"attributes":

{"OBJECTID":40712,"SUBLOCALITY":"Shakti Nagar

2","CITYNAME":"Bhuj","STATENAME":"Gujarat","TAG":null,"PINCODE":370427,"LBU2":"NEW","VINTAGE":2011,"BOI":null,"BUILDING_RJID":"BHUJBD0031982","LOCALI

TYNAME":"Sanskar

Nagar","SUBSUBLOCALITY":null,"CITYCODE":"BHUJ","LOCALITYCODE":"SNKR","LOCALITY_RJID":"LOY71336","DATASOURCE":null,"FTTXRFS":null,"BLCMSTATUS":null,"T

ALUKCODE":"BHUJ","TALUKNAME":"Bhuj","DISTRICTCODE":"BHUJ","DISTRICTNAME":"Kachchh","BOICATEGORY":null,"NEIGHBOURHOODCODE":null,"JIOCENTERNAME":"Bhuj"

,"VILLAGENAME":"Mirjhapar (CT)","VILLAGE_RJID":"VIE78276","JIOCENTERCODE":"JC01","GLOBALID_1":"{87ACB15B-BB59-42FB-8737-

5111B9A239B6}","JIOCENTER_RJID":"GJ-BHUJ-JC01-0275","JIOCENTER_SAP_ID":"I-GJ-BHUJ-JCO-

0001","SHAPE_Length":35.082851836058126,"SHAPE_Area":66.70308817988206},"geometry":{"curveRings":[[[-1293826.0616008043,2638881.98328707],[-

1293835.0307057127,2638881.8490332216],[-1293835.104782246,2638888.9112596065],[-1293824.5208404362,2638889.0695212036],[-

1293824.4993598238,2638887.027283214],[-1293825.616667755,2638887.010383025],{"c":[[-1293826.1089845225,2638886.5079577304],[-

1293825.966182138,2638886.8604469104]]},[-1293826.0616008043,2638881.98328707]]]}}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:501)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 8 more
Caused by: java.lang.NullPointerException
at com.esri.hadoop.hive.GeometryUtils.serialize(Unknown Source)
at com.esri.hadoop.hive.GeometryUtils.access$000(Unknown Source)
at com.esri.hadoop.hive.GeometryUtils$CachedGeometryBytesWritable.(Unknown Source)
at com.esri.hadoop.hive.GeometryUtils.geometryToEsriShapeBytesWritable(Unknown Source)
at com.esri.hadoop.hive.serde.JsonSerde.deserialize(Unknown Source)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:136)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:100)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:492)
... 9 more

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 25 Reduce: 1 Cumulative CPU: 33.68 sec HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 33 seconds 680 msec

@climbage
Copy link
Member

I see in the JSON for the failed record that you have a geometry with curves. Unfortunately, the Java geometry library only supports simple feature types and not curves.

"geometry":{"curveRings":[[[-1293826.0616008043,2638881.98328707],[-1293835.0307057127,2638881.8490332216],[-1293835.104782246,2638888.9112596065],[-1293824.5208404362,2638889.0695212036],[-1293824.4993598238,2638887.027283214],[-1293825.616667755,2638887.010383025],{"c":[[-1293826.1089845225,2638886.5079577304],[-1293825.966182138,2638886.8604469104]]},[-1293826.0616008043,2638881.98328707]]]}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants