Skip to content
Permalink
Browse files
HAWQ-1376 - clarify pxf host and port description (closes #99)
  • Loading branch information
lisakowen authored and dyozie committed Mar 10, 2017
1 parent dcb5cad commit 5714ce5b3efb61387e6479907ada58f5aa8f34aa
Showing 9 changed files with 19 additions and 11 deletions.
@@ -240,3 +240,7 @@ For command-line administrators:
$ hawq init standby -n -M fast

```

## <a id="pxfnhdfsnamenode"></a>Using PXF with HDFS NameNode HA

If HDFS NameNode High Availability is enabled, use the HDFS Nameservice ID in the `LOCATION` clause \<host\> field when invoking any PXF `CREATE EXTERNAL TABLE` command. If the \<port\> is omitted from the `LOCATION` URI, PXF connects to the port number designated by the `pxf_service_port` server configuration parameter value (default is 51200).
@@ -43,7 +43,7 @@ To create an external HBase table, use the following syntax:
``` sql
CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name
( column_name data_type [, ...] | LIKE other_table )
LOCATION ('pxf://namenode[:port]/hbase-table-name?Profile=HBase')
LOCATION ('pxf://host[:port]/hbase-table-name?Profile=HBase')
FORMAT 'CUSTOM' (Formatter='pxfwritable_import');
```

@@ -100,7 +100,8 @@ HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..

| Keyword | Value |
|-------|-------------------------------------|
| \<host\>[:\<port\>] | The HDFS NameNode and port. |
| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. |
| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. |
| \<path-to-hdfs-file\> | The path to the file in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`, or `Avro`. |
| \<custom-option\> | \<custom-option\> is profile-specific. Profile-specific options are discussed in the relevant profile topic later in this section.|
@@ -54,7 +54,8 @@ HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..

| Keyword | Value |
|-------|-------------------------------------|
| \<host\>[:\<port\>] | The HDFS NameNode and port. |
| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. |
| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. |
| \<path-to-hdfs-file\> | The path to the file in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
| \<custom-option\> | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
@@ -332,7 +332,8 @@ Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..

| Keyword | Value |
|-------|-------------------------------------|
| \<host\>[:<port\>] | The HDFS NameNode and port. |
| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. |
| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. |
| \<hive-db-name\> | The name of the Hive database. If omitted, defaults to the Hive database named `default`. |
| \<hive-table-name\> | The name of the Hive table. |
| PROFILE | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |
@@ -169,7 +169,8 @@ JSON-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` ca

| Keyword | Value |
|-------|-------------------------------------|
| \<host\> | Specify the HDFS NameNode in the \<host\> field. |
| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. |
| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. |
| PROFILE | The `PROFILE` keyword must specify the value `Json`. |
| IDENTIFIER | Include the `IDENTIFIER` keyword and \<value\> in the `LOCATION` string only when accessing a JSON file with multi-line records. \<value\> should identify the member name used to determine the encapsulating JSON object to return. (If the JSON file is the multi-line record Example 2 above, `&IDENTIFIER=created_at` would be specified.) |
| FORMAT | The `FORMAT` clause must specify `CUSTOM`. |
@@ -213,4 +214,4 @@ To query this external table populated with JSON data:

``` sql
SELECT * FROM sample_json_multiline_tbl;
```
```
@@ -53,8 +53,8 @@ FORMAT 'custom' (formatter='pxfwritable_import|pxfwritable_export');

| Parameter | Value and description |
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| host | The HDFS NameNode. |
| port  | Connection port for the PXF service. If the port is omitted, PXF assumes that High Availability (HA) is enabled and connects to the HA name service port, 51200, by default. The HA name service port can be changed by setting the `pxf_service_port` configuration parameter. |
| \<host\> | The PXF host. While \<host\> may identify any PXF agent node, use the HDFS NameNode as it is guaranteed to be available in a running HDFS cluster. If HDFS High Availability is enabled, \<host\> must identify the HDFS NameService. |
| \<port\> | The PXF port. If \<port\> is omitted, PXF assumes \<host\> identifies a High Availability HDFS Nameservice and connects to the port number designated by the `pxf_service_port` server configuration parameter value. Default is 51200. |
| \<path\-to\-data\> | A directory, file name, wildcard pattern, table name, etc. |
| PROFILE | The profile PXF uses to access the data. PXF supports multiple plug-ins that currently expose profiles named `HBase`, `Hive`, `HiveRC`, `HiveText`, `HiveORC`, `HdfsTextSimple`, `HdfsTextMulti`, `Avro`, `SequenceWritable`, and `Json`. |
| FRAGMENTER | The Java class the plug-in uses for fragmenting data. Used for READABLE external tables only. |
@@ -81,7 +81,7 @@ The following table lists some common errors encountered while using PXF:
</tr>
<tr class="odd">
<td>ERROR: fail to get filesystem credential for uri hdfs://&lt;namenode&gt;:8020/</td>
<td>Secure PXF: Wrong HDFS host or port is not 8020 (this is a limitation that will be removed in the next release)</td>
<td>Secure PXF: Wrong HDFS host or port is not 8020</td>
</tr>
<tr class="even">
<td>ERROR: remote component error (413) from '&lt;x&gt;': HTTP status code is 413 but HTTP response string is empty</td>
@@ -165,7 +165,7 @@ The `FORMAT` clause is used to describe how external table files are formatted.
<dd>The data type of the column.</dd>

<dt>LOCATION ('\<protocol\>://\<host\>\[:\<port\>\]/\<path\>/\<file\>' \[, ...\]) </dt>
<dd>For readable external tables, specifies the URI of the external data source(s) to be used to populate the external table or web table. Regular readable external tables allow the `file`, `gpfdist`, and `pxf` protocols. Web external tables allow the `http` protocol. If \<port\> is omitted, the `http` and `gpfdist` protocols assume port `8080` and the `pxf` protocol assumes the \<host\> is a high availability nameservice string. If using the `gpfdist` protocol, the \<path\> is relative to the directory from which `gpfdist` is serving files (the directory specified when you started the `gpfdist` program). Also, the \<path\> can use wildcards (or other C-style pattern matching) in the \<file\> name part of the location to denote multiple files in a directory. For example:
<dd>For readable external tables, specifies the URI of the external data source(s) to be used to populate the external table or web table. Regular readable external tables allow the `file`, `gpfdist`, and `pxf` protocols. Web external tables allow the `http` protocol. If \<port\> is omitted, the `http` and `gpfdist` protocols assume port `8080` and the `pxf` protocol assumes the \<host\> specifies a high availability Nameservice ID. If using the `gpfdist` protocol, the \<path\> is relative to the directory from which `gpfdist` is serving files (the directory specified when you started the `gpfdist` program). Also, the \<path\> can use wildcards (or other C-style pattern matching) in the \<file\> name part of the location to denote multiple files in a directory. For example:

``` pre
'gpfdist://filehost:8081/*'
@@ -183,7 +183,7 @@ For writable external tables, specifies the URI location of the `gpfdist` proces

With two `gpfdist` locations listed as in the above example, half of the segments would send their output data to the `data1.out` file and the other half to the `data2.out` file.

For the `pxf` protocol, the `LOCATION` string specifies the \<host\> and \<port\> of the PXF service, the location of the data, and the PXF plug-ins (Java classes) used to convert the data between storage format and HAWQ format. If the \<port\> is omitted, the \<host\> is taken to be the logical name for the high availability name service and the \<port\> is the value of the `pxf_service_port` configuration variable, 51200 by default. The URL parameters `FRAGMENTER`, `ACCESSOR`, and `RESOLVER` are the names of PXF plug-ins (Java classes) that convert between the external data format and HAWQ data format. The `FRAGMENTER` parameter is only used with readable external tables. PXF allows combinations of these parameters to be configured as profiles so that a single `PROFILE` parameter can be specified to access external data, for example `?PROFILE=Hive`. Additional \<custom-options\>` can be added to the LOCATION URI to further describe the external data format or storage options. For details about the plug-ins and profiles provided with PXF and information about creating custom plug-ins for other data sources see [Using PXF with Unmanaged Data](../../pxf/HawqExtensionFrameworkPXF.html).</dd>
For the `pxf` protocol, the `LOCATION` string specifies the HDFS NameNode \<host\> and the \<port\> of the PXF service, the location of the data, and the PXF profile or Java classes used to convert the data between storage format and HAWQ format. If the \<port\> is omitted, the \<host\> is taken to be the logical name for the high availability Nameservice, and the \<port\> is the value of the `pxf_service_port` configuration parameter, 51200 by default. The URL parameters `FRAGMENTER`, `ACCESSOR`, and `RESOLVER` are the names of PXF plug-ins (Java classes) that convert between the external data format and HAWQ data format. The `FRAGMENTER` parameter is only used with readable external tables. PXF allows combinations of these parameters to be configured as profiles so that a single `PROFILE` parameter can be specified to access external data, for example `?PROFILE=Hive`. Additional \<custom-options\>` can be added to the LOCATION URI to further describe the external data format or storage options. For details about the plug-ins and profiles provided with PXF and information about creating custom plug-ins for other data sources see [Using PXF with Unmanaged Data](../../pxf/HawqExtensionFrameworkPXF.html).</dd>

<dt>EXECUTE '\<command\>' ON ... </dt>
<dd>Allowed for readable web external tables or writable external tables only. For readable web external tables, specifies the OS command to be executed by the segment instances. The \<command\> can be a single OS command or a script. If \<command\> executes a script, that script must reside in the same location on all of the segment hosts and be executable by the HAWQ superuser (`gpadmin`).

0 comments on commit 5714ce5

Please sign in to comment.