Skip to content

Commit

Permalink
minor fixes in README (#216)
Browse files Browse the repository at this point in the history
  • Loading branch information
beltran committed May 2, 2023
1 parent 5ef3a3e commit eac395f
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions README.md
Expand Up @@ -2,15 +2,15 @@
[![Build Status](https://travis-ci.com/beltran/gohive.svg?branch=master)](https://travis-ci.com/beltran/gohive) [![Coverage Status](https://coveralls.io/repos/github/beltran/gohive/badge.svg?branch=master)](https://coveralls.io/github/beltran/gohive?branch=master)


GoHive is a driver for Hive and the [Spark Distributed SQL Engine](https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html) in go that supports connection mechanisms KERBEROS(Gssapi Sasl), NONE(Plain Sasl), LDAP, CUSTOM and NOSASL, both for binary and http transport, with and without SSL. The kerberos mechanism will pick a different authentication level depending on `hive.server2.thrift.sasl.qop`.
GoHive is a driver for Hive and the [Spark Distributed SQL Engine](https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html) in go that supports connection mechanisms KERBEROS(Gssapi Sasl), NONE(Plain Sasl), LDAP, CUSTOM and NOSASL, both for binary and HTTP transport, with and without SSL. The kerberos mechanism will pick a different authentication level depending on `hive.server2.thrift.sasl.qop`.

## Installation
Gohive can be installed with:
GoHive can be installed with:
```
go get github.com/beltran/gohive
```

To add kerberos support gohive requires header files to build against the GSSAPI C library. They can be installed with:
To add kerberos support GoHive requires header files to build against the GSSAPI C library. They can be installed with:
- Ubuntu: `sudo apt-get install libkrb5-dev`
- MacOS: `brew install homebrew/dupes/heimdal --without-x11`
- Debian: `yum install -y krb5-devel`
Expand Down Expand Up @@ -53,7 +53,7 @@ go get -tags kerberos github.com/beltran/gohive
connection.Close()
```

`cursor.HasMore` may query hive for more rows if not all of them have been received. Once the row is
`cursor.HasMore` may query Hive for more rows if not all of them have been received. Once the row is
read is discarded from memory so as long as the fetch size is not too big there's no limit to how much
data can be queried.

Expand Down Expand Up @@ -95,7 +95,7 @@ This implies setting in hive-site.xml:
Binary transport mode is supported for this three options(PLAIN, KERBEROS and NOSASL). Http transport is supported for PLAIN and KERBEROS:
``` go
configuration := NewConnectConfiguration()
configuration.HttpPath = "cliservice" // this is the default path in hive configuration.
configuration.HttpPath = "cliservice" // this is the default path in Hive configuration.
configuration.TransportMode = "http"
configuration.Service = "hive"

Expand All @@ -113,7 +113,7 @@ A connection can be made using zookeeper:
```go
connection, errConn := ConnectZookeeper("zk1.example.com:2181,zk2.example.com:2181", "NONE", configuration)
```
The last two parameters determine how the connection to hive will be made once the hive hosts are retrieved from zookeeper.
The last two parameters determine how the connection to Hive will be made once the Hive hosts are retrieved from zookeeper.

## NULL values
For example if a `NULL` value is in a row, the following operations would put `0` into `i`:
Expand All @@ -133,11 +133,11 @@ cursor.FetchOne(context.Background(), &i)
```
Alternatively, using the rowmap API, `m := cursor.RowMap(context.Background())`,
`m` would be `map[string]interface{}{"table_name.column_name": nil}` for a `NULL` value. It will return a map
where the keys are `table_name.column_name`. This works fine with hive but using [Spark Thirft SQL server](https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html) `table_name` is not present and the keys are `column_name` and it can [lead to problems](https://github.com/beltran/gohive/issues/120) if two tables have the same column name so the `FetchOne` API should be used in this case.
where the keys are `table_name.column_name`. This works fine with Hive but using [Spark Thirft SQL server](https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html) `table_name` is not present and the keys are `column_name` and it can [lead to problems](https://github.com/beltran/gohive/issues/120) if two tables have the same column name so the `FetchOne` API should be used in this case.

## Running tests
Tests can be run with:
```
./scripts/integration
```
This uses [dhive](https://github.com/beltran/dhive) and it will start two docker instances with hive and kerberos. `kinit`, `klist`, `kdestroy` have to be installed locally. `hs2.example.com` will have to be an alias for 127.0.0.1 in `/etc/hosts`. The krb5 configuration file should be created with `bash scripts/create_krbconf.sh`. Overall the [steps used in the travis CI](https://github.com/beltran/gohive/blob/ec69b5601829296a56ca0558693ed30c11180a94/.travis.yml#L24-L46) can be followed.
This uses [dhive](https://github.com/beltran/dhive) and it will start two docker instances with Hive and Kerberos. `kinit`, `klist`, `kdestroy` have to be installed locally. `hs2.example.com` will have to be an alias for 127.0.0.1 in `/etc/hosts`. The krb5 configuration file should be created with `bash scripts/create_krbconf.sh`. Overall the [steps used in the travis CI](https://github.com/beltran/gohive/blob/ec69b5601829296a56ca0558693ed30c11180a94/.travis.yml#L24-L46) can be followed.

0 comments on commit eac395f

Please sign in to comment.