Skip to content

Commit

Permalink
HCATALOG-20: Create a package target. New files.
Browse files Browse the repository at this point in the history
git-svn-id: https://svn.apache.org/repos/asf/incubator/hcatalog/trunk@1130209 13f79535-47bb-0310-9956-ffa450edef68
  • Loading branch information
ashutoshc committed Jun 1, 2011
1 parent 6471584 commit 881bda4
Show file tree
Hide file tree
Showing 10 changed files with 3,128 additions and 0 deletions.
404 changes: 404 additions & 0 deletions LICENSE.txt

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions NOTICE.txt
@@ -0,0 +1,10 @@
Apache HCatalog
Copyright 2011 The Apache Software Foundation

This product includes/uses software developed by The Apache Software
Foundation (http://www.apache.org/).
Apache HCatalog
Copyright 2011 The Apache Software Foundation

This product includes/uses software developed by The Apache Software
Foundation (http://www.apache.org/).
76 changes: 76 additions & 0 deletions README.txt
@@ -0,0 +1,76 @@
Apache HCatalog
===============
HCatalog is a table and storage management service for data created using Apache
Hadoop.

The vision of HCatalog is to provide table management and storage management layers
for Apache Hadoop. This includes:

* Providing a shared schema and data type mechanism.
* Providing a table abstraction so that users need not be concerned with where
or how their data is stored.
* Providing interoperability across data processing tools such as Pig, Map
Reduce, Streaming, and Hive.

Data processors using Apache Hadoop have a common need for table management
services. The goal of this table management service is to track data that exists in
a Hadoop grid and present that data to users in a tabular format. HCatalog
provides a single input and output format to users so that individual users need
not be concerned with the storage formats that are chosen for particular data
sets. Data is described by a schema and shares a datatype system.

Users are free to choose the best tools for their use cases. The Hadoop project
includes Map Reduce, Streaming, Pig, and Hive, and additional tools exist such
as Cascading. Each of these tools has users who prefer it, and there are use
cases best addressed by each of these tools. Two users on the same grid who
share data are not constrained to use the same tool but with HCatalog are free
to choose the best tool for their use case. HCatalog presents data in the same
way to all of the tools, providing interfaces to each of them.

For the latest information about HCatalog, please visit our website at:

http://incubator.apache.org/hcatalog

and our wiki, at:

https://cwiki.apache.org/confluence/display/HCATALOG


Apache HCatalog
===============
HCatalog is a table and storage management service for data created using Apache
Hadoop.

The vision of HCatalog is to provide table management and storage management layers
for Apache Hadoop. This includes:

* Providing a shared schema and data type mechanism.
* Providing a table abstraction so that users need not be concerned with where
or how their data is stored.
* Providing interoperability across data processing tools such as Pig, Map
Reduce, Streaming, and Hive.

Data processors using Apache Hadoop have a common need for table management
services. The goal of this table management service is to track data that exists in
a Hadoop grid and present that data to users in a tabular format. HCatalog
provides a single input and output format to users so that individual users need
not be concerned with the storage formats that are chosen for particular data
sets. Data is described by a schema and shares a datatype system.

Users are free to choose the best tools for their use cases. The Hadoop project
includes Map Reduce, Streaming, Pig, and Hive, and additional tools exist such
as Cascading. Each of these tools has users who prefer it, and there are use
cases best addressed by each of these tools. Two users on the same grid who
share data are not constrained to use the same tool but with HCatalog are free
to choose the best tool for their use case. HCatalog presents data in the same
way to all of the tools, providing interfaces to each of them.

For the latest information about HCatalog, please visit our website at:

http://incubator.apache.org/hcatalog

and our wiki, at:

https://cwiki.apache.org/confluence/display/HCATALOG


70 changes: 70 additions & 0 deletions RELEASE_NOTES.txt
@@ -0,0 +1,70 @@
These notes are for HCatalog 0.1.0 release.

Highlights
==========

This is the initial relase of Apache HCatalog. It provides read and write capability for Pig and Hadoop, and read capability for Hive.

System Requirements
===================

1. Java 1.6.x or newer, preferably from Sun. Set JAVA_HOME to the root of your
Java installation
2. Ant build tool, version 1.8 or higher: http://ant.apache.org - to build
source only
3. This release is compatible with Hadoop 0.20.x with security. Currently this
is available from Cloudera in their CDH3 release or from the 0.20.203 branch
of Apache Hadoop (not yet released).
4. This release is compatible with Pig 0.8.1.
5. This release is compatible with Hive 0.7.0.

Trying the Release
==================
1. Download hcatalog-0.1.0.tar.gz
2. Unpack the file: tar -xzvf hcatalog-0.1.0.tar.gz
3. Move into the installation directory: cd hcatalog-0.1.0
TODO need install instructions
4. To use with Hadoop MapReduce jobs, use the HCatInputFormat and
HCatOutputFormat classes.
5. To use with Pig, use the HCatLoader and HCatStorer classes.
6. To use the command line interface, set HADOOP_CLASSPATH to the directory
that contains the configuration files for your cluster, and use bin/hcat.sh

Relevant Documentation
======================
See http://incubator.apache.org/hcatalog/docs/r0.1.0
These notes are for HCatalog 0.1.0 release.

Highlights
==========

This is the initial relase of Apache HCatalog. It provides read and write capability for Pig and Hadoop, and read capability for Hive.

System Requirements
===================

1. Java 1.6.x or newer, preferably from Sun. Set JAVA_HOME to the root of your
Java installation
2. Ant build tool, version 1.8 or higher: http://ant.apache.org - to build
source only
3. This release is compatible with Hadoop 0.20.x with security. Currently this
is available from Cloudera in their CDH3 release or from the 0.20.203 branch
of Apache Hadoop (not yet released).
4. This release is compatible with Pig 0.8.1.
5. This release is compatible with Hive 0.7.0.

Trying the Release
==================
1. Download hcatalog-0.1.0.tar.gz
2. Unpack the file: tar -xzvf hcatalog-0.1.0.tar.gz
3. Move into the installation directory: cd hcatalog-0.1.0
TODO need install instructions
4. To use with Hadoop MapReduce jobs, use the HCatInputFormat and
HCatOutputFormat classes.
5. To use with Pig, use the HCatLoader and HCatStorer classes.
6. To use the command line interface, set HADOOP_CLASSPATH to the directory
that contains the configuration files for your cluster, and use bin/hcat.sh

Relevant Documentation
======================
See http://incubator.apache.org/hcatalog/docs/r0.1.0
36 changes: 36 additions & 0 deletions conf/jndi.properties
@@ -0,0 +1,36 @@
## ---------------------------------------------------------------------------
## Licensed to the Apache Software Foundation (ASF) under one or more
## contributor license agreements. See the NOTICE file distributed with
## this work for additional information regarding copyright ownership.
## The ASF licenses this file to You under the Apache License, Version 2.0
## (the "License"); you may not use this file except in compliance with
## the License. You may obtain a copy of the License at
##
## http://www.apache.org/licenses/LICENSE-2.0
##
## Unless required by applicable law or agreed to in writing, software
## distributed under the License is distributed on an "AS IS" BASIS,
## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
## See the License for the specific language governing permissions and
## limitations under the License.
## ---------------------------------------------------------------------------

# If ActiveMQ is used then uncomment following properties, else substitute it accordingly.
#java.naming.factory.initial = org.apache.activemq.jndi.ActiveMQInitialContextFactory

# use the following property to provide location of MQ broker.
#java.naming.provider.url = tcp://localhost:61616

# use the following property to specify the JNDI name the connection factory
# should appear as.
#connectionFactoryNames = connectionFactory, queueConnectionFactory, topicConnectionFactry

# register some queues in JNDI using the form
# queue.[jndiName] = [physicalName]
# queue.MyQueue = example.MyQueue


# register some topics in JNDI using the form
# topic.[jndiName] = [physicalName]
# topic.MyTopic = example.MyTopic

106 changes: 106 additions & 0 deletions conf/proto-hive-site.xml
@@ -0,0 +1,106 @@
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<configuration>

<property>
<name>hive.metastore.local</name>
<value>false</value>
<description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://DBHOSTNAME/hivemetastoredb?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>PASSWORD</value>
<description>password to use against metastore database</description>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>WAREHOUSE_DIR</value>
<description>location of default database for the warehouse</description>
</property>

<property>
<name>hive.metastore.sasl.enabled</name>
<value>true</value>
<description>If true, the metastore thrift interface will be secured with SASL. Clients must authenticate with Kerberos.</description>
</property>

<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>KEYTAB_PATH</value>
<description>The path to the Kerberos Keytab file containing the metastore thrift server's service principal.</description>
</property>

<property>
<name>hive.metastore.kerberos.principal</name>
<value>KERBEROS_PRINCIPAL</value>
<description>The service principal for the metastore thrift server. The special string _HOST will be replaced automatically with the correct host name.</description>
</property>

<property>
<name>hive.metastore.cache.pinobjtypes</name>
<value>Table,Database,Type,FieldSchema,Order</value>
<description>List of comma separated metastore object types that should be pinned in the cache</description>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://SVRHOST:3306</value>
<description>URI for client to contact metastore server</description>
</property>

<property>
<name>hive.semantic.analyzer.factory.impl</name>
<value>org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory</value>
<description>controls which SemanticAnalyzerFactory implemenation class is used by CLI</description>
</property>

<property>
<name>hadoop.clientside.fs.operations</name>
<value>true</value>
<description>FS operations are owned by client</description>
</property>

<property>
<name>hive.metastore.client.socket.timeout</name>
<value>60</value>
<description>MetaStore Client socket timeout in seconds</description>
</property>

</configuration>

0 comments on commit 881bda4

Please sign in to comment.