- Hadoop输入根据InputFormat作为抽象接口,并且提供RecordReader接口从每个划分中读取数据。提供OutputFormat作为输出接口。
- Hadoop输入输出都是物理文件的情况很正常,不过也支持存储在NoSQL等其他方式的存储介质中。
-
HiveStorageHandler是Hive用于连接如Hbase、Cassandra等类似的NoSQL存储的主要接口。
-
定制一个Storage handler需要指定如下条件
- input format
- output format
- serde
- metdata hooks,用于使外部列和表信息与Hive的metastore保持同步
- 在访问该处理程序存储的表的map / reduce作业上设置配置属性的规则
-
在NoSQL中处理Hive时,NoSQL系统的资源消耗、执行效率要比常规的基于HDFS的Hive和MapReduce job要慢,主要因为服务器socket连接资源消耗对底层多个文件的合并过程,而从HDFS中访问时完全顺序IO,顺序IO在现代磁盘时非常快的。
- 使用HiveQL执行Hbase表的Hive表
CREATE TABLE hbase_stocks(key INT, name STRING, price FLOAT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,stock:val")
TBLPROPERTIES ("hbase.table.name" = "stocks");
- 设置
hive.metastore.authorization.storage.checks
为ture,如果用户没有权限删除表底层文件,Hive就会阻止。
- 开启授权模块,
hive.security.authorization.enabled
,默认为false。 hive.security.authorization.createtable.owner.grants
;默认为null,使得用户无法访问自己的表。
- GRANT CREATE ON DATABASE To User username;基于用户创建表的权限。
- Show Grant user username on database default;查看用户在具体的数据库中的权限。
- grant select on table test to group groupname; 赋予组查询权限 。
- create role select_user_table;
- grant role select_user_table to user username;
- grant select on table tablename to role select_user_table;
- 默认情况下,是在表级别授予权限的。分区级别也可以进行权限授予
hive> CREATE TABLE authorize_part (key INT, value STRING)
> PARTITIONED BY (ds STRING);
hive> ALTER TABLE authorization_part
> SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE");
Authorization failed:No privilege 'Alter' found for inputs
{database:default, table:authorization_part}.
Use show grant to get more details.
hive> GRANT ALTER ON table authorization_part to user edward;
hive> ALTER TABLE authorization_part
> SET TBLPROPERTIES ("PARTITION_LEVEL_PRIVILEGE"="TRUE");
hive> GRANT SELECT ON TABLE authorization_part TO USER edward;
hive> ALTER TABLE authorization_part ADD PARTITION (ds='3');
hive> ALTER TABLE authorization_part ADD PARTITION (ds='4');
hive> SELECT * FROM authorization_part WHERE ds='3';
hive> REVOKE SELECT ON TABLE authorization_part partition (ds='3') FROM USER edward;
hive> SELECT * FROM authorization_part WHERE ds='3';
Authorization failed:No privilege 'Select' found for inputs
{ database:default, table:authorization_part, partitionName:ds=3, columnName:key}.
Use show grant to get more details.
- 通过hive.security.authorization.createtable.owner.grants中定义自动授予用户创建表后拥有的权限的范围,针对全部用户
<property>
<name>hive.security.authorization.createtable.owner.grants</name>
<value>select,drop</value>
</property>
- hive.security.authorization.createtable.user.grants,针对创建表后为单独用户指定权限
<property>
<name>hive.security.authorization.createtable.user.grants</name>
<value>admin1,edward:select;user1:create</value>
</property>
- hive.security.authorization.createtable.role.grants.针对创建表后为单独组指定权限
- hive配置zookeeper
<property>
<name>hive.zookeeper.quorum</name>
<value>zk1.site.pvt,zk1.site.pvt,zk1.site.pvt</value>
<description>The list of zookeeper servers to talk to.
This is only needed for read/write locks.</description>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
<description>Whether Hive supports concurrency or not.
A Zookeeper instance must be up and running for the default
Hive lock manager to support read-write locks.</description>
</property>
- 创建显式锁
# 上锁
lock table tablename exclusive;
# 解锁
unlock table tablename ;