Skip to content

HariharanRaveendar/python_hive_connect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

python_hive_connect

Download java jdk in version of 1.8

Download Hadoop version 2.8.0 https://hadoop.apache.org/release/2.8.0.html download the tar.gz file and then we need to do some configuration in hadoop files.

Hadoop installation

These files are list out in hadoop2.8.0/etc/hadoop/

  1. coresite.xml
  2. hdfssite.xml
  3. yarnsite.xml
  4. mapredsite.xml
  5. hadoopenv.cmd

Do configuration like this:

coresite.xml

<configuration>
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://localhost:9000</value>
   </property>
</configuration>

image

hdfssite.xml

To edit this file before You need to create one directory in hadoop-2.8.0 and named it as data:

  1. In that folder you need create two more folders one is namenode and datanode
<configuration>
   <property>
       <name>dfs.replication</name>
       <value>1</value>
   </property>
   <property>
       <name>dfs.namenode.name.dir</name>
       <value>/hadoop-2.8.0/data/namenode</value>
   </property>
   <property>
       <name>dfs.datanode.data.dir</name>
       <value>/hadoop-2.8.0/data/datanode</value>
   </property>
</configuration>

image

yarnsite.xml

<configuration>
<!-- Site specific YARN configuration properties -->
   <property>
    	<name>yarn.nodemanager.aux-services</name>
    	<value>mapreduce_shuffle</value>
   </property>
   <property>
      	<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>  
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

image

mapredsite.xml

<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

image

hadoopenv.cmd

  1. In this file you need put java path

set JAVA_HOME=C:\Java\jdk1.8.0_202

image

After These you have to add some files in hadoop-2.8.0/bin/

  1. Download the zip folder and extract it copy the bin folder and replace the bin in your hadoop-2.8.0/bin http://backend.onstep.in/hadoopconfig/

After All Doing this you need to add the path into the environment_variables

image

image

Atlast we come here to check hadoop is currently located in path

  1. Open the Command promt in administrator mode.
  2. image
  3. Here finally hadoop is currently located in path
  4. The Next step you need to format your namenode in the comand promt
  5. hdfs namenode -format
  6. image
  7. image
  8. Atlast we have to start hadoop.
  9. start-all.cmd
  10. After this command four command prompt will be open namenode, datanode,nodemanager,yarnresoursemanager

Namenode

image

Datanode

image

Nodemanager

image

YarnResourseManager

image

Next Thing is Hive Installation

1.Download hive 2.1.1 https://archive.apache.org/dist/hive/hive-2.1.1/ download the tar.gz file 2.Download derby 10.12.1.1 https://archive.apache.org/dist/db/derby/db-derby-10.12.1.1/ download the tar.gz file

Next Step

Extrat those download files.

Go to derby folder copy the lib folder and paste it in the hive folder Go to hive folder in that hive folder go to conf folder create new file with the name of hive-site.xml And then pase these data into that file

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property> <name>javax.jdo.option.ConnectionURL</name> 
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value> 
<description>JDBC connect string for a JDBC metastore</description>
</property><property> 
<name>javax.jdo.option.ConnectionDriverName</name> 
<value>org.apache.derby.jdbc.ClientDriver</value> 
<description>Driver class name for a JDBC metastore</description>
</property>
<property> 
<name>hive.server2.enable.impersonation</name> 
<description>Enable user impersonation for HiveServer2</description>
<value>false</value>
</property>
<property> 
<name>hive.server2.enable.doAs</name> 
<description>Enable user impersonation for HiveServer2</description>
<value>false</value>
</property>
<property>
<name>hive.server2.authentication</name> 
<value>NOSASL</value>
<description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider (Use with property hive.server2.custom.authentication.class) </description>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
</configuration>

And then add path to environment variable

1.Add derby path and hive path in environment varibles

image image

After finishing these step we can able to run hive

First Ensure that you hadoop is running or not, if you not run the hadoop means run the hadoop Second Thing you need start the derby server using this comman startNetworkServer -h 0.0.0.0 image

Now you are ready to start hive

image

Hive DataBase Connecting to Python

While using some pip you will face some erros(Visual Studio Build errors) To overcome this error we going to use one package[bitarray]

Go to this Website https://www.lfd.uci.edu/~gohlke/pythonlibs/#bitarray go to bitarray area download the file which realated to your python version.

image

  1. And then go to the file downloaded area open the command promt and do install using pip commands

For example pip install bitarray-1.7.1-cp39-cp39-win_amd64.whl

pip install impyla

pip install thrift_sasl

All Packages Installed Perfectly

First Ensure that you hadoop,derby,hive is running or not, if you not run the hadoop,derby,hive means run the hadoop,derby,hive

Next Step you need to hive server with the help of this command hive --service hiveserver2.

image

Finally Use These commad to connect hive in python

import impala

from impala.dbapi import connect

c=connect(port=10000).cursor()

c.execute("show databases")

c.fetchall() image

About

No description, website, or topics provided.

Resources

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published