python_hive_connect

Download java jdk in version of 1.8

Download Hadoop version 2.8.0 https://hadoop.apache.org/release/2.8.0.html download the tar.gz file and then we need to do some configuration in hadoop files.

Hadoop installation

These files are list out in hadoop2.8.0/etc/hadoop/

coresite.xml
hdfssite.xml
yarnsite.xml
mapredsite.xml
hadoopenv.cmd

Do configuration like this:

coresite.xml

<configuration>
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://localhost:9000</value>
   </property>
</configuration>

hdfssite.xml

To edit this file before You need to create one directory in hadoop-2.8.0 and named it as data:

In that folder you need create two more folders one is namenode and datanode

<configuration>
   <property>
       <name>dfs.replication</name>
       <value>1</value>
   </property>
   <property>
       <name>dfs.namenode.name.dir</name>
       <value>/hadoop-2.8.0/data/namenode</value>
   </property>
   <property>
       <name>dfs.datanode.data.dir</name>
       <value>/hadoop-2.8.0/data/datanode</value>
   </property>
</configuration>

yarnsite.xml

<configuration>
<!-- Site specific YARN configuration properties -->
   <property>
    	<name>yarn.nodemanager.aux-services</name>
    	<value>mapreduce_shuffle</value>
   </property>
   <property>
      	<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>  
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

mapredsite.xml

<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

hadoopenv.cmd

In this file you need put java path

set JAVA_HOME=C:\Java\jdk1.8.0_202

After These you have to add some files in hadoop-2.8.0/bin/

Download the zip folder and extract it copy the bin folder and replace the bin in your hadoop-2.8.0/bin http://backend.onstep.in/hadoopconfig/

After All Doing this you need to add the path into the environment_variables

Atlast we come here to check hadoop is currently located in path

Open the Command promt in administrator mode.
Here finally hadoop is currently located in path
The Next step you need to format your namenode in the comand promt
hdfs namenode -format
Atlast we have to start hadoop.
start-all.cmd
After this command four command prompt will be open namenode, datanode,nodemanager,yarnresoursemanager

Namenode

Datanode

Nodemanager

YarnResourseManager

Next Thing is Hive Installation

1.Download hive 2.1.1 https://archive.apache.org/dist/hive/hive-2.1.1/ download the tar.gz file 2.Download derby 10.12.1.1 https://archive.apache.org/dist/db/derby/db-derby-10.12.1.1/ download the tar.gz file

Next Step

Extrat those download files.

Go to derby folder copy the lib folder and paste it in the hive folder Go to hive folder in that hive folder go to conf folder create new file with the name of hive-site.xml And then pase these data into that file

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration><property> <name>javax.jdo.option.ConnectionURL</name> 
<value>jdbc:derby://localhost:1527/metastore_db;create=true</value> 
<description>JDBC connect string for a JDBC metastore</description>
</property><property> 
<name>javax.jdo.option.ConnectionDriverName</name> 
<value>org.apache.derby.jdbc.ClientDriver</value> 
<description>Driver class name for a JDBC metastore</description>
</property>
<property> 
<name>hive.server2.enable.impersonation</name> 
<description>Enable user impersonation for HiveServer2</description>
<value>false</value>
</property>
<property> 
<name>hive.server2.enable.doAs</name> 
<description>Enable user impersonation for HiveServer2</description>
<value>false</value>
</property>
<property>
<name>hive.server2.authentication</name> 
<value>NOSASL</value>
<description> Client authentication types. NONE: no authentication check LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider (Use with property hive.server2.custom.authentication.class) </description>
</property>
<property>
<name>datanucleus.autoCreateTables</name>
<value>True</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
</configuration>

And then add path to environment variable

1.Add derby path and hive path in environment varibles

After finishing these step we can able to run hive

First Ensure that you hadoop is running or not, if you not run the hadoop means run the hadoop Second Thing you need start the derby server using this comman startNetworkServer -h 0.0.0.0

Now you are ready to start hive

Hive DataBase Connecting to Python

While using some pip you will face some erros(Visual Studio Build errors) To overcome this error we going to use one package[bitarray]

Go to this Website https://www.lfd.uci.edu/~gohlke/pythonlibs/#bitarray go to bitarray area download the file which realated to your python version.

And then go to the file downloaded area open the command promt and do install using pip commands

For example pip install bitarray-1.7.1-cp39-cp39-win_amd64.whl

pip install impyla

pip install thrift_sasl

All Packages Installed Perfectly

First Ensure that you hadoop,derby,hive is running or not, if you not run the hadoop,derby,hive means run the hadoop,derby,hive

Next Step you need to hive server with the help of this command hive --service hiveserver2.

Finally Use These commad to connect hive in python

import impala

from impala.dbapi import connect

c=connect(port=10000).cursor()

c.execute("show databases")

c.fetchall()

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

python_hive_connect

Hadoop installation

coresite.xml

hdfssite.xml

yarnsite.xml

mapredsite.xml

hadoopenv.cmd

After These you have to add some files in hadoop-2.8.0/bin/

After All Doing this you need to add the path into the environment_variables

Atlast we come here to check hadoop is currently located in path

Namenode

Datanode

Nodemanager

YarnResourseManager

Next Thing is Hive Installation

Next Step

And then add path to environment variable

After finishing these step we can able to run hive

Now you are ready to start hive

Hive DataBase Connecting to Python

All Packages Installed Perfectly

Finally Use These commad to connect hive in python

About

Releases

Packages

HariharanRaveendar/python_hive_connect

Folders and files

Latest commit

History

Repository files navigation

python_hive_connect

Hadoop installation

coresite.xml

hdfssite.xml

yarnsite.xml

mapredsite.xml

hadoopenv.cmd

After These you have to add some files in hadoop-2.8.0/bin/

After All Doing this you need to add the path into the environment_variables

Atlast we come here to check hadoop is currently located in path

Namenode

Datanode

Nodemanager

YarnResourseManager

Next Thing is Hive Installation

Next Step

And then add path to environment variable

After finishing these step we can able to run hive

Now you are ready to start hive

Hive DataBase Connecting to Python

All Packages Installed Perfectly

Finally Use These commad to connect hive in python

About

Resources

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Packages