How to install Spark 2.1.0 in Windows 10 environment

Several steps include to install Spark on your local machine.

0. Pre-requirement: Install JVM in your environment.

You can download the latest version from

http://www.oracle.com/technetwork/java/javase/downloads/index.html

JRE (Java Runtime Environment) will be good for this task. If you are a Java developer, then you can select any other options by your own requirements.

1. Download Spark 2.1.0 installation.

http://spark.apache.org/downloads.html You may select the last build. As of today, it is 2.1.0 with prebuild Hadoop 2.7 Just click the download link to get the package.

2. Unzip and extract your download into a local folder.

Due to the installation is packaged by gzip then tar. So you will need to unpack it by any zip tools to get a "spark-2.1.0-bin-hadoop2.7". Rename it to spark-2.1.0-bin-hadoop2.7.zip and unzip it again to get all material.

3. Download / Install Hadoop 2.7.1 binary for windows 10

Download Hadoop binary from below link and unzip/extract into your local folder http://hadoop.apache.org/releases.html Because this version does not include windows 10 binary, you may refer below link to download a pre-build version for Windows environment. This is a 64-bit version. Download everything to put into/replace your /somewhere/hadoop-2.7.1/bin/ folder. https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin

There is a good article for Hadoop installation for your reference. http://toodey.com/2015/08/10/hadoop-installation-on-windows-without-cygwin-in-10-mints/

4. Configure your environment.

4.1 From windows logo=>search to launch: "Search advanced system settings" program-> click the button of "Environment Variables" 4.2 Change below environment variables according to your software version and location. This is an example.

JAVA_HOME=C:\Program Files\Java\jre1.8.0_121

SPARK_HOME=C:\somewhere\spark-2.1.0-bin-hadoop2.7

HADOOP_HOME=C:\somewhere\hadoop-2.7.3

4.3 Append below variable into "Path"

%SPARK_HOME%\bin

5. Grant permission to temp folder

Create a temp folder under c:\tmp\hive

Execute below command as administrator.

winutils.exe chmod 777 C:\tmp\hive

6. Try it.

Go to the c:\somewhere\spark-2.1.0-bin-hadoop2.7\bin\

execute "spark-shell", "pyspark" or "spark-submit <app_name>" for your program.

Hope everything good. If you found some error messages related to HiveSessionState, you may try to execute command as administrator to avoid it.

Reference website:

https://hernandezpaul.wordpress.com/2016/01/24/apache-spark-installation-on-windows-10/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly