Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation]: Get Started Experience on Ubuntu #13

Closed
rapoth opened this issue Apr 24, 2019 · 3 comments
Closed

[Documentation]: Get Started Experience on Ubuntu #13

rapoth opened this issue Apr 24, 2019 · 3 comments
Assignees
Labels

Comments

@rapoth
Copy link
Contributor

rapoth commented Apr 24, 2019

At the moment, the Get Started section in README.md talks about how users can get started on a Windows machines using .NET Core. We need a similar set of instructions for Ubuntu.

Success Criteria
User has clear instructions on:

  • Setting up all the pre-requisites
  • Creating a new .NET Core project
  • Obtain the Nuget package on Ubuntu
  • Compiling the .NET app on Ubuntu
  • Running the app using spark-submit
@lqdev
Copy link
Contributor

lqdev commented Apr 25, 2019

Getting Started with Spark.NET on Ubuntu

These instructions help get you started with Spark on Ubuntu 18.04

Download and Install Prerequisites

Install Java

Navigate to the following link and download jdk-8u211-linux-x64.tar.gz.

Then, extract the contents of the tar.gz folder:

tar -xvzf jdk-8u211-linux-x64.tar.gz

Add Java 1.8 to the list of Java versions on your system:

sudo update-alternatives --install "/usr/bin/java" "java" "/home/$USER/jdk1.8.0_211/bin/java" 1500
sudo update-alternatives --install "/usr/bin/javac" "javac" "/home/$USER/jdk1.8.0_211/bin/javac" 1500
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/home/$USER/jdk1.8.0_211/bin/javaws" 1500

Follow the prompts and select 1.8 as the version:

sudo update-alternatives

Install Maven

Enter the following commands into the terminal:

mkdir -p ~/bin/maven
cd ~/bin/maven
wget https://www-us.apache.org/dist/maven/maven-3/3.6.0/binaries/apache-maven-3.6.0-bin.tar.gz
tar -xvzf apache-maven-3.6.0-bin.tar.gz
ln -s apache-maven-3.6.0 current

Install Spark

Download the latest version of Spark:

wget http://apache.mirrors.ionfish.org/spark/spark-2.4.2/spark-2.4.2-bin-hadoop2.7.tgz

Extract the contents of the compressed folder:

tar -xvzf spark-2.4.2-bin-hadoop2.7.tgz

Install .NET Core SDK

Set up the respositories

wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb

Install the SDK

sudo add-apt-repository universe
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-2.1

Install Microsoft.Spark.Worker

wget https://github.com/dotnet/spark/releases/download/v0.1.0/Microsoft.Spark.Worker.netcoreapp2.1.linux-x64-0.1.0.tar.gz

Extract the contents of the compressed directory

tar -xvzf Microsoft.Spark.Worker.netcoreapp2.1.linux-x64-0.1.0.tar.gz

Set Environment Variables

Set up environment variables with the following commands:

echo "export JAVA_HOME=~/jdk1.8.0_211" >> ~/.bashrc
echo "export SPARK_HOME=~/spark-2.4.2-bin-hadoop2.7" >> ~/.bashrc
echo "export DotnetWorkerPath=~/Microsoft.Spark.Worker-0.1.0" >> ~/.bashrc
echo "export M2_HOME=~/bin/maven/current" >> ~/.bashrc
export PATH=${M2_HOME}/bin:$PATH
echo "export PATH=$SPARK_HOME/bin:$PATH" >> ~/.bashrc
source ~/.bashrc

Create Console Application

dotnet new console -o HelloSpark && cd HelloSpark

Install Nuget Package

dotnet add package Microsoft.Spark

Write The Program

Replace the contents of the Program.cs file with the following code:

using System;
using Microsoft.Spark.Sql;
using static Microsoft.Spark.Sql.Functions;

namespace HelloSpark
{
    class Program
    {
        static void Main(string[] args)
        {
            var spark = SparkSession.Builder().GetOrCreate();
            var df = spark.Read().Json("people.json");
            df.Show();
        }
    }
}

Add the following content to your HelloSpark.csproj file:

<ItemGroup>
    <Content Include="people.json">
        <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
    </Content>
</ItemGroup>

Create Data

Inside of the HelloSpark directory, enter the following command

cat << EOF > people.json
{"name":"Michael"} 
{"name":"Andy", "age":30} 
{"name":"Justin", "age":19} 
EOF

Build And Publish The Application

Build and publish the application with the following command:

dotnet publish -f netcoreapp2.1 -r linux-x64 ./HelloSpark.csproj

Run The Application

From the HelloSpark directory, enter the following command to run the application:

spark-submit \
--class org.apache.spark.deploy.DotnetRunner \
--master local \
./bin/Debug/netcoreapp2.1/linux-x64/publish/microsoft-spark-2.4.x-0.1.0.jar \
./bin/Debug/netcoreapp2.1/linux-x64/publish/HelloSpark

@lqdev
Copy link
Contributor

lqdev commented Apr 25, 2019

Added instructions as part of PR #50

@imback82
Copy link
Contributor

Closed by #50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants