azure-hail

Hail On Azure using HDInsights

Instructions

Create an Azure Managed Identity
Create an Azure Data Lake Storage Gen2 instance
Add the Managed Identity to be the storage instance with "Storage Blob Data Owner" role
Create an HDInsight Cluster with the type of "Spark" with version 2.4
Add a Script Action to the configuration tab with a link to a copy of hail-install.sh for both Head and Worker Nodes
Wait for the cluster to reach a "Running" status
Open the Apache Ambari Dashboard on the HDInsight Cluster
Go to the page for Spark2 by clicking it on the left panel
Open the config tab and expand the Advanced livy2-env tab
Append the following to the bottom of the configuration content box:

export PYSPARK_PYTHON=/usr/bin/anaconda/envs/hail/bin/python3.7
export PYSPARK3_PYTHON=/usr/bin/anaconda/envs/hail/bin/python3.7
export PYSPARK_DRIVER_PYTHON=/usr/bin/anaconda/envs/hail/bin/python3.7

Expand the Advanced spark2-env tab 12: Replace the last line exporting PYSPARK_PYTHON with the following line:

export PYSPARK_PYTHON=${PYSPARK_PYTHON:-/usr/bin/anaconda/envs/hail/bin/python3.7}

Append the following to the end of the line starting with export SPARK_DIST_CLASSPATH=...:

/usr/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/hail-all-spark.jar:

Expand the Custom spark2-defaults tab
Click Add Property ... at the bottom of the list
Input spark.kryo.registrator for the key property
Input is.hail.kryo.HailKryoRegistrator for the value property
Select Text for the property type
Click Add to append to the end of the list
Do the same (steps 15 - 19) for the next 3 items:

Key	Value
spark.jars	/usr/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
spark.driver.extraClassPath	/usr/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
spark.yarn.dist.jars	/usr/bin/anaconda/envs/hail/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
spark.executer.extraClassPath	./hail-all-spark.jar

Save the changes and restart the affected services. These changes need a restart of Spark2 service. Ambari UI will prompt a required restart reminder, click Restart to restart all affected services.
To add the new virtual environment to jupyter notebook, go to the Azure dashboard for your HDInsight cluster
Open the Script Actions Tab
Add a new custom script action only for the header nodes to set the jupyter notebook instance to use the new hail environment with a link to a copy of set-jupyter-to-hail.sh
Wait for the script to complete
You have successfully completed configuration of the cluster to run hail

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
hail-install.sh		hail-install.sh
set-jupyter-to-hail.sh		set-jupyter-to-hail.sh
update-hail.sh		update-hail.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

azure-hail

Instructions

About

Releases

Packages

Languages

TheEagleByte/azure-hail

Folders and files

Latest commit

History

Repository files navigation

azure-hail

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages