Skip to content

Correct process to download and install libraries offline #6522

@grajee-everest

Description

@grajee-everest

What is the correct way to download the sparknlp library for offline installation/deployment?

We have a SQLServer Big Data Cluster based on Kubernetes. Unlike Databricks, there is no library management feature and because of this we have to manually download the libraries, copy them to the hdfs location and include them in pyspark using sc.addFile.

So, the command I used to download sparknlp is "pip install spark-nlp -t spark-nlp". Since spark-nlp is not working I tried downloading the files from the link and untarred the files and compared the folders and from the two different methods and they seem to be different.

Nevertheless, I tried both the methods and they both are not working in our environment. Please refer to the Issue - 6506

image

I just want to be clear as to how/which is the right way to download libraries for offline usage. This is not an issue in Databricks because of its robust library management UI.

Environment:
SQLServer 2019 BDC

All the environment details are here :

Microsoft Spark Runtime 2021.1
Spark: 3.1.2
Delta Lake: 1.0.0
Java: Azul Zulu JRE 1.8.0_275
Scala: 2.12
Python: 3.8 (miniforge 4.9)
R: Microsoft R 3.5.2
Spark SQL Connector: 1.2.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions