-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use in databricks #236
Comments
Hi, thanks for the interesting question. Ideally, it should work like this: from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data') The only thing I'm not sure is how to provide the dependency to |
This is exactly what I'm doing. I have had no problems with pyspark.
…On Thu, Jan 16, 2020, 13:49 Ruslan Yushchenko ***@***.***> wrote:
Hi, thanks for the interesting question.
Ideally, it should work like this:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')
The only thing I'm not sure is how to provide the dependency to
spark-cobol. Will take a look at how it can be done on a local Spark
instance. Hopefully, setting this up in Databrics is similar.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#236?email_source=notifications&email_token=AAJ6T2PRAU7PAUSGTLALSYLQ6CT2VA5CNFSM4KHWAD62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFD5TQ#issuecomment-575291086>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ6T2LEHKKX37IIZULCEU3Q6CT2VANCNFSM4KHWAD6Q>
.
|
All I did to use pyspark was to add the correct jars (spark-cobol, cobol-parser, and scodec) to my Spark jars. After that, loading as @yruslan suggested should work fine. |
@tr11 As i'm new to databricks , can you please guide me on how to add these jars, from where can i get this ? what are the steps for adding this in databricks. Regards, |
I don't use databricks do I can't try it, but this seems promising: https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel |
Hi, |
You can get Spark schema if you have a DataFrame the same way as in Scala: df.schema or df.schema.treeString or df.printSchema You can get COBOL schema as an AST like this: val copybook = CopybookParser.parseTree(copyBookContents)
copybook1.generateRecordLayoutPositions |
for databricks its much simpler . All you need to do is go to open your cluster for installing libraries option and install cobrix either by passing the jar maven file verison : za.co.absa.cobrix:spark-cobol_2.12:2.5.1 or by downloading the jar from maven and uploading it to cluster. |
Hi,
#It's a question , not an issue.
I need to process a mainframe file in azure databricks , which has certain comp-3 values as well. I have the copy book in cobol for the layout of the schema.
I could not find any reference on how to use this in databricks using pyspark python 3.
Can you please provide a sample code , on how do we integrate/use cobrix in azure databricks.
A prompt reply would be appreciated.
Regards,
Divya
The text was updated successfully, but these errors were encountered: