use in databricks #236

DivSaru · 2020-01-16T15:35:50Z

Hi,

#It's a question , not an issue.

I need to process a mainframe file in azure databricks , which has certain comp-3 values as well. I have the copy book in cobol for the layout of the schema.

I could not find any reference on how to use this in databricks using pyspark python 3.
Can you please provide a sample code , on how do we integrate/use cobrix in azure databricks.

A prompt reply would be appreciated.
Regards,
Divya

yruslan · 2020-01-16T18:49:13Z

Hi, thanks for the interesting question.

Ideally, it should work like this:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

tr11 · 2020-01-16T18:52:46Z

This is exactly what I'm doing. I have had no problems with pyspark.

…

On Thu, Jan 16, 2020, 13:49 Ruslan Yushchenko ***@***.***> wrote: Hi, thanks for the interesting question. Ideally, it should work like this: from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data') The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#236?email_source=notifications&email_token=AAJ6T2PRAU7PAUSGTLALSYLQ6CT2VA5CNFSM4KHWAD62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJFD5TQ#issuecomment-575291086>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJ6T2LEHKKX37IIZULCEU3Q6CT2VANCNFSM4KHWAD6Q> .

tr11 · 2020-01-16T22:18:50Z

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

All I did to use pyspark was to add the correct jars (spark-cobol, cobol-parser, and scodec) to my Spark jars. After that, loading as @yruslan suggested should work fine.

DivSaru · 2020-01-17T09:02:44Z

@tr11 As i'm new to databricks , can you please guide me on how to add these jars, from where can i get this ? what are the steps for adding this in databricks.
I'll really appreciate your help and guidance in this.

Regards,
Divya

tr11 · 2020-01-17T11:49:28Z

I don't use databricks do I can't try it, but this seems promising:

https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel

poornimavithanage · 2021-12-01T05:57:34Z

Hi,
I need to read a schema in cobol copybook using python in AWS. Is there any suggestions.

yruslan · 2021-12-01T11:24:54Z

You can get Spark schema if you have a DataFrame the same way as in Scala:

df.schema

or

df.schema.treeString

or

df.printSchema

You can get COBOL schema as an AST like this:

val copybook = CopybookParser.parseTree(copyBookContents)
copybook1.generateRecordLayoutPositions

psb2509 · 2022-09-24T05:16:44Z

for databricks its much simpler . All you need to do is go to open your cluster for installing libraries option and install cobrix either by passing the jar maven file verison : za.co.absa.cobrix:spark-cobol_2.12:2.5.1 or by downloading the jar from maven and uploading it to cluster.

yruslan added the question Further information is requested label Jan 16, 2020

dwp0980 mentioned this issue Feb 9, 2023

Cobrix on Python #380

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use in databricks #236

use in databricks #236

DivSaru commented Jan 16, 2020

yruslan commented Jan 16, 2020

tr11 commented Jan 16, 2020 via email

tr11 commented Jan 16, 2020

DivSaru commented Jan 17, 2020

tr11 commented Jan 17, 2020

poornimavithanage commented Dec 1, 2021

yruslan commented Dec 1, 2021

psb2509 commented Sep 24, 2022

use in databricks #236

use in databricks #236

Comments

DivSaru commented Jan 16, 2020

yruslan commented Jan 16, 2020

tr11 commented Jan 16, 2020 via email

tr11 commented Jan 16, 2020

DivSaru commented Jan 17, 2020

tr11 commented Jan 17, 2020

poornimavithanage commented Dec 1, 2021

yruslan commented Dec 1, 2021

psb2509 commented Sep 24, 2022