Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use in databricks #236

Open
DivSaru opened this issue Jan 16, 2020 · 8 comments
Open

use in databricks #236

DivSaru opened this issue Jan 16, 2020 · 8 comments
Labels
question Further information is requested

Comments

@DivSaru
Copy link

DivSaru commented Jan 16, 2020

Hi,

#It's a question , not an issue.

I need to process a mainframe file in azure databricks , which has certain comp-3 values as well. I have the copy book in cobol for the layout of the schema.

I could not find any reference on how to use this in databricks using pyspark python 3.
Can you please provide a sample code , on how do we integrate/use cobrix in azure databricks.

A prompt reply would be appreciated.
Regards,
Divya

@yruslan yruslan added the question Further information is requested label Jan 16, 2020
@yruslan
Copy link
Collaborator

yruslan commented Jan 16, 2020

Hi, thanks for the interesting question.

Ideally, it should work like this:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

df = spark.read.format('cobol').options(copybook='/path/to/copybook.cob').load('/path/to/data')

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

@tr11
Copy link
Collaborator

tr11 commented Jan 16, 2020 via email

@tr11
Copy link
Collaborator

tr11 commented Jan 16, 2020

The only thing I'm not sure is how to provide the dependency to spark-cobol. Will take a look at how it can be done on a local Spark instance. Hopefully, setting this up in Databrics is similar.

All I did to use pyspark was to add the correct jars (spark-cobol, cobol-parser, and scodec) to my Spark jars. After that, loading as @yruslan suggested should work fine.

@DivSaru
Copy link
Author

DivSaru commented Jan 17, 2020

@tr11 As i'm new to databricks , can you please guide me on how to add these jars, from where can i get this ? what are the steps for adding this in databricks.
I'll really appreciate your help and guidance in this.

Regards,
Divya

@tr11
Copy link
Collaborator

tr11 commented Jan 17, 2020

I don't use databricks do I can't try it, but this seems promising:

https://docs.databricks.com/libraries.html#upload-a-jar-python-egg-or-python-wheel

@poornimavithanage
Copy link

Hi,
I need to read a schema in cobol copybook using python in AWS. Is there any suggestions.

@yruslan
Copy link
Collaborator

yruslan commented Dec 1, 2021

You can get Spark schema if you have a DataFrame the same way as in Scala:

df.schema

or

df.schema.treeString

or

df.printSchema

You can get COBOL schema as an AST like this:

val copybook = CopybookParser.parseTree(copyBookContents)
copybook1.generateRecordLayoutPositions

@psb2509
Copy link

psb2509 commented Sep 24, 2022

for databricks its much simpler . All you need to do is go to open your cluster for installing libraries option and install cobrix either by passing the jar maven file verison : za.co.absa.cobrix:spark-cobol_2.12:2.5.1 or by downloading the jar from maven and uploading it to cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants