Skip to content

AshHimself/etl-pipeline-from-pdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ETL pipeline from a PDF document and a simple data quality test

A quick example on how you can ingest a PDF document with a table into a Pandas Dataframe and then run some asserts using the Great Expectations package.

What we built

About

How to scrape a table within a PDF in Python, unit test the data for quality and then upload it to S3.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published