Skip to content

Pdf actions#413

Merged
sazid merged 4 commits intodevfrom
pdf_actions
Mar 13, 2024
Merged

Pdf actions#413
sazid merged 4 commits intodevfrom
pdf_actions

Conversation

@zahin178
Copy link
Copy Markdown
Contributor

@zahin178 zahin178 commented Mar 4, 2024

PR Type

Feature

PR Checklist

  • Tests for the changes have been added / updated.
  • Documentation comments have been added / updated.
  • A changelog entry has been made.
  • Version number has been updated.
  • Required modules have been added to respective "requirements*.txt" files.
  • Relevant Test Cases added to this description (below).
  • (Team) Label with affected action categories and semver status.

Overview

Added 2 new utility functions and 1 helping function for extracting data from PDF. They are

  1. extract_text_pdf()
  2. extract_text_by_page()
  3. extract_table_pdf()

Modified the following files

Framework\Built_In_Automation\Built_In_Utility\CrossPlatform\BuiltInUtilityFunction.py

Declared the action names in the "Framework\Built_In_Automation\Built_In_Utility\CrossPlatform\BuiltInUtilityFunction.py" file
The action names are "extract text pdf" and "extract table pdf"

Framework\Utilities\decorators.py

Added a condition that the "result" variable has to be a string before going to check the fail message.

Framework\Built_In_Automation\Built_In_Utility\CrossPlatform\BuiltInUtilityFunction.py

Implemented the functions here

def extract_text_by_page(pdf_path, text, pgn=None):

  • will import the libraries
  • will open the pdf file
  • it will either load texts of all the pages or a single page depending on the pgn variable
  • will do a regular expression search on the loaded texts
  • if the text is matched then it will return the result otherwise, will return "zeuz_failed"

def extract_text_pdf(dataset):

  • will check if pdfminer.six library is installed or not. If not then it will install it.
  • will parse the dataset
  • will call the extract_text_by_page() function with proper arguments

def extract_table_pdf(dataset):

  • will check if tabula-py library is installed or not. If not then it will install it.
  • will import the libraries.
  • will parse the dataset
  • will extract tabular data according to the parameters and return it

Test Cases

@zahin178 zahin178 requested a review from sazid March 4, 2024 06:49
@sazid sazid merged commit da7767d into dev Mar 13, 2024
@sazid sazid deleted the pdf_actions branch March 13, 2024 06:43
@sazid
Copy link
Copy Markdown
Member

sazid commented Mar 13, 2024

@zahin178 please add the actions to controlserver immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants