Skip to content
This repository has been archived by the owner on Dec 28, 2018. It is now read-only.

chelmertz/PDFBox-php

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

#PDFBox-php Parsing text from PDF-files using the java utility PDFBox from Apache.

##Usage

###Basic This is a basic use case, found in demos/demo.php.

<?php

require_once '..'.DIRECTORY_SEPARATOR.'PDFBox.php';
require_once '..'.DIRECTORY_SEPARATOR.'PDFBox'.DIRECTORY_SEPARATOR.'ExtractText.php';

$jar = "pdfbox-app-1.4.0.jar";
$pdf_box = new PDFBox($jar);
$extract_text = new PDFBox\ExtractText($pdf_box);

$extract_text->parse('regular.pdf');

Input:

  • Path to jar
  • Path to pdf

Output:

  • The contents of "regular.pdf" located in "regular.txt" in the same folder

###Full API Read through the interface (public methods) of PDFBox\ExtractText. There are corresponding methods for every option available in the .jar.

##Requirements

  • java
  • PDFBox - one single .jar file necessary - and it must be executable
  • PHP 5.3 (for namespaces)

##License The bundled .jar is licensed under the Apache License, Version 2.0. The same goes for PDFBox-php:

Copyright 2011 Carl Helmertz

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

PDF to txt. PHP façade for Apache's PDFBox

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages