Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextFragmentAbsorber cant match phone by pattern #55

Open
liurupeng821 opened this issue Dec 29, 2022 · 1 comment
Open

TextFragmentAbsorber cant match phone by pattern #55

liurupeng821 opened this issue Dec 29, 2022 · 1 comment

Comments

@liurupeng821
Copy link

Text in pdf contains embedded fonts,so I can't match the phone by pattern.

https://lagou-zhaopin-fe.lagou.com/activities/20221229/1672295126482.pdf


public static final String PHONE_REG = "(?:(?:1[-\\s]*[3456789][-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1}[-\\s]*\\d{1})|(?:0[1-9]\\d{1,2}[-\\s]*\\d{7,8}))(?!\\d)";
public static void main(String[] args) throws Exception {
    byte[] source = FileUtils.readFileToByteArray(new File("/1672295126482.pdf"));
    if (!getLicense()) {
        throw new Exception("com.aspose.pdf lic ERROR!");
    }
    try (ByteArrayInputStream searchInputStream = new ByteArrayInputStream(source); ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
        Document pdfDoc = new Document(searchInputStream);

        TextSearchOptions textSearchOptions = new TextSearchOptions(true);
        TextEditOptions textEditOptions = new TextEditOptions(0, TextEditOptions.LanguageTransformation.class);
        TextFragmentAbsorber phoneTextFragmentAbsorber = new TextFragmentAbsorber(
                PHONE_REG,
                textSearchOptions,
                textEditOptions);

        PageCollection pages = pdfDoc.getPages();
        Page page = pages.get_Item(1);
        page.accept(phoneTextFragmentAbsorber);

        for (TextFragment textFragment : phoneTextFragmentAbsorber.getTextFragments()) {
            String text = textFragment.getText();
            logger.info("phone: " + text);
        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}
@asadalikhan90
Copy link
Collaborator

@liurupeng821

We are unable to download the linked file here. Can you please create a post in our official support forum along with the sample file? We will definitely test the scenario in our environment and address it accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants