Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any paper or algorithm description about text extraction? #665

Open
whqwill opened this issue Jan 8, 2019 · 4 comments
Open

any paper or algorithm description about text extraction? #665

whqwill opened this issue Jan 8, 2019 · 4 comments

Comments

@whqwill
Copy link

whqwill commented Jan 8, 2019

any paper or algorithm description about text extraction? I want to know its theory details, thanks

@Ask149
Copy link

Ask149 commented Jan 8, 2019

Hi @whqwill , can you please specify your needs in detail. Thanks :)

@whqwill
Copy link
Author

whqwill commented Jan 9, 2019

I mean how it selects the important parts as the 'main text' and if possible any comparison with other methods. @Ask149

@bact
Copy link
Contributor

bact commented Jan 22, 2019

Not exactly for this newspaper lib, but the slides in this link is very useful overview of the problem:
Boilerplate Detection using Shallow Text Features
http://www.l3s.de/%7Ekohlschuetter/boilerplate/

@whqwill
Copy link
Author

whqwill commented Jan 22, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants