New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly set BeautifulSoup parser to html.parser to avoid warning #4551

Merged
merged 2 commits into from Apr 19, 2016

Conversation

Projects
None yet
6 participants
@anchitjain1234
Contributor

anchitjain1234 commented Feb 1, 2016

For #4550.

@anchitjain1234

This comment has been minimized.

Show comment
Hide comment
@anchitjain1234

anchitjain1234 Feb 8, 2016

Contributor

@astrofrog please check the changes.

Contributor

anchitjain1234 commented Feb 8, 2016

@astrofrog please check the changes.

@mwcraig

This comment has been minimized.

Show comment
Hide comment
@mwcraig

mwcraig Feb 8, 2016

Contributor

@anchitjain1234 -- thanks for this contribution. Please be aware that it often takes several days (sometimes even a few weeks) for the maintainers to review pull requests.

Contributor

mwcraig commented Feb 8, 2016

@anchitjain1234 -- thanks for this contribution. Please be aware that it often takes several days (sometimes even a few weeks) for the maintainers to review pull requests.

@mwcraig

This comment has been minimized.

Show comment
Hide comment
@mwcraig

mwcraig Feb 8, 2016

Contributor

@anchitjain1234 -- I realized after my first comment that I should have added a caveat....certainly if you haven't received any feedback by the beginning of next week, give the pull request a bump. We do sometimes inadvertently take longer than we'd like to get these reviewed and merged (and this is a really short one).

Contributor

mwcraig commented Feb 8, 2016

@anchitjain1234 -- I realized after my first comment that I should have added a caveat....certainly if you haven't received any feedback by the beginning of next week, give the pull request a bump. We do sometimes inadvertently take longer than we'd like to get these reviewed and merged (and this is a really short one).

@mwcraig mwcraig added the io.ascii label Feb 25, 2016

@hamogu

This comment has been minimized.

Show comment
Hide comment
@hamogu

hamogu Apr 1, 2016

Member

Looks good to me.
@taldcroft
(Remember to close #4550 when this is merged.)

Member

hamogu commented Apr 1, 2016

Looks good to me.
@taldcroft
(Remember to close #4550 when this is merged.)

@eteq

This comment has been minimized.

Show comment
Hide comment
@eteq

eteq Apr 5, 2016

Member

I'm don't think this fix is what we want. The underlying issue in #4550 is that bs4 guesses at the right kind of parser. The trouble is, I think we want it to guess. That is, On my system, for example, it's defaulting to lxml. I want it to do that because lxml is a fair bit faster than html.parser. So I don't think we want to say it must always use 'html.parser'.

I wonder if @taldcroft or @mdmueller can offer some insight into the best way to set this? My tilt would be towards a configuration option (i.e., http://docs.astropy.org/en/stable/config/), as it's sort of site-dependent. The default would be None, which is the current behavior of having beautifulsoup decide. But the configuration option would let you explicitly specify a parser if you definitely care.

Member

eteq commented Apr 5, 2016

I'm don't think this fix is what we want. The underlying issue in #4550 is that bs4 guesses at the right kind of parser. The trouble is, I think we want it to guess. That is, On my system, for example, it's defaulting to lxml. I want it to do that because lxml is a fair bit faster than html.parser. So I don't think we want to say it must always use 'html.parser'.

I wonder if @taldcroft or @mdmueller can offer some insight into the best way to set this? My tilt would be towards a configuration option (i.e., http://docs.astropy.org/en/stable/config/), as it's sort of site-dependent. The default would be None, which is the current behavior of having beautifulsoup decide. But the configuration option would let you explicitly specify a parser if you definitely care.

@taldcroft

This comment has been minimized.

Show comment
Hide comment
@taldcroft

taldcroft Apr 5, 2016

Member

Agree with @eteq that this is not the right solution. This change in BS4 is kind of annoying and not really consistent with the docs:

"If you just need to parse some HTML, you can dump the markup into the BeautifulSoup constructor, and it’ll probably be fine. Beautiful Soup will pick a parser for you and parse the data. But there are a few additional arguments you can pass in to the constructor to change which parser is used. ...

If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser. You can override this by specifying one of the following"

But anyway, I think that the config option is perhaps overkill and not ideal because it is global for your system, while in theory library availability is local (e.g. libraries can be available in one conda env but not another), etc etc.

I would suggest just blocking out this silly new warning from BS4. (I take this as a lesson for astropy, that many "informative/helpful warnings" are actually just annoying in practice).

Member

taldcroft commented Apr 5, 2016

Agree with @eteq that this is not the right solution. This change in BS4 is kind of annoying and not really consistent with the docs:

"If you just need to parse some HTML, you can dump the markup into the BeautifulSoup constructor, and it’ll probably be fine. Beautiful Soup will pick a parser for you and parse the data. But there are a few additional arguments you can pass in to the constructor to change which parser is used. ...

If you don’t specify anything, you’ll get the best HTML parser that’s installed. Beautiful Soup ranks lxml’s parser as being the best, then html5lib’s, then Python’s built-in parser. You can override this by specifying one of the following"

But anyway, I think that the config option is perhaps overkill and not ideal because it is global for your system, while in theory library availability is local (e.g. libraries can be available in one conda env but not another), etc etc.

I would suggest just blocking out this silly new warning from BS4. (I take this as a lesson for astropy, that many "informative/helpful warnings" are actually just annoying in practice).

@taldcroft

This comment has been minimized.

Show comment
Hide comment
@taldcroft

taldcroft Apr 5, 2016

Member

In other words, use an appropriate version of below to catch this particular warning:

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    soup = BeautifulSoup('\n'.join(lines))

BTW there is already an option in the HTML reader to explicitly specify the parser, so @astrofrog could avoid the warning by doing that.

Member

taldcroft commented Apr 5, 2016

In other words, use an appropriate version of below to catch this particular warning:

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    soup = BeautifulSoup('\n'.join(lines))

BTW there is already an option in the HTML reader to explicitly specify the parser, so @astrofrog could avoid the warning by doing that.

@anchitjain1234

This comment has been minimized.

Show comment
Hide comment
@anchitjain1234

anchitjain1234 Apr 12, 2016

Contributor

@taldcroft Could you please check the changes.

Contributor

anchitjain1234 commented Apr 12, 2016

@taldcroft Could you please check the changes.

Show outdated Hide outdated astropy/io/ascii/html.py
soup = BeautifulSoup('\n'.join(lines))
try:
import warnings
except ImportError:

This comment has been minimized.

@pllim

pllim Apr 12, 2016

Member

Is this really necessary? warnings come from standard Python library, so why would it not be there?

@pllim

pllim Apr 12, 2016

Member

Is this really necessary? warnings come from standard Python library, so why would it not be there?

This comment has been minimized.

@taldcroft

taldcroft Apr 12, 2016

Member

Agreed, no need to catch an import error for warnings. This import should be at the top of the package (no performance hit since in all likelihood some other module will have imported it already).

@taldcroft

taldcroft Apr 12, 2016

Member

Agreed, no need to catch an import error for warnings. This import should be at the top of the package (no performance hit since in all likelihood some other module will have imported it already).

Show outdated Hide outdated astropy/io/ascii/html.py
soup = BeautifulSoup('\n'.join(lines))
else:
with warnings.catch_warnings():
warnings.simplefilter("ignore")

This comment has been minimized.

@taldcroft

taldcroft Apr 12, 2016

Member

Catch only the warning in question, not every warning. You might need to use filterwarnings instead of simplefilter, but you can figure that out.

@taldcroft

taldcroft Apr 12, 2016

Member

Catch only the warning in question, not every warning. You might need to use filterwarnings instead of simplefilter, but you can figure that out.

@anchitjain1234

This comment has been minimized.

Show comment
Hide comment
@anchitjain1234

anchitjain1234 Apr 14, 2016

Contributor

@taldcroft @pllim Filtered warnings from UserWarning category, as warning log posted by @astrofrog in #4550 mentions it.

Contributor

anchitjain1234 commented Apr 14, 2016

@taldcroft @pllim Filtered warnings from UserWarning category, as warning log posted by @astrofrog in #4550 mentions it.

@pllim

This comment has been minimized.

Show comment
Hide comment
@pllim

pllim Apr 14, 2016

Member

@astrofrog , does this work for you?

Member

pllim commented Apr 14, 2016

@astrofrog , does this work for you?

@taldcroft

This comment has been minimized.

Show comment
Hide comment
@taldcroft

taldcroft Apr 15, 2016

Member

@anchitjain1234 - can you filter on just the particular message in question, e.g. filter out only messages containing No parser was explicitly specified?

Member

taldcroft commented Apr 15, 2016

@anchitjain1234 - can you filter on just the particular message in question, e.g. filter out only messages containing No parser was explicitly specified?

@anchitjain1234

This comment has been minimized.

Show comment
Hide comment
@anchitjain1234

anchitjain1234 Apr 15, 2016

Contributor

@taldcroft Please check the changes. Also, the travis failure is unrelated.

Contributor

anchitjain1234 commented Apr 15, 2016

@taldcroft Please check the changes. Also, the travis failure is unrelated.

@taldcroft taldcroft added this to the v1.2.0 milestone Apr 18, 2016

@taldcroft taldcroft self-assigned this Apr 18, 2016

@taldcroft

This comment has been minimized.

Show comment
Hide comment
@taldcroft

taldcroft Apr 18, 2016

Member

Almost done @anchitjain1234! Just need a change log entry. Put the following in CHANGES.rst, in the section 1.2 => New Features => astropy.io.ascii

  - Updated to filter out the default parser warning of BeautifulSoup. [#4551]

You can do this with a commit message that includes [skip ci] to avoid running CI again.

Member

taldcroft commented Apr 18, 2016

Almost done @anchitjain1234! Just need a change log entry. Put the following in CHANGES.rst, in the section 1.2 => New Features => astropy.io.ascii

  - Updated to filter out the default parser warning of BeautifulSoup. [#4551]

You can do this with a commit message that includes [skip ci] to avoid running CI again.

@anchitjain1234

This comment has been minimized.

Show comment
Hide comment
@anchitjain1234

anchitjain1234 Apr 19, 2016

Contributor

@taldcroft Added changelog entry.

Contributor

anchitjain1234 commented Apr 19, 2016

@taldcroft Added changelog entry.

@taldcroft taldcroft merged commit 6fbb534 into astropy:master Apr 19, 2016

@anchitjain1234 anchitjain1234 deleted the anchitjain1234:fix-4550 branch Apr 23, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment