Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added HTML reader/writer #2160

Merged
merged 60 commits into from
Apr 4, 2014
Merged

Added HTML reader/writer #2160

merged 60 commits into from
Apr 4, 2014

Conversation

mdmueller
Copy link
Contributor

This is a different implementation than in PR #2122. I can add more user-defined options for input and output in the htmldict parameter of HTML.__init__ if anyone has ideas for what customizations might be useful.

@mdboom
Copy link
Contributor

mdboom commented Mar 1, 2014

I know @taldcroft suggested you start with the writer based on the LaTeX writer, but the writer from #2122 was much better since it properly escaped the XML and ensured tags were closed etc. Is there some way you could use the XMLWriter again here, while still following the general layout of what you have here?

@mdmueller
Copy link
Contributor Author

That should be better. Also, I just noticed the Travis failure (due to lack of the bs4 dependency), so I assumed that it would be best to move from bs4 import BeautifulSoup to process_lines in HTMLInputter.

## LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
## ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
## (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
## SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By having the license here, do you intend this file to be under a different license than the rest of astropy, or is it the license of existing code that was copied and pasted here?

@mdboom
Copy link
Contributor

mdboom commented Mar 3, 2014

Thanks for moving to XMLWriter -- this looks much better!

@mdmueller
Copy link
Contributor Author

Oh, I copied that from latex.py since I noticed that all of the files in io.ascii had the same license and I wasn't sure about how astropy handles licenses. Should I remove it?

@mdboom
Copy link
Contributor

mdboom commented Mar 3, 2014

Ah -- my bad. Our standard is to just include the single-line, so you can probably just remove the verbose one here.

# Licensed under a 3-clause BSD style license - see LICENSE.rst

I didn't realize it was in latex.py -- there might be a specific reason for that (@taldcroft ?).

@taldcroft
Copy link
Member

@mdboom - This is legacy from asciitable. I've opened an "easy" issue #2164. BTW @amras1 you should leave this for other GSoC applicants, too easy for you. 😄

@taldcroft
Copy link
Member

@amras1 - I'll look at this over the next couple of days.

/cc @hamogu.

@taldcroft
Copy link
Member

@amras1 - you should start writing tests. This is also a good way to highlight functionality for people reviewing the code.

@hamogu
Copy link
Member

hamogu commented Mar 3, 2014

I am currently on holiday. I'll have a look as soon as I am back (in 7
days).

@amras1: Thanks for taking this on!
@taldcroft: Thanks for cc'ing me. I had an html reader/writer on my list
of things to do on my flight home, so I am happy now I can attack
something else.

@mdmueller
Copy link
Contributor Author

@hamogu - No problem!
@taldcroft - Sounds good, I'll add a few tests and document them to show functionality.

@taldcroft
Copy link
Member

An HTML page may have multiple tables (or the data table might be embedded in a formatting table), so what about a way to select a specific table. Maybe this could be by id tag or else by index (i.e. the 3rd table).

@taldcroft
Copy link
Member

@amras1 - github doesn't give a notification when you add commits (AFAIK), so it's a good idea to make a comment to let people know.

@mdmueller
Copy link
Contributor Author

@taldcroft - Oops, I'll make sure to comment when I commit in future. I just added that functionality -- now I'll add more tests.

@mdmueller
Copy link
Contributor Author

There are tests for all the HTML reading methods now, let me know if there's anything more I should add. Travis should work now that html.html is listed in setup_package.py.

@taldcroft
Copy link
Member

@amras1 - is this passing tests for you locally? python setup.py test (or any of the other methods discussed in the developers documentation on testing). There are some across-the-board failures on Travis now.

http://astropy.readthedocs.org/en/latest/development/testguide.html#testing-guidelines

@mdmueller
Copy link
Contributor Author

@taldcroft - Hm, that's odd. py.test has been working for me locally on the io.ascii tests, and python setup.py test also worked when I ran it again just now. Wonder why Travis is still failing...

@mdmueller
Copy link
Contributor Author

I think the Travis failures were due to a lack of the bs4 dependency, so I changed test_read.py and test_connect.py to skip HTML reading tests with the dependency absent. Hopefully this will work.

@taldcroft
Copy link
Member

Travis masters @astrofrog @eteq @embray @mdboom - Can we get a BeautifulSoup added in the optional dependencies test to support this HTML reader?

@taldcroft
Copy link
Member

@amras1 - sorry, I ran up against a hard deadline on another project and haven't had time to dig into your code. I'm going on vacation on Saturday for a week, but I may be able to look at it on the plane or during the week at some point.

@eteq
Copy link
Member

eteq commented Mar 7, 2014

@taldcroft and @amras1 - I just issued mdmueller#1, which is a PR against this branch. If you merge that, @amras1, it will do just as @taldcroft asked here and install beautiful soup as an optional dependency for the travis build.

Before: https://travis-ci.org/eteq/astropy/builds/20268261
After: https://travis-ci.org/eteq/astropy/builds/20268338

The install step is fast, negligible relative to other parts of the build.

@mdmueller
Copy link
Contributor Author

@eteq - Thanks for the addition, I just merged.
@taldcroft - No problem!

<th>
PIER </th>
<th>
PERROR </th>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amras1 - I haven't reviewed this in detail, but is there a way to make the HTML output more tidy? i.e.

<tr>
  <th>title1</th>
  <th>title2</th>
...

@astrofrog
Copy link
Member

@amras1 - thanks for this contribution! I haven't reviewed this in detail yet, but have tested it out and noted the issues above.

One issue we might want to think about - how to handle multi-dimensional columns. One could envisage, for columns with 2 dimensions instead of 1, to use column headers spanning many columns and actually showing all the elements (rather than showing e.g. 1.36583e-13 .. 1.36583e-13 at the moment). However, this is low priority and doesn't need to be done as part of this PR.

@mdmueller
Copy link
Contributor Author

@astrofrog - You're welcome, and thanks for testing! The output is somewhat untidy--it seems that XMLWriter defaults to the current spacing, but I'll check it out and try to make the output nicer. Not sure how I forgot to add a <table> tag, thanks for the catch :) I'd also like to implement the multi-dimensional column idea once that's done.

@mdmueller
Copy link
Contributor Author

I realized that it could be useful to have support for input as well as output of multi-dimensional columns, so I added that and put a test in test_html.py. Actually, I forgot to add the new test HTML file so I'll commit that now.

@mdboom
Copy link
Contributor

mdboom commented Mar 11, 2014

@amras1: I'm going to investigate why the XMLWriter is doing that and see if I can come up with a solution.

EDIT: Ah, I see you already found the solution. Thanks.

<th>PERROR</th>
</tr>
<tr>
<td>14 </td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to strip the whitespace from these values as well.

@mdmueller
Copy link
Contributor Author

@taldcroft - I just rebased on master, hopefully I did that correctly (I could still use some git-fu :D)

I also removed the unnecessary test file, must've forgotten to remove it before. Then I replaced that line in HTMLOutputter with super(HTMLOutputter, self).__call__(new_cols, meta). I tried out super(HTMLOutputter, self)(new_cols, meta), but it didn't work -- I think calling super() just returns a non-callable proxy object.

@taldcroft
Copy link
Member

@amras1 - the commit d797a14 to remove the test file isn't quite enough because that file is still in the repo. If you git checkout ed8a79c then it reappears. You need to follow the procedure outlined here:
https://help.github.com/articles/remove-sensitive-data

@mdmueller
Copy link
Contributor Author

@taldcroft - Oops, I misunderstood what you meant about removing the test file -- I just used git filter-branch to delete it permanently.

taldcroft added a commit that referenced this pull request Apr 4, 2014
Add HTML reader/writer to io.ascii
@taldcroft taldcroft merged commit 0eca03e into astropy:master Apr 4, 2014
@taldcroft
Copy link
Member

@amras1 - merged. Thanks for the hard work on this!!

@mdmueller
Copy link
Contributor Author

@taldcroft - You're welcome! I'm glad I was able to learn more about git as well.

@mdmueller mdmueller deleted the html-reader branch April 4, 2014 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants