-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added HTML reader/writer #2160
Added HTML reader/writer #2160
Conversation
I know @taldcroft suggested you start with the writer based on the LaTeX writer, but the writer from #2122 was much better since it properly escaped the XML and ensured tags were closed etc. Is there some way you could use the XMLWriter again here, while still following the general layout of what you have here? |
That should be better. Also, I just noticed the Travis failure (due to lack of the bs4 dependency), so I assumed that it would be best to move |
## LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND | ||
## ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT | ||
## (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS | ||
## SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having the license here, do you intend this file to be under a different license than the rest of astropy, or is it the license of existing code that was copied and pasted here?
Thanks for moving to XMLWriter -- this looks much better! |
Oh, I copied that from latex.py since I noticed that all of the files in |
Ah -- my bad. Our standard is to just include the single-line, so you can probably just remove the verbose one here.
I didn't realize it was in |
@amras1 - I'll look at this over the next couple of days. /cc @hamogu. |
@amras1 - you should start writing tests. This is also a good way to highlight functionality for people reviewing the code. |
I am currently on holiday. I'll have a look as soon as I am back (in 7 @amras1: Thanks for taking this on! |
@hamogu - No problem! |
An HTML page may have multiple tables (or the data table might be embedded in a formatting table), so what about a way to select a specific table. Maybe this could be by |
@amras1 - github doesn't give a notification when you add commits (AFAIK), so it's a good idea to make a comment to let people know. |
@taldcroft - Oops, I'll make sure to comment when I commit in future. I just added that functionality -- now I'll add more tests. |
There are tests for all the HTML reading methods now, let me know if there's anything more I should add. Travis should work now that |
@amras1 - is this passing tests for you locally? http://astropy.readthedocs.org/en/latest/development/testguide.html#testing-guidelines |
@taldcroft - Hm, that's odd. |
I think the Travis failures were due to a lack of the |
Travis masters @astrofrog @eteq @embray @mdboom - Can we get a BeautifulSoup added in the optional dependencies test to support this HTML reader? |
@amras1 - sorry, I ran up against a hard deadline on another project and haven't had time to dig into your code. I'm going on vacation on Saturday for a week, but I may be able to look at it on the plane or during the week at some point. |
@taldcroft and @amras1 - I just issued mdmueller#1, which is a PR against this branch. If you merge that, @amras1, it will do just as @taldcroft asked here and install beautiful soup as an optional dependency for the travis build. Before: https://travis-ci.org/eteq/astropy/builds/20268261 The install step is fast, negligible relative to other parts of the build. |
@eteq - Thanks for the addition, I just merged. |
<th> | ||
PIER </th> | ||
<th> | ||
PERROR </th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amras1 - I haven't reviewed this in detail, but is there a way to make the HTML output more tidy? i.e.
<tr>
<th>title1</th>
<th>title2</th>
...
@amras1 - thanks for this contribution! I haven't reviewed this in detail yet, but have tested it out and noted the issues above. One issue we might want to think about - how to handle multi-dimensional columns. One could envisage, for columns with 2 dimensions instead of 1, to use column headers spanning many columns and actually showing all the elements (rather than showing e.g. |
@astrofrog - You're welcome, and thanks for testing! The output is somewhat untidy--it seems that |
I realized that it could be useful to have support for input as well as output of multi-dimensional columns, so I added that and put a test in |
EDIT: Ah, I see you already found the solution. Thanks. |
<th>PERROR</th> | ||
</tr> | ||
<tr> | ||
<td>14 </td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to strip the whitespace from these values as well.
@taldcroft - I just rebased on master, hopefully I did that correctly (I could still use some git-fu :D) I also removed the unnecessary test file, must've forgotten to remove it before. Then I replaced that line in |
@amras1 - the commit d797a14 to remove the test file isn't quite enough because that file is still in the repo. If you |
@taldcroft - Oops, I misunderstood what you meant about removing the test file -- I just used |
Add HTML reader/writer to io.ascii
@amras1 - merged. Thanks for the hard work on this!! |
@taldcroft - You're welcome! I'm glad I was able to learn more about git as well. |
This is a different implementation than in PR #2122. I can add more user-defined options for input and output in the
htmldict
parameter ofHTML.__init__
if anyone has ideas for what customizations might be useful.