New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using table tag in HTML Reader produces no output #324

Closed
EK1771 opened this Issue Aug 1, 2014 · 14 comments

Comments

@EK1771

EK1771 commented Aug 1, 2014

Sample Code:

$phpWord = new \PhpOffice\PhpWord\PhpWord();

$section = $phpWord->addSection();

$html = '<table><tr><td>test</td></tr></table>';

\PhpOffice\PhpWord\Shared\Html::addHtml($section, $html);

$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$objWriter->save('test.docx');

Expected Output:

Table with one cell containing the word "test".

Actual Output:

Blank

From stepping through the code quickly, the issue seems to be caused by the following if condition in parseChildNodes():

if ($element instanceof AbstractContainer) {
    self::parseNode($cNode, $element, $styles, $data);
}

Commenting out the if condition then allows for the sample code above to produce the expected output.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@hregis

This comment has been minimized.

Show comment
Hide comment
@hregis

hregis Sep 24, 2014

Hello
me too, you have a solution? (develop branch)
Thank you

hregis commented Sep 24, 2014

Hello
me too, you have a solution? (develop branch)
Thank you

@EK1771

This comment has been minimized.

Show comment
Hide comment
@EK1771

EK1771 Sep 25, 2014

@hregis l do, in src/PhpWord/Shared/Html.php parseChildNodes():

private static function parseChildNodes($node, $element, $styles, $data)
{
       if ($node->nodeName != 'li') {
              $cNodes = $node->childNodes;
              if (count($cNodes) > 0) {
                     foreach ($cNodes as $cNode) {
                            if ($element instanceof AbstractContainer) {
                                   self::parseNode($cNode, $element, $styles, $data);
                            }
                     }
              }
       }
}

Change this to:

private static function parseChildNodes($node, $element, $styles, $data)
{
       if ($node->nodeName != 'li') {
              $cNodes = $node->childNodes;
              if (count($cNodes) > 0) {
                     foreach ($cNodes as $cNode) {
//                            if ($element instanceof AbstractContainer) {
                                   self::parseNode($cNode, $element, $styles, $data);
//                            }
                     }
              }
       }
}

Also could this issue please be labelled as a Bug, it's definitely not a Question.

EK1771 commented Sep 25, 2014

@hregis l do, in src/PhpWord/Shared/Html.php parseChildNodes():

private static function parseChildNodes($node, $element, $styles, $data)
{
       if ($node->nodeName != 'li') {
              $cNodes = $node->childNodes;
              if (count($cNodes) > 0) {
                     foreach ($cNodes as $cNode) {
                            if ($element instanceof AbstractContainer) {
                                   self::parseNode($cNode, $element, $styles, $data);
                            }
                     }
              }
       }
}

Change this to:

private static function parseChildNodes($node, $element, $styles, $data)
{
       if ($node->nodeName != 'li') {
              $cNodes = $node->childNodes;
              if (count($cNodes) > 0) {
                     foreach ($cNodes as $cNode) {
//                            if ($element instanceof AbstractContainer) {
                                   self::parseNode($cNode, $element, $styles, $data);
//                            }
                     }
              }
       }
}

Also could this issue please be labelled as a Bug, it's definitely not a Question.

@hregis

This comment has been minimized.

Show comment
Hide comment
@hregis

hregis Sep 25, 2014

@EK1771 thank you, but i have this error with develop branch:
Fatal error: Call to undefined method PhpOffice\PhpWord\Element\Table::addText() in /PhpWord/Shared/Html.php on line 239

hregis commented Sep 25, 2014

@EK1771 thank you, but i have this error with develop branch:
Fatal error: Call to undefined method PhpOffice\PhpWord\Element\Table::addText() in /PhpWord/Shared/Html.php on line 239

@mogilvie

This comment has been minimized.

Show comment
Hide comment
@mogilvie

mogilvie Nov 9, 2014

I had the same issue where tables were not being parsed from HTML to DOM.

The problem is that HTML elements
<tbody> <tr> and <td> are not DOMElements as defined by the Abstract Container class. Because these HTML elements are not DOM Abstract Containers the parseChildNodes method doesnt check for any child elements.

@EK1771 solution removes the check against Abstract containers, but also causes every element to be checked for children, even when some are not containers.

There are a couple of steps to fix this.

  1. Insert a new node into the HTML mapping table to catch <tbody> elements.

/PhpWord/Shared/Html.php:parseNode()

        // Node mapping table
        $nodes = array(
                              // $method        $node   $element    $styles     $data   $argument1      $argument2
            'p'         => array('Paragraph',   $node,  $element,   $styles,    null,   null,           null),
            'h1'        => array('Heading',     null,   $element,   $styles,    null,   'Heading1',     null),
            'h2'        => array('Heading',     null,   $element,   $styles,    null,   'Heading2',     null),
            'h3'        => array('Heading',     null,   $element,   $styles,    null,   'Heading3',     null),
            'h4'        => array('Heading',     null,   $element,   $styles,    null,   'Heading4',     null),
            'h5'        => array('Heading',     null,   $element,   $styles,    null,   'Heading5',     null),
            'h6'        => array('Heading',     null,   $element,   $styles,    null,   'Heading6',     null),
            '#text'     => array('Text',        $node,  $element,   $styles,    null,    null,          null),
            'span'      => array('Span',        $node,  null,       $styles,    null,    null,          null), //to catch inline span style changes
            'strong'    => array('Property',    null,   null,       $styles,    null,   'bold',         true),
            'em'        => array('Property',    null,   null,       $styles,    null,   'italic',       true),
            'sup'       => array('Property',    null,   null,       $styles,    null,   'superScript',  true),
            'sub'       => array('Property',    null,   null,       $styles,    null,   'subScript',    true),
            'table'     => array('Table',       $node,  $element,   $styles,    null,   'addTable',     true),
            'tbody'     => array('Table',       $node,  $element,   $styles,    null,   'skipTbody',    true), //added to catch tbody in html.
            'tr'        => array('Table',       $node,  $element,   $styles,    null,   'addRow',       true),
            'td'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),
            'ul'        => array('List',        null,   null,       $styles,    $data,  3,              null),
            'ol'        => array('List',        null,   null,       $styles,    $data,  7,              null),
            'li'        => array('ListItem',    $node,  $element,   $styles,    $data,  null,           null),
        );

  1. Modify the parseChildNodes method. Define a list of table HTML elements which contain child elements. Write an IF check against the HTML nodeName. Let any other node types carry on to the original IF check for DOM Elements.
    private static function parseChildNodes($node, $element, $styles, $data)
    {
        if ($node->nodeName != 'li') {
            $cNodes = $node->childNodes;
            if (count($cNodes) > 0) {
                foreach ($cNodes as $cNode) {              
                    // Added to get tables to work                    
                    $htmlContainers = array(
                        'tbody',
                        'tr',
                        'td',
                    );
                    if (in_array( $cNode->nodeName, $htmlContainers ) ) {                        
                        self::parseNode($cNode, $element, $styles, $data);
                    }                              
                    // All other containers as defined in AbstractContainer
                    if ($element instanceof AbstractContainer) {                        
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                }
            }
        }
    }
  1. Modify the parseTable method. The DOM writer adds columns and rows to the Table element directly, so you need to add a Switch or series of If checks against Argument1 of the Node Table.
    private static function parseTable($node, $element, &$styles, $argument1)
    {     
        switch ($argument1) {
            case 'addTable':                        
                $styles['paragraph'] = self::parseInlineStyle($node, $styles['paragraph']); 
                $newElement = $element->addTable('table', array('width' => 90));
                break;
            case 'skipTbody':                        
                $newElement = $element;
                break;
            case 'addRow':                        
                $newElement = $element->addRow();
                break;
            case 'addCell':                        
                $newElement = $element->addCell(1750);
                break;
        }

        // $attributes = $node->attributes;
        // if ($attributes->getNamedItem('width') !== null) {
            // $newElement->setWidth($attributes->getNamedItem('width')->value);
        // }

        // if ($attributes->getNamedItem('height') !== null) {
            // $newElement->setHeight($attributes->getNamedItem('height')->value);
        // }
        // if ($attributes->getNamedItem('width') !== null) {
            // $newElement=$element->addCell($width=$attributes->getNamedItem('width')->value);
        // }

        return $newElement;
    }

This works for me, hope it helps others. I'm sure there is a more elegent solution that could be incorporated in the Develop branch.

It needs to be exapanded to deal with <thead> and other HTML Table Elements.

Mark

mogilvie commented Nov 9, 2014

I had the same issue where tables were not being parsed from HTML to DOM.

The problem is that HTML elements
<tbody> <tr> and <td> are not DOMElements as defined by the Abstract Container class. Because these HTML elements are not DOM Abstract Containers the parseChildNodes method doesnt check for any child elements.

@EK1771 solution removes the check against Abstract containers, but also causes every element to be checked for children, even when some are not containers.

There are a couple of steps to fix this.

  1. Insert a new node into the HTML mapping table to catch <tbody> elements.

/PhpWord/Shared/Html.php:parseNode()

        // Node mapping table
        $nodes = array(
                              // $method        $node   $element    $styles     $data   $argument1      $argument2
            'p'         => array('Paragraph',   $node,  $element,   $styles,    null,   null,           null),
            'h1'        => array('Heading',     null,   $element,   $styles,    null,   'Heading1',     null),
            'h2'        => array('Heading',     null,   $element,   $styles,    null,   'Heading2',     null),
            'h3'        => array('Heading',     null,   $element,   $styles,    null,   'Heading3',     null),
            'h4'        => array('Heading',     null,   $element,   $styles,    null,   'Heading4',     null),
            'h5'        => array('Heading',     null,   $element,   $styles,    null,   'Heading5',     null),
            'h6'        => array('Heading',     null,   $element,   $styles,    null,   'Heading6',     null),
            '#text'     => array('Text',        $node,  $element,   $styles,    null,    null,          null),
            'span'      => array('Span',        $node,  null,       $styles,    null,    null,          null), //to catch inline span style changes
            'strong'    => array('Property',    null,   null,       $styles,    null,   'bold',         true),
            'em'        => array('Property',    null,   null,       $styles,    null,   'italic',       true),
            'sup'       => array('Property',    null,   null,       $styles,    null,   'superScript',  true),
            'sub'       => array('Property',    null,   null,       $styles,    null,   'subScript',    true),
            'table'     => array('Table',       $node,  $element,   $styles,    null,   'addTable',     true),
            'tbody'     => array('Table',       $node,  $element,   $styles,    null,   'skipTbody',    true), //added to catch tbody in html.
            'tr'        => array('Table',       $node,  $element,   $styles,    null,   'addRow',       true),
            'td'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),
            'ul'        => array('List',        null,   null,       $styles,    $data,  3,              null),
            'ol'        => array('List',        null,   null,       $styles,    $data,  7,              null),
            'li'        => array('ListItem',    $node,  $element,   $styles,    $data,  null,           null),
        );

  1. Modify the parseChildNodes method. Define a list of table HTML elements which contain child elements. Write an IF check against the HTML nodeName. Let any other node types carry on to the original IF check for DOM Elements.
    private static function parseChildNodes($node, $element, $styles, $data)
    {
        if ($node->nodeName != 'li') {
            $cNodes = $node->childNodes;
            if (count($cNodes) > 0) {
                foreach ($cNodes as $cNode) {              
                    // Added to get tables to work                    
                    $htmlContainers = array(
                        'tbody',
                        'tr',
                        'td',
                    );
                    if (in_array( $cNode->nodeName, $htmlContainers ) ) {                        
                        self::parseNode($cNode, $element, $styles, $data);
                    }                              
                    // All other containers as defined in AbstractContainer
                    if ($element instanceof AbstractContainer) {                        
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                }
            }
        }
    }
  1. Modify the parseTable method. The DOM writer adds columns and rows to the Table element directly, so you need to add a Switch or series of If checks against Argument1 of the Node Table.
    private static function parseTable($node, $element, &$styles, $argument1)
    {     
        switch ($argument1) {
            case 'addTable':                        
                $styles['paragraph'] = self::parseInlineStyle($node, $styles['paragraph']); 
                $newElement = $element->addTable('table', array('width' => 90));
                break;
            case 'skipTbody':                        
                $newElement = $element;
                break;
            case 'addRow':                        
                $newElement = $element->addRow();
                break;
            case 'addCell':                        
                $newElement = $element->addCell(1750);
                break;
        }

        // $attributes = $node->attributes;
        // if ($attributes->getNamedItem('width') !== null) {
            // $newElement->setWidth($attributes->getNamedItem('width')->value);
        // }

        // if ($attributes->getNamedItem('height') !== null) {
            // $newElement->setHeight($attributes->getNamedItem('height')->value);
        // }
        // if ($attributes->getNamedItem('width') !== null) {
            // $newElement=$element->addCell($width=$attributes->getNamedItem('width')->value);
        // }

        return $newElement;
    }

This works for me, hope it helps others. I'm sure there is a more elegent solution that could be incorporated in the Develop branch.

It needs to be exapanded to deal with <thead> and other HTML Table Elements.

Mark

@matteomoretti

This comment has been minimized.

Show comment
Hide comment
@matteomoretti

matteomoretti Jul 17, 2015

Any news? The bug still occours

matteomoretti commented Jul 17, 2015

Any news? The bug still occours

@hari-web

This comment has been minimized.

Show comment
Hide comment
@hari-web

hari-web Sep 1, 2015

by using \PhpOffice\PhpWord\Shared\Html::addHtml($section, $html) we can interpret html to word. Can we set alignment options for this output (such as align right/left/both) ?

hari-web commented Sep 1, 2015

by using \PhpOffice\PhpWord\Shared\Html::addHtml($section, $html) we can interpret html to word. Can we set alignment options for this output (such as align right/left/both) ?

@garethellis36

This comment has been minimized.

Show comment
Hide comment
@garethellis36

garethellis36 Feb 4, 2016

Contributor

@mogilvie are you able to share your complete and working Html class with modifications? I'm trying it myself but when I try and apply it to an updated sample (as below), I get an error when it tries to write because of the objects is null.

Html class

//node mapping table
            'table'     => array('Table',       $node,  $element,   $styles,    null,   'addTable',     true),
            'thead'     => array('Table',       $node,  $element,   $styles,    null,   'skipThead',    true),
            'tbody'     => array('Table',       $node,  $element,   $styles,    null,   'skipTbody',    true),
            'tr'        => array('Table',       $node,  $element,   $styles,    null,   'addRow',       true),
            'td'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),
            'th'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),

    /**
     * Parse child nodes.
     *
     * @param \DOMNode $node
     * @param \PhpOffice\PhpWord\Element\AbstractContainer $element
     * @param array $styles
     * @param array $data
     * @return void
     */
    private static function parseChildNodes($node, $element, $styles, $data)
    {
        if ('li' != $node->nodeName) {
            $cNodes = $node->childNodes;
            if (count($cNodes) > 0) {
                foreach ($cNodes as $cNode) {
                    // Added to get tables to work
                    $htmlContainers = array(
                        'thead',
                        'tbody',
                        'tr',
                        'td',
                        'th',
                    );
                    if (in_array($cNode->nodeName, $htmlContainers)) {
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                    if ($element instanceof AbstractContainer) {
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                }
            }
        }
    }

    private static function parseTable($node, $element, &$styles, $argument1)
    {
        switch ($argument1) {
            case 'addTable':
                $styles['paragraph'] = self::parseInlineStyle($node, $styles['paragraph']);
                $newElement = $element->addTable('table', array('width' => 90));
                break;
            case 'skipThead':
            case 'skipTbody':
                $newElement = $element;
                break;
            case 'addRow':
                $newElement = $element->addRow();
                break;
            case 'addCell':
                $newElement = $element->addCell(1750);
                break;
        }

Sample:

 $html .= '<table><thead><tr><th>Header of column 1</th><th>Header of column 2</th></tr></thead>';
$html .= '<tbody><tr><td>Row 1 for column 1</td><td>Row 1 for column 2</td></tr></tbody></table>';
Contributor

garethellis36 commented Feb 4, 2016

@mogilvie are you able to share your complete and working Html class with modifications? I'm trying it myself but when I try and apply it to an updated sample (as below), I get an error when it tries to write because of the objects is null.

Html class

//node mapping table
            'table'     => array('Table',       $node,  $element,   $styles,    null,   'addTable',     true),
            'thead'     => array('Table',       $node,  $element,   $styles,    null,   'skipThead',    true),
            'tbody'     => array('Table',       $node,  $element,   $styles,    null,   'skipTbody',    true),
            'tr'        => array('Table',       $node,  $element,   $styles,    null,   'addRow',       true),
            'td'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),
            'th'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),

    /**
     * Parse child nodes.
     *
     * @param \DOMNode $node
     * @param \PhpOffice\PhpWord\Element\AbstractContainer $element
     * @param array $styles
     * @param array $data
     * @return void
     */
    private static function parseChildNodes($node, $element, $styles, $data)
    {
        if ('li' != $node->nodeName) {
            $cNodes = $node->childNodes;
            if (count($cNodes) > 0) {
                foreach ($cNodes as $cNode) {
                    // Added to get tables to work
                    $htmlContainers = array(
                        'thead',
                        'tbody',
                        'tr',
                        'td',
                        'th',
                    );
                    if (in_array($cNode->nodeName, $htmlContainers)) {
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                    if ($element instanceof AbstractContainer) {
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                }
            }
        }
    }

    private static function parseTable($node, $element, &$styles, $argument1)
    {
        switch ($argument1) {
            case 'addTable':
                $styles['paragraph'] = self::parseInlineStyle($node, $styles['paragraph']);
                $newElement = $element->addTable('table', array('width' => 90));
                break;
            case 'skipThead':
            case 'skipTbody':
                $newElement = $element;
                break;
            case 'addRow':
                $newElement = $element->addRow();
                break;
            case 'addCell':
                $newElement = $element->addCell(1750);
                break;
        }

Sample:

 $html .= '<table><thead><tr><th>Header of column 1</th><th>Header of column 2</th></tr></thead>';
$html .= '<tbody><tr><td>Row 1 for column 1</td><td>Row 1 for column 2</td></tr></tbody></table>';
@mogilvie

This comment has been minimized.

Show comment
Hide comment
@mogilvie

mogilvie Feb 5, 2016

I'll post the full class tonight, but in the meantime, is the error caused by the switch statement not having any code to be executed for case: 'skipThead'?

mogilvie commented Feb 5, 2016

I'll post the full class tonight, but in the meantime, is the error caused by the switch statement not having any code to be executed for case: 'skipThead'?

@garethellis36

This comment has been minimized.

Show comment
Hide comment
@garethellis36

garethellis36 Feb 5, 2016

Contributor

@mogilvie I doubt that - it should fall through to the 'skipTbody' case because there's no break statement. I don't fully understand how this works but I assumed that thead would need to be handled in the same way as tbody.

Contributor

garethellis36 commented Feb 5, 2016

@mogilvie I doubt that - it should fall through to the 'skipTbody' case because there's no break statement. I don't fully understand how this works but I assumed that thead would need to be handled in the same way as tbody.

@mogilvie

This comment has been minimized.

Show comment
Hide comment
@mogilvie

mogilvie Feb 5, 2016

@garethellis36 Agree re skipThead. Is the parseTable function returning the $newElement? It's cut off from the sample html class.

mogilvie commented Feb 5, 2016

@garethellis36 Agree re skipThead. Is the parseTable function returning the $newElement? It's cut off from the sample html class.

@mogilvie

This comment has been minimized.

Show comment
Hide comment
@mogilvie

mogilvie Feb 5, 2016

Html.php uploaded as a text file. It was working as of Jan 2015.

Html.txt

mogilvie commented Feb 5, 2016

Html.php uploaded as a text file. It was working as of Jan 2015.

Html.txt

@surindersinghva

This comment has been minimized.

Show comment
Hide comment
@surindersinghva

surindersinghva Mar 15, 2016

Thank you, this is very very helpful. The embedded table is printing like charm now. I am still stuck with an embedded nested list, if you can help. There are two issues, 1. List item not printing if <strong> tag is used, 2. Nested list not printing with or without <strong> tags. Here's what I have: <p>The following list has all the information.</p><ol><li><strong>Item 1</strong><ol><li><strong>Nested Item 1</strong></li><li>Nested Item 2</li></ol></li></ol><p>List ends here.</p>

Thank you

surindersinghva commented Mar 15, 2016

Thank you, this is very very helpful. The embedded table is printing like charm now. I am still stuck with an embedded nested list, if you can help. There are two issues, 1. List item not printing if <strong> tag is used, 2. Nested list not printing with or without <strong> tags. Here's what I have: <p>The following list has all the information.</p><ol><li><strong>Item 1</strong><ol><li><strong>Nested Item 1</strong></li><li>Nested Item 2</li></ol></li></ol><p>List ends here.</p>

Thank you

@arrabal

This comment has been minimized.

Show comment
Hide comment
@arrabal

arrabal Apr 22, 2016

@mogilvie Your Html.php works ok, thanks for your work. I had to comment line 61, $dom->save('/var/www/vhosts/specshaper.com/DOM.xml'); //@todo Delete Debug

However, the table is generated without borders, even if a set a big "border" value in

tag.

Will these fixes be integrated in the main branch?

arrabal commented Apr 22, 2016

@mogilvie Your Html.php works ok, thanks for your work. I had to comment line 61, $dom->save('/var/www/vhosts/specshaper.com/DOM.xml'); //@todo Delete Debug

However, the table is generated without borders, even if a set a big "border" value in

tag.

Will these fixes be integrated in the main branch?

@arivanbastos

This comment has been minimized.

Show comment
Hide comment
@arivanbastos

arivanbastos Nov 14, 2016

This issue still occurs... The @garethellis36 version dont work for me (PHPWord version 0.13). So I work a little improving the HTML class.
My version is able to converting the following HTML:

<table style="width: 50%; border: 6px #0000FF solid;">
    <thead>
        <tr style="background-color: #FF0000; text-align: center; color: #FFFFFF; font-weight: bold; ">
             <th>a</th>
             <th>b</th>
             <th>c</th>
        </tr>
    </thead>
    <tbody>
        <tr><td>1</td><td colspan="2">2</td></tr>
        <tr><td>4</td><td>5</td><td>6</td></tr>
    </tbody>
</table>

More details see: http://stackoverflow.com/questions/29275140/html-reader-from-phpword-doest-work-with-tables/40600565#40600565

arivanbastos commented Nov 14, 2016

This issue still occurs... The @garethellis36 version dont work for me (PHPWord version 0.13). So I work a little improving the HTML class.
My version is able to converting the following HTML:

<table style="width: 50%; border: 6px #0000FF solid;">
    <thead>
        <tr style="background-color: #FF0000; text-align: center; color: #FFFFFF; font-weight: bold; ">
             <th>a</th>
             <th>b</th>
             <th>c</th>
        </tr>
    </thead>
    <tbody>
        <tr><td>1</td><td colspan="2">2</td></tr>
        <tr><td>4</td><td>5</td><td>6</td></tr>
    </tbody>
</table>

More details see: http://stackoverflow.com/questions/29275140/html-reader-from-phpword-doest-work-with-tables/40600565#40600565

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment