<a href="https://colab.research.google.com/github/SCS-Technology-and-Innovation/DSLP/blob/main/DSLP_M03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Putting things in

Much like XML, HTML is a language for providing a structure to document content. Specifically, webpage content.

[HTML tutorial](https://www.w3schools.com/html/)

The layout and the style that the content gets is not supposed to be controlled from within the HTML document, but instead handled by a separate document written in a language called CSS.

[Tutorial for cascading style sheets](https://www.w3schools.com/css/default.asp)

One can of course embed the CSS within the HTML if need be, just like we can embed a DTD within an XML, but if more than one document share the same CSS (or DTD), it is better to store it as an independent document.

(This is a good time to think about all the pros and cons of keeping such files separate versus embedded.)

In this module, we take XML content, place fields from it within an HTML document that then gets styled with a [simple CSS](https://scs-technology-and-innovation.github.io/courses/DSLP/card.css).

The end result will be a populated HTML file much like [this example](https://scs-technology-and-innovation.github.io/courses/DSLP/example.html).

In [59]:
xmlurl = 'https://raw.githubusercontent.com/SCS-Technology-and-Innovation/DSLP/main/data/card.xml'
htmlurl = 'https://raw.githubusercontent.com/SCS-Technology-and-Innovation/DSLP/main/data/card.html'

import urllib.request

xml = urllib.request.urlopen(xmlurl).read()
html = urllib.request.urlopen(htmlurl).read().decode()

Let's examine this information.

In [60]:
print(xml.decode())

<?xml version="1.0" encoding="UTF-8"?>
<card>
  <name>Elisa</name>
  <title>geek in residence</title>
  <email>fake@email.ca</email>
  <logo>https://satuelisa.github.io/favicon-32x32.png</logo>
</card>



We have a name, a job title, an email, and the URL for a company logo. Four pieces of information.

In [61]:
print(html)

<!DOCTYPE html>
<html lang="en">
  <head>
    <link rel="stylesheet"
	  type="text/css"
	  href="https://scs-technology-and-innovation.github.io/courses/DSLP/card.css" />
    <title>Business card</title>
  </head>
  <body>
    <div class="card">
      <div class="name"></div>
      <div class="title"></div>
      <div class="email"></div>
      <div class="logo"></div>
    </div>
  </body>
</html>



Here we clearly have four slots into these four pieces of information should go. In order to put them there, we need to *parse* both documents. Luckily `lxml` and `BeautifulSoup` we encountered in Module 2 can both handle this. In the below examples, the latter is used for the XML and the former for HTML, but both libraries can parse both formats.

In [62]:
from bs4 import BeautifulSoup

htmldata = BeautifulSoup(html, 'html.parser')
htmldata.find_all('div') # let's see what's inside just the div part that we need to work with

[<div class="card">
 <div class="name"></div>
 <div class="title"></div>
 <div class="email"></div>
 <div class="logo"></div>
 </div>,
 <div class="name"></div>,
 <div class="title"></div>,
 <div class="email"></div>,
 <div class="logo"></div>]

We can access individual fields.

In [63]:
htmldata.select('.name')

[<div class="name"></div>]

We can also modify their content.

In [64]:
substitution = htmldata.new_tag('div', **{'class':'name'})
substitution.string = 'Test'
substitution

<div class="name">Test</div>

Plug it in and check it out.

In [65]:
htmldata.select('.name')[0].replace_with(substitution)
htmldata.select('.name')

[<div class="name">Test</div>]

Now we just need the content from the XML to plug it in like this.

In [66]:
from lxml import etree
xmldata = etree.XML(xml)
etree.tostring(xmldata, pretty_print = True, encoding = 'UTF-8')

b'<card>\n  <name>Elisa</name>\n  <title>geek in residence</title>\n  <email>fake@email.ca</email>\n  <logo>https://satuelisa.github.io/favicon-32x32.png</logo>\n</card>\n'

In [67]:
xmldata.find('name').text

'Elisa'

All set ðŸ˜¸

In [68]:
for target in [ 'name', 'title', 'email' ]:
  s = htmldata.new_tag('div', **{'class': target })
  s.string = xmldata.find(target).text
  htmldata.select('.' + target)[0].replace_with(s)

That should have taken care of the first three.

In [69]:
htmldata

<!DOCTYPE html>

<html lang="en">
<head>
<link href="https://scs-technology-and-innovation.github.io/courses/DSLP/card.css" rel="stylesheet" type="text/css"/>
<title>Business card</title>
</head>
<body>
<div class="card">
<div class="name">Elisa</div>
<div class="title">geek in residence</div>
<div class="email">fake@email.ca</div>
<div class="logo"></div>
</div>
</body>
</html>

For the image, we have to insert the HTML to express it is an `img` with a specific URL as `src` and the desired `width`. This `img` is a tag on its own that needs to be inside (nested as a *child* tag) the `div` for `logo`.

In [70]:
url = xmldata.find('logo').text
url

'https://satuelisa.github.io/favicon-32x32.png'

In [71]:
sdiv = htmldata.new_tag('div', **{'class': 'logo' })
simg = htmldata.new_tag('img', **{'src': url, 'width': 250 })
sdiv.append(simg) # nest
sdiv

<div class="logo"><img src="https://satuelisa.github.io/favicon-32x32.png" width="250"/></div>

In [72]:
htmldata.select('.logo')[0].replace_with(sdiv)
htmldata

<!DOCTYPE html>

<html lang="en">
<head>
<link href="https://scs-technology-and-innovation.github.io/courses/DSLP/card.css" rel="stylesheet" type="text/css"/>
<title>Business card</title>
</head>
<body>
<div class="card">
<div class="name">Elisa</div>
<div class="title">geek in residence</div>
<div class="email">fake@email.ca</div>
<div class="logo"><img src="https://satuelisa.github.io/favicon-32x32.png" width="250"/></div>
</div>
</body>
</html>

Let's put this in a file that we can download and view on a web browser.

In [73]:
with open('ourcard.html', 'w') as destination:
  print(htmldata, file = destination)

There's a file on colab now. How do we get our hands on it without clicking around using a mouse? True professionals just use the keyboard ðŸ˜‰

In [74]:
from google.colab import files

files.download('ourcard.html')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Click on the download notification on your browser to view the generated HTML file. If you do not see one, check your downloads folder.