<div style="color:#800">Attention le rendu de ce notebook sur Github est buggé : les balises XML sont interprétées au lieu d'être représentées. Le fichier .ipynb sur le dépôt Github est donc uniquement destiné à être téléchargé, pas visualisé sur place. Pour une version HTML, utiliser l'adresse :
http://dmolinarius.github.io/demofiles/mod-32/07_NoSQL/04_PostgreSQL_XML.html
</div>

## PostgreSQL et XML

In [1]:
import psycopg2
import psycopg2.extras

conn = psycopg2.connect(
    host="localhost",
    database="demo",
    user="demo_owner",
    password="OODBMS")

# query result as dict
c = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)

<h3 id="data-table-from-XML">Alimentation d'une table à partir d'un document XML</h3>

<div style="font-size;120%">
Document XML  source
</div>

In [2]:
document = '''<?xml version="1.0" ?>
<people>
 <person>
   <first_name>Raymond</first_name>
   <last_name>Deubaze</last_name>
   <email>raymond.deubaze@ec-lyon.fr</email>
 </person>
 <person>
   <first_name>Jean</first_name>
   <last_name>Peuplu</last_name>
   <email>jpu@gmail.com</email>
 </person>
 <person>
   <first_name>Alex</first_name>
   <last_name>Terrieur</last_name>
   <email>atr1@gmail.com</email>
 </person>
 <person>
   <first_name>Alain</first_name>
   <last_name>Terrieur</last_name>
   <email>atr2@gmail.com</email>
 </person>
  <person>
   <first_name>Anna</first_name>
   <last_name>Conda</last_name>
   <email>Anna.Conda@ec-lyon.fr</email>
 </person>
 <person>
   <first_name>Ginette</first_name>
   <last_name>Ringard</last_name>
   <email>ginette@wanadoo.fr</email>
 </person>
</people>'''

<p style="font-size:120%">
Création de la table destination
</p>

In [3]:
c.execute('DROP TABLE IF EXISTS people')

sql = '''CREATE TABLE people (
  id         SERIAL PRIMARY KEY,
  first_name TEXT,
  last_name  TEXT,
  email      TEXT,
  centralien BOOLEAN
);'''
c.execute(sql)

conn.commit()

<div style="font-size:120%">
Syntaxe pour la création d'une donnée du type XML
</div>

In [4]:
sql = '''SELECT XMLPARSE( DOCUMENT %s ) AS data'''
c.execute(sql, (document,))
print(c.fetchone())

['<people>\n <person>\n   <first_name>Raymond</first_name>\n   <last_name>Deubaze</last_name>\n   <email>raymond.deubaze@ec-lyon.fr</email>\n </person>\n <person>\n   <first_name>Jean</first_name>\n   <last_name>Peuplu</last_name>\n   <email>jpu@gmail.com</email>\n </person>\n <person>\n   <first_name>Alex</first_name>\n   <last_name>Terrieur</last_name>\n   <email>atr1@gmail.com</email>\n </person>\n <person>\n   <first_name>Alain</first_name>\n   <last_name>Terrieur</last_name>\n   <email>atr2@gmail.com</email>\n </person>\n  <person>\n   <first_name>Anna</first_name>\n   <last_name>Conda</last_name>\n   <email>Anna.Conda@ec-lyon.fr</email>\n </person>\n <person>\n   <first_name>Ginette</first_name>\n   <last_name>Ringard</last_name>\n   <email>ginette@wanadoo.fr</email>\n </person>\n</people>']


<div style="font-size:120%;margin-bottom:0.66em">
Création d'une table à partir d'une donnée XML<br>
Noter la présence d'expressions XPath
</div>
https://www.postgresql.org/docs/current/functions-xml.html#FUNCTIONS-XML-PROCESSING-XMLTABLE

In [5]:
sql = '''
SELECT xmltable.* FROM
  (SELECT xml %s AS data) AS data,
  XMLTABLE( '//people/person'
    PASSING data
    COLUMNS
      id FOR ORDINALITY,
        "first_name" TEXT,
        "last_name" TEXT,
        "email" TEXT,
        centralien BOOLEAN PATH 'contains(email,"ec-lyon")'
  );'''
c.execute(sql, (document,))

print(c.fetchall())

[[1, 'Raymond', 'Deubaze', 'raymond.deubaze@ec-lyon.fr', True], [2, 'Jean', 'Peuplu', 'jpu@gmail.com', False], [3, 'Alex', 'Terrieur', 'atr1@gmail.com', False], [4, 'Alain', 'Terrieur', 'atr2@gmail.com', False], [5, 'Anna', 'Conda', 'Anna.Conda@ec-lyon.fr', True], [6, 'Ginette', 'Ringard', 'ginette@wanadoo.fr', False]]


<div id="feed-table" style="font-size:120%">
Alimentation de la table people à partir des informations du document XML :
</div>

In [6]:
sql = '''
INSERT INTO people (id, first_name, last_name, email, centralien) 
  SELECT xmltable.* FROM
    (SELECT xml %s AS data) AS data,
    XMLTABLE( '//people/person'
      PASSING data
      COLUMNS
        id FOR ORDINALITY,
        "first_name" TEXT,
        "last_name" TEXT,
        "email" TEXT,
        centralien BOOLEAN PATH 'contains(email,"ec-lyon")'
    )
;
'''
c.execute(sql, (document,))
conn.commit()

In [7]:
c.execute('SELECT * FROM people')
for p in c.fetchall():
    print('[{}] {:<7} {:<8} {:<26} {}'.format(*p))

[1] Raymond Deubaze  raymond.deubaze@ec-lyon.fr True
[2] Jean    Peuplu   jpu@gmail.com              False
[3] Alex    Terrieur atr1@gmail.com             False
[4] Alain   Terrieur atr2@gmail.com             False
[5] Anna    Conda    Anna.Conda@ec-lyon.fr      True
[6] Ginette Ringard  ginette@wanadoo.fr         False


<div id="XML-data-from-XML"></div>
<h3>Alimentation d'une table avec des données XML à partir d'un document XML</h3>

In [8]:
c.execute('DROP TABLE IF EXISTS xml_people')

sql = '''CREATE TABLE xml_people (
  id         SERIAL PRIMARY KEY,
  data       XML,
  centralien BOOLEAN
);'''
c.execute(sql)

conn.commit()

<div id="feed-XML-table" style="font-size:120%">
Alimentation de la table
</div>

In [9]:
sql = '''
INSERT INTO xml_people (data, centralien) 
  SELECT xmltable.* FROM
    (SELECT xml %s AS data) AS data,
    XMLTABLE( '//people/person'
      PASSING data
      COLUMNS
        data XML PATH '.',
        centralien BOOLEAN PATH 'contains(email,"ec-lyon")'
    )
;
'''
c.execute(sql, (document,))
conn.commit()

In [10]:
import re

c.execute('SELECT * FROM xml_people')
for p in c.fetchall():
    print('[{}] {:<5} {}'.format(p['id'],str(p['centralien']),re.sub(r'[\n ]+','',p['data'])))

[1] True  <person><first_name>Raymond</first_name><last_name>Deubaze</last_name><email>raymond.deubaze@ec-lyon.fr</email></person>
[2] False <person><first_name>Jean</first_name><last_name>Peuplu</last_name><email>jpu@gmail.com</email></person>
[3] False <person><first_name>Alex</first_name><last_name>Terrieur</last_name><email>atr1@gmail.com</email></person>
[4] False <person><first_name>Alain</first_name><last_name>Terrieur</last_name><email>atr2@gmail.com</email></person>
[5] True  <person><first_name>Anna</first_name><last_name>Conda</last_name><email>Anna.Conda@ec-lyon.fr</email></person>
[6] False <person><first_name>Ginette</first_name><last_name>Ringard</last_name><email>ginette@wanadoo.fr</email></person>


<div id="insert-XML" style="font-size:120%">
Insertion d'un champ de contenu XML
</div>

In [11]:
sql = 'INSERT INTO xml_people (data) VALUES (XMLPARSE( CONTENT %s ))'
args = ('<person><first_name>Ella</first_name><last_name>Ducran</last_name><email>edn@gmail.com</email></person>',)
c.execute(sql,args)
conn.commit()

<div id="XML-Xpath-request"></div>
<h3>Sélection d'un champ XML</h3>

<div style="font-size:120%">
Pour SQL le résultat d'une requête XPath est une séquence d'éléments ou de valeurs atomiques du type XML[]
</div>

In [12]:
c.execute('''SELECT XPATH( '//email' , data), centralien FROM xml_people''')
for r in c.fetchall():
    print(r)

['{<email>raymond.deubaze@ec-lyon.fr</email>}', True]
['{<email>jpu@gmail.com</email>}', False]
['{<email>atr1@gmail.com</email>}', False]
['{<email>atr2@gmail.com</email>}', False]
['{<email>Anna.Conda@ec-lyon.fr</email>}', True]
['{<email>ginette@wanadoo.fr</email>}', False]
['{<email>edn@gmail.com</email>}', None]


<div id="XMLSERIALIZE" style="font-size:120%">
Pour obtenir du texte, il faut éventuellement prendre le premier élément de la séquence et le convertir via XMLSERIALIZE :
</div>

In [13]:
c.execute('''SELECT
  XMLSERIALIZE( CONTENT (XPATH( '//email/text()' , data))[1] AS TEXT),
  centralien FROM xml_people
''')
for r in c.fetchall():
    print(r)

['raymond.deubaze@ec-lyon.fr', True]
['jpu@gmail.com', False]
['atr1@gmail.com', False]
['atr2@gmail.com', False]
['Anna.Conda@ec-lyon.fr', True]
['ginette@wanadoo.fr', False]
['edn@gmail.com', None]


<div style="font-size:120%">
Même lorsque XPath renvoie un nombre ou un booléen, pour SQL le résultat est une séquence du type XML[] :
</div>

In [14]:
c.execute('''SELECT XPATH('//first_name="Ella"', data) FROM xml_people''')
for r in c.fetchall():
    print(r)

['{false}']
['{false}']
['{false}']
['{false}']
['{false}']
['{false}']
['{true}']


<div id="convert-boolean" style="font-size:120%">
Pour obtenir un nombre ou un booléen, il faut d'abord convertir le résultat en texte, puis vers le type désiré :
</div>

In [15]:
c.execute('''SELECT XMLSERIALIZE(CONTENT (XPATH('//first_name="Ella"', data))[1] AS TEXT)::BOOLEAN FROM xml_people''')
for r in c.fetchall():
    print(r)

[False]
[False]
[False]
[False]
[False]
[False]
[True]


<div id="where-XML-boolean" style="font-size:120%">
Ce qui permet de l'exploiter dans une directive WHERE :
</div>

In [16]:
c.execute('''
SELECT * FROM xml_people WHERE
  XMLSERIALIZE (
    CONTENT
      (
        XPATH('//first_name="Ella"', data)
      )[1]
    AS TEXT
  )::BOOLEAN = true
;
''')
for r in c.fetchall():
    print(r)

[7, '<person><first_name>Ella</first_name><last_name>Ducran</last_name><email>edn@gmail.com</email></person>', None]


In [17]:
c.execute('''
UPDATE xml_people SET centralien=False WHERE
  XMLSERIALIZE (
    CONTENT
      (
        XPATH('//first_name="Ella"', data)
      )[1]
    AS TEXT
  )::BOOLEAN = true
;
''')
conn.commit()

In [18]:
c.execute('''SELECT
  XMLSERIALIZE( CONTENT (XPATH( '//first_name/text()' , data))[1] AS TEXT),
  XMLSERIALIZE( CONTENT (XPATH( '//last_name/text()' , data))[1] AS TEXT),
  XMLSERIALIZE( CONTENT (XPATH( '//email/text()' , data))[1] AS TEXT),
  centralien FROM xml_people
''')
for r in c.fetchall():
    print(r)

['Raymond', 'Deubaze', 'raymond.deubaze@ec-lyon.fr', True]
['Jean', 'Peuplu', 'jpu@gmail.com', False]
['Alex', 'Terrieur', 'atr1@gmail.com', False]
['Alain', 'Terrieur', 'atr2@gmail.com', False]
['Anna', 'Conda', 'Anna.Conda@ec-lyon.fr', True]
['Ginette', 'Ringard', 'ginette@wanadoo.fr', False]
['Ella', 'Ducran', 'edn@gmail.com', False]


<div id="generate-XML"></div>
<h3 style="margin-bottom:1em">Génération de contenu XML</h3>
<div style="font-size:120%">
Génération d'une série d'éléments à partir d'une ligne de table
</div>

In [19]:
c.execute('SELECT XMLFOREST(first_name, last_name, email) FROM people')
for r in c.fetchall():
    print(r[0])

<first_name>Raymond</first_name><last_name>Deubaze</last_name><email>raymond.deubaze@ec-lyon.fr</email>
<first_name>Jean</first_name><last_name>Peuplu</last_name><email>jpu@gmail.com</email>
<first_name>Alex</first_name><last_name>Terrieur</last_name><email>atr1@gmail.com</email>
<first_name>Alain</first_name><last_name>Terrieur</last_name><email>atr2@gmail.com</email>
<first_name>Anna</first_name><last_name>Conda</last_name><email>Anna.Conda@ec-lyon.fr</email>
<first_name>Ginette</first_name><last_name>Ringard</last_name><email>ginette@wanadoo.fr</email>


<div style="font-size:120%">
Encapsulation de chaque ligne dans un élément avec attributs
</div>

In [20]:
sql = '''
SELECT XMLELEMENT(
  NAME person,
  XMLATTRIBUTES(centralien AS ecl),
  XMLFOREST(first_name, last_name, email)
) FROM people
'''
c.execute(sql)
for r in c.fetchall():
    print(r[0])

<person ecl="true"><first_name>Raymond</first_name><last_name>Deubaze</last_name><email>raymond.deubaze@ec-lyon.fr</email></person>
<person ecl="false"><first_name>Jean</first_name><last_name>Peuplu</last_name><email>jpu@gmail.com</email></person>
<person ecl="false"><first_name>Alex</first_name><last_name>Terrieur</last_name><email>atr1@gmail.com</email></person>
<person ecl="false"><first_name>Alain</first_name><last_name>Terrieur</last_name><email>atr2@gmail.com</email></person>
<person ecl="true"><first_name>Anna</first_name><last_name>Conda</last_name><email>Anna.Conda@ec-lyon.fr</email></person>
<person ecl="false"><first_name>Ginette</first_name><last_name>Ringard</last_name><email>ginette@wanadoo.fr</email></person>


<div style="font-size:120%">
Aggrégation des lignes de la table
</div>

In [21]:
sql = '''
SELECT XMLAGG(
  XMLELEMENT(
    NAME person,
    XMLATTRIBUTES(centralien AS ecl),
    XMLFOREST(first_name, last_name, email)
  )
) FROM people
'''
c.execute(sql)
for r in c.fetchall():
    print(r[0])

<person ecl="true"><first_name>Raymond</first_name><last_name>Deubaze</last_name><email>raymond.deubaze@ec-lyon.fr</email></person><person ecl="false"><first_name>Jean</first_name><last_name>Peuplu</last_name><email>jpu@gmail.com</email></person><person ecl="false"><first_name>Alex</first_name><last_name>Terrieur</last_name><email>atr1@gmail.com</email></person><person ecl="false"><first_name>Alain</first_name><last_name>Terrieur</last_name><email>atr2@gmail.com</email></person><person ecl="true"><first_name>Anna</first_name><last_name>Conda</last_name><email>Anna.Conda@ec-lyon.fr</email></person><person ecl="false"><first_name>Ginette</first_name><last_name>Ringard</last_name><email>ginette@wanadoo.fr</email></person>


<div style="font-size:120%">
Encapsulation par l'élément racine
</div>

In [22]:
sql = '''
SELECT XMLELEMENT(
  NAME people,
  XMLAGG(
    XMLELEMENT(
      NAME person,
      XMLATTRIBUTES(centralien AS ecl),
      XMLFOREST(first_name, last_name, email)
    )
  )
) FROM people
'''
c.execute(sql)
for r in c.fetchall():
    print(r[0])

<people><person ecl="true"><first_name>Raymond</first_name><last_name>Deubaze</last_name><email>raymond.deubaze@ec-lyon.fr</email></person><person ecl="false"><first_name>Jean</first_name><last_name>Peuplu</last_name><email>jpu@gmail.com</email></person><person ecl="false"><first_name>Alex</first_name><last_name>Terrieur</last_name><email>atr1@gmail.com</email></person><person ecl="false"><first_name>Alain</first_name><last_name>Terrieur</last_name><email>atr2@gmail.com</email></person><person ecl="true"><first_name>Anna</first_name><last_name>Conda</last_name><email>Anna.Conda@ec-lyon.fr</email></person><person ecl="false"><first_name>Ginette</first_name><last_name>Ringard</last_name><email>ginette@wanadoo.fr</email></person></people>


<div style="font-size:120%">
Ajout de la déclaration XML
</div>

In [23]:
sql = '''
SELECT XMLROOT(
  XMLELEMENT(
    NAME people,
    XMLAGG(
      XMLELEMENT(
        NAME person,
        XMLATTRIBUTES(centralien AS ecl),
        XMLFOREST(first_name, last_name, email)
      )
    )
  ),
  VERSION '1.0',
  STANDALONE yes
) FROM people
'''
c.execute(sql)
for r in c.fetchall():
    print(r[0])

<?xml version="1.0" standalone="yes"?><people><person ecl="true"><first_name>Raymond</first_name><last_name>Deubaze</last_name><email>raymond.deubaze@ec-lyon.fr</email></person><person ecl="false"><first_name>Jean</first_name><last_name>Peuplu</last_name><email>jpu@gmail.com</email></person><person ecl="false"><first_name>Alex</first_name><last_name>Terrieur</last_name><email>atr1@gmail.com</email></person><person ecl="false"><first_name>Alain</first_name><last_name>Terrieur</last_name><email>atr2@gmail.com</email></person><person ecl="true"><first_name>Anna</first_name><last_name>Conda</last_name><email>Anna.Conda@ec-lyon.fr</email></person><person ecl="false"><first_name>Ginette</first_name><last_name>Ringard</last_name><email>ginette@wanadoo.fr</email></person></people>
