# Tout pour la musique

Vous disposez d’un glossaire bilingue sur la musique au format TSV (*Tabulated-separated values*) que vous souhaitez convertir au format TBX.

Chargez tout d’abord le fichier dans une variable `rows` :

In [None]:
import csv

with open('../files/music-glossary.tsv') as csvfile:
    fieldnames = ['term', 'preferred', 'alternates']
    reader = csv.DictReader(csvfile, delimiter='\t', fieldnames=fieldnames)
    
    rows = [ row for row in reader ]

Vous remarquez que chaque enregistrement contient trois cellules : `term`, `preferred` et `alternates`. La cellule `en` sert à indiquer la traduction à privilégier tandis que la cellule `alternates` propose des variantes.

Une fois que vous aurez sélectionné [la feuille de validation appropriée](https://github.com/LTAC-Global) pour votre fichier TBX, écrivez le programme qui convertit le fichier :

In [None]:
# your code here

# a more suitable structure to analyse

glossary = list()

for row in rows:
    glossary.append({
        'fr': {
            'preferred': row['term']
        },
        'en': {
            'preferred': row['preferred'],
            'admitted': row['alternates']
        }
    })

In [None]:
# XML module
import xml.etree.ElementTree as ET

# open new file
with open('../files/glossary.tbx', 'wb') as xmlfile:
    
    # prologue
    xmlfile.write(b'<?xml version="1.0" encoding="utf-8"?>\n')
    
    # DTD
    xmlfile.write(b'<?xml-model href="./TBXcoreStructV03_TBX-Min_integrated.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>\n')

    # root element to tree structure
    root = ET.Element('tbx', {
        'type': 'TBX-Min',
        'style': 'dca',
        'xml:lang': 'en',
        'xmlns': 'urn:iso:std:iso:30042:ed-2'
    })
    tree = ET.ElementTree(root)
    
    # header
    tbxHeader = ET.SubElement(root, 'tbxHeader')
    fileDesc = ET.SubElement(tbxHeader, 'fileDesc')
    sourceDesc = ET.SubElement(fileDesc, 'sourceDesc')
    p = ET.SubElement(sourceDesc, 'p')
    p.text = 'A very faulty glossary about music'
    
    # text
    text = ET.SubElement(root, 'text')
    
    # body
    body = ET.SubElement(text, 'body')

    # one concept by entry in the glossary
    for idx, concept in enumerate(glossary):

        # conceptEntry with a custom id
        conceptEntry = ET.SubElement(body, 'conceptEntry', { 'id': f"c{idx}"})

        # two languages
        for lang, variants in concept.items():

            # langSec
            langSec = ET.SubElement(conceptEntry, 'langSec', { 'xml:lang': lang })

            # sometimes, more than just one term
            for status, variant in variants.items():

                # if variant differs from None
                if variant:

                    # termSec
                    termSec = ET.SubElement(langSec, 'termSec')

                    # term
                    term = ET.SubElement(termSec, 'term')
                    term.text = variant

                    # termNote
                    termNote = ET.SubElement(termSec, 'termNote', { 'type': 'administrativeStatus'})
                    termNote.text = f"{status}Term-admn-sts"
    
    # serialize tree in file
    tree.write(xmlfile)