Skip to content

Latest commit

 

History

History
227 lines (191 loc) · 8.31 KB

File metadata and controls

227 lines (191 loc) · 8.31 KB

Sentence

Sentence label is the transcripted sentence of a piece of audio, which is often used for autonomous speech recognition.

Each audio can be assigned with multiple sentence labels.

The structure of one sentence label is like:

{
    "sentence": [
        {
            "text":  <str>
            "begin": <float>
            "end":   <float>
        }
        ...
        ...
    ],
    "spell": [
        {
            "text":  <str>
            "begin": <float>
            "end":   <float>
        }
        ...
        ...
    ],
    "phone": [
        {
            "text":  <str>
            "begin": <float>
            "end":   <float>
        }
        ...
        ...
    ],
    "attributes": {
        <key>: <value>
        ...
        ...
    }
}

To create a ~tensorbay.label.label_sentence.LabeledSentence label:

>>> from tensorbay.label import LabeledSentence >>> from tensorbay.label import Word >>> sentence_label = LabeledSentence( ... sentence=[Word("text", 1.1, 1.6)], ... spell=[Word("spell", 1.1, 1.6)], ... phone=[Word("phone", 1.1, 1.6)], ... attributes={"attribute_name": "attribute_value"} ... ) >>> sentence_label LabeledSentence( (sentence): [ Word( (text): 'text', (begin): 1.1, (end): 1.6 ) ], (spell): [ Word( (text): 'text', (begin): 1.1, (end): 1.6 ) ], (phone): [ Word( (text): 'text', (begin): 1.1, (end): 1.6 ) ], (attributes): { 'attribute_name': 'attribute_value' }

Sentence.sentence

The ~tensorbay.label.label_sentence.LabeledSentence.sentence of a ~tensorbay.label.label_sentence.LabeledSentence is a list of ~tensorbay.label.label_sentence.Word, representing the transcripted sentence of the audio.

Sentence.spell

The ~tensorbay.label.label_sentence.LabeledSentence.spell of a ~tensorbay.label.label_sentence.LabeledSentence is a list of ~tensorbay.label.label_sentence.Word, representing the spell within the sentence.

It is only for Chinese language.

Sentence.phone

The ~tensorbay.label.label_sentence.LabeledSentence.phone of a ~tensorbay.label.label_sentence.LabeledSentence is a list of ~tensorbay.label.label_sentence.Word, representing the phone of the sentence label.

Word

~tensorbay.label.label_sentence.Word is the basic component of a phonetic transcription sentence, containing the content of the word, the start and the end time in the audio.

>>> from tensorbay.label import Word >>> Word("text", 1.1, 1.6) Word( (text): 'text', (begin): 1, (end): 2 )

~tensorbay.label.label_sentence.LabeledSentence.sentence, ~tensorbay.label.label_sentence.LabeledSentence.spell, and ~tensorbay.label.label_sentence.LabeledSentence.phone of a sentence label all compose of ~tensorbay.label.label_sentence.Word.

Sentence.attributes

The attributes of the transcripted sentence. See reference/label_format/CommonLabelProperties:attributes for details.

SentenceSubcatalog

Before adding sentence labels to the dataset, ~tensorbay.label.label_sentence.SentenceSubcatalog should be defined.

Besides reference/label_format/CommonSubcatalogProperties:attributes information in ~tensorbay.label.label_sentence.SentenceSubcatalog, it also has ~tensorbay.label.label_sentence.SentenceSubcatalog.is_sample, ~tensorbay.label.label_sentence.SentenceSubcatalog.sample_rate and ~tensorbay.label.label_sentence.SentenceSubcatalog.lexicon. to describe the transcripted sentences of the audio.

The catalog with only Sentence subcatalog is typically stored in a json file as follows:

{
    "SENTENCE": {                                     <object>*
        "isSample":                                  <boolean>! -- Whether the unit of sampling points in Sentence label is the
                                                                   number of samples. The default value is false and the units
                                                                   are seconds.
        "sampleRate":                                 <number>  -- Audio sampling frequency whose unit is Hz. It is required
                                                                   when "isSample" is true.
        "description":                                <string>! -- Subcatalog description, (default: "").
        "attributes": [                                <array>  -- Attribute list, which contains all attribute information.
            {
                "name":                               <string>* -- Attribute name.
                "enum": [...],                         <array>  -- All possible options for the attribute.
                "type":                      <string or array>  -- Type of the attribute including "boolean", "integer",
                                                                   "number", "string", "array" and "null". And it is not
                                                                   required when "enum" is provided.
                "minimum":                            <number>  -- Minimum value of the attribute when type is "number".
                "maximum":                            <number>  -- Maximum value of the attribute when type is "number".
                "items": {                            <object>  -- Used only if the attribute type is "array".
                    "enum": [...],                     <array>  -- All possible options for elements in the attribute array.
                    "type":                  <string or array>  -- Type of elements in the attribute array.
                    "minimum":                        <number>  -- Minimum value of elements in the attribute array when type is
                                                                   "number".
                    "maximum":                        <number>  -- Maximum value of elements in the attribute array when type is
                                                                   "number".
                },
                "description":                        <string>! -- Attribute description, (default: "").
            },
            ...
            ...
        ]
        "lexicon": [                                   <array>  -- A list consists all of text and phone.
            [
                text,                                 <string>  -- Word.
                phone,                                <string>  -- Corresponding phonemes.
                phone,                                <string>  -- Corresponding phonemes (A word can correspond to more than
                                                                   one phoneme).
                ...
            ],
            ...
        ]
    }
}

Note

* indicates that the field is required. ! indicates that the field has a default value.

Besides giving the parameters while initializing ~tensorbay.label.label_sentence.SentenceSubcatalog, it's also feasible to set them after initialization.

>>> from tensorbay.label import SentenceSubcatalog >>> sentence_subcatalog = SentenceSubcatalog() >>> sentence_subcatalog.is_sample = True >>> sentence_subcatalog.sample_rate = 5 >>> sentence_subcatalog.append_lexicon(["text", "spell", "phone"]) >>> sentence_subcatalog SentenceSubcatalog( (is_sample): True, (sample_rate): 5, (lexicon): [...] )

To add a ~tensorbay.label.label_sentence.LabeledSentence label to one data:

>>> from tensorbay.dataset import Data >>> data = Data("local_path") >>> data.label.sentence = [] >>> data.label.sentence.append(sentence_label)

Note

One data may contain multiple Sentence labels, so the Data.label.sentence<tensorbay.dataset.data.Data.label.sentence> must be a list.