Skip to content

Commit

Permalink
Cleaning up Response documentation to reflect newer code
Browse files Browse the repository at this point in the history
  • Loading branch information
Jerin Philip committed Mar 19, 2021
1 parent a8dbb00 commit 317433a
Showing 1 changed file with 27 additions and 33 deletions.
60 changes: 27 additions & 33 deletions src/translator/response.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,44 +14,31 @@
namespace marian {
namespace bergamot {

typedef marian::data::SoftAlignment SoftAlignment;
typedef marian::data::WordAlignment WordAlignment;

/// Alignment is stored as a sparse matrix, this pretty much aligns with marian
/// internals but is brought here to maintain translator
/// agnosticism/independence.
struct Point {
size_t src; // Index pointing to source string_view.
size_t tgt; // Index pointing to target string_view.
size_t src; // Index pointing to source ByteRange
size_t tgt; // Index pointing to target ByteRange
float prob; // Score between [0, 1] on indicating degree of alignment.
};

typedef std::vector<Point> Alignment;

/// -loglikelhoods of the sequence components as proxy to quality.
struct Quality {
float sequence; /// Certainty/uncertainty score for sequence.
/// Certainty/uncertainty score for sequence.
float sequence;
/// Certainty/uncertainty for each word in the sequence.
std::vector<float> word;
};

class Response {
// Response is a marian internal class (not a bergamot-translator class)
// holding source blob of text, vector of TokenRanges corresponding to each
// sentence in the source text blob and histories obtained from translating
// these sentences.
//
// This class provides an API at a higher level in comparison to History to
// access translations and additionally use string_view manipulations to
// recover structure in translation from source-text's structure known through
// reference string and string_view. As many of these computations are not
// required until invoked, they are computed as required and stored in data
// members where it makes sense to do so (translation,translationTokenRanges).
//
// Examples of such use-cases are:
// translation()
// translationInSourceStructure() TODO(@jerinphilip)
// alignment(idx) TODO(@jerinphilip)
// sentenceMappings (for bergamot-translator)
// holding annotated source blob of text, and translated blob of text,
// alignment information between source and target words and sentences.
// Annotations are markings of word and sentences boundaries represented using
// ByteRanges.

public:
Response(AnnotatedBlob &&source, Histories &&histories,
Expand All @@ -64,23 +51,30 @@ class Response {
qualityScores(std::move(other.qualityScores)),
histories_(std::move(other.histories_)){};

// Prevents CopyConstruction and CopyAssignment. sourceRanges_ is constituted
// by string_view and copying invalidates the data member.
// The following copy bans are not stricitly required anymore since Annotation
// is composed of the ByteRange primitive (which was previously string_view
// and required to be bound to string), but makes movement efficient by
// banning these letting compiler complain about copies.
Response(const Response &) = delete;
Response &operator=(const Response &) = delete;

const size_t size() const { return source.numSentences(); }
const Histories &histories() const { return histories_; }

AnnotatedBlob source;
AnnotatedBlob target;
std::vector<Quality> qualityScores;
std::vector<Alignment> alignments;
AnnotatedBlob source; /// source-text, source.blob contains source
/// text. source.annotation holds the annotation.
AnnotatedBlob target; /// translated-text, target.blob contains translated
/// text. target.annotation holds the annotation.
std::vector<Quality>
qualityScores; /// -logProb of each word and negative log likelihood
/// of sequence (sentence), for each sentence processed by
/// the translator. Indices correspond to ranges accessible
/// through annotation
std::vector<Alignment>
alignments; /// Alignments between source and target. Each Alignment is a
/// sparse matrix representation with indices corresponding
/// to ranges accessible through Annotations.

const std::string &translation() {
LOG(info, "translation() will be deprecated now that target is public.");
return target.blob;
}
const Histories &histories() const { return histories_; }

private:
Histories histories_;
Expand Down

0 comments on commit 317433a

Please sign in to comment.