# Author

After the dates, we try to detect the span of the authors of the publication and use the same cell setup as before: three stages for the train, validation and test sets.

We use a simple paradigm for the detection of Author: boundary matching. Here, we try to detect not the content of the author span, but only its boundaries, which are then simply connected. The auhotrs are normally listed at the beginning og a reference. For the end (AuthorStop), we combine some indicators like Date annotation, periods and colons. Additional wordlists help to disambiguate spans that actually refer to something else, like editors.

In [None]:
%inputDir data/test
%outputDir out_author_train
%displayMode EVALUATION
%evalTypes Author
%writescript ./Author.ruta
%saveTypeSystem ./AuthorTypeSystem.xml

TYPESYSTEM ReferencesTypeSystem;
SCRIPT Date;

// only try to find dates if there aren't any yet
Document{-CONTAINS(Date)-> CALL(Date)};

DECLARE FirstInRef, AuthorStopInd, AuthorStop;
DECLARE Initial, EditorInd, NoAuthorInd;

WORDLIST EditorList = "editor_ind.txt";
MARKFAST(EditorInd, EditorList, true);
WORDLIST NoAuthorList = "no_author.txt";
MARKFAST(NoAuthorInd, NoAuthorList, true);

BLOCK(utils) Document{}{
    Reference{-> MARKFIRST(FirstInRef)};
    
    CW{REGEXP(".")-> Initial};
    (CW{REGEXP("..")} PERIOD){-> Initial};
    CAP{REGEXP(".{2,3}")-> Initial};
    i:Initial{->i.end=p.end} p:PERIOD; 
    
    ANY{-> AuthorStopInd} @Date;
    PERIOD{-PARTOF(Initial)-> AuthorStopInd};
    COLON{-> AuthorStopInd};
    as:AuthorStopInd{-> UNMARK(as)} Initial{ENDSWITH(PERIOD)};
}

BLOCK(Author) Reference{}{
   
    # AuthorStopInd{-> AuthorStop};
    
    (FirstInRef # AuthorStop){-> Author};
    a:Author{CONTAINS(EditorInd)-> UNMARK(a)};
    a:Author{CONTAINS(NoAuthorInd)-> UNMARK(a)};
}




In [None]:
%inputDir out_author_train
%outputDir trash
%displayMode CSV
%csvConfig BadReference

DECLARE BadReference;
Reference{OR(CONTAINS(FalsePositive),CONTAINS(FalseNegative)),-PARTOF(BadReference)-> BadReference};

COLOR(AuthorStop, "red");
COLOR(TruePositive, "lightgreen");
COLOR(FalsePositive, "lightblue");
COLOR(FalseNegative, "pink");


In [None]:
%inputDir data/validation
%outputDir out_author_validation
%displayMode EVALUATION
%evalTypes Author

SCRIPT Author;
CALL(Author);

In [None]:
%inputDir out_author_validation
%outputDir trash
%displayMode CSV
%csvConfig BadReference

DECLARE BadReference;
Reference{OR(CONTAINS(FalsePositive),CONTAINS(FalseNegative)),-PARTOF(BadReference)-> BadReference};

COLOR(AuthorStop, "red");
COLOR(TruePositive, "lightgreen");
COLOR(FalsePositive, "lightblue");
COLOR(FalseNegative, "pink");

In [None]:
%inputDir data/test
%displayMode EVALUATION
%evalTypes Author

SCRIPT Author;
CALL(Author);