## pigean and pigeonXT library

PigeonXT is an extention to the original Pigeon, created by Anastasis Germanidis. PigeonXT is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort of your Jupyter notebook.

Pigeon currently support the following annotation tasks:



1. binary / multi-class classification
2. multi-label classification
3. regression tasks
4. captioning tasks

Anything that can be displayed on Jupyter (text, images, audio, graphs, etc.) can be displayed by pigeon by providing the appropriate :code:display_fn argument.



In [0]:
pip install pigeon-jupyter
pip install pigeonXT-jupyter


In [10]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


##Binary or multi-class text classification


In [16]:
from pigeonXT import annotate
annotations = annotate(
    ['I love this movie', 'I was really disappointed by the book'],
    options = ['positive','negative']
)

HTML(value='0 examples annotated, 3 examples left')

HBox(children=(Button(description='positive', style=ButtonStyle()), Button(description='negative', style=Butto…

Output()

Annotation done.


In [17]:
annotations

[('I love this movie', 'positive'),
 ('I was really disappointed by the book', 'negative')]

##multi-label text classification

In [20]:
from pigeonXT import annotate
import pandas as pd

df = pd.DataFrame([
                   {'title': 'Star wars'},
                   {'title': 'The Positively True Adventures of the Alleged Texas Cheerleader-Murdering Mom'},
                   {'title': 'Eternal Sunshine of the Spotless Mind'},
                   {'title': 'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb'},
                   {'title': 'Killer klowns from outer space'},
])

labels = ['Adventure', 'Romance', 'Fantasy', 'Science fiction', 'Horror', 'Thriller']

annotations = annotate(df.title,
                       options = labels,
                       task_type = 'multilabel-classification',
                       buttons_in_a_row = 3,
                       reset_buttons_after_click = True,
                       include_skip =True)

HTML(value='0 examples annotated, 6 examples left')

VBox(children=(HBox(children=(ToggleButton(value=False, description='Adventure'), ToggleButton(value=False, de…

Output()

Annotation done.


In [22]:
annotations

[('Star wars', ['Adventure']),
 ('The Positively True Adventures of the Alleged Texas Cheerleader-Murdering Mom',
  ['Horror']),
 ('Eternal Sunshine of the Spotless Mind', ['Fantasy']),
 ('Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb',
  ['Thriller']),
 ('Killer klowns from outer space', ['Science fiction'])]

## Image labeling

In [23]:
from pigeonXT import annotate
from IPython.display import display, Image

annotations = annotate(
    ['/content/gdrive/My Drive/Kaggel Projects/pigeon/assets/img_example1.jpg', '/content/gdrive/My Drive/Kaggel Projects/pigeon/assets/img_example2.jpg'],
    options= ['cat','dog', 'hores'],
    display_fn= lambda filename: display(Image(filename))
)

HTML(value='0 examples annotated, 3 examples left')

HBox(children=(Button(description='cat', style=ButtonStyle()), Button(description='dog', style=ButtonStyle()),…

Output()

Annotation done.


In [24]:
annotations

[('/content/gdrive/My Drive/Kaggel Projects/pigeon/assets/img_example1.jpg',
  'cat'),
 ('/content/gdrive/My Drive/Kaggel Projects/pigeon/assets/img_example2.jpg',
  'dog')]

##multi-label text classification with custom hooks

In [25]:
    import pandas as pd
    import numpy as np
    from pathlib import Path


    df = pd.DataFrame([
        {'title': 'Star wars'},
        {'title': 'The Positively True Adventures of the Alleged Texas Cheerleader-Murdering Mom'},
        {'title': 'Eternal Sunshine of the Spotless Mind'},
        {'title': 'Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb'},
        {'title': 'Killer klowns from outer space'},
    ])

    labels = ['Adventure', 'Romance', 'Fantasy', 'Science fiction', 'Horror', 'Thriller']
    shortLabels = ['A', 'R', 'F', 'SF', 'H', 'T']

    df.to_csv('inputtestdata.csv', index=False)


    def setLabels(labels, numClasses):
        row = np.zeros([numClasses], dtype=np.uint8)
        row[labels] = 1
        return row

    def labelPortion(inputFile,
                     labels = ['yes', 'no'],
                     outputFile='output.csv',
                     portionSize=2,
                     textColumn='title',
                     shortLabels=None):
        if shortLabels == None:
            shortLabels = labels
        out = Path(outputFile)
        if out.exists():
            outdf = pd.read_csv(out)
            currentId = outdf.index.max() + 1
        else:
            currentId = 0
        indf = pd.read_csv(inputFile)
        examplesInFile = len(indf)
        indf = indf.loc[currentId:currentId + portionSize - 1]
        actualPortionSize = len(indf)
        print(f'{currentId + 1} - {currentId + actualPortionSize} of {examplesInFile}')
        sentences = indf[textColumn].tolist()

        for label in shortLabels:
            indf[label] = None

        def updateRow(ix, selectedLabels):
            print(ix, selectedLabels)
            labs = setLabels([labels.index(y) for y in selectedLabels], len(labels))
            indf.loc[indf.index.min()+ix, shortLabels] = labs

        def finalProcessing(annotations):
            if out.exists():
                prevdata = pd.read_csv(out)
                outdata = pd.concat([prevdata, indf]).reset_index(drop=True)
            else:
                outdata = indf.copy()
            outdata.to_csv(out, index=False)

        annotated = annotate( sentences,
                              options=labels,
                              task_type='multilabel-classification',
                              buttons_in_a_row=3,
                              reset_buttons_after_click=True,
                              include_skip=False,
                              example_process_fn=updateRow,
                              final_process_fn=finalProcessing)
        return indf


    annotations = labelPortion('inputtestdata.csv',
                               labels=labels,
                               shortLabels= shortLabels)

1 - 2 of 5


HTML(value='0 examples annotated, 3 examples left')

VBox(children=(HBox(children=(ToggleButton(value=False, description='Adventure'), ToggleButton(value=False, de…

Output()

0 ['Science fiction']
1 ['Romance']
Annotation done.


In [26]:
annotations

Unnamed: 0,title,A,R,F,SF,H,T
0,Star wars,0,0,0,1,0,0
1,The Positively True Adventures of the Alleged ...,0,1,0,0,0,0
