University of Zagreb\
Faculty of Electrical Engineering and Computing

## Text Analysis and Retrieval 2021/2022
https://www.fer.unizg.hr/predmet/apt/

------------------------------

### Basics of NLP

*Version: 1.1*

(c) 2022 Josip Jukić, Jan Šnajder

Submission deadline: **April 6, 2022, 23:59 CET** 

------------------------------

### Instructions

Hello visitor, this lab assignment consists of three parts. Your task boils down to filling out the missing parts of code and evaluating the cells. These parts are indicated by the "YOUR CODE HERE" template.

Each subtask is supplemented by several tests that you can run. Apart from that, there are additional test that will be executed after submition. If your solution is valid and it passes all of the visible tests, there shouldn't be any problems with the additional tests.

**IMPORTANT: Don't change the names of the predefined methods or random seeds**, because the tests won't be executed properly.

You're required to do this assignment **on your own**.

If you stumble upon problems, please refer to josip.jukic@fer.hr for office hours.

## Tasks

### 1. Preprocessing

In [670]:
import spacy
import numpy as np
import pandas as pd

We will use [spaCy](https://spacy.io/) exetensively in this assigment. You are advised to study the main aspects of this tool. You can go through the basics [here](https://spacy.io/usage/spacy-101). We recommend that you go through the procedures that we covered in the lectures: tokenization, lemmatization, part-of-speech (POS) tagging, and named entity recognition (NER).

Furthermore, we will rely on [NumPy](https://numpy.org/) and [pandas](https://pandas.pydata.org/) libraries. If you are not familiar with those libraries, we advise you to go through [this tutorial](https://www.hackerearth.com/practice/machine-learning/data-manipulation-visualisation-r-python/tutorial-data-manipulation-numpy-pandas-python/tutorial/).

In [671]:
# Load spacy model
nlp = spacy.load("en_core_web_sm")

#### (a)
Process the example below with spaCy. Tokenize the document and gather the tokens in a list. Finally, print the tokens.

In [672]:
ex1_a1 = (
    "A wizard is never late, Frodo Baggins. "
    "Nor is he early; he arrives precisely when he means to."
)

In [673]:
tokens_a1 = nlp.tokenizer(ex1_a1)
for token in tokens_a1:
    print(token)

A
wizard
is
never
late
,
Frodo
Baggins
.
Nor
is
he
early
;
he
arrives
precisely
when
he
means
to
.


#### (b)
Implement `sentencizer` using [spaCy](https://spacy.io/usage/linguistic-features).

In [674]:
def sentencizer(text):
    doc = nlp(text)
    return [str(x) for x in list(doc.sents)]

In [675]:
assert sentencizer("Sentence no. 1. Sentence no. 2.") == [
    "Sentence no. 1.",
    "Sentence no. 2.",
]

#### (c)

Implement `lemmatizer` using [spaCy](https://spacy.io/usage/linguistic-features).

In [676]:
def lemmatizer(text):
    doc = nlp(text)
    return [token.lemma_ for token in doc]

In [677]:
assert lemmatizer(ex1_a1) == [
    "a",
    "wizard",
    "be",
    "never",
    "late",
    ",",
    "Frodo",
    "Baggins",
    ".",
    "nor",
    "be",
    "he",
    "early",
    ";",
    "he",
    "arrive",
    "precisely",
    "when",
    "he",
    "mean",
    "to",
    ".",
]

#### (d)

Implement the `ngrams` methods. You might find the [`tee`](https://www.geeksforgeeks.org/python-itertools-tee/) method from the `itertools` package useful, but you're not obliged to use it. The method should return a generator. Plase refer to the [link](https://wiki.python.org/moin/Generators) if you aren't familiar with Python generators.

In [678]:
from itertools import tee


def ngrams(sequence, n, **kwargs):
    size = len(sequence)
    if "max_len" in kwargs and kwargs["max_len"] < size:
        size = kwargs["max_len"]

    result = []
    curr_n = n - 1
    for itter in tee(sequence, size - n + 1):
        result.append(tuple(list(itter)[curr_n - n + 1: curr_n + 1]))
        curr_n = curr_n + 1
    return result
    

In [679]:
assert list(ngrams(lemmatizer(ex1_a1), 2)) == [
    ("a", "wizard"),
    ("wizard", "be"),
    ("be", "never"),
    ("never", "late"),
    ("late", ","),
    (",", "Frodo"),
    ("Frodo", "Baggins"),
    ("Baggins", "."),
    (".", "nor"),
    ("nor", "be"),
    ("be", "he"),
    ("he", "early"),
    ("early", ";"),
    (";", "he"),
    ("he", "arrive"),
    ("arrive", "precisely"),
    ("precisely", "when"),
    ("when", "he"),
    ("he", "mean"),
    ("mean", "to"),
    ("to", "."),
]


### 2. News classification

#### (a)
Load the prepared BBC news data to a `pandas` dataframe named `df_bbc`. Explore the dataset structure.

In [680]:
import pandas as pd
df_bbc = pd.read_csv("bbc.csv")
print(df_bbc)

                                                  news           type
0    New 'yob' targets to be unveiled\n \n Fifty ne...       politics
1    Newcastle line up Babayaro\n \n Newcastle mana...          sport
2    Europe backs digital TV lifestyle\n \n How peo...           tech
3    Fears raised over ballet future\n \n Fewer chi...  entertainment
4    Barkley fit for match in Ireland\n \n England ...          sport
..                                                 ...            ...
195  Wales 'must learn health lessons'\n \n The new...       politics
196  Clarke to press on with ID cards\n \n New Home...       politics
197  Artists' secret postcards on sale\n \n Postcar...  entertainment
198  Lopez misses UK charity premiere\n \n Jennifer...  entertainment
199  February poll claim 'speculation'\n \n Reports...       politics

[200 rows x 2 columns]


#### (b)
To make the classification task a bit more challenging, we want to remove the news title from the text.\
Additionally, we will replace all whitespaces with single spaces. Implement title removal and whitespace replacement in `clean_text`.\
E.g., "This \n is  \t an &nbsp;&nbsp;&nbsp;&nbsp; example. " -> "This is an example."

In [681]:
def clean_text(text):
    """
    Removes news title and replaces all whitespaces with single spaces.
    Returns preprocessed text.
    """
    text = ''.join(text.split('\n')[1:])
    return ' '.join(text.split())


In [682]:
assert (
    clean_text("Breaking news\nClever Hans \t learns  to integrate.")
    == "Clever Hans learns to integrate."
)


In [683]:
df_bbc["text"] = df_bbc.news.apply(clean_text)

#### (c)
(1) Implement an abstract pipeline in `preprocess_pipe`. The method receieves a sequence of texts and a pipe function, which is used to preprocess documents in combination with the spaCy model `nlp` that we loaded at the beggining. We recommend you to use [`pipe`](https://spacy.io/usage/processing-pipelines).\
(2) Implement `lemmatize_pipe` that collects lemmas and returns a list of n-grams ranging from `ngram_min` to `ngram_max`. Additonally, **truncate** the documents to `max_len` tokens and **remove the stop words**. Refer to the tests below to see how this method should behave.

In [684]:
def lemmatize_pipe(doc, max_len, ngram_min, ngram_max):
    """
    Removes stopword, truncates the document to `max_len` tokens,
    and returns lemma n-grams in range [`ngram_min`, `ngram_max`].
    """
    result = []
    lemmas_no_sw = [token.lemma_ for token in doc if not token.is_stop]

    for n in range(ngram_min, ngram_max+1):
        n_grams = list(ngrams(lemmas_no_sw, n, max_len=max_len))
        result.extend(n_grams)
    return result

def preprocess_pipe(texts, pipe_fn):
    docs = nlp.pipe(texts)
    results = []
    for doc in docs:
        results.append(pipe_fn(doc))
    return results

In [685]:
from functools import partial


pipe_fn = partial(lemmatize_pipe, max_len=100, ngram_min=1, ngram_max=2)

ex2_c1 = ["Text no. 1", "Text no. 2"]
sol2_c1 = [
    [("text",), (".",), ("1",), ("text", "."), (".", "1")],
    [("text",), (".",), ("2",), ("text", "."), (".", "2")],
]

assert preprocess_pipe(ex2_c1, pipe_fn) == sol2_c1

ex2_c2 = [
    "It’s a dangerous business, Frodo, going out your door.",
    "You step onto the road, and if you don’t keep your feet, there’s no knowing where you might be swept off to.",
]
sol2_c2 = [
    [
        ("dangerous",),
        ("business",),
        (",",),
        ("Frodo",),
        (",",),
        ("go",),
        ("door",),
        (".",),
        ("dangerous", "business"),
        ("business", ","),
        (",", "Frodo"),
        ("Frodo", ","),
        (",", "go"),
        ("go", "door"),
        ("door", "."),
    ],
    [
        ("step",),
        ("road",),
        (",",),
        ("foot",),
        (",",),
        ("know",),
        ("sweep",),
        (".",),
        ("step", "road"),
        ("road", ","),
        (",", "foot"),
        ("foot", ","),
        (",", "know"),
        ("know", "sweep"),
        ("sweep", "."),
    ],
]

assert preprocess_pipe(ex2_c2, pipe_fn) == sol2_c2

In [686]:
from functools import partial
from sklearn.model_selection import train_test_split


pipe_fn = partial(lemmatize_pipe, max_len=100, ngram_min=1, ngram_max=2)

df_bbc["lemmas"] = preprocess_pipe(df_bbc.text, pipe_fn)
df_bbc_train, df_bbc_test = train_test_split(
    df_bbc[["lemmas", "type"]], test_size=0.2, random_state=42
)

In [687]:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer

# Load vectorizers
count_vectorizer = CountVectorizer(tokenizer=lambda doc: doc, lowercase=False, min_df=3)
tfidf_vectorizer = TfidfVectorizer(tokenizer=lambda doc: doc, lowercase=False, min_df=3)

#### (d)
Implement `train_lr`. Run `test_performance` with count and TF-IDF vectorizer. Compare the results.

In [688]:
from sklearn.linear_model import LogisticRegression as LR


def train_lr(df_train, vectorizer, lr_kwargs={"max_iter": 1000, "solver": "lbfgs"}):
    """
    Receives the train set `df_train` as pd.DataFrame and extracts lemma n-grams
    with their correspoding labels (news type).
    The text is vectorized and used to train a logistic regression with
    training arguments passed as `lr_kwargs`.
    Returns the fitted model.
    """
    X_test, y_test = df_train.lemmas, df_train.type
    X_vec = vectorizer.fit_transform(X_test)
    model = LR(max_iter=lr_kwargs["max_iter"], solver=lr_kwargs["solver"])
    return model.fit(X_vec, y_test) 

In [689]:
from sklearn.metrics import classification_report
from sklearn.metrics import f1_score


def test_performance(model, df_test, vectorizer):
    X_test, y_test = df_test.lemmas, df_test.type
    X_vec = vectorizer.transform(X_test)
    y_pred = model.predict(X_vec)
    print(classification_report(y_pred=y_pred, y_true=y_test))
    return f1_score(y_pred=y_pred, y_true=y_test, average="macro")

In [690]:
## Count vectorizer scenario
lr = train_lr(df_bbc_train, count_vectorizer)
f1 = test_performance(lr, df_bbc_test, count_vectorizer)
print(f"f1 = {f1:.3f}")

               precision    recall  f1-score   support

     business       0.92      1.00      0.96        11
entertainment       0.83      0.83      0.83         6
     politics       1.00      0.88      0.93         8
        sport       0.92      1.00      0.96        12
         tech       1.00      0.67      0.80         3

     accuracy                           0.93        40
    macro avg       0.93      0.88      0.90        40
 weighted avg       0.93      0.93      0.92        40

f1 = 0.897


In [691]:
## TF-IDF vectorizer scenario
lr = train_lr(df_bbc_train, tfidf_vectorizer)
f1 = test_performance(lr, df_bbc_test, tfidf_vectorizer)
print(f"f1 = {f1:.3f}")

               precision    recall  f1-score   support

     business       0.79      1.00      0.88        11
entertainment       1.00      0.67      0.80         6
     politics       1.00      0.88      0.93         8
        sport       0.92      1.00      0.96        12
         tech       1.00      0.67      0.80         3

     accuracy                           0.90        40
    macro avg       0.94      0.84      0.87        40
 weighted avg       0.92      0.90      0.90        40

f1 = 0.875


### 3. Named entity recognition

Named entity recognition (NER) is a NLP that seeks to classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, quantities, monetary values, percentages, etc. Refer to [Jurafsky \& Martin, Speech and Language Processing, Chapter 17](https://web.stanford.edu/~jurafsky/slp3/17.pdf) for additional information.

In this task, we will try out two approaches:
1. **classification**, where we classify named entities for each word in a document,
2. and **sequence labeling**, a more natural way to solve NER.

First, let's see spaCy's visualization tool `displacy` in action. We will take the first document from our data frame and render named entities with spaCy's default NER model. Although there are some minor innacuracies, spaCy's NER model generally performs very well (~90% accuracy).

In [692]:
from spacy import displacy


doc = nlp(df_bbc.news.iloc[0])
displacy.render(doc, style="ent", jupyter=True)

#### (a)
We want to use spaCy's deafult model to produce silver standard NER labels for our BBC news dataset. First step is to implement `entity_pipe`, a method that extracts POS tags and NER labels, which we will pass as an argument to `preprocess_pipe`. `entity_pipe` receives a spaCy document, extracts triplets in the form of (token, POS tag, named entity label), and returns the list of collected triplets. Refer to [spaCy's documention for NER](https://spacy.io/usage/linguistic-features#named-entities).

In [693]:
def entity_pipe(doc):
    ners = []
    for token in doc:
        if token.ent_type != 0:
            ners.append((token.text, token.tag_, f'{token.ent_iob_}-{token.ent_type_}'))
        else:
            ners.append((token.text, token.tag_, token.ent_iob_))
    return ners

In [694]:
from functools import partial


ex3_a1 = [
    "One does not simply walk into Mordor.",
    "What about second breakfast?",
]
sol3_a1 = [
    [
        ("One", "PRP", "O"),
        ("does", "VBZ", "O"),
        ("not", "RB", "O"),
        ("simply", "RB", "O"),
        ("walk", "VB", "O"),
        ("into", "IN", "O"),
        ("Mordor", "NNP", "B-ORG"),
        (".", ".", "O"),
    ],
    [
        ("What", "WP", "O"),
        ("about", "IN", "O"),
        ("second", "JJ", "B-ORDINAL"),
        ("breakfast", "NN", "O"),
        ("?", ".", "O"),
    ],
]
assert preprocess_pipe(ex3_a1, entity_pipe) == sol3_a1

We will only the first 50 documents to reduce the computational complexity.

In [695]:
df_bbc_trunc = df_bbc[:50].copy()

df_bbc_trunc["tags"] = preprocess_pipe(df_bbc_trunc["text"], entity_pipe)
data = sum(df_bbc_trunc["tags"], [])
tokens, pos, tags = zip(*data)
df_iob = pd.DataFrame({"token": tokens, "POS": pos, "tag": tags})
df_iob.head()

Unnamed: 0,token,POS,tag
0,Fifty,CD,B-CARDINAL
1,new,JJ,O
2,areas,NNS,O
3,getting,VBG,O
4,special,JJ,O


#### (b)
Vectorize the data in `df_iob` with [`DictVectorizer`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html). You can transform the datafframe to a dictionary with [`to_dict`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html). The structure of the dictionary should look like so: [{column -> value}, … , {column -> value}]. Refer to the linked documentation to see how to utilize the `orient` argument.
After vectorization, split the data using [`train_test_split`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), with `test_size=0.5` and `shuffle=False` to preserve the sentence structure. We are trying to classify named entites, so you can simply use the `tag` column from `df_iob` to extract labels. You can keep them in the string format.

In [696]:
from sklearn.feature_extraction import DictVectorizer

vec = DictVectorizer().fit_transform(df_iob.to_dict(orient='records'))
X_train, X_test, y_train, y_test = train_test_split(vec, df_iob.tag.values, test_size=0.5, shuffle=False)

You can train your classifier now. For this purpose, let's choose Multinomial Naïve Bayes (MNB). Since MNB can learn incrementally, notice that we train our model with [`partial_fit`](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html#sklearn.naive_bayes.MultinomialNB.partial_fit) to reduce the computational complexity.

In [697]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report

classes = np.unique(df_iob.tag.values).tolist()
nb = MultinomialNB()
nb.partial_fit(X_train, y_train, classes)

print(classification_report(y_pred=nb.predict(X_test), y_true=y_test, labels=classes))

               precision    recall  f1-score   support

   B-CARDINAL       0.84      0.95      0.89        83
       B-DATE       0.99      0.96      0.97       157
      B-EVENT       0.00      0.00      0.00         4
        B-FAC       0.00      0.00      0.00         2
        B-GPE       0.99      1.00      0.99       205
   B-LANGUAGE       0.00      0.00      0.00         0
        B-LAW       0.00      0.00      0.00         1
        B-LOC       0.00      0.00      0.00        25
      B-MONEY       1.00      0.64      0.78        44
       B-NORP       1.00      0.93      0.96        56
    B-ORDINAL       1.00      0.79      0.88        14
        B-ORG       0.74      1.00      0.85       217
    B-PERCENT       1.00      0.70      0.82        33
     B-PERSON       0.99      0.98      0.99       189
    B-PRODUCT       0.00      0.00      0.00         4
   B-QUANTITY       0.00      0.00      0.00         4
       B-TIME       0.00      0.00      0.00         8
B-WORK_OF

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


For non-sparse classes, the $F_1$ score should be close to $1$. The possible explanation is that spaCy's default NER model is rule-based, which makes it easy to learn. Remeber that we used spaCy to produce silver labels. To check how the classifier performs on human-annotated data, let's explore the next dataset "ner.csv".

In [698]:
df_ner = pd.read_csv("ner.csv", encoding="ISO-8859-1")
# Fill NaNs with preceding values (for the "Sentence #" column).
df_ner.fillna(method='ffill', inplace=True)

Repeat the same procedure as in **(b)** with [`DictVectorizer`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html) on `df_clf`. Use [`train_test_split`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), with `test_size=0.5` and `shuffle=False`.

In [699]:
df_clf = df_ner[["Word", "POS", "Tag"]]

vec = DictVectorizer().fit_transform(df_clf.to_dict(orient='records'))
X_train, X_test, y_train, y_test = train_test_split(vec, df_clf.Tag.values, test_size=0.5, shuffle=False)
classes = np.unique(df_clf.Tag.values).tolist()

In [700]:
nb = MultinomialNB()
nb.partial_fit(X_train, y_train, classes)

MultinomialNB()

Let's drop the `O` tag, since it is the most frequent tag and it is hard to interpret the performance quality when it is included. This will give us a more realistic `F_1` score. If you wish, you can compare the results by setting `labels=classes` instead of `labels=new_classes`. If your classifier performs terribly, that is expected, so don't worry.

In [701]:
new_classes = classes.copy()
new_classes.pop()
print(classification_report(y_pred=nb.predict(X_test), y_true=y_test, labels=new_classes))

  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

       B-art       0.00      0.00      0.00        27
       B-eve       0.00      0.00      0.00        14
       B-geo       0.96      0.99      0.98      1813
       B-gpe       0.98      1.00      0.99       772
       B-nat       0.00      0.00      0.00        12
       B-org       0.99      0.99      0.99       917
       B-per       1.00      1.00      1.00       879
       B-tim       0.98      0.99      0.98       943
       I-art       0.00      0.00      0.00        16
       I-eve       0.00      0.00      0.00        14
       I-geo       0.99      0.95      0.97       387
       I-gpe       0.00      0.00      0.00        20
       I-nat       0.00      0.00      0.00         2
       I-org       0.99      0.99      0.99       781
       I-per       0.99      1.00      1.00       915
       I-tim       1.00      0.66      0.80       310

   micro avg       0.98      0.97      0.97      7822
   macro avg       0.56   

Let's try to improve the performance with the sequence labeling approach. Specifically, we're going to use CRF. First, we have to prepare the sentence-level dataset.

In [702]:
from collections import Counter

import sklearn_crfsuite
from sklearn_crfsuite import scorers
from sklearn_crfsuite import metrics


sentences = df_ner.groupby("Sentence #").Word.agg(lambda s: " ".join(s)).values.tolist()
processed = preprocess_pipe(sentences, entity_pipe)

#### (c)
Implement missing features in `token2features`:
- -1:token.lower() = preceding token in lowercase
- -1:token.istitle() = is the preceding token a title
- -1:token.isupper() = is the preceding token a digit
- -1:postag = POS tag of the preceding token

Analogously, add the same features for succeeding tokens.

In [703]:
def token2features(sent, i):
    token = sent[i][0]
    postag = sent[i][1]

    features = {
        "bias": 1.0,
        "token.lower()": token.lower(),
        "token[-3:]": token[-3:],
        "token[-2:]": token[-2:],
        "token.isupper()": token.isupper(),
        "token.istitle()": token.istitle(),
        "token.isdigit()": token.isdigit(),
        "postag": postag,
        "postag[:2]": postag[:2],
    }
    if i > 0:
        features.update(
            {
                "-1:token.lower()": sent[i-1][0].lower(),
                "-1:token.istitle()": sent[i-1][0].istitle(),
                "-1:token.isupper()": sent[i-1][0].isupper(),
                "-1:postag": sent[i-1][1],
            }
        )
    else:
        features["BOS"] = True
    if i < len(sent) - 1:
        features.update(
            {
                "+1:token.lower()":  sent[i+1][0].lower(),
                "+1:token.istitle()": sent[i+1][0].istitle(),
                "+1:token.isupper()": sent[i+1][0].isupper(),
                "+1:postag":  sent[i+1][1],
            }
        )
    else:
        features["EOS"] = True
    return features


def sent2features(sent):
    return [token2features(sent, i) for i in range(len(sent))]


def sent2labels(sent):
    return [label for _, _, label in sent]


def sent2tokens(sent):
    return [token for token, _, _ in sent]

In [704]:
ex3_b1 = [
    ("Thousands", "NNS", "B-CARDINAL"),
    ("of", "IN", "O"),
    ("demonstrators", "NNS", "O"),
    ("have", "VBP", "O"),
    ("marched", "VBN", "O"),
    ("through", "IN", "O"),
    ("London", "NNP", "B-GPE"),
    ("to", "TO", "O"),
    ("protest", "VB", "O"),
    ("the", "DT", "O"),
    ("war", "NN", "O"),
    ("in", "IN", "O"),
    ("Iraq", "NNP", "B-GPE"),
    ("and", "CC", "O"),
    ("demand", "VB", "O"),
    ("the", "DT", "O"),
    ("withdrawal", "NN", "O"),
    ("of", "IN", "O"),
    ("British", "JJ", "B-NORP"),
    ("troops", "NNS", "O"),
    ("from", "IN", "O"),
    ("that", "DT", "O"),
    ("country", "NN", "O"),
    (".", ".", "O"),
]

sol3_b1 = {
    "bias": 1.0,
    "token.lower()": "through",
    "token[-3:]": "ugh",
    "token[-2:]": "gh",
    "token.isupper()": False,
    "token.istitle()": False,
    "token.isdigit()": False,
    "postag": "IN",
    "postag[:2]": "IN",
    "-1:token.lower()": "marched",
    "-1:token.istitle()": False,
    "-1:token.isupper()": False,
    "-1:postag": "VBN",
    "+1:token.lower()": "london",
    "+1:token.istitle()": True,
    "+1:token.isupper()": False,
    "+1:postag": "NNP",
}

assert sent2features(ex3_b1)[5] == sol3_b1

In [705]:
X = [sent2features(s) for s in processed]
y = [sent2labels(s) for s in processed]
new_classes = list(set(i for j in y for i in j))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=False)

If the training lasts longer than ~10 minutes, you can reduce `max_iterations`.

In [706]:
crf = sklearn_crfsuite.CRF(
    algorithm="lbfgs", c1=0.1, c2=0.1, max_iterations=100, all_possible_transitions=True
)
crf.fit(X_train, y_train)



CRF(algorithm='lbfgs', all_possible_transitions=True, c1=0.1, c2=0.1,
    keep_tempfiles=None, max_iterations=100)

CRF should heavily outperform our previous attempt with the classifier. Check the performance without the `O` tag. If you wish, you can see how $F_1$ changes if you include the `O` tag, simply by setting `labels=classes` in `flat_classification_report`. The benefits of solving NER as a sequence labeling task should be obvious after you inspect the margin of improvement.

In [707]:
y_pred = crf.predict(X_test)
print(metrics.flat_classification_report(y_test, y_pred, labels=new_classes))



               precision    recall  f1-score   support

        I-GPE       0.86      0.87      0.86       370
       I-NORP       0.83      0.69      0.75        75
    B-ORDINAL       0.85      0.93      0.89       107
        B-GPE       0.88      0.94      0.91      1749
     B-PERSON       0.83      0.79      0.81       796
       B-NORP       0.91      0.92      0.91       905
   B-CARDINAL       0.88      0.91      0.89       664
B-WORK_OF_ART       0.00      0.00      0.00        10
     I-PERSON       0.81      0.88      0.84       664
        B-LAW       0.00      0.00      0.00         3
   I-CARDINAL       0.91      0.92      0.92       375
      I-EVENT       0.77      0.67      0.71        60
       I-DATE       0.89      0.82      0.85       882
I-WORK_OF_ART       0.00      0.00      0.00        14
            O       0.98      0.99      0.99     40783
       B-DATE       0.93      0.89      0.91      1146
    B-PERCENT       0.80      0.78      0.79        46
        B

Let's explore the top (un)likely transitions. Can you spot any expected patterns?

In [708]:
top_n_trans = 20


def print_transitions(trans_features):
    for (label_from, label_to), weight in trans_features:
        print("%-14s -> %-14s: %0.5f" % (label_from, label_to, weight))


print("Top likely transitions:")
print_transitions(Counter(crf.transition_features_).most_common(top_n_trans))
print("\nTop unlikely transitions:")
print_transitions(Counter(crf.transition_features_).most_common()[-top_n_trans:])

Top likely transitions:
I-FAC          -> I-FAC         : 6.83986
I-CARDINAL     -> I-CARDINAL    : 6.62248
I-EVENT        -> I-EVENT       : 6.57318
B-PERSON       -> I-PERSON      : 6.32764
B-TIME         -> I-TIME        : 6.06083
I-GPE          -> I-GPE         : 6.05518
I-ORG          -> I-ORG         : 6.02829
B-PERCENT      -> I-PERCENT     : 5.97228
B-CARDINAL     -> I-CARDINAL    : 5.93175
B-LOC          -> I-LOC         : 5.89972
B-EVENT        -> I-EVENT       : 5.83768
I-PERSON       -> I-PERSON      : 5.82728
I-MONEY        -> I-MONEY       : 5.74150
B-QUANTITY     -> I-QUANTITY    : 5.57306
B-MONEY        -> I-MONEY       : 5.56706
I-DATE         -> I-DATE        : 5.51564
B-FAC          -> I-FAC         : 5.50932
B-WORK_OF_ART  -> I-WORK_OF_ART : 5.47687
B-DATE         -> I-DATE        : 5.40929
I-TIME         -> I-TIME        : 5.40436

Top unlikely transitions:
B-GPE          -> I-ORG         : -1.91419
B-NORP         -> B-ORG         : -1.95154
O              -> I-PER

Additionally, let's take a look at the most important features for specific tags.

In [709]:
top_n_feat = 30


def print_state_features(state_features):
    for (attr, label), weight in state_features:
        print("%0.5f %-14s %s" % (weight, label, attr))


print("Top positive:")
print_state_features(Counter(crf.state_features_).most_common(top_n_feat))

print()

print("Top negative:")
print_state_features(Counter(crf.state_features_).most_common()[-top_n_feat:])

Top positive:
5.61942 B-PERSON       -1:token.lower():mr.
4.99355 O              bias
4.96132 B-DATE         token[-3:]:day
4.45379 B-LOC          token.lower():asia
4.37708 B-CARDINAL     token.lower():millions
4.36302 O              BOS
4.22766 B-ORDINAL      token[-2:]:th
4.21113 I-DATE         token[-2:]:0s
4.19905 B-NORP         token.istitle()
4.09768 O              token.lower():president
3.73460 B-NORP         token.lower():shi'ite
3.68081 B-ORG          token.lower():taliban
3.67184 B-GPE          token.lower():ukrainian
3.56562 O              token.lower():minister
3.49082 B-ORG          token.lower():cholera
3.46878 B-LOC          token.lower():siberia
3.43886 B-PERSON       -1:token.lower():minister
3.43404 O              +1:token.lower():pacific
3.42720 B-NORP         token.lower():baluchistan
3.41009 B-CARDINAL     token.lower():dozens
3.39178 O              -1:token.lower():late
3.38578 B-ORG          token.lower():commonwealth
3.34223 I-DATE         -1:token.lower():las

Let's conclude this assignment with an overview of CRF feature importance using the `eli5` library.

In [710]:
import eli5

eli5.show_weights(crf, top=10)

From \ To,O,B-CARDINAL,I-CARDINAL,B-DATE,I-DATE,B-EVENT,I-EVENT,B-FAC,I-FAC,B-GPE,I-GPE,B-LANGUAGE,B-LAW,I-LAW,B-LOC,I-LOC,B-MONEY,I-MONEY,B-NORP,I-NORP,B-ORDINAL,B-ORG,I-ORG,B-PERCENT,I-PERCENT,B-PERSON,I-PERSON,B-PRODUCT,I-PRODUCT,B-QUANTITY,I-QUANTITY,B-TIME,I-TIME,B-WORK_OF_ART,I-WORK_OF_ART
O,3.655,1.707,-3.268,1.7,-4.522,0.638,-2.896,0.609,-2.369,1.898,-4.226,0.25,0.0,-2.255,0.929,-3.497,2.065,-2.746,1.454,-2.105,1.889,1.0,-5.029,1.014,-1.88,2.784,-2.009,1.14,-0.42,1.471,-2.052,1.361,-2.035,-0.028,-2.106
B-CARDINAL,1.263,-1.069,5.932,-1.084,-2.452,0.0,0.0,0.0,0.0,1.259,-0.007,0.0,0.0,0.0,0.0,0.0,0.0,-0.494,0.831,-0.0,0.0,0.673,-0.396,0.0,-0.82,0.138,0.0,0.0,0.0,0.0,-1.063,0.0,-0.623,0.0,0.0
I-CARDINAL,0.979,-0.827,6.622,-0.458,-0.879,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.042,0.526,0.0,0.0,-0.412,0.0,0.0,-0.371,0.634,0.0,0.0,0.0,0.0,-0.676,0.0,-0.23,0.0,0.0
B-DATE,0.462,-0.014,-0.375,-1.809,5.409,0.941,-0.254,0.0,0.0,0.61,-0.099,0.0,0.0,0.0,0.0,-0.008,0.0,-0.041,-0.468,0.0,-0.648,0.0,-0.922,0.0,-0.405,-0.09,-0.069,0.0,0.0,0.0,-0.096,1.027,-0.424,1.186,-0.279
I-DATE,-0.318,-1.023,-0.533,-1.069,5.516,0.0,-0.0,0.0,0.0,-0.133,0.0,0.0,0.0,0.0,0.0,0.0,-0.006,-0.537,-0.83,0.0,-0.786,-0.412,-0.205,0.187,-0.443,0.94,0.0,0.0,0.0,0.0,-0.03,0.0,-0.077,0.0,0.0
B-EVENT,-0.506,0.0,0.0,0.0,0.0,0.0,5.838,0.0,0.0,0.0,-0.112,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.289,0.0,0.0,-0.023,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I-EVENT,-0.211,0.0,0.0,0.081,-0.03,0.0,6.573,0.0,0.0,-0.127,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.803,0.0,0.0,0.0,-0.089,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B-FAC,0.0,0.0,0.0,0.029,0.0,0.0,0.0,0.0,5.509,0.0,-0.398,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.003,-0.485,0.0,0.0,-0.356,-0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I-FAC,-1.179,0.0,0.0,0.858,0.0,0.0,0.0,0.0,6.84,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.106,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
B-GPE,0.822,0.0,0.0,1.921,-0.742,0.0,-0.831,0.0,-0.538,0.0,4.979,0.0,0.0,-0.052,0.285,-0.969,1.151,-0.162,0.0,0.0,0.0,-1.059,-1.914,0.0,0.0,-1.124,-1.23,0.0,0.0,0.0,0.0,-0.686,-0.068,0.0,-0.354

Weight?,Feature,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Unnamed: 10_level_0,Unnamed: 11_level_0,Unnamed: 12_level_0,Unnamed: 13_level_0,Unnamed: 14_level_0,Unnamed: 15_level_0,Unnamed: 16_level_0,Unnamed: 17_level_0,Unnamed: 18_level_0,Unnamed: 19_level_0,Unnamed: 20_level_0,Unnamed: 21_level_0,Unnamed: 22_level_0,Unnamed: 23_level_0,Unnamed: 24_level_0,Unnamed: 25_level_0,Unnamed: 26_level_0,Unnamed: 27_level_0,Unnamed: 28_level_0,Unnamed: 29_level_0,Unnamed: 30_level_0,Unnamed: 31_level_0,Unnamed: 32_level_0,Unnamed: 33_level_0,Unnamed: 34_level_0
Weight?,Feature,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1
Weight?,Feature,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2
Weight?,Feature,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3,Unnamed: 28_level_3,Unnamed: 29_level_3,Unnamed: 30_level_3,Unnamed: 31_level_3,Unnamed: 32_level_3,Unnamed: 33_level_3,Unnamed: 34_level_3
Weight?,Feature,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4,Unnamed: 17_level_4,Unnamed: 18_level_4,Unnamed: 19_level_4,Unnamed: 20_level_4,Unnamed: 21_level_4,Unnamed: 22_level_4,Unnamed: 23_level_4,Unnamed: 24_level_4,Unnamed: 25_level_4,Unnamed: 26_level_4,Unnamed: 27_level_4,Unnamed: 28_level_4,Unnamed: 29_level_4,Unnamed: 30_level_4,Unnamed: 31_level_4,Unnamed: 32_level_4,Unnamed: 33_level_4,Unnamed: 34_level_4
Weight?,Feature,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,Unnamed: 6_level_5,Unnamed: 7_level_5,Unnamed: 8_level_5,Unnamed: 9_level_5,Unnamed: 10_level_5,Unnamed: 11_level_5,Unnamed: 12_level_5,Unnamed: 13_level_5,Unnamed: 14_level_5,Unnamed: 15_level_5,Unnamed: 16_level_5,Unnamed: 17_level_5,Unnamed: 18_level_5,Unnamed: 19_level_5,Unnamed: 20_level_5,Unnamed: 21_level_5,Unnamed: 22_level_5,Unnamed: 23_level_5,Unnamed: 24_level_5,Unnamed: 25_level_5,Unnamed: 26_level_5,Unnamed: 27_level_5,Unnamed: 28_level_5,Unnamed: 29_level_5,Unnamed: 30_level_5,Unnamed: 31_level_5,Unnamed: 32_level_5,Unnamed: 33_level_5,Unnamed: 34_level_5
Weight?,Feature,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,Unnamed: 7_level_6,Unnamed: 8_level_6,Unnamed: 9_level_6,Unnamed: 10_level_6,Unnamed: 11_level_6,Unnamed: 12_level_6,Unnamed: 13_level_6,Unnamed: 14_level_6,Unnamed: 15_level_6,Unnamed: 16_level_6,Unnamed: 17_level_6,Unnamed: 18_level_6,Unnamed: 19_level_6,Unnamed: 20_level_6,Unnamed: 21_level_6,Unnamed: 22_level_6,Unnamed: 23_level_6,Unnamed: 24_level_6,Unnamed: 25_level_6,Unnamed: 26_level_6,Unnamed: 27_level_6,Unnamed: 28_level_6,Unnamed: 29_level_6,Unnamed: 30_level_6,Unnamed: 31_level_6,Unnamed: 32_level_6,Unnamed: 33_level_6,Unnamed: 34_level_6
Weight?,Feature,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,Unnamed: 8_level_7,Unnamed: 9_level_7,Unnamed: 10_level_7,Unnamed: 11_level_7,Unnamed: 12_level_7,Unnamed: 13_level_7,Unnamed: 14_level_7,Unnamed: 15_level_7,Unnamed: 16_level_7,Unnamed: 17_level_7,Unnamed: 18_level_7,Unnamed: 19_level_7,Unnamed: 20_level_7,Unnamed: 21_level_7,Unnamed: 22_level_7,Unnamed: 23_level_7,Unnamed: 24_level_7,Unnamed: 25_level_7,Unnamed: 26_level_7,Unnamed: 27_level_7,Unnamed: 28_level_7,Unnamed: 29_level_7,Unnamed: 30_level_7,Unnamed: 31_level_7,Unnamed: 32_level_7,Unnamed: 33_level_7,Unnamed: 34_level_7
Weight?,Feature,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,Unnamed: 9_level_8,Unnamed: 10_level_8,Unnamed: 11_level_8,Unnamed: 12_level_8,Unnamed: 13_level_8,Unnamed: 14_level_8,Unnamed: 15_level_8,Unnamed: 16_level_8,Unnamed: 17_level_8,Unnamed: 18_level_8,Unnamed: 19_level_8,Unnamed: 20_level_8,Unnamed: 21_level_8,Unnamed: 22_level_8,Unnamed: 23_level_8,Unnamed: 24_level_8,Unnamed: 25_level_8,Unnamed: 26_level_8,Unnamed: 27_level_8,Unnamed: 28_level_8,Unnamed: 29_level_8,Unnamed: 30_level_8,Unnamed: 31_level_8,Unnamed: 32_level_8,Unnamed: 33_level_8,Unnamed: 34_level_8
Weight?,Feature,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,Unnamed: 10_level_9,Unnamed: 11_level_9,Unnamed: 12_level_9,Unnamed: 13_level_9,Unnamed: 14_level_9,Unnamed: 15_level_9,Unnamed: 16_level_9,Unnamed: 17_level_9,Unnamed: 18_level_9,Unnamed: 19_level_9,Unnamed: 20_level_9,Unnamed: 21_level_9,Unnamed: 22_level_9,Unnamed: 23_level_9,Unnamed: 24_level_9,Unnamed: 25_level_9,Unnamed: 26_level_9,Unnamed: 27_level_9,Unnamed: 28_level_9,Unnamed: 29_level_9,Unnamed: 30_level_9,Unnamed: 31_level_9,Unnamed: 32_level_9,Unnamed: 33_level_9,Unnamed: 34_level_9
Weight?,Feature,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,Unnamed: 11_level_10,Unnamed: 12_level_10,Unnamed: 13_level_10,Unnamed: 14_level_10,Unnamed: 15_level_10,Unnamed: 16_level_10,Unnamed: 17_level_10,Unnamed: 18_level_10,Unnamed: 19_level_10,Unnamed: 20_level_10,Unnamed: 21_level_10,Unnamed: 22_level_10,Unnamed: 23_level_10,Unnamed: 24_level_10,Unnamed: 25_level_10,Unnamed: 26_level_10,Unnamed: 27_level_10,Unnamed: 28_level_10,Unnamed: 29_level_10,Unnamed: 30_level_10,Unnamed: 31_level_10,Unnamed: 32_level_10,Unnamed: 33_level_10,Unnamed: 34_level_10
Weight?,Feature,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,Unnamed: 12_level_11,Unnamed: 13_level_11,Unnamed: 14_level_11,Unnamed: 15_level_11,Unnamed: 16_level_11,Unnamed: 17_level_11,Unnamed: 18_level_11,Unnamed: 19_level_11,Unnamed: 20_level_11,Unnamed: 21_level_11,Unnamed: 22_level_11,Unnamed: 23_level_11,Unnamed: 24_level_11,Unnamed: 25_level_11,Unnamed: 26_level_11,Unnamed: 27_level_11,Unnamed: 28_level_11,Unnamed: 29_level_11,Unnamed: 30_level_11,Unnamed: 31_level_11,Unnamed: 32_level_11,Unnamed: 33_level_11,Unnamed: 34_level_11
Weight?,Feature,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,Unnamed: 13_level_12,Unnamed: 14_level_12,Unnamed: 15_level_12,Unnamed: 16_level_12,Unnamed: 17_level_12,Unnamed: 18_level_12,Unnamed: 19_level_12,Unnamed: 20_level_12,Unnamed: 21_level_12,Unnamed: 22_level_12,Unnamed: 23_level_12,Unnamed: 24_level_12,Unnamed: 25_level_12,Unnamed: 26_level_12,Unnamed: 27_level_12,Unnamed: 28_level_12,Unnamed: 29_level_12,Unnamed: 30_level_12,Unnamed: 31_level_12,Unnamed: 32_level_12,Unnamed: 33_level_12,Unnamed: 34_level_12
Weight?,Feature,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,Unnamed: 14_level_13,Unnamed: 15_level_13,Unnamed: 16_level_13,Unnamed: 17_level_13,Unnamed: 18_level_13,Unnamed: 19_level_13,Unnamed: 20_level_13,Unnamed: 21_level_13,Unnamed: 22_level_13,Unnamed: 23_level_13,Unnamed: 24_level_13,Unnamed: 25_level_13,Unnamed: 26_level_13,Unnamed: 27_level_13,Unnamed: 28_level_13,Unnamed: 29_level_13,Unnamed: 30_level_13,Unnamed: 31_level_13,Unnamed: 32_level_13,Unnamed: 33_level_13,Unnamed: 34_level_13
Weight?,Feature,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,Unnamed: 15_level_14,Unnamed: 16_level_14,Unnamed: 17_level_14,Unnamed: 18_level_14,Unnamed: 19_level_14,Unnamed: 20_level_14,Unnamed: 21_level_14,Unnamed: 22_level_14,Unnamed: 23_level_14,Unnamed: 24_level_14,Unnamed: 25_level_14,Unnamed: 26_level_14,Unnamed: 27_level_14,Unnamed: 28_level_14,Unnamed: 29_level_14,Unnamed: 30_level_14,Unnamed: 31_level_14,Unnamed: 32_level_14,Unnamed: 33_level_14,Unnamed: 34_level_14
Weight?,Feature,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,Unnamed: 16_level_15,Unnamed: 17_level_15,Unnamed: 18_level_15,Unnamed: 19_level_15,Unnamed: 20_level_15,Unnamed: 21_level_15,Unnamed: 22_level_15,Unnamed: 23_level_15,Unnamed: 24_level_15,Unnamed: 25_level_15,Unnamed: 26_level_15,Unnamed: 27_level_15,Unnamed: 28_level_15,Unnamed: 29_level_15,Unnamed: 30_level_15,Unnamed: 31_level_15,Unnamed: 32_level_15,Unnamed: 33_level_15,Unnamed: 34_level_15
Weight?,Feature,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16,Unnamed: 17_level_16,Unnamed: 18_level_16,Unnamed: 19_level_16,Unnamed: 20_level_16,Unnamed: 21_level_16,Unnamed: 22_level_16,Unnamed: 23_level_16,Unnamed: 24_level_16,Unnamed: 25_level_16,Unnamed: 26_level_16,Unnamed: 27_level_16,Unnamed: 28_level_16,Unnamed: 29_level_16,Unnamed: 30_level_16,Unnamed: 31_level_16,Unnamed: 32_level_16,Unnamed: 33_level_16,Unnamed: 34_level_16
Weight?,Feature,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17,Unnamed: 10_level_17,Unnamed: 11_level_17,Unnamed: 12_level_17,Unnamed: 13_level_17,Unnamed: 14_level_17,Unnamed: 15_level_17,Unnamed: 16_level_17,Unnamed: 17_level_17,Unnamed: 18_level_17,Unnamed: 19_level_17,Unnamed: 20_level_17,Unnamed: 21_level_17,Unnamed: 22_level_17,Unnamed: 23_level_17,Unnamed: 24_level_17,Unnamed: 25_level_17,Unnamed: 26_level_17,Unnamed: 27_level_17,Unnamed: 28_level_17,Unnamed: 29_level_17,Unnamed: 30_level_17,Unnamed: 31_level_17,Unnamed: 32_level_17,Unnamed: 33_level_17,Unnamed: 34_level_17
Weight?,Feature,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18,Unnamed: 10_level_18,Unnamed: 11_level_18,Unnamed: 12_level_18,Unnamed: 13_level_18,Unnamed: 14_level_18,Unnamed: 15_level_18,Unnamed: 16_level_18,Unnamed: 17_level_18,Unnamed: 18_level_18,Unnamed: 19_level_18,Unnamed: 20_level_18,Unnamed: 21_level_18,Unnamed: 22_level_18,Unnamed: 23_level_18,Unnamed: 24_level_18,Unnamed: 25_level_18,Unnamed: 26_level_18,Unnamed: 27_level_18,Unnamed: 28_level_18,Unnamed: 29_level_18,Unnamed: 30_level_18,Unnamed: 31_level_18,Unnamed: 32_level_18,Unnamed: 33_level_18,Unnamed: 34_level_18
Weight?,Feature,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19,Unnamed: 10_level_19,Unnamed: 11_level_19,Unnamed: 12_level_19,Unnamed: 13_level_19,Unnamed: 14_level_19,Unnamed: 15_level_19,Unnamed: 16_level_19,Unnamed: 17_level_19,Unnamed: 18_level_19,Unnamed: 19_level_19,Unnamed: 20_level_19,Unnamed: 21_level_19,Unnamed: 22_level_19,Unnamed: 23_level_19,Unnamed: 24_level_19,Unnamed: 25_level_19,Unnamed: 26_level_19,Unnamed: 27_level_19,Unnamed: 28_level_19,Unnamed: 29_level_19,Unnamed: 30_level_19,Unnamed: 31_level_19,Unnamed: 32_level_19,Unnamed: 33_level_19,Unnamed: 34_level_19
Weight?,Feature,Unnamed: 2_level_20,Unnamed: 3_level_20,Unnamed: 4_level_20,Unnamed: 5_level_20,Unnamed: 6_level_20,Unnamed: 7_level_20,Unnamed: 8_level_20,Unnamed: 9_level_20,Unnamed: 10_level_20,Unnamed: 11_level_20,Unnamed: 12_level_20,Unnamed: 13_level_20,Unnamed: 14_level_20,Unnamed: 15_level_20,Unnamed: 16_level_20,Unnamed: 17_level_20,Unnamed: 18_level_20,Unnamed: 19_level_20,Unnamed: 20_level_20,Unnamed: 21_level_20,Unnamed: 22_level_20,Unnamed: 23_level_20,Unnamed: 24_level_20,Unnamed: 25_level_20,Unnamed: 26_level_20,Unnamed: 27_level_20,Unnamed: 28_level_20,Unnamed: 29_level_20,Unnamed: 30_level_20,Unnamed: 31_level_20,Unnamed: 32_level_20,Unnamed: 33_level_20,Unnamed: 34_level_20
Weight?,Feature,Unnamed: 2_level_21,Unnamed: 3_level_21,Unnamed: 4_level_21,Unnamed: 5_level_21,Unnamed: 6_level_21,Unnamed: 7_level_21,Unnamed: 8_level_21,Unnamed: 9_level_21,Unnamed: 10_level_21,Unnamed: 11_level_21,Unnamed: 12_level_21,Unnamed: 13_level_21,Unnamed: 14_level_21,Unnamed: 15_level_21,Unnamed: 16_level_21,Unnamed: 17_level_21,Unnamed: 18_level_21,Unnamed: 19_level_21,Unnamed: 20_level_21,Unnamed: 21_level_21,Unnamed: 22_level_21,Unnamed: 23_level_21,Unnamed: 24_level_21,Unnamed: 25_level_21,Unnamed: 26_level_21,Unnamed: 27_level_21,Unnamed: 28_level_21,Unnamed: 29_level_21,Unnamed: 30_level_21,Unnamed: 31_level_21,Unnamed: 32_level_21,Unnamed: 33_level_21,Unnamed: 34_level_21
Weight?,Feature,Unnamed: 2_level_22,Unnamed: 3_level_22,Unnamed: 4_level_22,Unnamed: 5_level_22,Unnamed: 6_level_22,Unnamed: 7_level_22,Unnamed: 8_level_22,Unnamed: 9_level_22,Unnamed: 10_level_22,Unnamed: 11_level_22,Unnamed: 12_level_22,Unnamed: 13_level_22,Unnamed: 14_level_22,Unnamed: 15_level_22,Unnamed: 16_level_22,Unnamed: 17_level_22,Unnamed: 18_level_22,Unnamed: 19_level_22,Unnamed: 20_level_22,Unnamed: 21_level_22,Unnamed: 22_level_22,Unnamed: 23_level_22,Unnamed: 24_level_22,Unnamed: 25_level_22,Unnamed: 26_level_22,Unnamed: 27_level_22,Unnamed: 28_level_22,Unnamed: 29_level_22,Unnamed: 30_level_22,Unnamed: 31_level_22,Unnamed: 32_level_22,Unnamed: 33_level_22,Unnamed: 34_level_22
Weight?,Feature,Unnamed: 2_level_23,Unnamed: 3_level_23,Unnamed: 4_level_23,Unnamed: 5_level_23,Unnamed: 6_level_23,Unnamed: 7_level_23,Unnamed: 8_level_23,Unnamed: 9_level_23,Unnamed: 10_level_23,Unnamed: 11_level_23,Unnamed: 12_level_23,Unnamed: 13_level_23,Unnamed: 14_level_23,Unnamed: 15_level_23,Unnamed: 16_level_23,Unnamed: 17_level_23,Unnamed: 18_level_23,Unnamed: 19_level_23,Unnamed: 20_level_23,Unnamed: 21_level_23,Unnamed: 22_level_23,Unnamed: 23_level_23,Unnamed: 24_level_23,Unnamed: 25_level_23,Unnamed: 26_level_23,Unnamed: 27_level_23,Unnamed: 28_level_23,Unnamed: 29_level_23,Unnamed: 30_level_23,Unnamed: 31_level_23,Unnamed: 32_level_23,Unnamed: 33_level_23,Unnamed: 34_level_23
Weight?,Feature,Unnamed: 2_level_24,Unnamed: 3_level_24,Unnamed: 4_level_24,Unnamed: 5_level_24,Unnamed: 6_level_24,Unnamed: 7_level_24,Unnamed: 8_level_24,Unnamed: 9_level_24,Unnamed: 10_level_24,Unnamed: 11_level_24,Unnamed: 12_level_24,Unnamed: 13_level_24,Unnamed: 14_level_24,Unnamed: 15_level_24,Unnamed: 16_level_24,Unnamed: 17_level_24,Unnamed: 18_level_24,Unnamed: 19_level_24,Unnamed: 20_level_24,Unnamed: 21_level_24,Unnamed: 22_level_24,Unnamed: 23_level_24,Unnamed: 24_level_24,Unnamed: 25_level_24,Unnamed: 26_level_24,Unnamed: 27_level_24,Unnamed: 28_level_24,Unnamed: 29_level_24,Unnamed: 30_level_24,Unnamed: 31_level_24,Unnamed: 32_level_24,Unnamed: 33_level_24,Unnamed: 34_level_24
Weight?,Feature,Unnamed: 2_level_25,Unnamed: 3_level_25,Unnamed: 4_level_25,Unnamed: 5_level_25,Unnamed: 6_level_25,Unnamed: 7_level_25,Unnamed: 8_level_25,Unnamed: 9_level_25,Unnamed: 10_level_25,Unnamed: 11_level_25,Unnamed: 12_level_25,Unnamed: 13_level_25,Unnamed: 14_level_25,Unnamed: 15_level_25,Unnamed: 16_level_25,Unnamed: 17_level_25,Unnamed: 18_level_25,Unnamed: 19_level_25,Unnamed: 20_level_25,Unnamed: 21_level_25,Unnamed: 22_level_25,Unnamed: 23_level_25,Unnamed: 24_level_25,Unnamed: 25_level_25,Unnamed: 26_level_25,Unnamed: 27_level_25,Unnamed: 28_level_25,Unnamed: 29_level_25,Unnamed: 30_level_25,Unnamed: 31_level_25,Unnamed: 32_level_25,Unnamed: 33_level_25,Unnamed: 34_level_25
Weight?,Feature,Unnamed: 2_level_26,Unnamed: 3_level_26,Unnamed: 4_level_26,Unnamed: 5_level_26,Unnamed: 6_level_26,Unnamed: 7_level_26,Unnamed: 8_level_26,Unnamed: 9_level_26,Unnamed: 10_level_26,Unnamed: 11_level_26,Unnamed: 12_level_26,Unnamed: 13_level_26,Unnamed: 14_level_26,Unnamed: 15_level_26,Unnamed: 16_level_26,Unnamed: 17_level_26,Unnamed: 18_level_26,Unnamed: 19_level_26,Unnamed: 20_level_26,Unnamed: 21_level_26,Unnamed: 22_level_26,Unnamed: 23_level_26,Unnamed: 24_level_26,Unnamed: 25_level_26,Unnamed: 26_level_26,Unnamed: 27_level_26,Unnamed: 28_level_26,Unnamed: 29_level_26,Unnamed: 30_level_26,Unnamed: 31_level_26,Unnamed: 32_level_26,Unnamed: 33_level_26,Unnamed: 34_level_26
Weight?,Feature,Unnamed: 2_level_27,Unnamed: 3_level_27,Unnamed: 4_level_27,Unnamed: 5_level_27,Unnamed: 6_level_27,Unnamed: 7_level_27,Unnamed: 8_level_27,Unnamed: 9_level_27,Unnamed: 10_level_27,Unnamed: 11_level_27,Unnamed: 12_level_27,Unnamed: 13_level_27,Unnamed: 14_level_27,Unnamed: 15_level_27,Unnamed: 16_level_27,Unnamed: 17_level_27,Unnamed: 18_level_27,Unnamed: 19_level_27,Unnamed: 20_level_27,Unnamed: 21_level_27,Unnamed: 22_level_27,Unnamed: 23_level_27,Unnamed: 24_level_27,Unnamed: 25_level_27,Unnamed: 26_level_27,Unnamed: 27_level_27,Unnamed: 28_level_27,Unnamed: 29_level_27,Unnamed: 30_level_27,Unnamed: 31_level_27,Unnamed: 32_level_27,Unnamed: 33_level_27,Unnamed: 34_level_27
Weight?,Feature,Unnamed: 2_level_28,Unnamed: 3_level_28,Unnamed: 4_level_28,Unnamed: 5_level_28,Unnamed: 6_level_28,Unnamed: 7_level_28,Unnamed: 8_level_28,Unnamed: 9_level_28,Unnamed: 10_level_28,Unnamed: 11_level_28,Unnamed: 12_level_28,Unnamed: 13_level_28,Unnamed: 14_level_28,Unnamed: 15_level_28,Unnamed: 16_level_28,Unnamed: 17_level_28,Unnamed: 18_level_28,Unnamed: 19_level_28,Unnamed: 20_level_28,Unnamed: 21_level_28,Unnamed: 22_level_28,Unnamed: 23_level_28,Unnamed: 24_level_28,Unnamed: 25_level_28,Unnamed: 26_level_28,Unnamed: 27_level_28,Unnamed: 28_level_28,Unnamed: 29_level_28,Unnamed: 30_level_28,Unnamed: 31_level_28,Unnamed: 32_level_28,Unnamed: 33_level_28,Unnamed: 34_level_28
Weight?,Feature,Unnamed: 2_level_29,Unnamed: 3_level_29,Unnamed: 4_level_29,Unnamed: 5_level_29,Unnamed: 6_level_29,Unnamed: 7_level_29,Unnamed: 8_level_29,Unnamed: 9_level_29,Unnamed: 10_level_29,Unnamed: 11_level_29,Unnamed: 12_level_29,Unnamed: 13_level_29,Unnamed: 14_level_29,Unnamed: 15_level_29,Unnamed: 16_level_29,Unnamed: 17_level_29,Unnamed: 18_level_29,Unnamed: 19_level_29,Unnamed: 20_level_29,Unnamed: 21_level_29,Unnamed: 22_level_29,Unnamed: 23_level_29,Unnamed: 24_level_29,Unnamed: 25_level_29,Unnamed: 26_level_29,Unnamed: 27_level_29,Unnamed: 28_level_29,Unnamed: 29_level_29,Unnamed: 30_level_29,Unnamed: 31_level_29,Unnamed: 32_level_29,Unnamed: 33_level_29,Unnamed: 34_level_29
Weight?,Feature,Unnamed: 2_level_30,Unnamed: 3_level_30,Unnamed: 4_level_30,Unnamed: 5_level_30,Unnamed: 6_level_30,Unnamed: 7_level_30,Unnamed: 8_level_30,Unnamed: 9_level_30,Unnamed: 10_level_30,Unnamed: 11_level_30,Unnamed: 12_level_30,Unnamed: 13_level_30,Unnamed: 14_level_30,Unnamed: 15_level_30,Unnamed: 16_level_30,Unnamed: 17_level_30,Unnamed: 18_level_30,Unnamed: 19_level_30,Unnamed: 20_level_30,Unnamed: 21_level_30,Unnamed: 22_level_30,Unnamed: 23_level_30,Unnamed: 24_level_30,Unnamed: 25_level_30,Unnamed: 26_level_30,Unnamed: 27_level_30,Unnamed: 28_level_30,Unnamed: 29_level_30,Unnamed: 30_level_30,Unnamed: 31_level_30,Unnamed: 32_level_30,Unnamed: 33_level_30,Unnamed: 34_level_30
Weight?,Feature,Unnamed: 2_level_31,Unnamed: 3_level_31,Unnamed: 4_level_31,Unnamed: 5_level_31,Unnamed: 6_level_31,Unnamed: 7_level_31,Unnamed: 8_level_31,Unnamed: 9_level_31,Unnamed: 10_level_31,Unnamed: 11_level_31,Unnamed: 12_level_31,Unnamed: 13_level_31,Unnamed: 14_level_31,Unnamed: 15_level_31,Unnamed: 16_level_31,Unnamed: 17_level_31,Unnamed: 18_level_31,Unnamed: 19_level_31,Unnamed: 20_level_31,Unnamed: 21_level_31,Unnamed: 22_level_31,Unnamed: 23_level_31,Unnamed: 24_level_31,Unnamed: 25_level_31,Unnamed: 26_level_31,Unnamed: 27_level_31,Unnamed: 28_level_31,Unnamed: 29_level_31,Unnamed: 30_level_31,Unnamed: 31_level_31,Unnamed: 32_level_31,Unnamed: 33_level_31,Unnamed: 34_level_31
Weight?,Feature,Unnamed: 2_level_32,Unnamed: 3_level_32,Unnamed: 4_level_32,Unnamed: 5_level_32,Unnamed: 6_level_32,Unnamed: 7_level_32,Unnamed: 8_level_32,Unnamed: 9_level_32,Unnamed: 10_level_32,Unnamed: 11_level_32,Unnamed: 12_level_32,Unnamed: 13_level_32,Unnamed: 14_level_32,Unnamed: 15_level_32,Unnamed: 16_level_32,Unnamed: 17_level_32,Unnamed: 18_level_32,Unnamed: 19_level_32,Unnamed: 20_level_32,Unnamed: 21_level_32,Unnamed: 22_level_32,Unnamed: 23_level_32,Unnamed: 24_level_32,Unnamed: 25_level_32,Unnamed: 26_level_32,Unnamed: 27_level_32,Unnamed: 28_level_32,Unnamed: 29_level_32,Unnamed: 30_level_32,Unnamed: 31_level_32,Unnamed: 32_level_32,Unnamed: 33_level_32,Unnamed: 34_level_32
Weight?,Feature,Unnamed: 2_level_33,Unnamed: 3_level_33,Unnamed: 4_level_33,Unnamed: 5_level_33,Unnamed: 6_level_33,Unnamed: 7_level_33,Unnamed: 8_level_33,Unnamed: 9_level_33,Unnamed: 10_level_33,Unnamed: 11_level_33,Unnamed: 12_level_33,Unnamed: 13_level_33,Unnamed: 14_level_33,Unnamed: 15_level_33,Unnamed: 16_level_33,Unnamed: 17_level_33,Unnamed: 18_level_33,Unnamed: 19_level_33,Unnamed: 20_level_33,Unnamed: 21_level_33,Unnamed: 22_level_33,Unnamed: 23_level_33,Unnamed: 24_level_33,Unnamed: 25_level_33,Unnamed: 26_level_33,Unnamed: 27_level_33,Unnamed: 28_level_33,Unnamed: 29_level_33,Unnamed: 30_level_33,Unnamed: 31_level_33,Unnamed: 32_level_33,Unnamed: 33_level_33,Unnamed: 34_level_33
Weight?,Feature,Unnamed: 2_level_34,Unnamed: 3_level_34,Unnamed: 4_level_34,Unnamed: 5_level_34,Unnamed: 6_level_34,Unnamed: 7_level_34,Unnamed: 8_level_34,Unnamed: 9_level_34,Unnamed: 10_level_34,Unnamed: 11_level_34,Unnamed: 12_level_34,Unnamed: 13_level_34,Unnamed: 14_level_34,Unnamed: 15_level_34,Unnamed: 16_level_34,Unnamed: 17_level_34,Unnamed: 18_level_34,Unnamed: 19_level_34,Unnamed: 20_level_34,Unnamed: 21_level_34,Unnamed: 22_level_34,Unnamed: 23_level_34,Unnamed: 24_level_34,Unnamed: 25_level_34,Unnamed: 26_level_34,Unnamed: 27_level_34,Unnamed: 28_level_34,Unnamed: 29_level_34,Unnamed: 30_level_34,Unnamed: 31_level_34,Unnamed: 32_level_34,Unnamed: 33_level_34,Unnamed: 34_level_34
+4.994,bias,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+4.363,BOS,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+4.098,token.lower():president,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+3.566,token.lower():minister,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+3.434,+1:token.lower():pacific,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+3.392,-1:token.lower():late,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
+3.291,token.lower():secretary,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
… 1775 more positive …,… 1775 more positive …,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
… 931 more negative …,… 931 more negative …,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
-3.128,postag[:2]:CD,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Weight?,Feature
+4.994,bias
+4.363,BOS
+4.098,token.lower():president
+3.566,token.lower():minister
+3.434,+1:token.lower():pacific
+3.392,-1:token.lower():late
+3.291,token.lower():secretary
… 1775 more positive …,… 1775 more positive …
… 931 more negative …,… 931 more negative …
-3.128,postag[:2]:CD

Weight?,Feature
+4.377,token.lower():millions
+3.410,token.lower():dozens
+2.707,postag[:2]:CD
+2.707,postag:CD
+2.703,token.lower():hundreds
+2.444,token.lower():thousands
+2.360,BOS
+2.222,token[-3:]:eds
+2.051,token[-2:]:ns
+2.018,token[-3:]:/09

Weight?,Feature
+2.292,+1:token.lower():of
+1.751,-1:token.lower():least
+1.738,token.lower():dozen
+1.737,token[-3:]:zen
+1.650,+1:token.lower():thousands
+1.625,-1:token.lower():several
+1.603,-1:token.lower():number
+1.506,postag[:2]:CD
… 180 more positive …,… 180 more positive …
… 34 more negative …,… 34 more negative …

Weight?,Feature
+4.961,token[-3:]:day
+2.860,token.lower():january
+2.812,token[-3:]:ber
+2.797,token[-2:]:ay
+2.747,token.lower():winter
+2.708,token.lower():annual
+2.667,token.lower():august
+2.655,token.lower():weeks
+2.655,token[-3:]:eks
+2.593,-1:token.lower():this

Weight?,Feature
+4.211,token[-2:]:0s
+3.342,-1:token.lower():last
+2.899,token.lower():months
+2.815,token.lower():years
+2.781,token.lower():year
+2.763,token[-3:]:day
+2.537,token[-2:]:ay
+2.527,token.lower():day
+2.423,-1:token.lower():next
+2.119,token[-3:]:eks

Weight?,Feature
+1.889,token.lower():katrina
+1.818,+1:token.lower():had
+1.815,token.lower():olympics
+1.643,token.lower():hurricane
+1.640,token[-3:]:ane
+1.608,token[-2:]:cs
+1.608,token[-3:]:ics
+1.498,+1:token.lower():australian
+1.225,token[-3:]:ina
+1.218,postag:NNPS

Weight?,Feature
+2.019,token[-2:]:ar
+1.983,-1:token.lower():year
+1.772,-1:token.lower():the
+1.764,-1:token.lower():hurricane
+1.724,token.lower():war
+1.505,-1:postag:NNP
+1.502,+1:token.isupper()
+1.358,+1:token.lower():ii
+1.330,+1:token.lower():war
+1.199,-1:token.istitle()

Weight?,Feature
+2.675,token.lower():mosques
+2.448,token.lower():vatican
+2.356,token[-3:]:ues
+1.909,-1:token.lower():at
+1.728,token[-3:]:can
+1.592,token[-3:]:amo
+1.592,token.lower():guantanamo
+1.591,token[-2:]:mo
+1.583,+1:postag:NNP
+1.442,token[-2:]:es

Weight?,Feature
+1.264,-1:postag:NNP
+1.201,postag:NNP
+1.157,-1:token.lower():the
+1.132,postag[:2]:NN
+1.024,-1:postag:DT
+0.955,-1:token.lower():dharmeratnam
+0.954,token.lower():sivaram
+0.946,token.lower():moment
+0.903,+1:token.lower():when
+0.899,+1:postag:WRB

Weight?,Feature
+3.672,token.lower():ukrainian
+3.114,token.lower():washington
+3.101,token[-3:]:ona
+3.030,token.lower():krona
+2.850,+1:token.lower():united
+2.822,token.lower():china
+2.789,token.lower():iraq
+2.672,token.lower():sudan
+2.469,token.lower():philippines
+2.443,token.lower():israel

Weight?,Feature
+2.849,token.lower():province
+2.607,-1:token.lower():west
+2.420,-1:token.lower():new
+2.289,-1:token.lower():south
+2.169,-1:token.lower():north
+2.150,token.lower():city
+2.093,-1:token.lower():tangshan
+2.071,-1:token.lower():himalayan
+2.000,-1:token.lower():lanka
… 297 more positive …,… 297 more positive …

Weight?,Feature
+1.871,+1:token.lower():language
+1.792,token.lower():english
+1.663,token[-3:]:ish
+1.656,+1:postag:HYPH
+1.627,token[-2:]:sh
+1.584,token.istitle()
+1.492,postag:JJ
+1.413,+1:token.lower():-
+1.311,token[-3:]:bic
+1.311,token.lower():arabic

Weight?,Feature
+1.232,+1:postag:NNP
+1.100,+1:token.lower():universal
+0.921,+1:token.lower():european
+0.901,-1:token.lower():on
+0.790,+1:token.lower():official
+0.772,+1:token.istitle()
+0.747,-1:token.lower():under
+0.736,-1:postag:IN
+0.658,token[-3:]:the
+0.593,-1:token.lower():from

Weight?,Feature
+1.348,token[-3:]:ion
+1.075,token.istitle()
+0.949,-1:postag:DT
+0.915,token[-2:]:ts
+0.883,-1:token.lower():the
+0.863,token[-2:]:al
+0.792,+1:token.lower():constitution
+0.791,token.lower():constitution
+0.753,token[-2:]:on
+0.747,+1:token.lower():would

Weight?,Feature
+4.454,token.lower():asia
+3.469,token.lower():siberia
+3.035,token.lower():africa
+3.027,token.lower():europe
+2.528,token.lower():jupiter
+2.009,token.istitle()
+1.989,token[-3:]:ope
+1.889,token[-2:]:pe
+1.880,token[-2:]:th
+1.860,token[-3:]:sia

Weight?,Feature
+2.026,token.istitle()
+1.982,token.lower():america
+1.772,-1:token.lower():middle
+1.391,token.lower():anbar
+1.380,+1:token.lower():earthquake
+1.364,+1:token.lower():following
+1.326,+1:token.lower():anbar
+1.222,-1:token.lower():the
+1.222,-1:token.lower():central
+1.130,token.lower():valley

Weight?,Feature
+2.347,-1:postag:$
+2.347,-1:token.lower():$
+1.193,token.lower():millions
+1.192,-1:token.lower():pay
+1.123,+1:postag:DT
+0.972,token.lower():billion
+0.942,-1:postag:VB
+0.909,postag:RB
+0.908,+1:token.lower():750
… 95 more positive …,… 95 more positive …

Weight?,Feature
+1.814,-1:token.lower():$
+1.814,-1:postag:$
+1.700,+1:token.lower():dollars
+1.695,token.lower():dollars
+1.280,postag:NNS
+1.256,+1:postag:CD
+1.181,token.lower():cents
+1.165,-1:postag:RB
+1.165,token[-3:]:ars
+1.076,postag[:2]:CD

Weight?,Feature
+4.199,token.istitle()
+3.735,token.lower():shi'ite
+3.427,token.lower():baluchistan
+3.148,token.lower():tajikistan
+2.755,token[-3:]:ite
+2.534,token[-2:]:an
+2.447,token.lower():democrat
+2.402,token[-2:]:ni
+2.389,token[-3:]:ese
… 333 more positive …,… 333 more positive …

Weight?,Feature
+2.170,-1:token.lower():-
+1.812,postag:NNPS
+1.756,token[-3:]:can
+1.717,token.istitle()
+1.480,token.lower():democrats
+1.462,-1:token.lower():east
+1.430,-1:token.lower():social
+1.407,-1:token.lower():sunni
+1.377,-1:token.lower():roman
+1.286,token[-2:]:an

Weight?,Feature
+4.228,token[-2:]:th
+3.198,token.lower():first
+2.828,token[-3:]:rst
+2.682,token.lower():third
+2.321,token[-3:]:ird
+2.263,token.lower():second
+2.259,token[-3:]:ond
+2.069,token.lower():fourth
+2.025,token[-2:]:rd
+1.941,token[-2:]:nd

Weight?,Feature
+3.681,token.lower():taliban
+3.491,token.lower():cholera
+3.386,token.lower():commonwealth
+3.251,token[-3:]:era
+3.129,token.lower():hezbollah
+3.085,token.lower():reuters
+2.966,token.isupper()
+2.842,token.lower():parliament
+2.776,token.lower():congress
+2.629,token.lower():hamas

Weight?,Feature
+2.728,token[-3:]:ion
+2.503,+1:token.lower():security
+2.480,-1:token.lower():news
+2.162,token.lower():nations
+2.093,+1:token.lower():geological
+1.931,-1:token.lower():xinhua
+1.918,token.lower():department
+1.883,-1:token.lower():the
+1.823,+1:token.lower():navy
+1.821,token[-2:]:ly

Weight?,Feature
+1.405,+1:token.lower():percent
+1.218,postag:RB
+1.218,+1:token.lower():%
+1.077,postag[:2]:RB
+1.051,+1:postag:NN
+0.945,postag:CD
+0.945,postag[:2]:CD
+0.830,+1:postag:IN
+0.582,-1:token.lower():by
+0.544,postag:JJR

Weight?,Feature
+2.336,token.lower():percent
+2.076,token.lower():%
+2.076,token[-3:]:%
+2.076,token[-2:]:%
+2.039,-1:postag:CD
+1.746,+1:token.lower():percent
+1.708,token[-3:]:ent
+1.703,token[-2:]:nt
+1.702,-1:postag:RB
+1.521,postag:NN

Weight?,Feature
+5.619,-1:token.lower():mr.
+3.439,-1:token.lower():minister
+3.222,-1:token.lower():president
+2.847,-1:token.lower():ms.
+2.779,+1:token.lower():administration
+2.648,token.lower():gotovina
+2.565,token.lower():astypaleia
+2.538,token.lower():rice
+2.514,token.lower():orakzai
+2.502,-1:token.lower():old

Weight?,Feature
+1.940,-1:postag:HYPH
+1.929,+1:token.lower():staffers
+1.896,-1:token.lower():hussein
+1.824,-1:token.lower():gagarin
+1.653,+1:postag::
+1.626,+1:postag:VBZ
+1.619,+1:token.lower():reputed
+1.614,-1:token.lower():delp
+1.610,+1:token.lower():family
+1.563,token[-3:]:son

Weight?,Feature
+2.798,token.lower():twitter
+2.334,token.lower():discovery
+2.141,token[-3:]:ery
+1.926,+1:token.lower():never
+1.671,token[-3:]:ter
+1.588,token[-2:]:ry
+1.408,+1:token.lower():katif
+1.408,token.lower():gush
+1.386,token[-2:]:47
+1.350,token[-3:]:rsk

Weight?,Feature
+1.231,token[-2:]:if
+1.196,-1:token.lower():gush
+1.196,token[-3:]:tif
+1.196,token.lower():katif
+1.195,+1:token.lower():bloc
+0.696,+1:postag:NN
+0.584,-1:postag:NNP
+0.565,-1:token.istitle()
+0.374,postag:NNP
+0.257,token.istitle()

Weight?,Feature
+1.103,+1:postag:CD
+1.075,-1:postag:VB
+0.805,+1:token.lower():barrels
+0.801,"token.lower():2,00,000"
+0.792,-1:token.lower():process
+0.766,+1:postag:HYPH
+0.749,-1:token.lower():a
+0.710,token.lower():185
+0.710,token[-3:]:185
+0.707,-1:token.lower():had

Weight?,Feature
+2.466,token.lower():kilometers
+2.217,token.lower():kilometer
+2.082,-1:postag:CD
+1.764,+1:token.lower():kilometers
+1.633,token[-3:]:ter
+1.564,token.lower():barrels
+1.530,postag[:2]:NN
+1.527,token[-2:]:00
+1.430,token.lower():tons
+1.340,+1:token.lower():of

Weight?,Feature
+3.076,token.lower():morning
+2.482,-1:token.lower():within
+2.422,-1:token.lower():tense
+2.001,token.lower():afternoon
+1.958,+1:token.lower():wednesday
+1.916,+1:token.lower():departure
+1.870,token.lower():seconds
+1.839,-1:postag:NNP
+1.801,token.lower():saturday
+1.739,token.lower():meters

Weight?,Feature
+2.131,-1:token.lower():saturday
+1.898,token.lower():wednesday
+1.869,-1:postag:RBR
+1.781,token.lower():evening
+1.763,token[-3:]:urs
+1.763,token.lower():hours
+1.745,-1:token.lower():late
+1.507,-1:postag:CD
+1.506,token.lower():night
+1.491,-1:token.lower():earlier

Weight?,Feature
+1.865,-1:token.lower():under
+1.070,+1:token.lower():nobel
+0.953,token.istitle()
+0.926,-1:token.lower():rockers
+0.926,+1:token.lower():sex
+0.913,+1:postag:NNP
+0.882,token[-3:]:day
+0.863,+1:token.lower():show
+0.841,token.lower():today
+0.841,+1:token.istitle()

Weight?,Feature
+1.106,-1:token.istitle()
+1.067,-1:postag:DT
+0.911,-1:postag:NNP
+0.787,postag[:2]:IN
+0.787,postag:IN
+0.734,token.lower():prize
+0.733,token[-3:]:ize
+0.723,bias
+0.702,token.istitle()
+0.696,+1:token.lower():rebel
