# Lab3 - Assignment Sentiment

Copyright: Vrije Universiteit Amsterdam, Faculty of Humanities, CLTL

This notebook describes the LAB-2 assignment of the Text Mining course. It is about sentiment analysis.

The aims of the assignment are:
* Learn how to run a rule-based sentiment analysis module (VADER)
* Learn how to run a machine learning sentiment analysis module (Scikit-Learn/ Naive Bayes)
* Learn how to run scikit-learn metrics for the quantitative evaluation
* Learn how to perform and interpret a quantitative evaluation of the outcomes of the tools (in terms of Precision, Recall, and F<sub>1</sub>)
* Learn how to evaluate the results qualitatively (by examining the data) 
* Get insight into differences between the two applied methods
* Get insight into the effects of using linguistic preprocessing
* Be able to describe differences between the two methods in terms of their results
* Get insight into issues when applying these methods across different  domains

In this assignment, you are going to create your own gold standard set from 50 tweets. You will the VADER and scikit-learn classifiers to these tweets and evaluate the results by using evaluation metrics and inspecting the data.

We recommend you go through the notebooks in the following order:
* **Read the assignment (see below)**
* **Lab3.2-Sentiment-analysis-with-VADER.ipynb**
* **Lab3.3-Sentiment-analysis.with-scikit-learn.ipynb**
* **Answer the questions of the assignment (see below) using the provided notebooks and submit**

In this assignment you are asked to perform both quantitative evaluations and error analyses:
* a quantitative evaluation concerns the scores (Precision, Recall, and F<sub>1</sub>) provided by scikit's classification_report. It includes the scores per category, as well as micro and macro averages. Discuss whether the scores are balanced or not between the different categories (positive, negative, neutral) and between precision and recall. Discuss the shortcomings (if any) of the classifier based on these scores
* an error analysis regarding the misclassifications of the classifier. It involves going through the texts and trying to understand what has gone wrong. It servers to get insight in what could be done to improve the performance of the classifier. Do you observe patterns in misclassifications?  Discuss why these errors are made and propose ways to solve them.

## Credits
The notebooks in this block have been originally created by [Marten Postma](https://martenpostma.github.io) and [Isa Maks](https://research.vu.nl/en/persons/e-maks). Adaptations were made by [Filip Ilievski](http://ilievski.nl).

## Part I: VADER assignments


### Preparation (nothing to submit):
To be able to answer the VADER questions you need to know how the tool works. 
* Read more about the VADER tool in [this blog](http://t-redactyl.io/blog/2017/04/using-vader-to-handle-sentiment-analysis-with-social-media-text.html).  
* VADER provides 4 scores (positive, negative, neutral, compound). Be sure to understand what they mean and how they are calculated.
* VADER uses rules to handle linguistic phenomena such as negation and intensification. Be sure to understand which rules are used, how they work, and why they are important.
* VADER makes use of a sentiment lexicon. Have a look at the lexicon. Be sure to understand which information can be found there (lemma?, wordform?, part-of-speech?, polarity value?, word meaning?) What do all scores mean? https://github.com/cjhutto/vaderSentiment/blob/master/vaderSentiment/vader_lexicon.txt) 


### [3.5 points] Question1:

Regard the following sentences and their output as given by VADER. Regard sentences 1 to 7, and explain the outcome **for each sentence**. Take into account both the rules applied by VADER and the lexicon that is used. You will find that some of the results are reasonable, but others are not. Explain what is going wrong or not when correct and incorrect results are produced. 

```
INPUT SENTENCE 1 I love apples
VADER OUTPUT {'neg': 0.0, 'neu': 0.192, 'pos': 0.808, 'compound': 0.6369}

INPUT SENTENCE 2 I don't love apples
VADER OUTPUT {'neg': 0.627, 'neu': 0.373, 'pos': 0.0, 'compound': -0.5216}

INPUT SENTENCE 3 I love apples :-)
VADER OUTPUT {'neg': 0.0, 'neu': 0.133, 'pos': 0.867, 'compound': 0.7579}

INPUT SENTENCE 4 These houses are ruins
VADER OUTPUT {'neg': 0.492, 'neu': 0.508, 'pos': 0.0, 'compound': -0.4404}

INPUT SENTENCE 5 These houses are certainly not considered ruins
VADER OUTPUT {'neg': 0.0, 'neu': 0.51, 'pos': 0.49, 'compound': 0.5867}

INPUT SENTENCE 6 He lies in the chair in the garden
VADER OUTPUT {'neg': 0.286, 'neu': 0.714, 'pos': 0.0, 'compound': -0.4215}

INPUT SENTENCE 7 This house is like any house
VADER OUTPUT {'neg': 0.0, 'neu': 0.667, 'pos': 0.333, 'compound': 0.3612}
```

#### Answer:
VADER uses a lexicon which consits of word which are sentiment-related.

Sentence 1: Here, VADER picks up the word 'love' and rates the sentence positively.

Sentence 2: Additionally, VADER picks up the word 'don't' and together with 'love' assigns the sentence negatively.

Sentence 3: VADER can also pick up emoticons which it does here and assigns it more positively than without the happy smiley face.

Sentence 4: VADER picks up the word 'ruins' which has a negative connotations because of the verb 'ruin' while in context 'ruins' is just a state of buildings.

Sentence 5: Here, VADER sees the word 'ruins' again but also the word 'not' and so combining them makes it something positive while it should mostly be neutral as it is just a statement. 

Sentence 6: Similar to sentence 4, VADER picks up the word 'lies' which has a negative connotation because of the verb 'lying' and thus assigns the sentence a bit negatively. This sentence should only be classified neutral.

Sentence 7: Similar to the last sentence, it picks up the word 'like' which has a positive connotation so it assigns the sentence positively but in this context it is just to compare to the other house and should instead be more neutral. 

### [Points: 2.5] Exercise 2: Collecting 50 tweets for evaluation
Collect 50 tweets. Try to find tweets that are interesting for sentiment analysis, e.g., very positive, neutral, and negative tweets. These could be your own tweets (typed in) or collected from the Twitter stream.

We will store the tweets in the file **my_tweets.json** (use a text editor to edit).
For each tweet, you should insert:
* sentiment analysis label: negative | neutral | positive (this you determine yourself, this is not done by a computer)
* the text of the tweet
* the Tweet-URL

from:
```
    "1": {
        "sentiment_label": "",
        "text_of_tweet": "",
        "tweet_url": "",
```
to:
```
"1": {
        "sentiment_label": "positive",
        "text_of_tweet": "All across America people chose to get involved, get engaged and stand up. Each of us can make a difference, and all of us ought to try. So go keep changing the world in 2018.",
        "tweet_url" : "https://twitter.com/BarackObama/status/946775615893655552",
    },
```

You can load your tweets with human annotation in the following way.

In [48]:
import json

In [49]:
my_tweets = json.load(open('my_tweets.json'))

In [50]:
for id_, tweet_info in my_tweets.items():
    print(id_, tweet_info)
    break

1 {'sentiment_label': 'positive', 'text_of_tweet': 'The Northern Lights, an atmospheric phenomenon rarely seen in the Netherlands, were visible over large parts of the country on Sunday night.', 'tweet_url': 'https://twitter.com/DutchNewsNL/status/1630281109274599425'}


### [5 points] Question 3:

Run VADER on your own tweets (see function **run_vader** from notebook **Lab2-Sentiment-analysis-using-VADER.ipynb**). You can use the code snippet below this explanation as a starting point. 
* [2.5 points] a. Perform a quantitative evaluation. Explain the different scores, and explain which scores are most relevant and why.
* [2.5 points] b. Perform an error analysis: select 10 positive, 10 negative and 10 neutral tweets that are not correctly classified and try to understand why. Refer to the VADER-rules and the VADER-lexicon. Of course, if there are less than 10 errors for a category, you only have to check those. For example, if there are only 5 errors for positive tweets, you just describe those.

In [58]:
#copied from Lab3.2
from nltk.sentiment import vader
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import spacy
from sklearn.metrics import classification_report

vader_model = SentimentIntensityAnalyzer()
nlp = spacy.load('en_core_web_sm') # 'en_core_web_sm'

def run_vader(textual_unit, 
              lemmatize=False,
              parts_of_speech_to_consider=set(),
              verbose=1):
   
    doc = nlp(textual_unit)
        
    input_to_vader = []

    for sent in doc.sents:
        for token in sent:

            to_add = token.text

            if lemmatize:
                to_add = token.lemma_

                if to_add == '-PRON-': 
                    to_add = token.text

            if parts_of_speech_to_consider:
                if token.pos_ in parts_of_speech_to_consider:
                    input_to_vader.append(to_add) 
            else:
                input_to_vader.append(to_add)

    scores = vader_model.polarity_scores(' '.join(input_to_vader))
    
    if verbose >= 1:
        print()
        print('INPUT SENTENCE', sent)
        print('INPUT TO VADER', input_to_vader)
        print('VADER OUTPUT', scores)

    return scores

In [59]:
def vader_output_to_label(vader_output):
    """
    map vader output e.g.,
    {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.4215}
    to one of the following values:
    a) positive float -> 'positive'
    b) 0.0 -> 'neutral'
    c) negative float -> 'negative'
    
    :param dict vader_output: output dict from vader
    
    :rtype: str
    :return: 'negative' | 'neutral' | 'positive'
    """
    compound = vader_output['compound']
    
    if compound < 0:
        return 'negative'
    elif compound == 0.0:
        return 'neutral'
    elif compound > 0.0:
        return 'positive'
    
assert vader_output_to_label( {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.0}) == 'neutral'
assert vader_output_to_label( {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': 0.01}) == 'positive'
assert vader_output_to_label( {'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound': -0.01}) == 'negative'

In [60]:
tweets = []
all_vader_output = []
gold = []

# settings (to change for different experiments)
to_lemmatize = True 
pos = set()

for id_, tweet_info in my_tweets.items():
    the_tweet = tweet_info['text_of_tweet']
    vader_output = run_vader(the_tweet, lemmatize=to_lemmatize) # run vader
    vader_label = vader_output_to_label(vader_output) # convert vader output to category
    
    tweets.append(the_tweet)
    all_vader_output.append(vader_label)
    gold.append(tweet_info['sentiment_label'])
    print('Tweet:', id_, the_tweet)
    print('Vader label:', vader_label)
    print('Gold label:', tweet_info['sentiment_label'])
    print()
    
# use scikit-learn's classification report
report = classification_report(gold, all_vader_output, digits=2)
print('a:', report)


INPUT SENTENCE The Northern Lights, an atmospheric phenomenon rarely seen in the Netherlands, were visible over large parts of the country on Sunday night.
INPUT TO VADER ['the', 'Northern', 'Lights', ',', 'an', 'atmospheric', 'phenomenon', 'rarely', 'see', 'in', 'the', 'Netherlands', ',', 'be', 'visible', 'over', 'large', 'part', 'of', 'the', 'country', 'on', 'Sunday', 'night', '.']
VADER OUTPUT {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
Tweet: 1 The Northern Lights, an atmospheric phenomenon rarely seen in the Netherlands, were visible over large parts of the country on Sunday night.
Vader label: neutral
Gold label: positive


INPUT SENTENCE actually cannot breathe from how wholesome this is, not a single bad thing happens in it & I was real scared for a minute there
INPUT TO VADER ['actually', 'can', 'not', 'breathe', 'from', 'how', 'wholesome', 'this', 'be', ',', 'not', 'a', 'single', 'bad', 'thing', 'happen', 'in', 'it', '&', 'I', 'be', 'real', 'scared', 'for', 'a', 'm


INPUT SENTENCE Myspace Tom is a SAVAGE
INPUT TO VADER ['Myspace', 'Tom', 'be', 'a', 'savage']
VADER OUTPUT {'neg': 0.5, 'neu': 0.5, 'pos': 0.0, 'compound': -0.4588}
Tweet: 35 Myspace Tom is a SAVAGE
Vader label: negative
Gold label: positive


INPUT SENTENCE To Secede From Florida
INPUT TO VADER ['Disney', 'World', 'Fortifies', 'Borders', 'with', 'Armed', 'Characters', 'as', 'Park', 'Announces', 'Plan', 'to', 'secede', 'from', 'Florida']
VADER OUTPUT {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}
Tweet: 36 Disney World Fortifies Borders With Armed Characters As Park Announces Plan To Secede From Florida
Vader label: neutral
Gold label: negative


INPUT SENTENCE What is carbon capture and how does it fight climate change?
INPUT TO VADER ['what', 'be', 'carbon', 'capture', 'and', 'how', 'do', 'it', 'fight', 'climate', 'change', '?']
VADER OUTPUT {'neg': 0.206, 'neu': 0.794, 'pos': 0.0, 'compound': -0.3818}
Tweet: 37 What is carbon capture and how does it fight climate change?
Vad

### Answer

b) **We analysed all the incorrectly classified**

**SENTENCE 1:** The Northern Lights, an atmospheric phenomenon rarely seen in the Netherlands, were visible over large parts of the country on Sunday night.
* ACTUAL(GOLD):  positive
* VADER OUTPUT:  neutral
* {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

No words are in the VADER lexicon so it classifies it as neutral. We thought ‚Äòphenomenon‚Äô would be rated positive which is why the gold label is positive. 
No difference when not lemmatizing.

**SENTENCE 2:** actually cannot breathe from how wholesome this is, not a single bad thing happens in it & I was real scared for a minute there
* ACTUAL(GOLD):  positive
* VADER OUTPUT:  negative
* {'neg': 0.113, 'neu': 0.777, 'pos': 0.111, 'compound': -0.0129}

VADER picks up the word ‚Äòscared‚Äô and ‚Äònot‚Ä¶bad‚Äô and the subpart ‚Äònot‚Ä¶bad‚Äô has a significant positive impact on the compound rating (as there is the negation of the strongly negative word ‚Äòbad‚Äô). This balances the negative words and yields a neutral compound rating.
We assessed this by removing that part of the sentence. When doing so the overall compound rating dropped heavily towards a negative rating. 
No difference when not lemmatizing.

**SENTENCE 3:** there are like 4 ppl that hacked into my netflix from different countries so I just left them a message to try and organize a movie night with all of them
* ACTUAL(GOLD):  neutral
* VADER OUTPUT:  positive when Lemmatized, otherwise negative 
* Lemmatize = true {'neg': 0.051, 'neu': 0.863, 'pos': 0.086, 'compound': 0.2551}
* Lemmatize = false {'neg': 0.089, 'neu': 0.828, 'pos': 0.083, 'compound': -0.0516}

When lemmatized is set to true, the word ‚Äúlike‚Äù is picked up by VADER. In this context the word ‚Äúlike‚Äù is not used in a positive way but as a preposition but VADER doesn‚Äôt understand that, thus the sentence is rated positively. When lemmatizing is set to false, additionally the word ‚Äúhacked‚Äù is picked up, making the sentence negative.

**SENTENCE 18:** Thousands took to Lisbon‚Äôs streets on Saturday to demand better living conditions at a time high inflation is making it even tougher for people to make ends meet. | @Reuters
* ACTUAL(GOLD):  neutral
* VADER OUTPUT:  positive
* Lemmatize {'neg': 0.096, 'neu': 0.836, 'pos': 0.068, 'compound': 0.0258}
* Not lemmatize {'neg': 0.047, 'neu': 0.81, 'pos': 0.143, 'compound': 0.4767}

This gold label is neutral as this is an objective sentence from the news. When lemmatized, ‚Äúbetter‚Äù becomes ‚Äúwell‚Äù which has a lower positive rating than ‚Äúbetter‚Äù. Additionally, ‚Äútougher‚Äù becomes ‚Äútough‚Äù which has a negative rating while ‚Äútougher‚Äù has a positive rating. Thus, VADER gives a higher compound rating when the sentence is not lemmatized (‚Äúbetter and ‚Äútougher‚Äù is kept) and rates it positively.  


**SENTENCE 20:** Last night, the northern lights were observed by residents of those countries where it is almost never seen.  An unusual phenomenon was seen by residents of Britain, Denmark, the Netherlands and the United States.Scientists explain everything by a recent solar flare.
* ACTUAL(GOLD):  neutral
* VADER OUTPUT:  positive
* {'neg': 0.0, 'neu': 0.935, 'pos': 0.065, 'compound': 0.4215}

This gold label is neutral because this is an objective sentence from the news. VADER picks up the word ‚Äúunited‚Äù as positive while in this context it is referred as a country. No difference when not lemmatized. 


**SENTENCE 22:** And she can demean the President of the United States at the SOTU speech....two faced POS üôÑ
* ACTUAL(GOLD):  negative
* VADER OUTPUT:  positive
* {'neg': 0.0, 'neu': 0.859, 'pos': 0.141, 'compound': 0.4215}

This gold label is negative because of the name-calling ‚Äútwo faced POS üôÑ‚Äù. VADER picks up ‚Äúunited‚Äù as positive and it didn‚Äôt pick up ‚ÄúPOS‚Äù or ‚ÄúüôÑ‚Äù which were the main reason for the gold label. If the emoticon was changed to ‚Äú:/‚Äù, the sentence would be classified as negative. Same goes for changing ‚ÄúPOS‚Äù to ‚Äúpiece of shit‚Äù where VADER then picks up ‚Äúshit‚Äù. No difference when not lemmatized.

**SENTENCE 27:** So i tested positive for COVID. Continued prayers please.
* ACTUAL(GOLD):  negative
* VADER OUTPUT:  positive
* {'neg': 0.0, 'neu': 0.493, 'pos': 0.507, 'compound': 0.7334}

This gold label is negative as it is about getting tested positive for a disease. VADER picks up ‚Äúpositive‚Äù and ‚Äúplease‚Äù which are both positively rated words making the overall rating positive. No difference when not lemmatized.


**SENTENCE 31:** After a Marilyn Manson accuser claimed Evan Rachel Wood "manipulated" her, the actress provided a voicemail from the accuser saying she believed the rocker's attorney wanted the accuser to ‚Äúturn on the other girls and say that it was all a ruse‚Äù
* ACTUAL(GOLD):  negative
* VADER OUTPUT:  positive
* Lemmatize {'neg': 0.0, 'neu': 0.968, 'pos': 0.032, 'compound': 0.0772}
* Not Lemmatize {'neg': 0.062, 'neu': 0.938, 'pos': 0.0, 'compound': -0.3818}

This gold label is negative because it involves manipulation and arguments. When lemmatized, ‚Äúmanipulated‚Äù becomes ‚Äúmanipulate‚Äù which is not in the lexicon. ‚ÄúWanted‚Äù becomes ‚Äúwant‚Äù which is slightly positive while ‚Äúwanted‚Äù is not in the lexicon. When not lemmatized, the VADER output is also negative as it picks up ‚Äúmanipulated‚Äù and does not pick up ‚Äúwanted‚Äù.


**SENTENCE 35:** Myspace Tom is a SAVAGE
* ACTUAL(GOLD):  positive
* VADER OUTPUT:  negative
* {'neg': 0.5, 'neu': 0.5, 'pos': 0.0, 'compound': -0.4588}

This gold label is positive because using savage to describe someone online is in a positive way. VADER picks up the word ‚Äúsavage‚Äù which is rated negatively. When lemmatized, ‚ÄúSAVAGE‚Äù becomes ‚Äúsavage‚Äù which makes the compound rating less negative because capitalisation increases the intensity of the negative word.**

**SENTENCE 36:** Disney World Fortifies Borders With Armed Characters As Park Announces Plan To Secede From Florida
* ACTUAL(GOLD):  negative
* VADER OUTPUT:  neutral
* {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

This gold label is negative because of the word ‚Äúfortifies‚Äù as it has a connection to military and the word ‚Äúsecede‚Äù which has a connection with seperation. VADER doesn‚Äôt pick up any words. No difference when not lemmatized.


**SENTENCE 37:** What is carbon capture and how does it fight climate change?
* ACTUAL(GOLD):  neutral
* VADER OUTPUT:  negative
* {'neg': 0.206, 'neu': 0.794, 'pos': 0.0, 'compound': -0.3818}

This gold label is neutral because it is a informative question. VADER picks up ‚Äúfight‚Äù which is rated negatively. No difference when not lemmatized.


**SENTENCE 48:** safety is our number one priority
* ACTUAL(GOLD):  neutral
* VADER OUTPUT:  positive
* {'neg': 0.0, 'neu': 0.494, 'pos': 0.506, 'compound': 0.4767}

This gold label is neutral because of the objective declaration. VADER picks up ‚Äúsafety‚Äù which is very positive and ‚Äúnumber‚Äù which is slightly positive rating the overall sentence positively. No difference when not lemmatized.

**SENTENCE 50:** russian boats need to be removed from the Black Sea.  Give Ukraine the ability to free the Black Sea Corridor.
* ACTUAL(GOLD):  neutral
* VADER OUTPUT:  positive
* {'neg': 0.0, 'neu': 0.763, 'pos': 0.237, 'compound': 0.6808}

This gold label is neutral because. VADER picks up ‚Äúability‚Äù and ‚Äúfree‚Äù which are both rated positively rating the overall sentence also positively. No difference when not lemmatized.

### [4 points] Question 4:
Run VADER on the set of airline tweets with the following settings:

* Run VADER (as it is) on the set of airline tweets 
* Run VADER on the set of airline tweets after having lemmatized the text
* Run VADER on the set of airline tweets with only adjectives
* Run VADER on the set of airline tweets with only adjectives and after having lemmatized the text
* Run VADER on the set of airline tweets with only nouns
* Run VADER on the set of airline tweets with only nouns and after having lemmatized the text
* Run VADER on the set of airline tweets with only verbs
* Run VADER on the set of airline tweets with only verbs and after having lemmatized the text

* [1 point] a. Generate for all separate experiments the classification report, i.e., Precision, Recall, and F<sub>1</sub> scores per category as well as micro and macro averages. **Use a different code cell (or multiple code cells) for each experiment.**
* [3 points] b. Compare the scores and explain what they tell you.
* - Does lemmatisation help? Explain why or why not.
* - Are all parts of speech equally important for sentiment analysis? Explain why or why not.

In [None]:
# Your code here
cwd = pathlib.Path.cwd()
airline_tweets_folder = cwd.joinpath('airlinetweets')
airline_tweets = load_files(str(airline_tweets_folder))

vader_labels = []
gold_labels = []

def run_vader_airline(tweets, lemmatize=False, pos=set()):
    
    for tweet, label in tweets(): ####
        the_tweet = ####
    
        vader_output = run_vader(the_tweet, lemmatize, pos)
        vader_label = vader_output_to_label(vader_output)
    
        vader_labels.append(vader_label)
        gold_labels.append(airline_tweets.target_names[label])


## Part II: scikit-learn assignments
### [4 points] Question 5
Train the scikit-learn classifier (Naive Bayes) using the airline tweets.

+ Train the model on the airline tweets with 80% training and 20% test set and default settings (TF-IDF representation, min_df=2)
+ Train with different settings:
    + with respect to vectorizing: TF-IDF ('airline_tfidf') vs. Bag of words representation ('airline_count') 
    + with respect to the frequency threshold (min_df). Carry out experiments with increasing values for document frequency (min_df = 2; min_df = 5; min_df =10) 
* [1 point] a. Generate a classification_report for all experiments
* [3 points] b. Look at the results of the experiments with the different settings and try to explain why they differ: 
    + which category performs best, is this the case for any setting?
    + does the frequency threshold affect the scores? Why or why not according to you?

In [None]:
# Your code here


### [4 points] Question 6: Inspecting the best scoring features 

+ Train the scikit-learn classifier (Naive Bayes) model with the following settings (airline tweets 80% training and 20% test;  Bag of words representation ('airline_count'), min_df=2)
* [1 point] a. Generate the list of best scoring features per class (see function **important_features_per_class** below) [1 point]
* [3 points] b. Look at the lists and consider the following issues: 
    + [1 point] Which features did you expect for each separate class and why?
    + [1 point] Which features did you not expect and why ? 
    + [1 point] The list contains all kinds of words such as names of airlines, punctuation, numbers and content words (e.g., 'delay' and 'bad'). Which words would you remove or keep when trying to improve the model and why? 

In [55]:
def important_features_per_class(vectorizer,classifier,n=80):
    class_labels = classifier.classes_
    feature_names =vectorizer.get_feature_names()
    topn_class1 = sorted(zip(classifier.feature_count_[0], feature_names),reverse=True)[:n]
    topn_class2 = sorted(zip(classifier.feature_count_[1], feature_names),reverse=True)[:n]
    topn_class3 = sorted(zip(classifier.feature_count_[2], feature_names),reverse=True)[:n]
    print("Important words in negative documents")
    for coef, feat in topn_class1:
        print(class_labels[0], coef, feat)
    print("-----------------------------------------")
    print("Important words in neutral documents")
    for coef, feat in topn_class2:
        print(class_labels[1], coef, feat) 
    print("-----------------------------------------")
    print("Important words in positive documents")
    for coef, feat in topn_class3:
        print(class_labels[2], coef, feat) 

# example of how to call from notebook:
#important_features_per_class(airline_vec, clf)

### [Optional! (will not  be graded)] Question 7
Train the model on airline tweets and test it on your own set of tweets
+ Train the model with the following settings (airline tweets 80% training and 20% test;  Bag of words representation ('airline_count'), min_df=2)
+ Apply the model on your own set of tweets and generate the classification report
* [1 point] a. Carry out a quantitative analysis.
* [1 point] b. Carry out an error analysis on 10 correctly and 10 incorrectly classified tweets and discuss them
* [2 points] c. Compare the results (cf. classification report) with the results obtained by VADER on the same tweets and discuss the differences.

### [Optional! (will not be graded)] Question 8: trying to improve the model
* [2 points] a. Think of some ways to improve the scikit-learn Naive Bayes model by playing with the settings or applying linguistic preprocessing (e.g., by filtering on part-of-speech, or removing punctuation). Do not change the classifier but continue using the Naive Bayes classifier. Explain what the effects might be of these other settings 
+ [1 point] b. Apply the model with at least one new setting (train on the airline tweets using 80% training, 20% test) and generate the scores
* [1 point] c. Discuss whether the model achieved what you expected.

## End of this notebook