# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In /opt/anaconda3/envs/mlenv/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The savefig.frameon rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
In /opt/anaconda3/envs/mlenv/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The verbose.level rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.
In /opt/anaconda3/envs/mlenv/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: 
The verbose.fileo rcparam was deprecated in Matplotlib 3.1 and will be removed in 3.3.


In [2]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv("NEWS_API_KEY2")

In [3]:
# Create a newsapi client
from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

In [4]:
# Fetch the Bitcoin news articles
bitcoin_news = newsapi.get_everything(q="bitcoin", language="en")
bitcoin_articles = bitcoin_news["articles"]

In [5]:
# Fetch the Ethereum news articles
ethereum_news = newsapi.get_everything(q="ethereum", language="en")
ethereum_articles = ethereum_news["articles"]

In [6]:
# Create the Bitcoin sentiment scores DataFrame
bitcoin_content=[]
counter=0
for article in bitcoin_articles:
    bitcoin_content.append(bitcoin_articles[counter]["content"])
    counter+=1

bitcoin_content_df = pd.DataFrame({
    "text": bitcoin_content
})


# Create dictionary to hold sentiment scores
bitcoin_sentiment = {
    "Compound": [],
    "Negative": [],
    "Neutral": [],
    "Positive": []
}

# Perform sentiment analysis on Bitcoin articles
for row in bitcoin_content_df.iterrows():
    bitcoin_analysis = analyzer.polarity_scores(row[1]["text"])
    bitcoin_sentiment["Compound"].append(bitcoin_analysis["compound"])
    bitcoin_sentiment["Negative"].append(bitcoin_analysis["neg"])
    bitcoin_sentiment["Neutral"].append(bitcoin_analysis["neu"])
    bitcoin_sentiment["Positive"].append(bitcoin_analysis["pos"])

# Joining sentiment analysis 
bitcoin_analysis_df = pd.DataFrame.from_dict(bitcoin_sentiment)
bitcoin_df = pd.concat([bitcoin_analysis_df, bitcoin_content_df], axis=1)  
bitcoin_df.head()


Unnamed: 0,Compound,Negative,Neutral,Positive,text
0,0.0772,0.0,0.961,0.039,Whether youre looking to make a larger investm...
1,0.5859,0.0,0.873,0.127,"As it promised earlier this year, Tesla now ac..."
2,0.3182,0.0,0.935,0.065,The inevitable has happened: You can now purch...
3,0.2023,0.0,0.95,0.05,Tesla made headlines earlier this year when it...
4,0.6075,0.102,0.719,0.178,National Burrito Day lands on April Fools Day ...


In [7]:
# Create the ethereum sentiment scores DataFrame
ethereum_content=[]
counter=0
for article in ethereum_articles:
    ethereum_content.append(ethereum_articles[counter]["content"])
    counter+=1

ethereum_content_df = pd.DataFrame({
    "text" : ethereum_content
})

# Create dictionary to hold sentiment scores
ethereum_sentiment = {
    "Compound": [],
    "Negative": [],
    "Neutral": [],
    "Positive": []
}

# Perform sentiment analysis on Ethereum articles
for row in ethereum_content_df.iterrows():
    ethereum_analysis = analyzer.polarity_scores(row[1]["text"])
    ethereum_sentiment["Compound"].append(ethereum_analysis["compound"])
    ethereum_sentiment["Negative"].append(ethereum_analysis["neg"])
    ethereum_sentiment["Neutral"].append(ethereum_analysis["neu"])
    ethereum_sentiment["Positive"].append(ethereum_analysis["pos"])

ethereum_analysis_df = pd.DataFrame(ethereum_sentiment)
ethereum_df = pd.concat([ethereum_analysis_df,ethereum_content_df], axis=1)
ethereum_df.head()

Unnamed: 0,Compound,Negative,Neutral,Positive,text
0,-0.5574,0.11,0.89,0.0,One of the strictest crackdowns worldwide\r\nP...
1,0.0772,0.0,0.961,0.039,Whether youre looking to make a larger investm...
2,0.0,0.0,1.0,0.0,Famed auction house Christies just sold its fi...
3,0.0,0.0,1.0,0.0,Payment card network Visa has announced that t...
4,0.565,0.093,0.735,0.172,The NFT craze has been an intriguing moment fo...


In [8]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,20.0,20.0,20.0,20.0
mean,0.20819,0.02825,0.89805,0.0737
std,0.366353,0.047327,0.082335,0.067464
min,-0.5574,0.0,0.709,0.0
25%,0.0,0.0,0.86325,0.02925
50%,0.3182,0.0,0.925,0.066
75%,0.36635,0.0545,0.94025,0.0855
max,0.7717,0.142,1.0,0.24


In [9]:
# Describe the Ethereum Sentiment
ethereum_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,20.0,20.0,20.0,20.0
mean,0.020075,0.03985,0.91755,0.04265
std,0.321552,0.048404,0.077935,0.050402
min,-0.5574,0.0,0.735,0.0
25%,-0.25,0.0,0.9095,0.0
50%,0.0,0.0,0.934,0.0195
75%,0.307,0.08075,0.97075,0.07775
max,0.565,0.145,1.0,0.172


### Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin

Q: Which coin had the highest compound score?

A: Bitcoin

Q. Which coin had the highest positive score?

A: Bitcoin

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [53]:
# Expand the default stopwords list if necessary
sw_addon = {'youre', 'whether', 'said', 'also', 'like', 'seen', 'chars', 'arent', 'according'}

In [54]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    sw = set(stopwords.words('english'))
    lemmatizer = WordNetLemmatizer()
    regex = re.compile("[^a-zA-Z ]")
    # Remove punctuation
    content_clean = [regex.sub('', text)]
    # Tokenize words
    word_tokenized = [word_tokenize(i) for i in content_clean]
    # Convert the words to lowercase and remove stop words
    lower_tokenized = [word.lower() for words in word_tokenized for word in words if word.lower() not in sw.union(sw_addon)]
    print(lower_tokenized)
    # Lemmatize Words into root words
    tokens = [lemmatizer.lemmatize(word) for word in lower_tokenized]
    return tokens

In [55]:
# Create a new tokens column for bitcoin
bitcoin_tokens = [tokenizer(article) for article in bitcoin_content]
bitcoin_df["tokens"] = bitcoin_tokens

['looking', 'make', 'larger', 'investment', 'want', 'dabble', 'cryptocurrencies', 'purchase', 'bitcoin', 'ethereum', 'bitcoin', 'cash', 'litecoin', 'paypal', 'soon', 'youll', 'ab']
['promised', 'earlier', 'year', 'tesla', 'accepts', 'payment', 'bitcoin', 'teslas', 'website', 'tweet', 'ceo', 'elon', 'musk', 'subsequent', 'tweet', 'musk', 'bitcoin', 'paid', 'tesla']
['inevitable', 'happened', 'purchase', 'tesla', 'vehicle', 'bitcointhis', 'tesla', 'ceo', 'pardon', 'technoking', 'elon', 'musk', 'tweeted', 'wednesdayyou', 'buy', 'tesla']
['tesla', 'made', 'headlines', 'earlier', 'year', 'took', 'significant', 'holdings', 'bitcoin', 'acquiring', 'roughly', 'billion', 'stake', 'thenprices', 'early', 'february', 'time', 'noted', 'sec']
['national', 'burrito', 'day', 'lands', 'april', 'fools', 'day', 'year', 'thankfully', 'restaurants', 'playing', 'around', 'deals', 'starting', 'tomorrow', 'restaurants', 'pollo', 'loco', 'moes', 'southwest', 'grill']
['one', 'strictest', 'crackdowns', 'worldwi

In [56]:
# Create a new tokens column for ethereum
ethereum_tokens = [tokenizer(article) for article in ethereum_content]
ethereum_df["tokens"] = ethereum_tokens

['one', 'strictest', 'crackdowns', 'worldwidephoto', 'michele', 'doying', 'vergeindia', 'reportedly', 'moving', 'forward', 'sweeping', 'ban', 'cryptocurrencies', 'reuters', 'countrys', 'legislat']
['looking', 'make', 'larger', 'investment', 'want', 'dabble', 'cryptocurrencies', 'purchase', 'bitcoin', 'ethereum', 'bitcoin', 'cash', 'litecoin', 'paypal', 'soon', 'youll', 'ab']
['famed', 'auction', 'house', 'christies', 'sold', 'first', 'purely', 'digital', 'piece', 'art', 'whopping', 'million', 'price', 'buyer', 'got', 'digital', 'file', 'collage', 'images', 'complex', 'legac']
['payment', 'card', 'network', 'visa', 'announced', 'transactions', 'settled', 'using', 'usd', 'coin', 'usdc', 'stablecoin', 'powered', 'ethereum', 'blockchain', 'cryptocom', 'first', 'company', 'test', 'new', 'capabi']
['nft', 'craze', 'intriguing', 'moment', 'digital', 'artists', 'great', 'leaps', 'tech', 'allowed', 'create', 'work', 'much', 'progress', 'shifting', 'profit']
['move', 'fast', 'break', 'things', '

---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [57]:
from collections import Counter
from nltk import ngrams

In [58]:
# Generate the Bitcoin N-grams where N=2
bitcoin_words = []
for word in bitcoin_tokens:
    bitcoin_words.extend(word)
bitcoin_ngrams = Counter(ngrams(bitcoin_words, n=2))
dict(bitcoin_ngrams)

{('looking', 'make'): 1,
 ('make', 'larger'): 1,
 ('larger', 'investment'): 1,
 ('investment', 'want'): 1,
 ('want', 'dabble'): 1,
 ('dabble', 'cryptocurrencies'): 1,
 ('cryptocurrencies', 'purchase'): 1,
 ('purchase', 'bitcoin'): 1,
 ('bitcoin', 'ethereum'): 1,
 ('ethereum', 'bitcoin'): 1,
 ('bitcoin', 'cash'): 1,
 ('cash', 'litecoin'): 1,
 ('litecoin', 'paypal'): 1,
 ('paypal', 'soon'): 1,
 ('soon', 'youll'): 1,
 ('youll', 'ab'): 1,
 ('ab', 'promised'): 1,
 ('promised', 'earlier'): 1,
 ('earlier', 'year'): 2,
 ('year', 'tesla'): 1,
 ('tesla', 'accepts'): 1,
 ('accepts', 'payment'): 1,
 ('payment', 'bitcoin'): 1,
 ('bitcoin', 'tesla'): 1,
 ('tesla', 'website'): 1,
 ('website', 'tweet'): 1,
 ('tweet', 'ceo'): 1,
 ('ceo', 'elon'): 1,
 ('elon', 'musk'): 2,
 ('musk', 'subsequent'): 1,
 ('subsequent', 'tweet'): 1,
 ('tweet', 'musk'): 1,
 ('musk', 'bitcoin'): 1,
 ('bitcoin', 'paid'): 1,
 ('paid', 'tesla'): 1,
 ('tesla', 'inevitable'): 1,
 ('inevitable', 'happened'): 1,
 ('happened', 'purcha

In [59]:
# Generate the Ethereum N-grams where N=2
ethereum_words = []
for word in ethereum_tokens:
    ethereum_words.extend(word)
ethereum_ngrams = Counter(ngrams(ethereum_words, n=2))
dict(ethereum_ngrams)

{('one', 'strictest'): 1,
 ('strictest', 'crackdown'): 1,
 ('crackdown', 'worldwidephoto'): 1,
 ('worldwidephoto', 'michele'): 1,
 ('michele', 'doying'): 1,
 ('doying', 'vergeindia'): 1,
 ('vergeindia', 'reportedly'): 1,
 ('reportedly', 'moving'): 1,
 ('moving', 'forward'): 1,
 ('forward', 'sweeping'): 1,
 ('sweeping', 'ban'): 1,
 ('ban', 'cryptocurrencies'): 1,
 ('cryptocurrencies', 'reuters'): 1,
 ('reuters', 'country'): 1,
 ('country', 'legislat'): 1,
 ('legislat', 'looking'): 1,
 ('looking', 'make'): 1,
 ('make', 'larger'): 1,
 ('larger', 'investment'): 1,
 ('investment', 'want'): 1,
 ('want', 'dabble'): 1,
 ('dabble', 'cryptocurrencies'): 1,
 ('cryptocurrencies', 'purchase'): 1,
 ('purchase', 'bitcoin'): 1,
 ('bitcoin', 'ethereum'): 1,
 ('ethereum', 'bitcoin'): 1,
 ('bitcoin', 'cash'): 1,
 ('cash', 'litecoin'): 1,
 ('litecoin', 'paypal'): 1,
 ('paypal', 'soon'): 1,
 ('soon', 'youll'): 1,
 ('youll', 'ab'): 1,
 ('ab', 'famed'): 1,
 ('famed', 'auction'): 1,
 ('auction', 'house'): 1,


In [60]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [61]:
# Get the top 10 words for Bitcoin
bitcoin_top_words = token_count(bitcoin_words, N=10)
bitcoin_top_words

[('bitcoin', 14),
 ('tesla', 7),
 ('year', 5),
 ('cryptocurrencies', 4),
 ('reuters', 4),
 ('new', 4),
 ('paypal', 3),
 ('earlier', 3),
 ('musk', 3),
 ('crypto', 3)]

In [63]:
# Get the top 10 words for Ethereum
ethereum_top_words = token_count(ethereum_words, N=10)
ethereum_top_words

[('cryptocurrency', 5),
 ('reuters', 4),
 ('ethereum', 4),
 ('new', 4),
 ('one', 3),
 ('cryptocurrencies', 3),
 ('country', 3),
 ('bitcoin', 3),
 ('digital', 3),
 ('million', 3)]

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [64]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [69]:
# Generate the Bitcoin word cloud
wc = WordCloud()
bitcoin_string = ' '.join(bitcoin_words)
# bitcoin_cloud = process_text(bitcoin_string)
bitcoin_wordcloud = wc.generate(bitcoin_string)
plt.imshow(bitcoin_wordcloud)

NameError: name 'img' is not defined

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [27]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [31]:
# Concatenate all of the ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!