# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [6]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi import NewsApiClient
load_dotenv()
%matplotlib inline


Bad key "text.kerning_factor" on line 4 in
C:\Users\annap\Anaconda3\envs\pyvizenv\lib\site-packages\matplotlib\mpl-data\stylelib\_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution


In [7]:
# Read your api key environment variable
api_key = os.getenv("NEWS_API_KEY")


In [8]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=api_key)


In [9]:
# Fetch the Bitcoin news articles
bitcoin_news = newsapi.get_everything(
    q="bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

bitcoin_news["articles"][10]

{'source': {'id': 'mashable', 'name': 'Mashable'},
 'author': 'Stan Schroeder',
 'title': 'Maisie Williams asks about Bitcoin. Memefest ensues, and Elon Musk joins the party.',
 'description': 'Maisie Williams – also known as Arya Stark, daughter of Lady Catelyn and Lord Eddard, acolyte of the Faceless Men, you know the one – is considering buying some bitcoins.\xa0\nOn Monday, she asked her Twitter followers whether she should "go long on bitcoin" (goi…',
 'url': 'https://mashable.com/article/maisie-williams-elon-musk-bitcoin/',
 'urlToImage': 'https://mondrian.mashable.com/2020%252F11%252F17%252F34%252F1c2c71334a1d4432989b9b2086e24340.ca47d.jpg%252F1200x630.jpg?signature=HiGyzLYrPuetz6-sUkoLXncJHNY=',
 'publishedAt': '2020-11-17T08:26:55Z',
 'content': 'Maisie Williams also known as Arya Stark, daughter of Lady Catelyn and Lord Eddard, acolyte of the Faceless Men, you know the one is considering buying some bitcoins.\xa0\r\nOn Monday, she asked her Twitt… [+1512 chars]'}

In [10]:
# Fetch the Ethereum news articles
ethereum_news = newsapi.get_everything(
    q="ethereum",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

ethereum_news["articles"][10]

{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
 'author': 'Mike Butcher',
 'title': 'As Crypto comes back, Binance-backed Injective Protocol launches Testnet for its DeFi trading platform',
 'description': 'Decentralized exchange protocols that allow crypto traders and investors to trade across different blockсhains have been in development for a while. A significant new development now comes with the launch of the ‘Testnet’ from Injective Protocol. Injective ha…',
 'url': 'http://techcrunch.com/2020/12/03/as-crypto-comes-back-binance-backed-injective-protocol-launches-testnet-for-its-defi-trading-platform/',
 'urlToImage': 'https://techcrunch.com/wp-content/uploads/2020/12/2560px-Decentralization_diagram.svg_.png?w=711',
 'publishedAt': '2020-12-03T18:18:48Z',
 'content': 'Decentralized exchange protocols that allow crypto traders and investors to trade across different blockhains have been in development for a while. A significant new development now comes with the la… [+3079

In [11]:
# Create the Bitcoin sentiment scores DataFrame
bitcoin_sentiments = []

for article in bitcoin_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        bitcoin_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

cols = ["text", "compound", "positive", "negative", "neutral"]
bitcoin_df = bitcoin_df[cols]

bitcoin_df.head()

Unnamed: 0,text,compound,positive,negative,neutral
0,A former Microsoft software engineer from Ukra...,-0.6705,0.064,0.199,0.737
1,Visa has partnered with cryptocurrency startup...,0.6369,0.162,0.0,0.838
2,PayPal is bringing its newly-announced support...,0.2144,0.053,0.0,0.947
3,"In November 2017, after an absolutely massive,...",0.2023,0.05,0.0,0.95
4,"Unlike ‘conventional’ cryptocurrencies, a cent...",0.0,0.0,0.0,1.0


In [12]:
# Create the ethereum sentiment scores DataFrame
ethereum_sentiments = []

for article in ethereum_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        ethereum_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
ethereum_df = pd.DataFrame(ethereum_sentiments)

cols = ["text", "compound", "positive", "negative", "neutral"]
ethereum_df = ethereum_df[cols]

ethereum_df.head()

Unnamed: 0,text,compound,positive,negative,neutral
0,PayPal is bringing its newly-announced support...,0.2144,0.053,0.0,0.947
1,FILE PHOTO: Representation of the Ethereum vir...,0.0,0.0,0.0,1.0
2,FILE PHOTO: Representation of the Ethereum vir...,0.0,0.0,0.0,1.0
3,LONDON (Reuters) - Digital currencies Ethereum...,0.4215,0.088,0.0,0.912
4,NEW YORK (Reuters) - Institutional investors p...,0.1779,0.052,0.0,0.948


In [13]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,98.0,98.0,98.0,98.0
mean,0.151223,0.05699,0.020847,0.922173
std,0.338619,0.065976,0.053264,0.083086
min,-0.9468,0.0,0.0,0.637
25%,0.0,0.0,0.0,0.858
50%,0.0,0.05,0.0,0.948
75%,0.4166,0.12,0.0,1.0
max,0.8779,0.318,0.363,1.0


In [14]:
# Describe the Ethereum Sentiment
ethereum_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,97.0,97.0,97.0,97.0
mean,0.223294,0.075629,0.021722,0.90266
std,0.360931,0.077265,0.045489,0.087585
min,-0.6705,0.0,0.0,0.653
25%,0.0,0.0,0.0,0.849
50%,0.2144,0.074,0.0,0.912
75%,0.5106,0.132,0.0,1.0
max,0.8834,0.347,0.196,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [15]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re



In [16]:
lemmatizer = WordNetLemmatizer()

In [17]:
# Expand the default stopwords list if necessary
sw_addon = {'yes', 'no'}

In [60]:
# Complete the tokenizer function
def tokenizer(text):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z]")
    re_clean = regex.sub('', text)
    words = word_tokenize(re_clean)
    lemmatizer = WordNetLemmatizer()
    lem = [lemmatizer.lemmatize(word) for word in words]
    tokens = [word.lower() for word in lem if word.lower() not in sw.union(sw_addon)]
    
    return tokens

In [61]:
# Create a new tokens column for bitcoin
bitcoin_df["tokens"] = bitcoin_df.apply(lambda x:
                                       tokenizer(x.text), axis=1
                                       )
bitcoin_df.head()

Unnamed: 0,text,compound,positive,negative,neutral,tokens
0,A former Microsoft software engineer from Ukra...,-0.6705,0.064,0.199,0.737,[aformermicrosoftsoftwareengineerfromukraineha...
1,Visa has partnered with cryptocurrency startup...,0.6369,0.162,0.0,0.838,[visahaspartneredwithcryptocurrencystartupbloc...
2,PayPal is bringing its newly-announced support...,0.2144,0.053,0.0,0.947,[paypalisbringingitsnewlyannouncedsupportforcr...
3,"In November 2017, after an absolutely massive,...",0.2023,0.05,0.0,0.95,[innovemberafteranabsolutelymassivetwomonthral...
4,"Unlike ‘conventional’ cryptocurrencies, a cent...",0.0,0.0,0.0,1.0,[unlikeconventionalcryptocurrenciesacentralban...


In [74]:
# Create a new tokens column for ethereum
ethereum_df["tokens"] = bitcoin_df.apply(lambda x:
                                       tokenizer(x.text), axis=1
                                       )
ethereum_df.head()

Unnamed: 0,text,compound,positive,negative,neutral,tokens
0,PayPal is bringing its newly-announced support...,0.2144,0.053,0.0,0.947,[aformermicrosoftsoftwareengineerfromukraineha...
1,FILE PHOTO: Representation of the Ethereum vir...,0.0,0.0,0.0,1.0,[visahaspartneredwithcryptocurrencystartupbloc...
2,FILE PHOTO: Representation of the Ethereum vir...,0.0,0.0,0.0,1.0,[paypalisbringingitsnewlyannouncedsupportforcr...
3,LONDON (Reuters) - Digital currencies Ethereum...,0.4215,0.088,0.0,0.912,[innovemberafteranabsolutelymassivetwomonthral...
4,NEW YORK (Reuters) - Institutional investors p...,0.1779,0.052,0.0,0.948,[unlikeconventionalcryptocurrenciesacentralban...


---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [20]:
from collections import Counter
from nltk import ngrams

In [33]:
from nltk.corpus import reuters, stopwords
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import re

# Code to download corpora
import nltk
nltk.download('reuters')
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package reuters to
[nltk_data]     C:\Users\annap\AppData\Roaming\nltk_data...
[nltk_data]   Package reuters is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\annap\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\annap\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\annap\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [75]:
eth_article = ethereum_news["articles"][0]['content']
print(eth_article)

PayPal is bringing its newly-announced support for cryptocurrency to all US accounts. It first announced plans to open cryptocurrency trading to US-based users in October, but until now it was only a… [+589 chars]


In [65]:
btc_article = bitcoin_news["articles"][0]['content']
print(btc_article)

A former Microsoft software engineer from Ukraine has been sentenced to nine years in prison for stealing more than $10 million in store credit from Microsoft's online store. From 2016 to 2018, Volod… [+3307 chars]


In [66]:
def process_text(doc):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', doc)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

In [76]:
processed_eth = process_text(eth_article)
print(processed_eth)

['paypal', 'bringing', 'newlyannounced', 'support', 'cryptocurrency', 'us', 'account', 'first', 'announced', 'plan', 'open', 'cryptocurrency', 'trading', 'usbased', 'user', 'october', 'wa', 'char']


In [73]:
processed_btc = process_text(btc_article)
print(processed_btc)

['former', 'microsoft', 'software', 'engineer', 'ukraine', 'ha', 'sentenced', 'nine', 'year', 'prison', 'stealing', 'million', 'store', 'credit', 'microsofts', 'online', 'store', 'volod', 'char']


In [68]:
from collections import Counter

In [77]:
word_counts = Counter(processed_btc)
print(dict(word_counts))

{'former': 1, 'microsoft': 1, 'software': 1, 'engineer': 1, 'ukraine': 1, 'ha': 1, 'sentenced': 1, 'nine': 1, 'year': 1, 'prison': 1, 'stealing': 1, 'million': 1, 'store': 2, 'credit': 1, 'microsofts': 1, 'online': 1, 'volod': 1, 'char': 1}


In [78]:
word_counts = Counter(processed_eth)
print(dict(word_counts))

{'paypal': 1, 'bringing': 1, 'newlyannounced': 1, 'support': 1, 'cryptocurrency': 2, 'us': 1, 'account': 1, 'first': 1, 'announced': 1, 'plan': 1, 'open': 1, 'trading': 1, 'usbased': 1, 'user': 1, 'october': 1, 'wa': 1, 'char': 1}


In [82]:
bigram_btc = Counter(ngrams(processed_eth, n=2))
print(dict(bigram_counts))

{('paypal', 'bringing'): 1, ('bringing', 'newlyannounced'): 1, ('newlyannounced', 'support'): 1, ('support', 'cryptocurrency'): 1, ('cryptocurrency', 'us'): 1, ('us', 'account'): 1, ('account', 'first'): 1, ('first', 'announced'): 1, ('announced', 'plan'): 1, ('plan', 'open'): 1, ('open', 'cryptocurrency'): 1, ('cryptocurrency', 'trading'): 1, ('trading', 'usbased'): 1, ('usbased', 'user'): 1, ('user', 'october'): 1, ('october', 'wa'): 1, ('wa', 'char'): 1}


In [83]:
# Generate the Ethereum N-grams where N=2
bigram_eth = Counter(ngrams(processed_btc, n=2))
print(dict(bigram_counts))

{('paypal', 'bringing'): 1, ('bringing', 'newlyannounced'): 1, ('newlyannounced', 'support'): 1, ('support', 'cryptocurrency'): 1, ('cryptocurrency', 'us'): 1, ('us', 'account'): 1, ('account', 'first'): 1, ('first', 'announced'): 1, ('announced', 'plan'): 1, ('plan', 'open'): 1, ('open', 'cryptocurrency'): 1, ('cryptocurrency', 'trading'): 1, ('trading', 'usbased'): 1, ('usbased', 'user'): 1, ('user', 'october'): 1, ('october', 'wa'): 1, ('wa', 'char'): 1}


In [71]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [90]:
# Get the top 10 words for Bitcoin
# Get the top 10 words for Ethereum
btc_10 = dict(bigram_btc.most_common(10))
print(btc_10)

{('paypal', 'bringing'): 1, ('bringing', 'newlyannounced'): 1, ('newlyannounced', 'support'): 1, ('support', 'cryptocurrency'): 1, ('cryptocurrency', 'us'): 1, ('us', 'account'): 1, ('account', 'first'): 1, ('first', 'announced'): 1, ('announced', 'plan'): 1, ('plan', 'open'): 1}


In [89]:
# Get the top 10 words for Ethereum
eth_10 = dict(bigram_eth.most_common(10))
print(eth_10)

{('former', 'microsoft'): 1, ('microsoft', 'software'): 1, ('software', 'engineer'): 1, ('engineer', 'ukraine'): 1, ('ukraine', 'ha'): 1, ('ha', 'sentenced'): 1, ('sentenced', 'nine'): 1, ('nine', 'year'): 1, ('year', 'prison'): 1, ('prison', 'stealing'): 1}


# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [87]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [96]:
def wordcloud(text, title=""):
    df_cloud = WordCloud(width=500).generate(text)
    plt.imshow(df_cloud)
    plt.axis("off")
    Fontdict = {"fontsize": 48, "fontweught" : "bold"}
    plt.title(title, fontdict=fontdict)
    plt.show()

In [98]:
wordcloud(eth_10.text.str.cat(), title="Bitcoin Word Cloud")

AttributeError: 'dict' object has no attribute 'text'

In [99]:
# Generate the Bitcoin word cloud
wc = WordCloud().generate(btc_10)
plt.imshow(wc)

TypeError: expected string or bytes-like object

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [100]:
import spacy
from spacy import displacy

In [None]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [101]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!