# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In [2]:
# Read your api key environment variable
# YOUR CODE HERE!
load_dotenv()

apinews_key_read = os.getenv("NEWSAPI_KEY")
type(apinews_key_read)

str

In [3]:
# Create a newsapi client
from newsapi.newsapi_client import NewsApiClient
newsapi = NewsApiClient(api_key=apinews_key_read)

In [4]:
# Fetch the Bitcoin news articles
fetch_bitcoin_news = newsapi.get_everything(
    q="Bitcoin",
    language="en",
    sort_by="relevancy"
)
fetch_bitcoin_news["totalResults"]

4419

In [5]:
# Fetch the Ethereum news articles
fetch_ethereum_news= newsapi.get_everything(
    q="Ethereum",
    language="en",
    sort_by="relevancy"
)
fetch_ethereum_news["totalResults"]

1217

In [6]:
# Create the Bitcoin sentiment scores DataFrame
import nltk
nltk.download('vader_lexicon') # update Vader Lexicon

# Initialize VADER sentiment analyzer
analyzer= SentimentIntensityAnalyzer()
bitcoin_sentiments = []



[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\Trader\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [7]:
for article in fetch_bitcoin_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]

        bitcoin_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
        })
    except AttributeError:
        pass

In [8]:
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

#Reorder columns
cols = ["date","text", "compound", "positive","negative", "neutral"]
bitcoin_df = bitcoin_df[cols]
bitcoin_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2020-11-12,A former Microsoft software engineer from Ukra...,-0.6705,0.064,0.199,0.737
1,2020-12-03,Visa has partnered with cryptocurrency startup...,0.6369,0.162,0.0,0.838
2,2020-11-12,PayPal is bringing its newly-announced support...,0.2144,0.053,0.0,0.947
3,2020-11-20,"In November 2017, after an absolutely massive,...",0.2023,0.05,0.0,0.95
4,2020-12-06,"Unlike ‘conventional’ cryptocurrencies, a cent...",0.0,0.0,0.0,1.0


In [9]:
bitcoin_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,19.0,19.0,19.0,19.0
mean,0.119611,0.060526,0.029,0.910421
std,0.391383,0.065557,0.066381,0.09657
min,-0.6705,0.0,0.0,0.716
25%,0.0,0.0,0.0,0.8335
50%,0.0,0.05,0.0,0.947
75%,0.4117,0.1025,0.0,1.0
max,0.765,0.174,0.215,1.0


In [10]:
articles_bitcoin_df = pd.DataFrame.from_dict(fetch_bitcoin_news["articles"])
articles_bitcoin_df.head()

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'wired', 'name': 'Wired'}","Timothy B. Lee, Ars Technica",An Engineer Gets 9 Years for Stealing $10M Fro...,The defendant tried—and failed—to use bitcoin ...,https://www.wired.com/story/an-engineer-gets-9...,https://media.wired.com/photos/5fac6afb446b463...,2020-11-12T14:00:00Z,A former Microsoft software engineer from Ukra...
1,"{'id': None, 'name': 'Lifehacker.com'}","Mike Winters on Two Cents, shared by Mike Wint...",Is the New Visa Bitcoin Rewards Card Worth It?,Visa has partnered with cryptocurrency startup...,https://twocents.lifehacker.com/is-the-new-vis...,https://i.kinja-img.com/gawker-media/image/upl...,2020-12-03T22:00:00Z,Visa has partnered with cryptocurrency startup...
2,"{'id': 'engadget', 'name': 'Engadget'}",Karissa Bell,"PayPal now lets all US users buy, sell and hol...",PayPal is bringing its newly-announced support...,https://www.engadget.com/paypal-opens-cryptocu...,https://o.aolcdn.com/images/dims?resize=1200%2...,2020-11-12T21:05:41Z,PayPal is bringing its newly-announced support...
3,"{'id': 'mashable', 'name': 'Mashable'}",Stan Schroeder,"Bitcoin is flirting with $20,000 again. How hi...","In November 2017, after an absolutely massive,...",https://mashable.com/article/bitcoin-20000/,https://mondrian.mashable.com/2020%252F11%252F...,2020-11-20T20:02:17Z,"In November 2017, after an absolutely massive,..."
4,"{'id': 'engadget', 'name': 'Engadget'}",Jon Fingas,You can now spend China's digital currency at ...,China’s official digital currency is now usabl...,https://www.engadget.com/jd-com-supports-china...,https://o.aolcdn.com/images/dims?resize=1200%2...,2020-12-06T22:37:18Z,"Unlike ‘conventional’ cryptocurrencies, a cent..."


In [11]:
# Create the ethereum sentiment scores DataFrame
# YOUR CODE HERE!
ethereum_sentiments = []
for article in fetch_ethereum_news["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]

        ethereum_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
        })
    except AttributeError:
        pass

ethe_df = pd.DataFrame(ethereum_sentiments)

#Reorder columns
cols = ["date","text", "compound", "positive","negative", "neutral"]
ethe_df = ethe_df[cols]
ethe_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2020-11-12,PayPal is bringing its newly-announced support...,0.2144,0.053,0.0,0.947
1,2020-11-23,FILE PHOTO: Representation of the Ethereum vir...,0.0,0.0,0.0,1.0
2,2020-11-23,FILE PHOTO: Representation of the Ethereum vir...,0.0,0.0,0.0,1.0
3,2020-11-23,LONDON (Reuters) - Digital currencies Ethereum...,0.4215,0.088,0.0,0.912
4,2020-11-19,"PayPal has launched the Generosity Network, a ...",0.8779,0.318,0.0,0.682


In [12]:
ethe_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.153245,0.0643,0.02105,0.91465
std,0.339247,0.07896,0.054473,0.104806
min,-0.4939,0.0,0.0,0.672
25%,0.0,0.0,0.0,0.876
50%,0.0,0.0615,0.0,0.932
75%,0.430825,0.09525,0.0,1.0
max,0.8779,0.318,0.196,1.0


In [13]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!

In [14]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!

### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [15]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [16]:
# Expand the default stopwords list if necessary
""" This segment is to test the code with one article"""
nltk.download('stopwords')
article_bitcoin = articles_bitcoin_df['content'][1]
#word_tokenize(article)
article_bitcoin

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Trader\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


'Visa\xa0has partnered with cryptocurrency startup BlockFi to offer the first rewards credit card that pays out in Bitcoin rather than cash, but is it worth applying for? Unless youre extremely bullish o… [+2239 chars]'

In [17]:
token_article_bitcoin = word_tokenize(article_bitcoin)
#token_article_bitcoin

In [18]:
sw = set(stopwords.words('english'))
first_result = [token.lower() for token in token_article_bitcoin if token.lower() not in sw] 
print(first_result)

['visa', 'partnered', 'cryptocurrency', 'startup', 'blockfi', 'offer', 'first', 'rewards', 'credit', 'card', 'pays', 'bitcoin', 'rather', 'cash', ',', 'worth', 'applying', '?', 'unless', 'youre', 'extremely', 'bullish', 'o…', '[', '+2239', 'chars', ']']


In [27]:
# 1) getting rid of Non-alpha characters: Regex
#sentence_bit = sent_tokenize(article_bitcoin)
regex = re.compile("[^a-zA-Z ]")
re_bit_clean = regex.sub('',article_bitcoin)
print(re_bit_clean)
# then 2) remove stopwords from regex_clean sentence

token_article_from_string = word_tokenize(re_bit_clean)
stopword_removed_bit = [token.lower() for token in token_article_from_string if token.lower() not in sw]
print(stopword_removed_bit)

PayPal is bringing its newlyannounced support for cryptocurrency to all US accounts It first announced plans to open cryptocurrency trading to USbased users in October but until now it was only a  chars
['paypal', 'bringing', 'newlyannounced', 'support', 'cryptocurrency', 'us', 'accounts', 'first', 'announced', 'plans', 'open', 'cryptocurrency', 'trading', 'usbased', 'users', 'october', 'chars']


In [28]:
# then 3) lemmatize the words
lemmatizer = WordNetLemmatizer() # instanciate the lemmatizer
final_root = [lemmatizer.lemmatize(word) for word in stopword_removed_bit ]
print(final_root)

['paypal', 'bringing', 'newlyannounced', 'support', 'cryptocurrency', 'u', 'account', 'first', 'announced', 'plan', 'open', 'cryptocurrency', 'trading', 'usbased', 'user', 'october', 'char']


In [30]:
new_clean = [regex.sub('',word) for word in article_bitcoin ]
"""for word in article_bitcoin:
    reg.sub('', word)"""




"for word in article_bitcoin:\n    reg.sub('', word)"

In [21]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words
    tokens = []
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    # Convert the words to lowercase
    #sentence = sent_tokenize(text)
    # Remove the punctuation
    re_clean = regex.sub('',text)
    token_article = word_tokenize(re_clean)
    
    # Remove the stop words
    stopword_removed = [token.lower() for token in token_article if token.lower() not in sw]
    
    # Lemmatize Words into root words
    tokens = [lemmatizer.lemmatize(word) for word in stopword_removed ]

    return tokens


In [31]:
#test function
article_bitcoin = articles_bitcoin_df['content'][4]
token_list = tokenizer(article_bitcoin)
token_list

['unlike',
 'conventional',
 'cryptocurrencies',
 'central',
 'bank',
 'control',
 'digital',
 'yuan',
 'case',
 'people',
 'bank',
 'china',
 'move',
 'give',
 'country',
 'power',
 'theory',
 'stability',
 'freq',
 'char']

In [23]:
# Create a new tokens column for bitcoin
"""for index, row in articles_bitcoin_df.iterrows():
    try:
        article_bitcoin = articles_bitcoin_df['content'][row]
        token_list = tokenizer(article_bitcoin)

    except AttributeError:
        pass
#token_list_df = pd.DataFrame(token_list)
articles_bitcoin_df['tokens']=articles_bitcoin_df.apply(lambda row: tokenizer(articles_bitcoin_df['content'][row]), axis=1)
articles_bitcoin_df.head()"""

"for index, row in articles_bitcoin_df.iterrows():\n    try:\n        article_bitcoin = articles_bitcoin_df['content'][row]\n        token_list = tokenizer(article_bitcoin)\n\n    except AttributeError:\n        pass\n#token_list_df = pd.DataFrame(token_list)\narticles_bitcoin_df['tokens']=articles_bitcoin_df.apply(lambda row: tokenizer(articles_bitcoin_df['content'][row]), axis=1)\narticles_bitcoin_df.head()"

In [24]:
article_bitcoin = articles_bitcoin_df['content'][2]
token_list = tokenizer(article_bitcoin)
token_list

['paypal',
 'bringing',
 'newlyannounced',
 'support',
 'cryptocurrency',
 'u',
 'account',
 'first',
 'announced',
 'plan',
 'open',
 'cryptocurrency',
 'trading',
 'usbased',
 'user',
 'october',
 'char']

In [35]:
tokens_2_bit = [tokenizer(articles_bitcoin_df['content'][row]) for row in range(len(articles_bitcoin_df['content']))]
print(tokens_2_bit[:5])

TypeError: expected string or bytes-like object

In [26]:
take_1 = [tokenizer(row) for row in articles_bitcoin_df['content']]
take_1

TypeError: expected string or bytes-like object

In [None]:
for row in articles_bitcoin_df['content']:
    print(row)
    print("new line")

In [None]:
articles_bitcoin_df['tokens'] = articles_bitcoin_df['content'].apply(tokenizer) 

In [None]:
def apply_tokenizer(x): 
    return tokenizer(x['content'])

articles_bitcoin_df['tokens']= articles_bitcoin_df.apply(apply_tokenizer, axis=1)

In [None]:
# Create a new tokens column for ethereum
# YOUR CODE HERE!

---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [None]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [None]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
ids = 

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [None]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!