# News Headlines Sentiment

Use the news api to pull the latest news articles for bitcoin and ethereum and create a DataFrame of sentiment scores for each coin. 

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [71]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
from nltk.sentiment.vader import SentimentIntensityAnalyzer

from newsapi.newsapi_client import NewsApiClient

analyzer = SentimentIntensityAnalyzer()


%matplotlib inline

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [63]:
# Read your api key environment variable
load_dotenv()

news_api_key = os.getenv("news_api_key")

#help(NewsApiClient)



In [44]:
# Create a newsapi client
#help(NewsApiClient)

newsapi = NewsApiClient(news_api_key)

#help(newsapi)

test = newsapi.get_everything(q='bitcoin')

#type(test)

help(newsapi.get_everything)



Help on method get_everything in module newsapi.newsapi_client:

get_everything(q=None, sources=None, domains=None, exclude_domains=None, from_param=None, to=None, language=None, sort_by=None, page=None, page_size=None) method of newsapi.newsapi_client.NewsApiClient instance
        Search through millions of articles from over 5,000 large and small news sources and blogs.
    
        Optional parameters:
            (str) q - return headlines w/ specified coin! Valid values are:
                        'bitcoin', 'trump', 'tesla', 'ethereum', etc
    
            (str) sources - return headlines of news sources! some Valid values are:
                        'bbc-news', 'the-verge', 'abc-news', 'crypto coins news',
                        'ary news','associated press','wired','aftenposten','australian financial review','axios',
                        'bbc news','bild','blasting news','bloomberg','business insider','engadget','google news',
                        'hacker news','info

In [53]:
# Fetch the Bitcoin news articles
bitcoin_headlines = newsapi.get_everything(q='bitcoin',
                                           language="en",
                                           page_size=100,
                                           sort_by="relevancy")



bitcoin_headlines['articles'][0]

{'source': {'id': None, 'name': 'Gizmodo.com'},
 'author': 'John Biggs',
 'title': 'Crypto Traders Cut Out the Middleman, Simply Rob Victim',
 'description': 'Two alleged crypto traders in Singapore apparently came up with a fool-proof plan: rather than convert a customer’s 365,000 Singapore dollars to bitcoin, they would simply rob the victim when he came in to do the trade.Read more...',
 'url': 'https://gizmodo.com/crypto-traders-cut-out-the-middleman-simply-rob-victim-1845011301',
 'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/li0fkkejdmaugm8v1fkw.jpg',
 'publishedAt': '2020-09-10T14:28:00Z',
 'content': 'Two alleged crypto traders in Singapore apparently came up with a fool-proof plan: rather than convert a customers 365,000 Singapore dollars to bitcoin, they would simply rob the victim when he came … [+1735 chars]'}

In [70]:
# Fetch the Ethereum news articles
ethereum_headlines = newsapi.get_everything(q='ethereum',
                                            language="en",
                                            page_size=100,
                                            sort_by="relevancy")


# Confirm output type of sentiment analyzer
test_sentiment = analyzer.polarity_scores(ethereum_headlines['articles'][0]['content'])

test_sentiment

type(ethereum_headlines['articles'][0]['content'])

str

In [87]:
# Create the Bitcoin sentiment scores DataFrame

bitcoin_sentiment = []

for article in bitcoin_headlines["articles"]:
    
    try:
    
        text = article["content"]
        date = article["publishedAt"][:10]
    
        sentiment = analyzer.polarity_scores(text)
    
        compound = sentiment["compound"]
        positive = sentiment["pos"]
        negative = sentiment["neg"]
        neutral = sentiment["neu"]


        bitcoin_sentiment.append({
            "date": date,
            "text": text,
            "compound": compound,
            "positive": positive,
            "negative": negative,
            "neutral": neutral
            })
        
    except AttributeError:
        
        pass

type(bitcoin_sentiment[0]['text'])

# I first tried to create the above function with out the 'try...except...pass' structure.  I kept getting the error
# "AttributeError: 'NoneType' object has no attribute 'encode'".  What exactly does this mean?  Were some article 
# texts NoneTypes instead of strings?  In any case, the 'try' seemed to fix it.  (Used in class exercise.)


bitcoin_sentiment

bitcoin_df = pd.DataFrame(bitcoin_sentiment)

columns = ["date", "text", "compound", "positive", "negative", "neutral"]

# Rearrange columns
bitcoin_df = bitcoin_df[columns]

bitcoin_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2020-09-10,Two alleged crypto traders in Singapore appare...,-0.6908,0.0,0.16,0.84
1,2020-09-08,"By Alexis Akwagyiram, Tom Wilson\r\n* Monthly ...",0.0,0.0,0.0,1.0
2,2020-08-23,“The COVID-19 pandemic has resulted in a mass ...,0.2732,0.063,0.0,0.937
3,2020-09-08,"LAGOS/LONDON (Reuters) - Four months ago, Abol...",0.0,0.0,0.0,1.0
4,2020-09-08,"LAGOS/LONDON (Reuters) - Four months ago, Abol...",0.0,0.0,0.0,1.0


In [92]:
# Create the ethereum sentiment scores DataFrame

ethereum_sentiment = []

for article in ethereum_headlines["articles"]:
    
    try:
    
        text = article["content"]
        date = article["publishedAt"][:10]
    
        sentiment = analyzer.polarity_scores(text)
    
        compound = sentiment["compound"]
        positive = sentiment["pos"]
        negative = sentiment["neg"]
        neutral = sentiment["neu"]


        ethereum_sentiment.append({
            "date": date,
            "text": text,
            "compound": compound,
            "positive": positive,
            "negative": negative,
            "neutral": neutral
            })
        
    except AttributeError:
        
        pass

type(ethereum_sentiment[0]['text'])


ethereum_sentiment

ethereum_df = pd.DataFrame(ethereum_sentiment)

columns = ["date", "text", "compound", "positive", "negative", "neutral"]

# Rearrange columns
ethereum_df = ethereum_df[columns]

ethereum_df.head()

In [8]:
# Describe the Bitcoin Sentiment
# YOUR CODE HERE!

In [9]:
# Describe the Ethereum Sentiment
# YOUR CODE HERE!

### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

# Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word
2. Remove Punctuation
3. Remove Stopwords

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Create a list of the words

    # Convert the words to lowercase
    
    # Remove the punctuation
    
    # Remove the stop words
    
    # Lemmatize Words into root words
    
    return tokens


In [13]:
# Create a new tokens column for bitcoin
# YOUR CODE HERE!

In [14]:
# Create a new tokens column for ethereum
# YOUR CODE HERE!

---

# NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Use the token_count function to generate the top 10 words from each coin
def token_count(tokens, N=10):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [20]:
# Get the top 10 words for Ethereum
# YOUR CODE HERE!

# Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

# Named Entity Recognition

In this section, you will build a named entity recognition model for both coins and visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Optional - download a language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

## Bitcoin NER

In [27]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

## Ethereum NER

In [31]:
# Concatenate all of the bitcoin text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!