# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [7]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi import NewsApiClient
load_dotenv('../../Python Works/DOTENV/Keys.env')
%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\brand\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
Python-dotenv could not parse statement starting at line 8


In [8]:
# Read your api key environment variable
api_key = os.getenv("NEWS_API")
type(api_key)

str

In [9]:
# Create a newsapi client
newsapi=NewsApiClient(api_key=api_key)

In [20]:
# Fetch the Bitcoin news articles
btc_headlines = newsapi.get_everything(
    q="bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy")
btc_headlines

{'status': 'ok',
 'totalResults': 7047,
 'articles': [{'source': {'id': 'bbc-news', 'name': 'BBC News'},
   'author': 'https://www.facebook.com/bbcnews',
   'title': "Indian PM Modi's Twitter hacked with bitcoin tweet",
   'description': "The Indian prime minister's account had a message stating that bitcoin would be distributed to citizens.",
   'url': 'https://www.bbc.co.uk/news/world-asia-india-59627124',
   'urlToImage': 'https://ichef.bbci.co.uk/news/1024/branded_news/5998/production/_122063922_mediaitem122063921.jpg',
   'publishedAt': '2021-12-12T10:59:57Z',
   'content': "Image source, AFP via Getty Images\r\nImage caption, Modi has has more than 70 million Twitter followers\r\nIndian Prime Minister Narendra Modi's Twitter account was hacked with a message saying India ha… [+854 chars]"},
  {'source': {'id': None, 'name': 'New York Times'},
   'author': 'Corey Kilgannon',
   'title': 'Why New York State Is Experiencing a Bitcoin Boom',
   'description': 'Cryptocurrency miners a

In [21]:
# Fetch the Ethereum news articles
eth_headlines = newsapi.get_everything(
    q="etherium",
    language="en",
    page_size=100,
    sort_by="relevancy")
eth_headlines

{'status': 'ok',
 'totalResults': 41,
 'articles': [{'source': {'id': None, 'name': 'Boing Boing'},
   'author': 'Jason Weisberger',
   'title': 'The group that failed to buy a copy of the US Constitution lost a lot of crypto in transaction fees',
   'description': 'The ConstitutionDAO group did not win its bid to buy a copy of the US Constitution, however, amassing the $40 million USD in Etherium cost them $1 million in transaction fees, and assumedly will cost similar to return the, now-lesser valued, coin. — Read the …',
   'url': 'https://boingboing.net/2021/11/23/the-group-who-failed-to-buy-a-copy-of-the-us-constitution-lost-a-lot-crypto-transaction-in-fees.html',
   'urlToImage': 'https://i2.wp.com/boingboing.net/wp-content/uploads/2021/11/shutterstock_687484867.jpg?fit=1000%2C342&ssl=1',
   'publishedAt': '2021-11-23T16:46:29Z',
   'content': 'The ConstitutionDAO group did not win its bid to buy a copy of the US Constitution, however, amassing the $40 million USD in Etherium cos

In [28]:
# Create the Bitcoin sentiment scores
btc_sentiment = []

for article in btc_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btc_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu})
        
    except AttributeError:
        pass

In [29]:
#Create the Bitcoin sentiment DataFrame
btc_df = pd.DataFrame(btc_sentiments)

In [30]:
# Create the Ethereum sentiment scores
eth_sentiment = []

for article in eth_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu})
        
    except AttributeError:
        pass

In [31]:
#Create the Bitcoin sentiment DataFrame
eth_df = pd.DataFrame(eth_sentiments)

In [32]:
# Describe the Bitcoin Sentiment
btc_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,400.0,400.0,400.0,400.0
mean,0.106245,0.06303,0.0385,0.89845
std,0.36333,0.064198,0.058083,0.089
min,-0.802,0.0,0.0,0.66
25%,-0.00645,0.0,0.0,0.84575
50%,0.0,0.054,0.0,0.9195
75%,0.386225,0.114,0.06925,1.0
max,0.7906,0.23,0.246,1.0


In [33]:
# Describe the Ethereum Sentiment
eth_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,164.0,164.0,164.0,164.0
mean,-0.006315,0.037512,0.04339,0.919098
std,0.369288,0.054753,0.062705,0.082139
min,-0.891,0.0,0.0,0.707
25%,-0.1027,0.0,0.0,0.867
50%,0.0,0.0,0.0,0.95
75%,0.25,0.088,0.054,1.0
max,0.6588,0.189,0.293,1.0


### Questions:

Q: Which coin had the highest mean positive score?

A: Bitcoin has the higher positive mean score of .063030

Q: Which coin had the highest negative score?

A: Etherium has the highest negative score of 0.293000

Q. Which coin had the highest positive score?

A: Bitcoin has the highest positive score of 0.230000

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [10]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [11]:
# Instantiate the lemmatizer
# YOUR CODE HERE!

# Create a list of stopwords
# YOUR CODE HERE!

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [12]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [13]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [14]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [15]:
from collections import Counter
from nltk import ngrams

In [16]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [17]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [18]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [20]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [22]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [23]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [24]:
import spacy
from spacy import displacy

In [25]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [26]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [27]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [28]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [29]:
# Render the visualization
# YOUR CODE HERE!

In [30]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [31]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [32]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [33]:
# Render the visualization
# YOUR CODE HERE!

In [34]:
# List all Entities
# YOUR CODE HERE!

---