# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
from newsapi import NewsApiClient

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/dariamerkulenko/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [2]:
# Read your api key environment variable
load_dotenv()
api_key = os.getenv("NEWS_API_KEY")

In [3]:
print(api_key)

22340e782dda4cb9a30800ab80c06360


In [4]:
# Create a newsapi client
newsapi = NewsApiClient(api_key=api_key)

In [5]:
# Fetch the Bitcoin news articles

bitcoin_news = newsapi.get_everything(q="bitcoin", language = "en", page_size = 100, sort_by = "relevancy")

#Showing total number of news

print (f"Total number of articles about Bitcoin: {bitcoin_news['totalResults']}")

#Showing a sample article

bitcoin_news["articles"][0]

Total number of articles about Bitcoin: 8519


{'source': {'id': None, 'name': 'Lifehacker.com'},
 'author': 'Jeff Somers',
 'title': 'Is the Crypto Bubble Going to Burst?',
 'description': 'Even if you aren’t paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 trillion last year to about $1.5 trillion in a matter of days…',
 'url': 'https://lifehacker.com/is-the-crypto-bubble-going-to-burst-1848475768',
 'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/976a59b09e0e681e692bd7517498e3f2.jpg',
 'publishedAt': '2022-02-09T16:00:00Z',
 'content': 'Even if you arent paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 tr… [+4782 chars]'}

In [6]:
# Fetch the Ethereum news articles

ethereum_news = newsapi.get_everything(q="ethereum", language = "en", page_size = 100, sort_by = "relevancy")

#Showing total number of news
print (f"Total number of articles about Ethereum: {ethereum_news['totalResults']}")

#Showing a sample article
ethereum_news["articles"][0]

Total number of articles about Ethereum: 3888


{'source': {'id': 'the-verge', 'name': 'The Verge'},
 'author': 'Corin Faife',
 'title': 'Crypto.com admits over $30 million stolen by hackers',
 'description': 'Cryptocurrency exchange Crypto.com has said that $15 million in ethereum and $18 million in bitcoin were stolen by hackers in a security breach',
 'url': 'https://www.theverge.com/2022/1/20/22892958/crypto-com-exchange-hack-bitcoin-ethereum-security',
 'urlToImage': 'https://cdn.vox-cdn.com/thumbor/mde_l3lUC4muDPEFG7LYrUz0O3g=/0x146:2040x1214/fit-in/1200x630/cdn.vox-cdn.com/uploads/chorus_asset/file/8921023/acastro_bitcoin_2.jpg',
 'publishedAt': '2022-01-20T13:23:31Z',
 'content': 'In a new blog post the company said that 4,836 ETH and 443 bitcoin were taken\r\nIllustration by Alex Castro / The Verge\r\nIn a blog post published in the early hours of Thursday morning, cryptocurrency… [+2004 chars]'}

In [7]:
# Create the Bitcoin sentiment scores DataFrame

bitcoin_sentiments = []

for article in bitcoin_news["articles"]:
    try:
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment ["pos"]
        neu = sentiment ["neu"]
        neg = sentiment["neg"]
        
        bitcoin_sentiments.append({
            "text" : text,
            "Compound" : compound,
            "Positive" : pos,
            "Negative" : neg,
            "Neutral" : neu
        })
    except AttributeError:
        pass
#Create DataFrame

bitcoin_df = pd.DataFrame(bitcoin_sentiments)

#Reordering the column order

cols = ["Compound", "Negative", "Neutral", "Positive", "text"]
bitcoin_df = bitcoin_df[cols]

bitcoin_df.head()

Unnamed: 0,Compound,Negative,Neutral,Positive,text
0,0.5859,0.0,0.876,0.124,Even if you arent paying attention to Bitcoin ...
1,0.0,0.0,1.0,0.0,When Denis Rusinovich set up cryptocurrency mi...
2,0.3182,0.0,0.895,0.105,El Salvador introduced Bitcoin as a legal tend...
3,-0.4404,0.083,0.917,0.0,Were officially building an open Bitcoin minin...
4,-0.3182,0.084,0.871,0.045,Israeli national Tal Prihar pled guilty to rou...


In [8]:
# Create the Ethereum sentiment scores DataFrame
ethereum_sentiments =[]

for articles in ethereum_news["articles"]:
    try:
        text = articles["content"]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment ["neu"]
        neg = sentiment ["neg"]
        
        ethereum_sentiments.append({
            "text" : text,
            "Compound" : compound,
            "Positive" : pos,
            "Negative" : neg,
            "Neutral" : neu
        })
    except AttributeError:
        pass
#Create DataFrame

ethereum_df = pd.DataFrame(ethereum_sentiments)

#Reordering the column order

cols = ["Compound", "Negative", "Neutral", "Positive", "text"]
ethereum_df = ethereum_df[cols]

ethereum_df.head()

Unnamed: 0,Compound,Negative,Neutral,Positive,text
0,0.0,0.0,1.0,0.0,"In a new blog post the company said that 4,836..."
1,0.0,0.0,1.0,0.0,Hackers who made off with roughly $15 million ...
2,0.1779,0.0,0.948,0.052,"On some level, the new mayor is simply employi..."
3,0.0772,0.0,0.962,0.038,"Back in September\r\n, Robinhood announced pla..."
4,-0.2023,0.062,0.899,0.039,If people who buy cryptocurrencies intended on...


In [9]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,100.0,100.0,100.0,100.0
mean,0.073527,0.04795,0.88259,0.06943
std,0.428141,0.064763,0.083741,0.062734
min,-0.8176,0.0,0.662,0.0
25%,-0.288875,0.0,0.831,0.0
50%,0.0644,0.0,0.895,0.07
75%,0.445,0.088,0.93725,0.11275
max,0.8341,0.258,1.0,0.234


In [10]:
# Describe the Ethereum Sentiment
ethereum_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,100.0,100.0,100.0,100.0
mean,0.223738,0.02307,0.89843,0.07851
std,0.371354,0.041149,0.082296,0.074122
min,-0.7096,0.0,0.716,0.0
25%,0.0,0.0,0.842,0.0
50%,0.1779,0.0,0.9045,0.066
75%,0.5106,0.046,0.9715,0.12925
max,0.8807,0.174,1.0,0.265


### Questions:

Q: Which coin had the highest mean positive score?

A: 

Q: Which coin had the highest compound score?

A: 

Q. Which coin had the highest positive score?

A: 

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [11]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [12]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/dariamerkulenko/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [13]:
# Instantiate the lemmatizer
lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
sw = set(stopwords.words('english'))


In [14]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', text)
   
    # Create a tokenized list of the words
    
    words = word_tokenize(re_clean)
    
    # Lemmatize words into root words
    
    lem = [lemmatizer.lemmatize(word) for word in words]
   
    # Convert the words to lowercase
    
    tokens = [word.lower() for word in lem if word.lower() not in sw]
    
    # Remove the stop words
    
    
    return tokens

In [15]:
# Create a new tokens column for Bitcoin

bitcoin_df["Tokens"] = bitcoin_df.text.apply(tokenizer)
bitcoin_df.head()

Unnamed: 0,Compound,Negative,Neutral,Positive,text,Tokens
0,0.5859,0.0,0.876,0.124,Even if you arent paying attention to Bitcoin ...,"[even, arent, paying, attention, bitcoin, cryp..."
1,0.0,0.0,1.0,0.0,When Denis Rusinovich set up cryptocurrency mi...,"[denis, rusinovich, set, cryptocurrency, minin..."
2,0.3182,0.0,0.895,0.105,El Salvador introduced Bitcoin as a legal tend...,"[el, salvador, introduced, bitcoin, legal, ten..."
3,-0.4404,0.083,0.917,0.0,Were officially building an open Bitcoin minin...,"[officially, building, open, bitcoin, mining, ..."
4,-0.3182,0.084,0.871,0.045,Israeli national Tal Prihar pled guilty to rou...,"[israeli, national, tal, prihar, pled, guilty,..."


In [16]:
# Create a new tokens column for Ethereum
ethereum_df["Tokens"] = ethereum_df.text.apply(tokenizer)
ethereum_df.head()

Unnamed: 0,Compound,Negative,Neutral,Positive,text,Tokens
0,0.0,0.0,1.0,0.0,"In a new blog post the company said that 4,836...","[new, blog, post, company, said, eth, bitcoin,..."
1,0.0,0.0,1.0,0.0,Hackers who made off with roughly $15 million ...,"[hackers, made, roughly, million, ethereum, cr..."
2,0.1779,0.0,0.948,0.052,"On some level, the new mayor is simply employi...","[level, new, mayor, simply, employing, ageold,..."
3,0.0772,0.0,0.962,0.038,"Back in September\r\n, Robinhood announced pla...","[back, september, robinhood, announced, plan, ..."
4,-0.2023,0.062,0.899,0.039,If people who buy cryptocurrencies intended on...,"[people, buy, cryptocurrencies, intended, hold..."


---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [17]:
from collections import Counter
from nltk import ngrams

In [23]:
# Generate the Bitcoin N-grams where N=2

bitcoin = bitcoin_df["Tokens"]
bitcoin_ngram_counts = Counter(bitcoin)
#print(list(bitcoin_ngram_counts))


TypeError: unhashable type: 'list'

In [None]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [None]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [None]:
# Use token_count to get the top 10 words for Bitcoin
word_counts = Counter(processed)
print(dict(word_counts.most_common(20)))

In [None]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [None]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [None]:
import spacy
from spacy import displacy

In [None]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [None]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---