# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [28]:
# Initial imports
import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

In [29]:
!pip install python-dotenv
!pip install newsapi-python



You should consider upgrading via the 'C:\Users\markp\anaconda3\python.exe -m pip install --upgrade pip' command.




You should consider upgrading via the 'C:\Users\markp\anaconda3\python.exe -m pip install --upgrade pip' command.


In [40]:
# Read your api key environment variable
load_dotenv()
from newsapi import NewsApiClient
api_key = os.getenv('NEWS_API')
type(api_key)

str

In [41]:
# Create a newsapi client
from newsapi import NewsApiClient
newsapi = NewsApiClient(api_key=api_key)

In [42]:
# Fetch the Bitcoin news articles
bitcoin_headlines = newsapi.get_everything(
    q="bitcoin",
    language="en",
    page_size=100,
    sort_by="relevancy"
)

# Print total articles
print(f"bitcoin: {bitcoin_headlines['totalResults']}")

# Show sample article
bitcoin_headlines["articles"][0]

bitcoin: 7339


{'source': {'id': None, 'name': 'Lifehacker.com'},
 'author': 'Jeff Somers',
 'title': 'Is the Crypto Bubble Going to Burst?',
 'description': 'Even if you aren’t paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 trillion last year to about $1.5 trillion in a matter of days…',
 'url': 'https://lifehacker.com/is-the-crypto-bubble-going-to-burst-1848475768',
 'urlToImage': 'https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/976a59b09e0e681e692bd7517498e3f2.jpg',
 'publishedAt': '2022-02-09T16:00:00Z',
 'content': 'Even if you arent paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 tr… [+4782 chars]'}

In [50]:
# Fetch the Ethereum news articles
# YOUR CODE HERE!
eth_articles = newsapi.get_everything(q='ethereum',
                                      language='en',
                                      sort_by='relevancy',
                                     page_size=100)
eth_articles['totalResults']

# Print total articles
print(f"ethereum: {ethereum_headlines['totalResults']}")

# Show sample article
ethereum_headlines["articles"][0]

ethereum: 3047


{'source': {'id': 'wired', 'name': 'Wired'},
 'author': 'Omar L. Gallaga',
 'title': 'Playing With Crypto? You’ll Need a Wallet (or Several)',
 'description': 'Buying and selling NFTs or transferring digital currency is going to require a little leap of faith. Here’s how to get started.',
 'url': 'https://www.wired.com/story/how-to-choose-set-up-crypto-wallet/',
 'urlToImage': 'https://media.wired.com/photos/620415899266d5d11c07b346/191:100/w_1280,c_limit/Gear-Coinbase-App-Screens.jpg',
 'publishedAt': '2022-02-10T14:00:00Z',
 'content': "If people who buy cryptocurrencies intended only to hold on to them as speculative investments, there'd be no real need for crypto wallets. Exchanges and online brokerages that convert dollars to, sa… [+3031 chars]"}

In [66]:
# Create the Bitcoin sentiment scores DataFrame
# YOUR CODE HERE!
bitcoin_sentiments = []

for article in bitcoin_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        bitcoin_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
bitcoin_df = pd.DataFrame(bitcoin_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
bitcoin_df = bitcoin_df[cols]

bitcoin_df.head()


Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-02-09,Even if you arent paying attention to Bitcoin ...,0.5859,0.124,0.0,0.876
1,2022-02-11,Netflix\r\n is making a docuseries about one o...,-0.7096,0.0,0.169,0.831
2,2022-02-08,"Over the last five years, about 25,000 of thos...",-0.4939,0.0,0.091,0.909
3,2022-02-17,"Even in cyberspace, the Department of Justice ...",0.7351,0.147,0.0,0.853
4,2022-02-13,The couple would never flee from the country a...,-0.34,0.057,0.118,0.825


In [67]:
# Create the Ethereum sentiment scores DataFrame
# YOUR CODE HERE!
ethereum_sentiments = []

for article in ethereum_headlines["articles"]:
    try:
        text = article["content"]
        date = article["publishedAt"][:10]
        sentiment = analyzer.polarity_scores(text)
        compound = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        ethereum_sentiments.append({
            "text": text,
            "date": date,
            "compound": compound,
            "positive": pos,
            "negative": neg,
            "neutral": neu
            
        })
        
    except AttributeError:
        pass
    
# Create DataFrame
ethereum_df = pd.DataFrame(ethereum_sentiments)

# Reorder DataFrame columns
cols = ["date", "text", "compound", "positive", "negative", "neutral"]
ethereum_df = ethereum_df[cols]

ethereum_df.head()

Unnamed: 0,date,text,compound,positive,negative,neutral
0,2022-02-10,If people who buy cryptocurrencies intended on...,-0.2023,0.039,0.062,0.899
1,2022-03-01,"In February, shit hit the fan in the usual way...",-0.3182,0.059,0.093,0.848
2,2022-02-17,"Technical analysis isnt a perfect tool, but it...",-0.2498,0.0,0.059,0.941
3,2022-02-09,This enables an L1 platform to bootstrap its n...,0.0,0.0,0.0,1.0
4,2022-02-25,Coinbase reported that the share of trading vo...,0.6705,0.188,0.0,0.812


In [68]:
# Describe the Bitcoin Sentiment
bitcoin_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,100.0,100.0,100.0,100.0
mean,0.054023,0.06928,0.05666,0.87407
std,0.470294,0.06223,0.073662,0.085674
min,-0.8957,0.0,0.0,0.627
25%,-0.30155,0.0,0.0,0.81825
50%,0.0386,0.066,0.0,0.884
75%,0.4767,0.11925,0.0915,0.938
max,0.8341,0.234,0.269,1.0


In [69]:
# Describe the Ethereum Sentiment
eth_df.describe()

Unnamed: 0,Compound,Negative,Neutral,Positive
count,100.0,100.0,100.0,100.0
mean,0.210204,0.03499,0.8766,0.08839
std,0.417609,0.060181,0.085037,0.07208
min,-0.9136,0.0,0.627,0.0
25%,0.0,0.0,0.83525,0.0405
50%,0.2143,0.0,0.875,0.074
75%,0.5106,0.059,0.9415,0.127
max,0.9186,0.312,1.0,0.29


### Questions:

Q: Which coin had the highest mean positive score?

A: Ethereum (85%) 

Q: Which coin had the highest compound score?

A: Bitcoin

Q. Which coin had the highest positive score?

A: Bitcoin

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [89]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [90]:
# Instantiate the lemmatizer

lemmatizer = WordNetLemmatizer()

# Create a list of stopwords
def process_text(article):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', article)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return output

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [91]:
def get(api_key):
    global token

In [92]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text
    words = [t.translate(str.maketrans('','',string.punctuation)) for t in tokens]

   
    # Create a tokenized list of the words
    tokens = word_tokenize(re_clean)
    
    
    # Lemmatize words into root words
    tokens = ', '.join([lemmatizer.lemmatize(word) for word in words])

   
    # Convert the words to lowercase
    tokens = [t.lower() for t in tokens]
    
    # Remove the stop words
    stop_words = set(stopwords.words('english'))
    words = [w for w in words if not w in stop_words]
    
    return tokens

In [None]:
# Create a new tokens column for Bitcoin
global tokens
bitcoin_df["tokens"]= bitcoin_df['text'].apply(tokenizer)
bitcoin_df.head()

In [None]:
# Create a new tokens column for Ethereum
eth_df["tokens"] = eth_df['text'].apply(tokenizer)
eth_df.head()

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [103]:
from collections import Counter
from nltk import ngrams

In [None]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!
tokens_btc = []
tokens_btc_all = []
for x in range(0,100):
    tokens_btc = bitcoin_df["tokens"].values[x]
    tokens_btc_all.append(tokens_btc)

In [None]:
# Generate the Ethereum N-grams where N=2
tokens_eth = []
tokens_eth_all = []
for x in range(0,100):
    tokens_eth = ethereum_df["tokens"].values[x]
    tokens_eth_all.append(tokens_eth)
flattened_eth = [val for sublist in tokens_eth_all for val in sublist]
bigram_counts = Counter(ngrams(flattened_eth, n=2))
print(dict(bigram_counts))

In [84]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [19]:
# Use token_count to get the top 10 words for Bitcoin
N=2
bigram_counts_B = Counter(ngrams(bitcoin_p, N))
print(dict(bigram_counts_B))

In [20]:
# Use token_count to get the top 10 words for Ethereum
N= 2
bigram_counts_E = Counter(ngrams(ethereum_p, N))
print(dict(bigram_counts_E))

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [21]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [None]:
# Generate the Bitcoin word cloud
def process_text(doc):
    sw = set(stopwords.words('english'))
    regex = re.compile("[^a-zA-Z ]")
    re_clean = regex.sub('', doc)
    words = word_tokenize(re_clean)
    lem = [lemmatizer.lemmatize(word) for word in words]
    output = [word.lower() for word in lem if word.lower() not in sw]
    return ' '.join(output)
big_string = ' '.join(flattened_btc)
input_text = process_text(big_string)
wc = WordCloud().generate(input_text)
plt.imshow(wc)

In [None]:
# Generate the Ethereum word cloud
big_string = ' '.join(flattened_eth)
input_text = process_text(big_string)
wc = WordCloud().generate(input_text)
plt.imshow(wc)

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [105]:
import spacy
from spacy import displacy

In [106]:
# Download the language model for SpaCy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.2.0

2022-03-06 05:43:16.653771: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-03-06 05:43:16.653832: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
You should consider upgrading via the 'C:\Users\markp\anaconda3\python.exe -m pip install --upgrade pip' command.



  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl (13.9 MB)
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [107]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

---
### Bitcoin NER

In [109]:
# Concatenate all of the Bitcoin text together
all_concat_bitcoin = bitcoin_df.text.str.cat()
all_concat_bitcoin

'Even if you arent paying attention to Bitcoin and other cryptocurrencies, you might have noticed that their value plummeted last week, with the total value of the market tumbling from a high of $3 tr… [+4782 chars]Netflix\r\n is making a docuseries about one of the worst rappers of all time\r\n. Coincidentally, Heather "Razzlekhan" Morgan and her husband, Ilya Lichtenstein, were charged this week with conspiring t… [+1432 chars]Over the last five years, about 25,000 of those Bitcoin were transferred out of Mr. Lichtensteins wallet using a complicated series of transactions meant to obscure that the currency had been stolen … [+2506 chars]Even in cyberspace, the Department of Justice is able to use a tried and true investigative technique, following the money, Ms. Monaco said. Its what led us to Al Capone in the 30s. It helped us dest… [+1176 chars]The couple would never flee from the country at the risk of losing access to their ability to have children, the lawyer wrote.\r\nAt the he

In [110]:
# Run the NER processor on all of the text
bitcoin_doc = nlp(all_concat_bitcoin)
bitcoin_doc

# Add a title to the document
bitcoin_doc.user_data["Title"] = "Bitcoin NER"

In [111]:
# Render the visualization
displacy.render(bitcoin_doc, style = 'ent')

In [112]:
# List all Entities
for i in bitcoin_doc.ents:
    print(i.text, i.label_)

last week DATE
3 MONEY
about one CARDINAL
Ilya Lichtenstein PERSON
this week DATE
the last five years DATE
about 25,000 CARDINAL
Lichtensteins PERSON
the Department of Justice ORG
Monaco PERSON
Al Capone LOC
Margaret Lynaugh PERSON
Super Bowl EVENT
Larry David PERSON
LeBron James PERSON
+3454 ORG
one CARDINAL
Tuesday DATE
$4.5 billion MONEY
Lichtensteins PERSON
Feb. 1 DATE
roughly $3.6 billion MONEY
one 2020 DATE
4 DATE
Reuters ORG
8.82% PERCENT
40,611.4 MONEY
2202 DATE
Friday DATE
3,291.29 MONEY
Bitcoin PERSON
23.2% PERCENT
22 CARDINAL
Reuters ORG
2021 DATE
thousands CARDINAL
Reuters ORG
Russia GPE
Ukraine GPE
Nonfungible Tidbits PERSON
this week DATE
Russia GPE
Ukraine GPE
Ukrainians NORP
Russian NORP
YouTube ORG
Alex Castro PERSON
Verge ORG
BitConnect ORG
Getty GPE
Russia GPE
last Thursday DATE
Ukranian NORP
Mexico City GPE
Telegr ORG
Bitcoin WORK_OF_ART
Bloombergs Lorcan Roche Kelly ORG
first ORDINAL
the 21st cen DATE
15 CARDINAL
Reuters ORG
Europe LOC
two CARDINAL
Russia GPE
March

---

### Ethereum NER

In [113]:
# Concatenate all of the Ethereum text together
all_concat_ethereum = eth_df.text.str.cat()
all_concat_ethereum

'If people who buy cryptocurrencies intended only to hold on to them as speculative investments, there\'d be no real need for crypto wallets. Exchanges and online brokerages that convert dollars to, sa… [+3031 chars]In February, shit hit the fan in the usual way: An old tweet resurfaced. Brantly Millegan, director of operations at Ethereum Name Service (ENS), a web3 business, had written the following in May 201… [+3096 chars]Technical analysis isnt a perfect tool, but it may point the way for Ethereum\r\nEthereum\xa0(ETH-USD\r\n) continues to be a volatile crypto investment. Crypto is volatile by nature Im not setting it apart … [+3612 chars]This enables an L1 platform to bootstrap its national economy over time through a flywheel between financial speculation around its native token and actual building of applications and activities in … [+4057 chars]Coinbase reported that the share of trading volume for ethereum and other altcoins increased last year, while bitcoin\'s share dropped 

In [114]:
# Run the NER processor on all of the text
ethereum_doc = nlp(all_concat_ethereum)
ethereum_doc

# Add a title to the document
ethereum_doc.user_data["Title"] = "Ethereum NER"

In [115]:
# Render the visualization
displacy.render(ethereum_doc, style = 'ent')

In [116]:
# List all Entities
for i in ethereum_doc.ents:
    print(i.text, i.label_)

February DATE
Ethereum Name Service ORG
ENS ORG
May 201 DATE
Ethereum ORG
ETH-USD ORG
Crypto PERSON
last year DATE
Between 2020 and 2021 DATE
Colorado GPE
the middle of this year DATE
US GPE
Jared Polis PERSON
James Bareham PERSON
More than $15 million MONEY
More than $15 million MONEY
Ukrainian GPE
Russia GPE
2022 DATE
Facebook ORG
Microsoft ORG
Twitter PRODUCT
Ethereum ORG
second ORDINAL
two months DATE
NFT ORG
7 CARDINAL
Reuters ORG
Monday DATE
$450 million MONEY
Sequoia Capital India ORG
two hours TIME
YouTube ORG
Waka Flacka Fla ORG
the past few years DATE
NFT ORG
$23 billion MONEY
4 CARDINAL
Reuters ORG
8.82% PERCENT
40,611.4 MONEY
2202 DATE
Friday DATE
3,291.29 MONEY
Bitcoin PERSON
23.2% PERCENT
TIME ORG
weekly DATE
TIME ORG
weekly DATE
10 CARDINAL
Reuters ORG
Thursday DATE
UK GPE
today DATE
Brett Harrison PERSON
Bitcoin PERSON
2017 DATE
+5776 ORG
Finance Insider PERSON
American Express ORG
Twitter PRODUCT
Russia GPE
Ukraine GPE
Opera ORG
Keshas PERSON
Opera ORG
Monday DATE
Fina

---