# Unit 12 - Tales from the Crypto

---


## 1. Sentiment Analysis

Use the [newsapi](https://newsapi.org/) to pull the latest news articles for Bitcoin and Ethereum and create a DataFrame of sentiment scores for each coin.

Use descriptive statistics to answer the following questions:
1. Which coin had the highest mean positive score?
2. Which coin had the highest negative score?
3. Which coin had the highest positive score?

In [1]:
## I just put this into everything now so I don't wast time reading the warnings.

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Initial imports
## This was all in the starter code.

import os
import pandas as pd
from dotenv import load_dotenv
import nltk as nltk

In [3]:
## Having some issues with Vader, so splitting for troubleshooting.

nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

%matplotlib inline

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\rotar\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [4]:
## Having trouble getting my base environment to read my API keys.
## Trying pip install python-dotenv to make sure it can see the .env file.

!pip install python-dotenv



In [5]:
%load_ext dotenv
%dotenv

In [6]:
# Read your api key environment variable

api_key = os.getenv("NEWSAPI_ORG_KEY")

In [7]:
## I ran "print(api_key)" to see if it worked.

In [8]:
## It worked!

In [9]:
# Create a newsapi client
## Ran into some issues with my base environment again.
## Need to do a pip install for newsapi, then import NewsApiClient (from: https://newsapi.org/docs/client-libraries/python)

!pip install newsapi-python



In [10]:
## So far, so good.

from newsapi import NewsApiClient

In [11]:
newsapi = NewsApiClient(api_key=api_key)

In [12]:
## It worked!

In [13]:
# Fetch the Bitcoin news articles
## Study group says we're sorting by "relevancy"; sounds right.
## Naming this one btc_articles, because it's asking for articles about BTC.

btc_articles = newsapi.get_everything(
    q="bitcoin", language="en", sort_by="relevancy"
)

In [14]:
## Let's see if that worked.

## Show some sample articles.
btc_articles["articles"][:2]

[{'source': {'id': 'wired', 'name': 'Wired'},
  'author': 'Khari Johnson',
  'title': 'Why Not Use Self-Driving Cars as Supercomputers?',
  'description': 'Autonomous vehicles use the equivalent of 200 laptops to get around. Some want to tap that computing power to decode viruses or mine bitcoin.',
  'url': 'https://www.wired.com/story/use-self-driving-cars-supercomputers/',
  'urlToImage': 'https://media.wired.com/photos/60f081b4c147fe7a1a367362/191:100/w_1280,c_limit/Business-Autonomous-Vehicles-Supercomputers-1201885684.jpg',
  'publishedAt': '2021-07-19T11:00:00Z',
  'content': 'Like Dogecoin devotees, the mayor of Reno, and the leaders of El Salvador, Aldo Baoicchi is convinced cryptocurrency is the future. The CEO and founder of Canadian scooter maker Daymak believes this … [+4116 chars]'},
 {'source': {'id': 'the-verge', 'name': 'The Verge'},
  'author': 'Richard Lawler',
  'title': 'Kaseya ransomware attackers demand $70 million, claim they infected over a million devices',
  '

In [15]:
## It worked!

In [16]:
# Fetch the Ethereum news articles
## Naming this one eth_articles, because it's asking for articles about ETH.

eth_articles = newsapi.get_everything(
    q="ethereum", language="en", sort_by="relevancy"
)

In [17]:
## Let's see if that worked.

## Show some sample articles.
eth_articles["articles"][:2]

[{'source': {'id': 'techcrunch', 'name': 'TechCrunch'},
  'author': 'Connie Loizos',
  'title': 'Crypto investors like Terraform Labs so much, they’re committing $150 million to its ‘ecosystem’',
  'description': 'There are many blockchain platforms competing for investors’ and developers’ attention right now, from the big daddy of them all, Ethereum, to so-called “Ethereum Killers” like Solana, which we wrote about in May. Often, these technologies are seen as so prom…',
  'url': 'http://techcrunch.com/2021/07/16/crypto-investors-like-terraform-labs-so-much-theyre-committing-150-million-to-its-ecosystem/',
  'urlToImage': 'https://techcrunch.com/wp-content/uploads/2020/06/GettyImages-1174590894.jpg?w=667',
  'publishedAt': '2021-07-16T16:00:55Z',
  'content': 'There are many blockchain platforms competing for investors’ and developers’ attention right now, from the big daddy of them all, Ethereum, to so-called “Ethereum Killers” like Solana, which we wrote… [+2563 chars]'},
 {'source'

In [18]:
# Create the Bitcoin sentiment scores DataFrame
## This came with a *TON* of help from the study group and copy/pasting from class work.

btc_sentiment = []

for article in btc_articles["articles"]:
    try:
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
## Forgot compound. Adding that now and re-running on both sentiments.
        comp = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        btc_sentiment.append({
            "compound": comp,
            "positive": pos,
            "negative": neg,
            "neutral": neu,
            "text": text
        })
        
    except AttributeError:
        pass

In [19]:
# Create a DataFrame.
## Split this for troubleshooting.

btc_sentiment_df = pd.DataFrame(btc_sentiment)
btc_sentiment_df.head()

Unnamed: 0,compound,positive,negative,neutral,text
0,0.6908,0.178,0.0,0.822,"Like Dogecoin devotees, the mayor of Reno, and..."
1,-0.5719,0.111,0.184,0.705,Filed under:\r\nThe supply chain attack has re...
2,-0.6124,0.0,0.143,0.857,image copyrightGetty Images\r\nThe gang behind...
3,0.624,0.127,0.0,0.873,To get a roundup of TechCrunchs biggest and mo...
4,0.7264,0.164,0.0,0.836,While retail investors grew more comfortable b...


In [20]:
# Create the Ethereum sentiment scores DataFrame
## Copied in from above.

eth_sentiment = []

for article in eth_articles["articles"]:
    try:
        text = article["content"]
        sentiment = analyzer.polarity_scores(text)
## Forgot compound. Adding that now and re-running on both sentiments.
        comp = sentiment["compound"]
        pos = sentiment["pos"]
        neu = sentiment["neu"]
        neg = sentiment["neg"]
        
        eth_sentiment.append({
            "compound": comp,
            "positive": pos,
            "negative": neg,
            "neutral": neu,
            "text": text
        })
        
    except AttributeError:
        pass

In [21]:
# Create a DataFrame.

eth_sentiment_df = pd.DataFrame(eth_sentiment)
eth_sentiment_df.head()

Unnamed: 0,compound,positive,negative,neutral,text
0,0.3612,0.075,0.0,0.925,There are many blockchain platforms competing ...
1,0.7264,0.164,0.0,0.836,While retail investors grew more comfortable b...
2,0.3612,0.11,0.041,0.849,Bitcoin and Ethereum\r\nYuriko Nakao\r\nEther ...
3,0.6369,0.157,0.0,0.843,"""Anthony Di Iorio, a co-founder of the Ethereu..."
4,0.7717,0.194,0.0,0.806,"Ether holders have ""staked"" more than $13 bill..."


In [22]:
# Describe the Bitcoin Sentiment
## So, I ordered the df by highest-to-lowest, made confusing plots, outlined an essay and generally just forgot about the .describe function.
## I wasted a *LOT* of sanity on this one, you guys.

btc_sentiment_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,-0.034365,0.0414,0.0431,0.9155
std,0.423814,0.062425,0.079494,0.102089
min,-0.8271,0.0,0.0,0.653
25%,-0.3818,0.0,0.0,0.869
50%,0.0,0.0,0.0,0.92
75%,0.0625,0.0925,0.08,1.0
max,0.7264,0.178,0.287,1.0


In [23]:
## So upset.

In [24]:
# Describe the Ethereum Sentiment

eth_sentiment_df.describe()

Unnamed: 0,compound,positive,negative,neutral
count,20.0,20.0,20.0,20.0
mean,0.221595,0.068,0.01685,0.91515
std,0.346178,0.074558,0.030824,0.080755
min,-0.3818,0.0,0.0,0.782
25%,0.0,0.0,0.0,0.84275
50%,0.125,0.0375,0.0,0.92
75%,0.528675,0.14925,0.01025,1.0
max,0.7717,0.194,0.08,1.0


### Questions:

Q: Which coin had the highest mean positive score?

In [25]:
## A: eth showed .068 positive on my end, with btc at .041.

Q: Which coin had the highest compound score?

In [26]:
## A: eth had a 0.771 max, compared to btc's 0.726; so eth again.

Q. Which coin had the highest positive score?

In [27]:
## A: eth, again, had the highest positive sentiment at 0.194 to btc's 0.178.

---

## 2. Natural Language Processing
---
###   Tokenizer

In this section, you will use NLTK and Python to tokenize the text for each coin. Be sure to:
1. Lowercase each word.
2. Remove Punctuation.
3. Remove Stopwords.

In [28]:
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer, PorterStemmer
from string import punctuation
import re

In [29]:
# Instantiate the lemmatizer
# YOUR CODE HERE!

# Create a list of stopwords
# YOUR CODE HERE!

# Expand the default stopwords list if necessary
# YOUR CODE HERE!

In [30]:
# Complete the tokenizer function
def tokenizer(text):
    """Tokenizes text."""
    
    # Remove the punctuation from text

   
    # Create a tokenized list of the words
    
    
    # Lemmatize words into root words

   
    # Convert the words to lowercase
    
    
    # Remove the stop words
    
    
    return tokens

In [31]:
# Create a new tokens column for Bitcoin
# YOUR CODE HERE!

In [32]:
# Create a new tokens column for Ethereum
# YOUR CODE HERE!

---

### NGrams and Frequency Analysis

In this section you will look at the ngrams and word frequency for each coin. 

1. Use NLTK to produce the n-grams for N = 2. 
2. List the top 10 words for each coin. 

In [33]:
from collections import Counter
from nltk import ngrams

In [34]:
# Generate the Bitcoin N-grams where N=2
# YOUR CODE HERE!

In [35]:
# Generate the Ethereum N-grams where N=2
# YOUR CODE HERE!

In [36]:
# Function token_count generates the top 10 words for a given coin
def token_count(tokens, N=3):
    """Returns the top N tokens from the frequency count"""
    return Counter(tokens).most_common(N)

In [37]:
# Use token_count to get the top 10 words for Bitcoin
# YOUR CODE HERE!

In [38]:
# Use token_count to get the top 10 words for Ethereum
# YOUR CODE HERE!

---

### Word Clouds

In this section, you will generate word clouds for each coin to summarize the news for each coin

In [39]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = [20.0, 10.0]

In [40]:
# Generate the Bitcoin word cloud
# YOUR CODE HERE!

In [41]:
# Generate the Ethereum word cloud
# YOUR CODE HERE!

---
## 3. Named Entity Recognition

In this section, you will build a named entity recognition model for both Bitcoin and Ethereum, then visualize the tags using SpaCy.

In [42]:
import spacy
from spacy import displacy

In [43]:
# Download the language model for SpaCy
# !python -m spacy download en_core_web_sm

In [44]:
# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

---
### Bitcoin NER

In [None]:
# Concatenate all of the Bitcoin text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---

### Ethereum NER

In [None]:
# Concatenate all of the Ethereum text together
# YOUR CODE HERE!

In [None]:
# Run the NER processor on all of the text
# YOUR CODE HERE!

# Add a title to the document
# YOUR CODE HERE!

In [None]:
# Render the visualization
# YOUR CODE HERE!

In [None]:
# List all Entities
# YOUR CODE HERE!

---