# _Explore: January 13, 2019_

**Objective**: To be determined; however, the following articles/links might provide some inspiration!
- [Generating A Twitter Ego-Network & Detecting Communities](https://towardsdatascience.com/generating-twitter-ego-networks-detecting-ego-communities-93897883d255)
- [Deploy your side-projects at scale for basically nothing - Google Cloud Run](https://alexolivier.me/posts/deploy-container-stateless-cheap-google-cloud-run-serverless)
- [Awesome Streamlit](http://awesome-streamlit.org/)

Also, thinking of modeling text of Verified Users, getting more data, and potentially starting development of an app that analyzes RT data.

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [2]:
# import libraries
import pandas as pd
pd.options.display.max_columns = None
import numpy as np
import random
import os

# Matplotlib
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')

## _Load in Data_

In [3]:
# load in data
verified = pd.read_json("json-data/verified_train.json", orient="split", dtype={"id_str": str})
ira = pd.read_json("json-data/ira_train.json", orient="split", dtype={"id_str": str}).sample(n=len(verified), random_state=5)

# combine dfs above 
df = pd.concat([verified, ira])

In [4]:
# one hot encode label column
df = pd.get_dummies(df, columns=["label"])

In [5]:
# grab subset of data to experiment on
sample = (df[["id_str", "screen_name", "created_at", "full_text", "label_real"]]
          .sample(frac=0.50, random_state=5))

In [6]:
# reset index
sample.reset_index(drop=True, inplace=True)

## _Beyond n-grams: word embeddings_

- mapping words into an n-dimensional vector space
- produced using deep learning and huge amounts of data
- discern how similar two words are to each other
- used to detect synonyms and antonyms
- captures complex relationships
- dependent on spacy model

In [8]:
#!python -m spacy download en_core_web_md

In [7]:
import spacy

# load model and create Doc object
nlp = spacy.load("en_core_web_md")
doc = nlp("I am happy.")

# generate word vectors for each token
#for token in doc:
#    print(token.vector)

In [9]:
# word similarities
doc = nlp("happy joyous sad")

for token1 in doc:
    for token2 in doc:
        print(token1.text, token2.text, token1.similarity(token2))

happy happy 1.0
happy joyous 0.60518235
happy sad 0.64389884
joyous happy 0.60518235
joyous joyous 1.0
joyous sad 0.4639511
sad happy 0.64389884
sad joyous 0.4639511
sad sad 1.0


In [10]:
# document similarities
sent1 = nlp("I am happy")
sent2 = nlp("I am sad")
sent3 = nlp("I am joyous")

In [11]:
# compute similarity between sent1 and sent2
sent1.similarity(sent2)

0.9492464724721577

In [12]:
# compute similarity between sent1 and sent3
sent1.similarity(sent3)

0.9383192352389133