## [Scattertext](https://github.com/JasonKessler/scattertext)
A tool for finding distinguishing terms in small-to-medium-sized corpora, and presenting them in a sexy, interactive scatter plot with non-overlapping term labels. Exploratory data analysis just got more fun.
![](https://github.com/apjanco/dashboard/raw/master/poetry.png)
![](https://github.com/apjanco/dashboard/raw/master/example.gif)


For this task, we're going to want some text and categories.

In [6]:

#import nltk
#nltk.download() or nltk.download('brown')

from nltk.corpus import brown
brown.categories()


['adventure',
 'belles_lettres',
 'editorial',
 'fiction',
 'government',
 'hobbies',
 'humor',
 'learned',
 'lore',
 'mystery',
 'news',
 'religion',
 'reviews',
 'romance',
 'science_fiction']

In [7]:
from nltk.corpus import brown

scifi = ''
for i in range(len(brown.sents(categories=['science_fiction',]))):
    scifi += ' '.join(brown.sents(categories=['science_fiction',])[i])

len(scifi)

71206

In [8]:
religion = ''
for i in range(len(brown.sents(categories=['religion',]))):
    religion += ' '.join(brown.sents(categories=['religion',])[i])

len(religion)

206264

In [9]:
import numpy as np
import pandas as pd
data = {'category':['scifi', 'religion',], 'text':[scifi, religion,]} 

df = pd.DataFrame(data=data)

In [10]:
df.head()

Unnamed: 0,category,text
0,scifi,Now that he knew himself to be self he was fre...
1,religion,"As a result , although we still make use of th..."


In [12]:
from IPython.display import IFrame
import scattertext as st
import spacy
import pandas as pd
import numpy as np

nlp = spacy.load('en_core_web_sm')
#nlp.add_pipe(nlp.create_pipe('sentencizer'))

# Create a scattertext object using the texts and journal categories.
corpus = st.CorpusFromPandas(
    df, category_col='category', text_col='text', nlp=nlp
).build()

# Generate the D3 visualization
html = st.produce_scattertext_explorer(
    corpus,
    category='scifi',
    category_name='Science Fiction',
    not_category_name='religion',
    width_in_pixels=1000,
    
)
file_name = 'scifi.html'
open(file_name, 'wb').write(html.encode('utf-8'))
IFrame(src=file_name, width = 1200, height=700)