# Detección de tópicos por tweet en threads

## Importacion de librerias y definicion de funciones

Se importan las librerias necesarias y se definen las funciones para tokenizar, lemmatizar y para preparar el texto para el LDA.

En la tokenizacion se eliminan los hashtags y los usuarios citados

In [45]:
import os
import pymongo

In [46]:
#conexion mongo
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["twitter-memoria"]

csv_all = mydb["csv_all"]

In [47]:
import spacy
spacy.load('en')

from spacy.lang.en import English
parser = English()

def tokenize(text):
    lda_tokens = []
    tokens = parser(text)
    for token in tokens:
        if token.orth_.isspace():
            continue
        elif token.like_url:
            continue
        elif token.orth_.startswith('#'):
            continue
        elif token.orth_.startswith('@'):
            continue
        else:
            lda_tokens.append(token.lower_)
    return lda_tokens

import nltk

nltk.download('wordnet')
from nltk.corpus import wordnet as wn
def get_lemma(word):
    lemma = wn.morphy(word)
    if lemma is None:
        return word
    else:
        return lemma
    
from nltk.stem.wordnet import WordNetLemmatizer
def get_lemma2(word):
    return WordNetLemmatizer().lemmatize(word)

nltk.download('stopwords')
en_stop = set(nltk.corpus.stopwords.words('english'))

def prepare_text_for_lda(text):
    tokens = tokenize(text)
    tokens = [token for token in tokens if len(token) > 3]
    tokens = [token for token in tokens if token not in en_stop]
    tokens = [get_lemma(token) for token in tokens]
    return tokens


[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\carlo\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\carlo\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Importacion de archivos

Se cargan los archivos csv y se agrupan los tweets por threads, para luego crear un diccionario de tweets por cada thread (thread 1 : tweet1, tweet2...)

In [48]:
import random
import pandas as pd


In [49]:

csv1 = pd.read_csv('five_ten.csv', encoding='iso-8859-1')
csv1_grouped_by_thread = csv1.groupby(['thread_number'])
threads1 = {}
documentos1 = []

csv2 = pd.read_csv('ten_fifteen.csv', encoding='iso-8859-1')
csv2_grouped_by_thread = csv2.groupby(['thread_number'])
threads2 = {}
documentos2 = []

csv3 = pd.read_csv('fifteen_twenty.csv', encoding='iso-8859-1')
csv3_grouped_by_thread = csv3.groupby(['thread_number'])
threads3 = {}
documentos3 = []

csv4 = pd.read_csv('twenty_twentyfive.csv', encoding='iso-8859-1')
csv4_grouped_by_thread = csv4.groupby(['thread_number'])
threads4 = {}
documentos4 = []

csv5 = pd.read_csv('twentyfive_thirty.csv', encoding='iso-8859-1')
csv5_grouped_by_thread = csv5.groupby(['thread_number'])
threads5 = {}
documentos5 = []

In [50]:
string = '\n'

hilos_ref = []
all_hilos = []
tweets = csv_all.find({})

threads1 = {}
threads2 = {}
threads3 = {}
threads4 = {}
threads5 = {}
kthreads1 = {}
kthreads2 = {}
kthreads3 = {}
kthreads4 = {}
kthreads5 = {}

for tweet in tweets:
    if tweet["hilo_ref"] not in hilos_ref:
        hilos_ref.append(tweet["hilo_ref"])

for hilo in hilos_ref:
    tweets_hilo = csv_all.find({"hilo_ref": hilo})
    lista_aux_hilos = []
    csv = ""
    for tweet_aux in tweets_hilo:
        lista_aux_hilos.append(tweet_aux["text"])
        csv = tweet_aux["csv"]
        hilo_csv = tweet_aux["hilo"]
    
    if csv == "csv1":
        threads1[hilo_csv] = lista_aux_hilos
        kthreads1[hilo_csv] = string.join(lista_aux_hilos)
    elif csv == "csv2":
        threads2[hilo_csv] = lista_aux_hilos
        kthreads2[hilo_csv] = string.join(lista_aux_hilos)
    elif csv == "csv3":
        threads3[hilo_csv] = lista_aux_hilos
        kthreads3[hilo_csv] = string.join(lista_aux_hilos)
    elif csv == "csv4":
        threads4[hilo_csv] = lista_aux_hilos
        kthreads4[hilo_csv] = string.join(lista_aux_hilos)
    elif csv == "csv5":
        threads5[hilo_csv] = lista_aux_hilos
        kthreads5[hilo_csv] = string.join(lista_aux_hilos)
        
Tthreads1 = list(kthreads1.values())
Tthreads2 = list(kthreads2.values())
Tthreads3 = list(kthreads3.values())
Tthreads4 = list(kthreads4.values())
Tthreads5 = list(kthreads5.values())


## Creación de diccionario de tweets por threads

Se agruparán los tweets de cada hilo en un diccionario para cada archivo.

In [51]:
threads1

{'Thread 1': ['Extraordinary evidence at Treasury committee from Jon Thompson, CEO of HMRC on customs and Brexit today https://t.co/DJhIQhmVwJ',
  "The Brexiter favourite Max Fac - would cost business between £17 and £20bn a year\n\n- that's almost 1% of GDP\n\n- just for filling in forms\n\nThanks #Brexit",
  '"We think we can manage the risk - we think we can" he said. He didn\'t sound so sure. \n\nAnd "the potential backdoor risk applies to both models" he added\n\nDidn\'t sound like officials think either is sensible',
  'Mr Thompson said he did not expect the EU to reciprocate over the customs partnership. \n\nWhat that means is UK collects tariffs for EU and hands it over when a ship lands in Felixtowe and drives to Calais, but if ship first lands in Rotterdam, EU keeps the import tariffs.',
  'Both would not be ready by 2021. Max Fac needs 3 years. Customs Partnership requires 5, Mr Thompson said.\n\nThe border would be "functioning", but if technology not ready ministers would 

## LDA para cada thread de cada CSV

Se definen la cantidad de topicos a detectar, en conjunto con la cantidad de palabras que se mostraran al imprimir los topicos detectados.

La detección de tópicos se realizará a cada thread de todos los archivos CSV, por lo que se considerará cada tweet del thread como un documento.

In [52]:
import gensim
from gensim import corpora
NUM_TOPICS = 5
NUM_WORDS = 5
import pickle

### CSV five_ten

In [53]:
THIS_FOLDER = os.getcwd()
threads_leer = threads1
carpeta_guardar = "tpcsv1"

#Poblar text_data

for hilos in threads_leer:
    camino = os.path.join(THIS_FOLDER, carpeta_guardar)
    text_data = []
    documentos = []
    dictionary = []
    corpus = []
    print(hilos)
    documentos = threads_leer[hilos]

    #print(documentos)

    for line in documentos:
        #print(line)
        tokens = prepare_text_for_lda(line)
        if random.random() > .009:
            #print(tokens)
            text_data.append(tokens)

    #print(text_data) 
    NDIC = camino+"\\"+hilos+"_t_dictionary1.gensim"
    NMOD = camino+"\\"+hilos+"_t_model1.gensim"
    NCOR = camino+"\\"+hilos+"_t_corpus1.pkl"
    dictionary = corpora.Dictionary(text_data)
    corpus = [dictionary.doc2bow(text) for text in text_data]
    pickle.dump(corpus, open(NCOR, 'wb'))
    dictionary.save(NDIC)

    ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
    ldamodel.save(NMOD)
    topics = ldamodel.print_topics(num_words=NUM_WORDS)
    for topic in topics:
        print(topic)

Thread 10
(0, '0.021*"group" + 0.021*"give" + 0.021*"immig" + 0.021*"judges" + 0.021*"china"')
(1, '0.006*"free" + 0.006*"make" + 0.006*"longterm" + 0.006*"state" + 0.006*"create"')
(2, '0.039*"green" + 0.021*"provisional" + 0.021*"correct" + 0.021*"bring" + 0.021*"penalty"')
(3, '0.028*"expensive" + 0.028*"invest" + 0.028*"money" + 0.028*"resource" + 0.015*"stop"')
(4, '0.040*"create" + 0.032*"immigration" + 0.025*"free" + 0.017*"make" + 0.017*"fine"')
Thread 14
(0, '0.069*"thestorm" + 0.069*"internetbillofrights" + 0.069*"oigreport" + 0.069*"internetbillofrightsnow" + 0.069*"greatawakening"')
(1, '0.067*"kong" + 0.067*"moment" + 0.067*"waldo" + 0.067*"hong" + 0.067*"qanon"')
(2, '0.060*"well" + 0.060*"look" + 0.060*"hold" + 0.060*"future" + 0.060*"/end"')
(3, '0.069*"qanon" + 0.069*"internetbillofrightsnow" + 0.069*"greatawakening" + 0.069*"oigreport" + 0.069*"internetbillofrights"')
(4, '0.025*"greatawakening" + 0.025*"internetbillofrightsnow" + 0.025*"internetbillofrights" + 0.025*

(4, '0.043*"bootpruitt" + 0.043*"change" + 0.043*"could" + 0.043*"family" + 0.043*"mean"')
Thread 19
(0, '0.035*"october" + 0.035*"destroy" + 0.035*"ramius" + 0.035*"try" + 0.019*"going"')
(1, '0.044*"government" + 0.044*"people" + 0.044*"place" + 0.024*"state" + 0.024*"deepstate"')
(2, '0.032*"influence" + 0.032*"trump" + 0.032*"need" + 0.032*"help" + 0.032*"vote"')
(3, '0.033*"deepstate" + 0.033*"government" + 0.033*"design" + 0.033*"could" + 0.018*"deep"')
(4, '0.072*"pray" + 0.031*"american" + 0.017*"potus" + 0.017*"marines" + 0.017*"military"')
Thread 22
(0, '0.038*"would" + 0.026*"answer" + 0.026*"complicate" + 0.026*"mkultra" + 0.026*"segment"')
(1, '0.028*"mkultra" + 0.028*"feel" + 0.028*"unresolved" + 0.028*"segment" + 0.028*"expert"')
(2, '0.027*"look" + 0.027*"pump" + 0.027*"every" + 0.027*"back" + 0.027*"formula"')
(3, '0.027*"expert" + 0.027*"qanon" + 0.027*"extreme" + 0.027*"segment" + 0.027*"pretending"')
(4, '0.028*"world" + 0.028*"time" + 0.028*"expert" + 0.028*"unreso

(1, '0.030*"work" + 0.030*"advocate" + 0.016*"mainstream" + 0.016*"diversity" + 0.016*"hard"')
(2, '0.030*"talent" + 0.030*"year" + 0.030*"felt" + 0.030*"meaningful" + 0.030*"times"')
(3, '0.025*"oscar" + 0.025*"like" + 0.025*"would" + 0.025*"inclusion" + 0.025*"fight"')
(4, '0.044*"minority" + 0.024*"struggle" + 0.024*"latino" + 0.024*"work" + 0.024*"even"')
Thread 43
(0, '0.050*"need" + 0.050*"patriot" + 0.050*"kind" + 0.050*"guidance" + 0.050*"guide"')
(1, '0.046*"tweet" + 0.046*"election" + 0.046*"year" + 0.046*"shut" + 0.046*"alter"')
(2, '0.015*"undermine" + 0.015*"info" + 0.015*"try" + 0.015*"point" + 0.015*"president"')
(3, '0.015*"color" + 0.015*"antifa" + 0.015*"need" + 0.015*"eyes" + 0.015*"matter"')
(4, '0.037*"like" + 0.020*"child" + 0.020*"light" + 0.020*"violent" + 0.020*"seizure"')
Thread 41
(0, '0.067*"trump" + 0.046*"russia" + 0.046*"million" + 0.046*"might" + 0.046*"spend"')
(1, '0.013*"partially" + 0.013*"russia" + 0.013*"money" + 0.013*"group" + 0.013*"spokesperson

(0, '0.058*"pathway" + 0.031*"alternate" + 0.031*"tech" + 0.031*"could" + 0.031*"effects"')
(1, '0.037*"country" + 0.037*"article" + 0.037*"woman" + 0.020*"problematic" + 0.020*"metaphor"')
(2, '0.046*"west" + 0.046*"know" + 0.046*"female" + 0.046*"dominate" + 0.046*"first"')
(3, '0.077*"woman" + 0.063*"tech" + 0.048*"need" + 0.033*"idea" + 0.018*"also"')
(4, '0.038*"tech" + 0.038*"article" + 0.038*"job" + 0.038*"woman" + 0.021*"pipeline"')
Thread 72
(0, '0.091*"atenció" + 0.091*"republicana" + 0.091*"teixint" + 0.091*"xarxa" + 0.091*"gran"')
(1, '0.091*"twitter" + 0.091*"gran" + 0.091*"gràcies" + 0.091*"estem" + 0.091*"republicana"')
(2, '0.091*"xarxa" + 0.091*"atenció" + 0.091*"teixint" + 0.091*"censura" + 0.091*"gran"')
(3, '0.091*"seguiu" + 0.091*"xarxa" + 0.091*"història" + 0.091*"atenció" + 0.091*"censura"')
(4, '0.091*"republicana" + 0.091*"història" + 0.091*"xarxa" + 0.091*"atenció" + 0.091*"gran"')
Thread 67
(0, '0.033*"deepstate" + 0.018*"involve" + 0.018*"need" + 0.018*"coup

(0, '0.044*"greatawakening" + 0.044*"qanon" + 0.044*"thestorm" + 0.044*"uneasy" + 0.044*"around"')
(1, '0.156*"know" + 0.037*"thestorm" + 0.037*"greatawakening" + 0.037*"qanon" + 0.037*"friend"')
(2, '0.034*"patriot" + 0.034*"want" + 0.034*"prosecution" + 0.034*"internetbillofrights" + 0.034*"attitude"')
(3, '0.012*"qanon" + 0.012*"know" + 0.012*"greatawakening" + 0.012*"thestorm" + 0.012*"think"')
(4, '0.061*"think" + 0.034*"check" + 0.034*"intel" + 0.034*"level" + 0.034*"believe"')
Thread 94
(0, '0.008*"microtargeting" + 0.008*"campaigning" + 0.008*"principle" + 0.008*"even" + 0.008*"important"')
(1, '0.026*"campaign" + 0.026*"cambridgeanalytica" + 0.026*"last" + 0.026*"night" + 0.026*"information"')
(2, '0.040*"population" + 0.028*"information" + 0.028*"cambridgeanalytica" + 0.028*"create" + 0.028*"invest"')
(3, '0.037*"military" + 0.020*"fake" + 0.020*"campaign" + 0.020*"misinformation" + 0.020*"false"')
(4, '0.037*"microtargeting" + 0.037*"even" + 0.025*"dark" + 0.025*"cambridgean

### CSV ten_fifteen

In [54]:
THIS_FOLDER = os.getcwd()
threads_leer = threads2
carpeta_guardar = "tpcsv2"

#Poblar text_data

for hilos in threads_leer:
    camino = os.path.join(THIS_FOLDER, carpeta_guardar)
    text_data = []
    documentos = []
    dictionary = []
    corpus = []
    print(hilos)
    documentos = threads_leer[hilos]

    #print(documentos)

    for line in documentos:
        #print(line)
        tokens = prepare_text_for_lda(line)
        if random.random() > .009:
            #print(tokens)
            text_data.append(tokens)

    #print(text_data) 
    NDIC = camino+"\\"+hilos+"_t_dictionary1.gensim"
    NMOD = camino+"\\"+hilos+"_t_model1.gensim"
    NCOR = camino+"\\"+hilos+"_t_corpus1.pkl"
    dictionary = corpora.Dictionary(text_data)
    corpus = [dictionary.doc2bow(text) for text in text_data]
    pickle.dump(corpus, open(NCOR, 'wb'))
    dictionary.save(NDIC)

    ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
    ldamodel.save(NMOD)
    topics = ldamodel.print_topics(num_words=NUM_WORDS)
    for topic in topics:
        print(topic)

Thread 3
(0, '0.009*"saturdaynightthoughts" + 0.009*"wall" + 0.009*"particle" + 0.009*"human" + 0.009*"quantum"')
(1, '0.045*"particle" + 0.045*"barrier" + 0.045*"energy" + 0.045*"constant" + 0.025*"quantum"')
(2, '0.049*"electron" + 0.049*"probability" + 0.037*"like" + 0.025*"ball" + 0.025*"back"')
(3, '0.033*"exponential" + 0.033*"note" + 0.033*"mass" + 0.033*"look" + 0.033*"wall"')
(4, '0.066*"tunnel" + 0.045*"probability" + 0.034*"human" + 0.034*"barrier" + 0.034*"wall"')
Thread 2
(0, '0.021*"say" + 0.021*"really" + 0.021*"root" + 0.021*"dangerous" + 0.021*"bannon"')
(1, '0.037*"racism" + 0.030*"folks" + 0.016*"need" + 0.016*"racial" + 0.016*"moral"')
(2, '0.006*"away" + 0.006*"dangerous" + 0.006*"protection" + 0.006*"say" + 0.006*"lose"')
(3, '0.031*"stuff" + 0.031*"racial" + 0.017*"people" + 0.017*"mean" + 0.017*"happening"')
(4, '0.032*"racist" + 0.032*"solution" + 0.018*"lose" + 0.018*"4/13" + 0.018*"nation"')
Thread 6
(0, '0.075*"pay" + 0.057*"source" + 0.039*"verify" + 0.039*

(3, '0.063*"trump" + 0.033*"mueller" + 0.033*"pageant" + 0.018*"well" + 0.018*"muellertime"')
(4, '0.047*"putin" + 0.047*"trump" + 0.026*"say" + 0.026*"agalarov" + 0.026*"would"')
Thread 17
(0, '0.009*"unroll" + 0.009*"greatawakening" + 0.009*"please" + 0.009*"thestorm" + 0.009*"releasethememo"')
(1, '0.073*"qanon" + 0.073*"releasethememo" + 0.073*"memoday" + 0.073*"greatawakening" + 0.073*"thestorm"')
(2, '0.045*"know" + 0.024*"mueller" + 0.024*"please" + 0.024*"unroll" + 0.024*"greatawakening"')
(3, '0.030*"many" + 0.030*"mueller" + 0.030*"memoday" + 0.030*"releasethememo" + 0.030*"thestorm"')
(4, '0.053*"thestorm" + 0.040*"information" + 0.040*"greatawakening" + 0.040*"qanon" + 0.040*"memoday"')
Thread 23
(0, '0.021*"nader" + 0.021*"grand" + 0.021*"shortly" + 0.021*"electronics‼️" + 0.021*"warrant"')
(1, '0.046*"meeting" + 0.035*"seychelles" + 0.024*"btwn" + 0.024*"nader" + 0.024*"backchannel"')
(2, '0.033*"nader" + 0.033*"kush" + 0.022*"seychelles" + 0.022*"george" + 0.022*"russian

(4, '0.041*"globalist" + 0.021*"corrupt" + 0.021*"look" + 0.021*"rocket" + 0.021*"already"')
Thread 33
(0, '0.016*"democracy" + 0.016*"high" + 0.016*"fail" + 0.016*"priority" + 0.016*"view"')
(1, '0.024*"news" + 0.024*"ban" + 0.024*"election" + 0.017*"normal" + 0.017*"terms"')
(2, '0.020*"woman" + 0.011*"2016" + 0.011*"investigation" + 0.011*"behavior" + 0.011*"rope"')
(3, '0.013*"country" + 0.013*"look" + 0.013*"ethical" + 0.013*"know" + 0.013*"hold"')
(4, '0.030*"america" + 0.016*"tear" + 0.016*"good" + 0.016*"democracy" + 0.016*"trump"')
Thread 38
(0, '0.065*"flynn" + 0.065*"pay" + 0.044*"security" + 0.024*"also" + 0.024*"expense"')
(1, '0.068*"partner" + 0.067*"flynn" + 0.037*"violate" + 0.037*"letter" + 0.037*"sent"')
(2, '0.062*"flynn" + 0.034*"deal" + 0.034*"say" + 0.034*"trump" + 0.034*"disclose"')
(3, '0.063*"flynn" + 0.043*"military" + 0.043*"constitution" + 0.024*"republican" + 0.024*"strip"')
(4, '0.045*"pension" + 0.045*"clearance" + 0.045*"security" + 0.045*"trumprussia" 

(0, '0.162*"️repub" + 0.150*"party" + 0.094*"cmte" + 0.063*"repub" + 0.032*"rebekahmercer"')
(1, '0.032*"campaign" + 0.032*"mercer" + 0.032*"pac" + 0.018*"center" + 0.018*"free"')
(2, '0.027*"individual" + 0.027*"party" + 0.019*"donation" + 0.019*"️club" + 0.019*"️make"')
(3, '0.039*"american" + 0.021*"state" + 0.021*"pac" + 0.021*"tie" + 0.021*"conservative"')
(4, '0.036*"cycle" + 0.036*"election" + 0.036*"mercer" + 0.020*"receive" + 0.020*"list"')
Thread 62
(0, '0.041*"candidate" + 0.022*"cambridgeanalytica" + 0.022*"election" + 0.022*"work" + 0.022*"state"')
(1, '0.043*"employee" + 0.030*"election" + 0.016*"would" + 0.016*"provide" + 0.016*"official"')
(2, '0.047*"campaign" + 0.035*"legal" + 0.024*"cambridge" + 0.024*"make" + 0.024*"illegal"')
(3, '0.007*"cambridgeanalytica" + 0.007*"election" + 0.007*"campaign" + 0.007*"violate" + 0.007*"laws"')
(4, '0.025*"campaign" + 0.025*"election" + 0.025*"state" + 0.025*"memo" + 0.025*"say"')
Thread 60
(0, '0.041*"hrclegitimate45" + 0.025*"qu

(1, '0.033*"people" + 0.023*"challenge" + 0.023*"first" + 0.013*"resistinghate" + 0.013*"association"')
(2, '0.020*"regressiveleft" + 0.020*"pressure" + 0.020*"victim" + 0.020*"medium" + 0.020*"people"')
(3, '0.027*"swamp" + 0.020*"destroy" + 0.020*"company" + 0.020*"culture" + 0.020*"technology"')
(4, '0.036*"thread" + 0.019*"draintheswampuk" + 0.019*"need" + 0.019*"intelligence" + 0.019*"medium"')
Thread 82
(0, '0.022*"qanon" + 0.022*"memo" + 0.022*"greatawakening" + 0.022*"democrat" + 0.022*"like"')
(1, '0.020*"move" + 0.014*"next" + 0.014*"already" + 0.014*"direction" + 0.014*"rest"')
(2, '0.020*"memo" + 0.020*"would" + 0.020*"greatawakening" + 0.020*"elaborate" + 0.020*"qanon"')
(3, '0.022*"plan" + 0.022*"document" + 0.022*"deepstate" + 0.022*"step" + 0.022*"vision"')
(4, '0.029*"exactly" + 0.016*"know" + 0.016*"take" + 0.016*"charade" + 0.016*"power"')
Thread 83
(0, '0.037*"many" + 0.020*"island" + 0.020*"another" + 0.020*"camp" + 0.020*"refuse"')
(1, '0.036*"enemy" + 0.024*"neve

### CSV fifteen_twenty

In [55]:
THIS_FOLDER = os.getcwd()
threads_leer = threads3
carpeta_guardar = "tpcsv3"

#Poblar text_data

for hilos in threads_leer:
    camino = os.path.join(THIS_FOLDER, carpeta_guardar)
    text_data = []
    documentos = []
    dictionary = []
    corpus = []
    print(hilos)
    documentos = threads_leer[hilos]

    #print(documentos)

    for line in documentos:
        #print(line)
        tokens = prepare_text_for_lda(line)
        if random.random() > .009:
            #print(tokens)
            text_data.append(tokens)

    #print(text_data) 
    NDIC = camino+"\\"+hilos+"_t_dictionary1.gensim"
    NMOD = camino+"\\"+hilos+"_t_model1.gensim"
    NCOR = camino+"\\"+hilos+"_t_corpus1.pkl"
    dictionary = corpora.Dictionary(text_data)
    corpus = [dictionary.doc2bow(text) for text in text_data]
    pickle.dump(corpus, open(NCOR, 'wb'))
    dictionary.save(NDIC)

    ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
    ldamodel.save(NMOD)
    topics = ldamodel.print_topics(num_words=NUM_WORDS)
    for topic in topics:
        print(topic)

Thread 6
(0, '0.046*"2017" + 0.046*"bonus" + 0.046*"system" + 0.046*"decision" + 0.046*"honesty"')
(1, '0.063*"crypto" + 0.063*"podcast" + 0.063*"episode" + 0.063*"best" + 0.034*"blockchain"')
(2, '0.145*"host" + 0.054*"crypto" + 0.054*"matter" + 0.054*"squeezing" + 0.054*"regulation"')
(3, '0.069*"host" + 0.048*"mention" + 0.048*"bitcoin" + 0.026*"cryptoharuspex" + 0.026*"casper"')
(4, '0.135*"host" + 0.071*"bitcoin" + 0.038*"debate" + 0.038*"cash" + 0.038*"cryptocurrency"')
Thread 4
(0, '0.038*"thestorm" + 0.038*"internetbillofrights" + 0.038*"greatawakening" + 0.038*"qanon" + 0.037*"relate"')
(1, '0.059*"seal" + 0.022*"qanon" + 0.022*"greatawakening" + 0.022*"thestorm" + 0.022*"internetbillofrights"')
(2, '0.033*"gitmo" + 0.033*"internetbillofrights" + 0.033*"greatawakening" + 0.033*"thestorm" + 0.033*"enjoytheshow"')
(3, '0.044*"qanon" + 0.044*"thestorm" + 0.044*"greatawakening" + 0.035*"enjoytheshow" + 0.035*"internetbillofrights"')
(4, '0.048*"enjoytheshow" + 0.047*"internetbillo

(1, '0.042*"mind" + 0.042*"control" + 0.042*"outside" + 0.035*"qanon" + 0.029*"conflict"')
(2, '0.046*"qanon" + 0.034*"truth" + 0.026*"ask" + 0.018*"google" + 0.018*"bridge"')
(3, '0.042*"qanon" + 0.029*"access" + 0.016*"tonight" + 0.016*"suggest" + 0.016*"remind"')
(4, '0.048*"qanon" + 0.029*"podesta" + 0.029*"sessions" + 0.020*"deepstate" + 0.020*"tony"')
Thread 18
(0, '0.035*"qanon" + 0.035*"bible" + 0.035*"fakenewsawards" + 0.035*"student" + 0.019*"say"')
(1, '0.030*"shithole" + 0.030*"bible" + 0.030*"qanon" + 0.030*"show" + 0.030*"student"')
(2, '0.033*"student" + 0.033*"fakenewsawards" + 0.033*"bible" + 0.033*"qanon" + 0.012*"prove"')
(3, '0.032*"student" + 0.032*"bible" + 0.032*"qanon" + 0.032*"fakenewsawards" + 0.022*"say"')
(4, '0.031*"fakenewsawards" + 0.031*"bible" + 0.031*"qanon" + 0.031*"student" + 0.016*"medium"')
Thread 23
(0, '0.037*"trump" + 0.029*"group" + 0.022*"people" + 0.022*"immigrant" + 0.015*"say"')
(1, '0.032*"racism" + 0.032*"immigrant" + 0.032*"affect" + 0.0

(3, '0.004*"come" + 0.004*"wethepeople" + 0.004*"grant" + 0.004*"might" + 0.004*"dream"')
(4, '0.017*"government" + 0.017*"know" + 0.017*"people" + 0.017*"worth" + 0.017*"self"')
Thread 35
(0, '0.075*"vote" + 0.066*"leave" + 0.038*"brexit" + 0.031*"campaign" + 0.026*"total"')
(1, '0.033*"cambridge" + 0.033*"analytica" + 0.033*"brexit" + 0.017*"russia" + 0.017*"canadian"')
(2, '0.026*"leave" + 0.022*"search" + 0.022*"give" + 0.012*"🏼paid" + 0.012*"unionist"')
(3, '0.043*"group" + 0.023*"use" + 0.023*"pro-#brexit" + 0.023*"own" + 0.023*"allow"')
(4, '0.053*"campaign" + 0.033*"group" + 0.017*"could" + 0.016*"limit" + 0.015*"work"')
Thread 40
(0, '0.042*"company" + 0.022*"work" + 0.022*"show" + 0.022*"outside" + 0.015*"things"')
(1, '0.034*"learn" + 0.018*"time" + 0.018*"need" + 0.018*"something" + 0.018*"microsoft"')
(2, '0.028*"outside" + 0.027*"google" + 0.027*"redis" + 0.015*"show" + 0.015*"company"')
(3, '0.039*"startup" + 0.020*"make" + 0.014*"want" + 0.014*"take" + 0.014*"home"')
(4

(1, '0.034*"former" + 0.034*"gina" + 0.034*"mccarthy" + 0.034*"write" + 0.019*"quite"')
(2, '0.025*"scottpruitt" + 0.025*"mccarthy" + 0.025*"gina" + 0.014*"mondaymotivaton" + 0.014*"job"')
(3, '0.031*"change" + 0.031*"agency" + 0.017*"former" + 0.017*"totally" + 0.017*"record"')
(4, '0.026*"gina" + 0.026*"mccarthy" + 0.026*"right" + 0.026*"business" + 0.014*"much"')
Thread 62
(0, '0.030*"qanon" + 0.023*"post" + 0.023*"clown" + 0.023*"use" + 0.016*"last"')
(1, '0.026*"qanon" + 0.026*"salzman" + 0.026*"attorney" + 0.026*"clinton" + 0.014*"reveal"')
(2, '0.022*"post" + 0.022*"think" + 0.022*"timing" + 0.022*"release" + 0.022*"outside"')
(3, '0.021*"qanon" + 0.012*"tell" + 0.012*"cage" + 0.012*"10,000/wk" + 0.012*"bilk"')
(4, '0.029*"qanon" + 0.020*"today" + 0.020*"nancy" + 0.020*"attack" + 0.020*"post"')
Thread 60
(0, '0.018*"المشاركة" + 0.010*"للسلطة" + 0.010*"التصويت" + 0.010*"فيها" + 0.010*"سنوات"')
(1, '0.012*"ولكن" + 0.012*"الرئيس" + 0.012*"المصريين" + 0.012*"للانتخابات" + 0.012*"علي

(0, '0.039*"learning" + 0.015*"change" + 0.015*"root" + 0.015*"require" + 0.015*"acting"')
(1, '0.018*"people" + 0.018*"professor" + 0.018*"access" + 0.018*"group" + 0.018*"learning"')
(2, '0.023*"important" + 0.023*"patience" + 0.023*"ignorance" + 0.023*"credit" + 0.023*"patient"')
(3, '0.051*"understanding" + 0.035*"newfound" + 0.035*"helpful" + 0.019*"things" + 0.019*"people"')
(4, '0.032*"someone" + 0.022*"process" + 0.022*"would" + 0.022*"take" + 0.012*"hear"')
Thread 84
(0, '0.025*"people" + 0.025*"stop" + 0.013*"oathing" + 0.013*"design" + 0.013*"even"')
(1, '0.023*"oath" + 0.023*"peasant" + 0.016*"colonial" + 0.016*"take" + 0.016*"thread"')
(2, '0.021*"even" + 0.021*"oathing" + 0.011*"land" + 0.011*"one" + 0.011*"downtrodden"')
(3, '0.015*"people" + 0.015*"subjugate" + 0.015*"colonial" + 0.015*"support" + 0.015*"jomo"')
(4, '0.032*"laws" + 0.022*"oath" + 0.022*"jomo" + 0.022*"ever" + 0.012*"tool"')
Thread 83
(0, '0.026*"answer" + 0.014*"keep" + 0.014*"busy" + 0.014*"give" + 0.0

### CSV twenty_twentyfive

In [56]:
THIS_FOLDER = os.getcwd()
threads_leer = threads4
carpeta_guardar = "tpcsv4"

#Poblar text_data

for hilos in threads_leer:
    camino = os.path.join(THIS_FOLDER, carpeta_guardar)
    text_data = []
    documentos = []
    dictionary = []
    corpus = []
    print(hilos)
    documentos = threads_leer[hilos]

    #print(documentos)

    for line in documentos:
        #print(line)
        tokens = prepare_text_for_lda(line)
        if random.random() > .009:
            #print(tokens)
            text_data.append(tokens)

    #print(text_data) 
    NDIC = camino+"\\"+hilos+"_t_dictionary1.gensim"
    NMOD = camino+"\\"+hilos+"_t_model1.gensim"
    NCOR = camino+"\\"+hilos+"_t_corpus1.pkl"
    dictionary = corpora.Dictionary(text_data)
    corpus = [dictionary.doc2bow(text) for text in text_data]
    pickle.dump(corpus, open(NCOR, 'wb'))
    dictionary.save(NDIC)

    ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
    ldamodel.save(NMOD)
    topics = ldamodel.print_topics(num_words=NUM_WORDS)
    for topic in topics:
        print(topic)

Thread 2
(0, '0.024*"power" + 0.024*"meaningful" + 0.016*"member" + 0.016*"state" + 0.016*"integration"')
(1, '0.017*"european" + 0.017*"parliament" + 0.017*"rule" + 0.017*"ngo" + 0.017*"member"')
(2, '0.024*"trade" + 0.016*"create" + 0.016*"vote" + 0.009*"lobby" + 0.009*"foreigner"')
(3, '0.023*"policy" + 0.023*"elite" + 0.023*"narcissistic" + 0.023*"unintended" + 0.023*"groupthink"')
(4, '0.039*"vote" + 0.027*"referendum" + 0.015*"public" + 0.015*"leave" + 0.015*"idea"')
Thread 3
(0, '0.025*"leave" + 0.025*"never" + 0.010*"know" + 0.010*"every" + 0.010*"course"')
(1, '0.017*"market" + 0.017*"brexit" + 0.017*"debate" + 0.009*"particpation" + 0.009*"tory"')
(2, '0.029*"trade" + 0.025*"would" + 0.020*"brexit" + 0.015*"leaver" + 0.015*"tory"')
(3, '0.018*"right" + 0.010*"london" + 0.010*"ignorance" + 0.010*"month" + 0.010*"bubble"')
(4, '0.021*"job" + 0.021*"brexit" + 0.021*"trade" + 0.021*"remain" + 0.021*"depend"')
Thread 5
(0, '0.016*"trump" + 0.016*"stand" + 0.016*"everyone" + 0.016*

(1, '0.064*"2018" + 0.048*"candidate" + 0.040*"democratic" + 0.033*"voting" + 0.033*"georgia"')
(2, '0.042*"democratic" + 0.042*"2018" + 0.036*"candidate" + 0.030*"vote" + 0.024*"register"')
(3, '0.042*"democratic" + 0.023*"incumbent" + 0.023*"ga02" + 0.023*"clip" + 0.023*"november"')
(4, '0.033*"ga03" + 0.033*"ga13" + 0.031*"democratic" + 0.031*"candidate" + 0.028*"april"')
Thread 19
(0, '0.023*"onoda" + 0.022*"même" + 0.022*"quelques" + 0.017*"cette" + 0.015*"major"')
(1, '0.018*"dans" + 0.018*"c’est" + 0.018*"après" + 0.013*"carrément" + 0.013*"faux"')
(2, '0.019*"dans" + 0.019*"fait" + 0.013*"1974" + 0.013*"c’est" + 0.013*"qu’il"')
(3, '0.022*"onoda" + 0.022*"vous" + 0.015*"cette" + 0.015*"suzuki" + 0.015*"d’une"')
(4, '0.018*"pour" + 0.014*"c’est" + 0.014*"donc" + 0.010*"arrière" + 0.010*"retour"')
Thread 25
(0, '0.031*"say" + 0.022*"know" + 0.022*"mouth" + 0.022*"face" + 0.012*"like"')
(1, '0.046*"school" + 0.031*"shoot" + 0.024*"bully" + 0.024*"people" + 0.016*"football"')
(2, '

(4, '0.021*"vote" + 0.016*"know" + 0.016*"sure" + 0.016*"first" + 0.016*"move"')
Thread 43
(0, '0.029*"food" + 0.029*"dprk" + 0.022*"security" + 0.015*"lack" + 0.015*"among"')
(1, '0.032*"dprk" + 0.022*"security" + 0.022*"food" + 0.022*"decision" + 0.012*"government"')
(2, '0.052*"food" + 0.030*"dprk" + 0.030*"security" + 0.019*"citizen" + 0.018*"would"')
(3, '0.023*"furthermore" + 0.023*"famine" + 0.023*"world" + 0.016*"country" + 0.016*"1990s"')
(4, '0.042*"dprk" + 0.032*"military" + 0.016*"food" + 0.011*"threat" + 0.011*"point"')
Thread 45
(0, '0.033*"qanon" + 0.033*"master" + 0.023*"military" + 0.023*"chair" + 0.023*"border"')
(1, '0.049*"qanon" + 0.018*"aghdam" + 0.018*"youtube" + 0.014*"release" + 0.014*"trace"')
(2, '0.044*"pope" + 0.023*"drop" + 0.023*"qanon" + 0.023*"tell" + 0.023*"kill"')
(3, '0.033*"qanon" + 0.020*"epstein" + 0.014*"refer" + 0.014*"learn" + 0.014*"manafort"')
(4, '0.022*"unroll" + 0.022*"please" + 0.004*"release" + 0.004*"manafort" + 0.004*"qanon"')
Thread 4

(3, '0.043*"corn" + 0.031*"mother" + 0.031*"david" + 0.025*"jones" + 0.019*"dossier"')
(4, '0.024*"buzzfeed" + 0.017*"gubarev" + 0.017*"source" + 0.017*"dossier" + 0.017*"trump"')
Thread 65
(0, '0.024*"fusion" + 0.016*"dossier" + 0.016*"go" + 0.016*"million" + 0.016*"people"')
(1, '0.035*"fusion" + 0.028*"hire" + 0.022*"trump" + 0.015*"lawyer" + 0.015*"information"')
(2, '0.022*"fusion" + 0.022*"register" + 0.022*"radio" + 0.012*"politics" + 0.012*"nellie"')
(3, '0.026*"fusion" + 0.020*"dossier" + 0.020*"know" + 0.014*"mother" + 0.014*"jones"')
(4, '0.029*"fusion" + 0.015*"president" + 0.015*"smear" + 0.015*"bruce" + 0.015*"whose"')
Thread 66
(0, '0.061*"investigation" + 0.061*"mueller" + 0.046*"answer" + 0.046*"many" + 0.032*"stem"')
(1, '0.070*"thanks" + 0.070*"playing" + 0.012*"source" + 0.012*"answer" + 0.012*"mueller"')
(2, '0.103*"investigation" + 0.074*"mueller" + 0.046*"long" + 0.046*"days" + 0.031*"modern"')
(3, '0.073*"mueller" + 0.073*"robert" + 0.073*"investigation" + 0.073

(1, '0.052*"wray" + 0.029*"mueller" + 0.029*"say" + 0.029*"recuse" + 0.029*"sessions"')
(2, '0.103*"qanon" + 0.088*"justice" + 0.088*"anon" + 0.081*"fulldisclosure" + 0.074*"comment"')
(3, '0.052*"please" + 0.052*"unroll" + 0.052*"05/21/18" + 0.009*"justice" + 0.009*"qanon"')
(4, '0.051*"justice" + 0.051*"qanon" + 0.051*"rachel" + 0.051*"brand" + 0.019*"clock"')
Thread 93
(0, '0.027*"think" + 0.027*"putin" + 0.027*"deal" + 0.027*"focus" + 0.015*"nuclear"')
(1, '0.057*"uranium" + 0.033*"would" + 0.025*"rosatom" + 0.025*"license" + 0.017*"sure"')
(2, '0.033*"uranium" + 0.033*"need" + 0.033*"plenty" + 0.018*"assume" + 0.018*"strategic"')
(3, '0.045*"uranium" + 0.036*"kazakhstan" + 0.028*"...." + 0.019*"sure" + 0.019*"right"')
(4, '0.025*"rosatom" + 0.025*"mine" + 0.025*"deal" + 0.025*"uranium" + 0.017*"little"')
Thread 100
(0, '0.030*"become" + 0.030*"1915" + 0.030*"song" + 0.017*"country" + 0.017*"slavery"')
(1, '0.028*"anthem" + 0.028*"wilson" + 0.028*"order" + 0.028*"pretty" + 0.028*"m

### CSV twentyfive_thirty

In [57]:
THIS_FOLDER = os.getcwd()
threads_leer = threads5
carpeta_guardar = "tpcsv5"

#Poblar text_data

for hilos in threads_leer:
    camino = os.path.join(THIS_FOLDER, carpeta_guardar)
    text_data = []
    documentos = []
    dictionary = []
    corpus = []
    print(hilos)
    documentos = threads_leer[hilos]

    #print(documentos)

    for line in documentos:
        #print(line)
        tokens = prepare_text_for_lda(line)
        if random.random() > .009:
            #print(tokens)
            text_data.append(tokens)

    #print(text_data) 
    NDIC = camino+"\\"+hilos+"_t_dictionary1.gensim"
    NMOD = camino+"\\"+hilos+"_t_model1.gensim"
    NCOR = camino+"\\"+hilos+"_t_corpus1.pkl"
    dictionary = corpora.Dictionary(text_data)
    corpus = [dictionary.doc2bow(text) for text in text_data]
    pickle.dump(corpus, open(NCOR, 'wb'))
    dictionary.save(NDIC)

    ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
    ldamodel.save(NMOD)
    topics = ldamodel.print_topics(num_words=NUM_WORDS)
    for topic in topics:
        print(topic)

Thread 2
(0, '0.040*"data" + 0.024*"eunoia" + 0.024*"time" + 0.024*"accord" + 0.012*"delete"')
(1, '0.019*"say" + 0.019*"eunoia" + 0.019*"block" + 0.019*"wylie" + 0.010*"million"')
(2, '0.041*"data" + 0.036*"eunoia" + 0.021*"wylie" + 0.016*"company" + 0.016*"facebook"')
(3, '0.038*"wylie" + 0.022*"trump" + 0.022*"palantir" + 0.014*"lewandowski" + 0.014*"eunoia"')
(4, '0.030*"wylie" + 0.026*"eunoia" + 0.018*"source" + 0.013*"2014" + 0.009*"user"')
Thread 5
(0, '0.062*"boomer" + 0.020*"think" + 0.015*"world" + 0.015*"parent" + 0.015*"tell"')
(1, '0.022*"make" + 0.015*"even" + 0.015*"die" + 0.015*"could" + 0.015*"like"')
(2, '0.019*"people" + 0.019*"make" + 0.011*"west" + 0.011*"remember" + 0.011*"pioneer"')
(3, '0.023*"boomer" + 0.017*"problem" + 0.017*"people" + 0.012*"want" + 0.012*"opportunity"')
(4, '0.027*"boomer" + 0.017*"going" + 0.017*"generation" + 0.011*"society" + 0.011*"might"')
Thread 6
(0, '0.037*"disneyprincess" + 0.029*"stormy" + 0.017*"take" + 0.013*"demagod" + 0.013*"pr

(4, '0.033*"russia" + 0.033*"trump" + 0.025*"steele" + 0.017*"kremlin" + 0.017*"cambridge"')
Thread 28
(0, '0.013*"want" + 0.013*"support" + 0.013*"white" + 0.013*"le" + 0.009*"inequality"')
(1, '0.028*"white" + 0.016*"black" + 0.012*"household" + 0.012*"world" + 0.012*"group"')
(2, '0.020*"white" + 0.020*"supremacy" + 0.014*"whiteness" + 0.014*"people" + 0.014*"true"')
(3, '0.019*"create" + 0.019*"job" + 0.013*"group" + 0.013*"woman" + 0.013*"away"')
(4, '0.020*"supremacy" + 0.020*"white" + 0.015*"male" + 0.015*"dominance" + 0.010*"traditional"')
Thread 29
(0, '0.035*"humint" + 0.023*"steele" + 0.023*"dossier" + 0.018*"things" + 0.018*"know"')
(1, '0.020*"weird" + 0.020*"verify" + 0.020*"around" + 0.011*"later" + 0.011*"give"')
(2, '0.019*"know" + 0.019*"use" + 0.019*"dude" + 0.019*"book" + 0.019*"dossier"')
(3, '0.041*"people" + 0.031*"intelligence" + 0.021*"information" + 0.021*"image" + 0.021*"humint"')
(4, '0.042*"steele" + 0.030*"dossier" + 0.024*"humint" + 0.019*"going" + 0.019*

(2, '0.031*"elite" + 0.025*"woman" + 0.013*"garden" + 0.013*"qanon" + 0.013*"program"')
(3, '0.021*"nxivm" + 0.021*"qanon" + 0.011*"cult" + 0.011*"bronfman" + 0.011*"elect"')
(4, '0.018*"albany" + 0.012*"slave" + 0.012*"master" + 0.012*"member" + 0.012*"remember"')
Thread 52
(0, '0.020*"make" + 0.020*"need" + 0.013*"still" + 0.013*"able" + 0.013*"case"')
(1, '0.033*"people" + 0.033*"working" + 0.023*"thing" + 0.023*"class" + 0.012*"relevant"')
(2, '0.020*"things" + 0.020*"brexit" + 0.020*"know" + 0.020*"knowing" + 0.020*"find"')
(3, '0.038*"people" + 0.023*"reach" + 0.016*"things" + 0.016*"need" + 0.016*"information"')
(4, '0.016*"horizon" + 0.016*"wrong" + 0.011*"good" + 0.011*"live" + 0.011*"shopping"')
Thread 56
(0, '0.057*"atheist" + 0.039*"belief" + 0.039*"lack" + 0.029*"empathy" + 0.020*"sjws"')
(1, '0.027*"man" + 0.027*"justice" + 0.027*"atheist" + 0.019*"bring" + 0.019*"comfort"')
(2, '0.026*"authority" + 0.026*"christian" + 0.026*"hope" + 0.026*"meaning" + 0.014*"lawful"')
(3,

(4, '0.033*"smith" + 0.018*"poster" + 0.018*"illustrate" + 0.018*"book" + 0.018*"james"')
Thread 72
(0, '0.056*"location" + 0.056*"county" + 0.056*"arizona" + 0.056*"check" + 0.056*"idaho"')
(1, '0.070*"find" + 0.068*"drop" + 0.068*"voting" + 0.037*"right" + 0.037*"colorado"')
(2, '0.049*"poll" + 0.049*"info" + 0.043*"location" + 0.043*"always" + 0.043*"ballot"')
(3, '0.227*"find" + 0.127*"vote" + 0.092*"info" + 0.092*"poll" + 0.048*"place"')
(4, '0.059*"find" + 0.053*"call" + 0.053*"8683" + 0.053*"site" + 0.053*"345-vote"')
Thread 70
(0, '0.017*"tweet" + 0.012*"trump" + 0.012*"treason" + 0.012*"potus" + 0.012*"corruption"')
(1, '0.015*"corrupt" + 0.015*"using" + 0.015*"lie" + 0.008*"report" + 0.008*"russia"')
(2, '0.018*"hack" + 0.014*"russia" + 0.009*"draintheswamp" + 0.009*"could" + 0.009*"establishment"')
(3, '0.017*"russia" + 0.011*"deal" + 0.011*"russian" + 0.011*"server" + 0.011*"meeting"')
(4, '0.020*"hillary" + 0.020*"even" + 0.016*"include" + 0.016*"destroy" + 0.011*"evidence

(3, '0.016*"go" + 0.016*"parent" + 0.016*"toronto" + 0.016*"dead" + 0.016*"home"')
(4, '0.018*"last" + 0.018*"father" + 0.018*"never" + 0.012*"husband" + 0.012*"good"')


# Deteccion de topicos por threads

Al contrario del apartado anterior, se buscarán tópicos en el archivo completo, por lo que se considerará cada thread como un documento, para esto se unirán los tweets siendo considerados parrafos separados por saltos de linea "\n".


In [58]:
Tthreads1

['5. Create a path to a green card for E-2 investors. Include any children brought here before age 21.\n6. Let\'s really make a 10 year law: provisional green cards.\n7. Direct DHS to allow people to take the steps to correct their immigration status.\n8. Penalties besides deportation.\n24. Vote out elected officials with close ties to nativist, white nationalist, or fearmongering groups such as FAIR, CIS, NumbersUSA, US Inc., KKK, VDare, etc.\n25. Give immig judges true independence, more support staff.\n26. Backlog relief for India, China, Philippines, Mexico.\n15. Expand ESL instruction.\n16. Create state Offices of New Americans.\n17. Make real use of S visas to take down cartels.\n18. Make filing for citizenship free or almost free.\n19. Create path to green card for longterm TPS holders.\n20. Recognize "deportees" as a group for asylum.\nSo much can be done. Yet we\'re stuck between a border wall and DACA. And gutting protections for 90% of the currently undocumented. \n\nThey\'v

In [59]:
from gensim import corpora
import gensim
NUM_TOPICS = 20
NUM_WORDS = 10
import pickle

### CSV five_ten

In [60]:
THIS_FOLDER = os.getcwd()
threads_leer = Tthreads1
carpeta_guardar = "Ttpcsv1"

#Poblar text_data


camino = os.path.join(THIS_FOLDER, carpeta_guardar)
text_data = []
documentos = []
dictionary = []
corpus = []
documentos = threads_leer

#print(documentos)

for line in documentos:
    #print(line)
    tokens = prepare_text_for_lda(line)
    if random.random() > .009:
        #print(tokens)
        text_data.append(tokens)

#print(text_data) 
NDIC = camino+"\\t_dictionary1.gensim"
NMOD = camino+"\\t_model1.gensim"
NCOR = camino+"\\t_corpus1.pkl"
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]
pickle.dump(corpus, open(NCOR, 'wb'))
dictionary.save(NDIC)

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
ldamodel.save(NMOD)
topics = ldamodel.print_topics(num_words=NUM_WORDS)
for topic in topics:
    print(topic)

(0, '0.019*"cambridge" + 0.016*"analytica" + 0.015*"trump" + 0.014*"campaign" + 0.013*"cambridgeanalytica" + 0.012*"qanon" + 0.011*"candidate" + 0.011*"hillary" + 0.009*"voter" + 0.009*"ask"')
(1, '0.020*"north" + 0.020*"dakota" + 0.014*"candidate" + 0.014*"northdakota" + 0.014*"democratic" + 0.011*"state" + 0.009*"2018" + 0.009*"attorney" + 0.007*"voter" + 0.007*"vote"')
(2, '0.015*"know" + 0.007*"startup" + 0.006*"deny" + 0.006*"investor" + 0.006*"question" + 0.006*"2016" + 0.006*"election" + 0.005*"american" + 0.005*"involve" + 0.005*"public"')
(3, '0.014*"zuck" + 0.014*"julian" + 0.011*"government" + 0.011*"still" + 0.011*"reconnectjulian" + 0.008*"interview" + 0.008*"single" + 0.008*"investigation" + 0.008*"waiting" + 0.008*"ever"')
(4, '0.032*"child" + 0.013*"could" + 0.013*"chemical" + 0.013*"health" + 0.010*"change" + 0.010*"safety" + 0.010*"corporation" + 0.010*"bootpruitt" + 0.010*"recent" + 0.006*"protect"')
(5, '0.015*"child" + 0.011*"laws" + 0.009*"woman" + 0.009*"state" +

### CSV Ten_fifteen

In [61]:
THIS_FOLDER = os.getcwd()
threads_leer = Tthreads2
carpeta_guardar = "Ttpcsv2"

#Poblar text_data


camino = os.path.join(THIS_FOLDER, carpeta_guardar)
text_data = []
documentos = []
dictionary = []
corpus = []
documentos = threads_leer

#print(documentos)

for line in documentos:
    #print(line)
    tokens = prepare_text_for_lda(line)
    if random.random() > .009:
        #print(tokens)
        text_data.append(tokens)

#print(text_data) 
NDIC = camino+"\\t_dictionary1.gensim"
NMOD = camino+"\\t_model1.gensim"
NCOR = camino+"\\t_corpus1.pkl"
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]
pickle.dump(corpus, open(NCOR, 'wb'))
dictionary.save(NDIC)

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
ldamodel.save(NMOD)
topics = ldamodel.print_topics(num_words=NUM_WORDS)
for topic in topics:
    print(topic)

(1, '0.042*"qanon" + 0.033*"greatawakening" + 0.032*"thestorm" + 0.016*"gitmo" + 0.009*"internetbillofrights" + 0.008*"oigreport" + 0.008*"releasethevideo" + 0.008*"sethrich" + 0.008*"releasethememo" + 0.008*"memoday"')
(2, '0.019*"corbyn" + 0.011*"single" + 0.010*"outside" + 0.008*"want" + 0.008*"access" + 0.008*"rule" + 0.008*"future" + 0.008*"learning" + 0.007*"full" + 0.007*"brexit"')
(3, '0.008*"grassleymemo" + 0.008*"page" + 0.008*"büyük" + 0.007*"white" + 0.007*"libertarian" + 0.006*"declassify" + 0.006*"saint" + 0.005*"portion" + 0.005*"privatize" + 0.005*"believe"')
(4, '0.011*"people" + 0.009*"nevergetourguns" + 0.008*"qanon" + 0.007*"control" + 0.007*"information" + 0.006*"make" + 0.006*"know" + 0.005*"right" + 0.005*"want" + 0.005*"school"')
(5, '0.010*"back" + 0.010*"woman" + 0.008*"france" + 0.007*"people" + 0.007*"reconciliation" + 0.007*"qanon" + 0.006*"time" + 0.006*"would" + 0.006*"election" + 0.006*"campaign"')
(6, '0.009*"trump" + 0.008*"people" + 0.007*"train" + 0.

### CSV fifteen_twenty

In [62]:
THIS_FOLDER = os.getcwd()
threads_leer = Tthreads3
carpeta_guardar = "Ttpcsv3"

#Poblar text_data


camino = os.path.join(THIS_FOLDER, carpeta_guardar)
text_data = []
documentos = []
dictionary = []
corpus = []
documentos = threads_leer

#print(documentos)

for line in documentos:
    #print(line)
    tokens = prepare_text_for_lda(line)
    if random.random() > .009:
        #print(tokens)
        text_data.append(tokens)

#print(text_data) 
NDIC = camino+"\\t_dictionary1.gensim"
NMOD = camino+"\\t_model1.gensim"
NCOR = camino+"\\t_corpus1.pkl"
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]
pickle.dump(corpus, open(NCOR, 'wb'))
dictionary.save(NDIC)

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
ldamodel.save(NMOD)
topics = ldamodel.print_topics(num_words=NUM_WORDS)
for topic in topics:
    print(topic)

(0, '0.011*"people" + 0.005*"make" + 0.005*"anglais" + 0.005*"want" + 0.005*"guerre" + 0.004*"state" + 0.004*"moses" + 0.004*"français" + 0.004*"allemands" + 0.004*"like"')
(1, '0.015*"startup" + 0.013*"company" + 0.011*"liberty" + 0.007*"assange" + 0.007*"learn" + 0.007*"outside" + 0.006*"show" + 0.006*"work" + 0.006*"years" + 0.005*"right"')
(2, '0.010*"kramer" + 0.008*"dossier" + 0.007*"mccain" + 0.007*"vote" + 0.007*"know" + 0.006*"oscar" + 0.006*"russia" + 0.006*"qanon" + 0.005*"leave" + 0.005*"plan"')
(3, '0.017*"trump" + 0.011*"private" + 0.008*"idea" + 0.008*"intelligence" + 0.008*"writing" + 0.008*"contractor" + 0.007*"government" + 0.007*"company" + 0.007*"président" + 0.006*"justice"')
(4, '0.007*"clinton" + 0.005*"story" + 0.005*"bill" + 0.005*"hillary" + 0.005*"begin" + 0.005*"henley" + 0.002*"people" + 0.002*"involve" + 0.002*"time" + 0.002*"greatawakening"')
(5, '0.000*"trump" + 0.000*"qanon" + 0.000*"make" + 0.000*"know" + 0.000*"pruitt" + 0.000*"kramer" + 0.000*"greata

### CSV twenty_twentyfive

In [63]:
THIS_FOLDER = os.getcwd()
threads_leer = Tthreads4
carpeta_guardar = "Ttpcsv4"

#Poblar text_data


camino = os.path.join(THIS_FOLDER, carpeta_guardar)
text_data = []
documentos = []
dictionary = []
corpus = []
documentos = threads_leer

#print(documentos)

for line in documentos:
    #print(line)
    tokens = prepare_text_for_lda(line)
    if random.random() > .009:
        #print(tokens)
        text_data.append(tokens)

#print(text_data) 
NDIC = camino+"\\t_dictionary1.gensim"
NMOD = camino+"\\t_model1.gensim"
NCOR = camino+"\\t_corpus1.pkl"
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]
pickle.dump(corpus, open(NCOR, 'wb'))
dictionary.save(NDIC)

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
ldamodel.save(NMOD)
topics = ldamodel.print_topics(num_words=NUM_WORDS)
for topic in topics:
    print(topic)

(0, '0.016*"trump" + 0.010*"know" + 0.009*"tory" + 0.009*"would" + 0.008*"russia" + 0.008*"kurd" + 0.008*"stone" + 0.007*"mueller" + 0.007*"turkey" + 0.007*"iran"')
(1, '0.026*"trump" + 0.012*"contre" + 0.011*"pour" + 0.011*"campagne" + 0.011*"justice" + 0.010*"président" + 0.009*"l\'informateur" + 0.009*"nunes" + 0.008*"cette" + 0.008*"breaking"')
(2, '0.000*"trump" + 0.000*"know" + 0.000*"cohen" + 0.000*"would" + 0.000*"state" + 0.000*"russian" + 0.000*"people" + 0.000*"manafort" + 0.000*"russia" + 0.000*"also"')
(3, '0.039*"maga" + 0.038*"qanon" + 0.036*"wwg1wga" + 0.021*"trump" + 0.019*"potus" + 0.015*"patriotsfight" + 0.014*"post" + 0.014*"witchhunt" + 0.013*"wethepeople" + 0.012*"tuesdaythoughts"')
(4, '0.014*"criminal" + 0.013*"trump" + 0.012*"black" + 0.009*"morgan" + 0.008*"fusion" + 0.008*"know" + 0.007*"congressrun2018" + 0.007*"bluewave2018" + 0.007*"bluewaveil" + 0.006*"would"')
(5, '0.012*"profit" + 0.010*"shearer" + 0.009*"clinton" + 0.006*"money" + 0.006*"would" + 0.005

### CSV twentyfive_thirty

In [64]:
THIS_FOLDER = os.getcwd()
threads_leer = Tthreads5
carpeta_guardar = "Ttpcsv5"

#Poblar text_data


camino = os.path.join(THIS_FOLDER, carpeta_guardar)
text_data = []
documentos = []
dictionary = []
corpus = []
documentos = threads_leer

#print(documentos)

for line in documentos:
    #print(line)
    tokens = prepare_text_for_lda(line)
    if random.random() > .009:
        #print(tokens)
        text_data.append(tokens)

#print(text_data) 
NDIC = camino+"\\t_dictionary1.gensim"
NMOD = camino+"\\t_model1.gensim"
NCOR = camino+"\\t_corpus1.pkl"
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]
pickle.dump(corpus, open(NCOR, 'wb'))
dictionary.save(NDIC)

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
ldamodel.save(NMOD)
topics = ldamodel.print_topics(num_words=NUM_WORDS)
for topic in topics:
    print(topic)

(0, '0.017*"system" + 0.013*"head" + 0.008*"balance" + 0.008*"canal" + 0.006*"movement" + 0.006*"around" + 0.006*"vestibular" + 0.006*"detect" + 0.005*"know" + 0.005*"things"')
(1, '0.015*"mulehead" + 0.012*"contd" + 0.011*"trump" + 0.010*"would" + 0.008*"view" + 0.007*"say" + 0.007*"russia" + 0.007*"russian" + 0.007*"evidence" + 0.006*"director"')
(2, '0.013*"find" + 0.008*"also" + 0.007*"like" + 0.007*"content" + 0.007*"make" + 0.006*"even" + 0.005*"would" + 0.005*"vote" + 0.005*"want" + 0.005*"people"')
(3, '0.013*"trump" + 0.008*"comeymemos" + 0.008*"comey" + 0.007*"right" + 0.007*"world" + 0.007*"chinese" + 0.006*"people" + 0.006*"mail" + 0.005*"could" + 0.005*"even"')
(4, '0.026*"qanon" + 0.023*"internetbillofrights" + 0.016*"file" + 0.008*"nunes" + 0.008*"website" + 0.007*"campaign" + 0.006*"post" + 0.005*"virus" + 0.005*"site" + 0.005*"code"')
(5, '0.014*"page" + 0.006*"road" + 0.006*"come" + 0.006*"nunes" + 0.006*"know" + 0.005*"2016" + 0.005*"carter" + 0.005*"meeting" + 0.005

## Megacorpus

Como tercera alternativa de análisis, se decide unir todos los threads que se tienen en un megacorpus, por lo que se utilizan cada thread de todos los archivos como un documento, luego se detectan los topicos presentes en los aproximadamente 500 documentos entregados.

In [65]:
megatexto = Tthreads1+Tthreads2+Tthreads3+Tthreads4+Tthreads5

In [66]:
megatexto

['5. Create a path to a green card for E-2 investors. Include any children brought here before age 21.\n6. Let\'s really make a 10 year law: provisional green cards.\n7. Direct DHS to allow people to take the steps to correct their immigration status.\n8. Penalties besides deportation.\n24. Vote out elected officials with close ties to nativist, white nationalist, or fearmongering groups such as FAIR, CIS, NumbersUSA, US Inc., KKK, VDare, etc.\n25. Give immig judges true independence, more support staff.\n26. Backlog relief for India, China, Philippines, Mexico.\n15. Expand ESL instruction.\n16. Create state Offices of New Americans.\n17. Make real use of S visas to take down cartels.\n18. Make filing for citizenship free or almost free.\n19. Create path to green card for longterm TPS holders.\n20. Recognize "deportees" as a group for asylum.\nSo much can be done. Yet we\'re stuck between a border wall and DACA. And gutting protections for 90% of the currently undocumented. \n\nThey\'v

In [67]:
from gensim import corpora
import gensim
NUM_TOPICS = 20
NUM_WORDS = 10
import pickle

In [68]:
THIS_FOLDER = os.getcwd()
threads_leer = megatexto
carpeta_guardar = "mega"

#Poblar text_data

camino = os.path.join(THIS_FOLDER, carpeta_guardar)
text_data = []
documentos = []
dictionary = []
corpus = []
documentos = threads_leer

#print(documentos)

for line in documentos:
    #print(line)
    tokens = prepare_text_for_lda(line)
    if random.random() > .009:
        #print(tokens)
        text_data.append(tokens)

print(text_data) 
NDIC = camino+"\\t_dictionary1.gensim"
NMOD = camino+"\\t_model1.gensim"
NCOR = camino+"\\t_corpus1.pkl"
dictionary = corpora.Dictionary(text_data)
corpus = [dictionary.doc2bow(text) for text in text_data]
pickle.dump(corpus, open(NCOR, 'wb'))
dictionary.save(NDIC)

ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics = NUM_TOPICS, id2word=dictionary, passes=15)
ldamodel.save(NMOD)
topics = ldamodel.print_topics(num_words=NUM_WORDS)
for topic in topics:
    print(topic)




(0, '0.006*"trump" + 0.006*"libertyrising" + 0.006*"yang" + 0.006*"wethepeople" + 0.006*"noconcon" + 0.005*"page" + 0.004*"theresistance" + 0.004*"mémo" + 0.004*"nunes" + 0.004*"2018mmm"')
(1, '0.006*"people" + 0.005*"would" + 0.005*"state" + 0.005*"know" + 0.005*"like" + 0.003*"make" + 0.003*"time" + 0.003*"even" + 0.003*"going" + 0.003*"last"')
(2, '0.006*"like" + 0.006*"yang" + 0.005*"royalwedding" + 0.004*"party" + 0.004*"harryandmeghan" + 0.003*"time" + 0.003*"️repub" + 0.003*"tidak" + 0.003*"antum" + 0.003*"black"')
(3, '0.013*"trump" + 0.007*"mcgahn" + 0.007*"mueller" + 0.006*"bluewave2018" + 0.006*"grassleymemo" + 0.006*"congressrun2018" + 0.006*"bluewaveil" + 0.004*"system" + 0.004*"president" + 0.004*"page"')
(4, '0.008*"manafort" + 0.007*"clinton" + 0.007*"trump" + 0.006*"democratic" + 0.005*"candidate" + 0.004*"make" + 0.004*"state" + 0.004*"congress" + 0.004*"incumbent" + 0.003*"hillary"')
(5, '0.009*"statement" + 0.008*"sessions" + 0.007*"false" + 0.007*"hyten" + 0.005*"

### Analisis de resultados de megacorpus

Luego de detectar los topicos, se clasificarán los threads de un archivo según los tópicos obtenidos.

In [69]:
for hilo in Tthreads1:
    hilito = prepare_text_for_lda(hilo)
    hilito_bow = dictionary.doc2bow(hilito)
    print(hilo)
    print(ldamodel.get_document_topics(hilito_bow))

5. Create a path to a green card for E-2 investors. Include any children brought here before age 21.
6. Let's really make a 10 year law: provisional green cards.
7. Direct DHS to allow people to take the steps to correct their immigration status.
8. Penalties besides deportation.
24. Vote out elected officials with close ties to nativist, white nationalist, or fearmongering groups such as FAIR, CIS, NumbersUSA, US Inc., KKK, VDare, etc.
25. Give immig judges true independence, more support staff.
26. Backlog relief for India, China, Philippines, Mexico.
15. Expand ESL instruction.
16. Create state Offices of New Americans.
17. Make real use of S visas to take down cartels.
18. Make filing for citizenship free or almost free.
19. Create path to green card for longterm TPS holders.
20. Recognize "deportees" as a group for asylum.
So much can be done. Yet we're stuck between a border wall and DACA. And gutting protections for 90% of the currently undocumented. 

They've been gutting it fo

I'm of the belief that the most successful politicians are the ones who will listen to the experts in whatever the respective field may be. Want to craft the best policy to deal with the opioid epidemic? Talk to the experts in that field. Don't shut them down.
[(8, 0.3599363), (17, 0.26814985), (19, 0.36522093)]
Pruitt is a danger to the public, yet is losing in the courts while conning @realDonaldTrump into saving his job...all while defying &amp; insulting his boss. Perhaps Trump is the empty vessel Scott Pruitt said he is. That would explain how Pruitt so easily conned him. #BootPruitt
REALITY CHECK: While Pruitt is a dangerous threat to the health of our families, his sloppy and careless legal work is undermining his ability to even implement his extreme pro-polluter agenda in several key cases. #BootPruitt https://t.co/lZKNC2kUjf
REALITY CHECK: Trump is also defending the guy who called him an "empty vessel." https://t.co/ovfQIUUssn #BootPruitt
REALITY CHECK: Pruitt has spent $3 M

2/ Papadopoulos failing to live by his agreements with the feds/the court could lead to charges being brought forward that were previously withheld and/or contempt charges and/or reopening investigations otherwise on hold. None of the conditions Joyce said could be present exist.
[(2, 0.6315876), (17, 0.36058632)]
The real issue is, if the #CIA was doing these behavior and drug experiments in the 50s, #WTF makes us think they just decided to stop? 

They think we are sheep. 

They are wrong. @POTUS gave us back our voice.

#Qanon #TheStormHasArrived #MKUltra #RedPill https://t.co/rpHmPmIdW6
Stated by Senator Kennedy: “...there are perhaps any number of Americans who are walking around today ... who were given drugs with all the kinds of physical and psychological damage than can be caused.”[page 16]

#MKUltra #Qanon #TheStormHasArrived #CORRUPTION @POTUS https://t.co/qAkqYbEt98
Part of #MKUltra that led to its exposure was the death of Frank Olsen who was drugged with LSD by the CIA wi

5. It seems pretty clear CLOWNS control social media. Eric Schmidt (former Google CEO) likely wrote the code to censor across platforms. I'm guessing Russia, China, &amp; HK probably were sold this code by ES (or via HRC selling access to her unclass server?). https://t.co/rHiuN6p8JZ
[(19, 0.9934932)]
Startups should assume they'll get a ton of NOs from investors before a Yes. Identify partners with expertise &amp; track record in your industry, not just big names, and schedule meetings 3-4 weeks out. VCs are always on "vacation". 5/
If you don't hear back from an investor in 1-2 days, they don't like you. If they don't ask questions, they don't like you. Find one VC partner to be your champion and find out what their colleagues will want to know. 6/
Startup burn rates balloon to the size of your bank account if you're not careful. Be thrifty. Cash should last 12-18 months until you've hit milestones that de-risk your business for a new round of funding. 9/
The unintuitive secret to st

(7) Put yourself in the chair. Masculinity is toxic. Republicans are uncompassionate. Trump is a Russian rape menace. How would you react to a world where #LoveTrumpsHate is compassion? Would you be confused? Would you snap? #mkultra https://t.co/dHi78r0ASZ
[(19, 0.9934483)]
+3DC/+3VA Eastern = 13,682 🔥SEALED Indictments
⚖️Dockets➡️ https://t.co/AIWErHTRGh 
🇷🇺#skolkovo➡️  https://t.co/tNKB1x6FPf
Btw 🇨🇳SD/WY/CO💰➡️https://t.co/NkAtbGbzFF  
 #QAnon #MemoWar @avery1776 @connieketchup @damartin32 @BabeReflex_8 @BasedBasterd @littlecarrotq https://t.co/QAcy2PD5q2
+8 DC/+8 VA Eastern = 16 NEW Sealed Indictments #FollowTheWhiteRabbit  #ReleaseTheMemo #SealedIndictments #TickTock  @connieketchup @almostjingo @damartin32 @littlecarrotq @BabeReflex_8 @passion_4truth 
🧐👇 
https://t.co/r4C3ern9Xd
Nationwide Tallies 
https://t.co/3xiTaFYEFi https://t.co/43xH9tbPMW
+1 DC/+0 VA Eastern =13,676 SEALED Indictments 
https://t.co/b1V490Y9MO
#FollowTheWhiteRabbit  #MemoWar #QAnon @connieketchup @damartin32

/4 https://t.co/Sl1OaEOAZr
[(12, 0.7750324), (19, 0.22035222)]
@RepAdamSchiff @rgoodlaw 6. Schiff: House Intelligence Committee internally divided on investigating obstruction of justice https://t.co/oAmPqw3YaB
@RepAdamSchiff @rgoodlaw 7. Schiff: Social media companies “noncommittal”/stonewalling on proposal to identify potential Trump-Russia coordination

#Facebook https://t.co/sNgduU8SrS
@RepAdamSchiff @AP @rgoodlaw 4. Schiff: note to #TrumpRussia observers:

Pattern and timing of Russians' approach to Carter Page and George Papadopoulos are revealing https://t.co/MUswrYcRYO
@RepAdamSchiff @AP 2. Here are all top Ten Highlights followed by excerpts of a few of them (picked by @rgoodlaw): https://t.co/bwOqBmcIt6
Ten Highlights of @RepAdamSchiff @AP interview

Topics: Manafort's Kremlin links, Facebook stonewalling, limits on Mueller and more nuggets

After Tues election results, Schiff is increasingly likely to be next Chair of House Intel Committee.
https://t.co/RgcqSvXi3i

&lt;THREA

Car si on reprend ces lettres, qu'y lit-on vraiment ? D'abord, qu'il n'y a pas de coup de foudre : Abélard, grand séducteur, décide de conquérir Héloïse. Dans ses lettres, il se décrit lui-même comme un « loup affamé convoitant une tendre brebis ». https://t.co/FZKjqbExa1
[(16, 0.9962)]
Ce #thread n'a pas pour objectif de donner raison à une partie ou à l'autre. Il n'est pas non plus exhaustif... 

Il s'agit juste de montrer l'importance du journaliste : il peut réagir à une émotion, mais doit la pondérer pour apporter une information la plus complète possible.
Colère et indignation sont des émotions légitimes. Une information pondérée peut les susciter. Quand elles sont légitimes, il n'y a pas besoin de biaiser une information pour y parvenir. Tout simplement. C'est la fin de ce #thread à la @samuellaurent, merci de m'avoir lu, bye.
[(5, 0.8810851), (15, 0.101607196)]
2) Spot the bias - are these search terms going to show you the whole scope of research? (3/n) https://t.co/jS0FpRBNMJ

Remezcla has spent over 10 years uplifting &amp; advocating for Latinx creators, and we deserve a seat at the table. We've earned the opportunity to tell our community's success stories. Instead, we often find ourselves sidelined from the very moments our work has helped to build.
[(1, 0.99480873)]
3. The FBI hid this info for a year. They tried to alter an election and undermine a President. We are at a point that tweeting is not enough. They shut down every trend
4 we need to make a show of force like BLM and Antifa but in our own way and nonviolant. Any ppl color or religion is welcome. It does not matter we are all deplorable. In their eyes.
2. A violent seizure of power was attempted upon @realDonaldTrump. The guilty are now exposed and in the light. But they are fighting like cornered rats. What kind of world do you want your children to grow up in?
1. #myfellowarericans. My friends, patriots and hero’s. Please share this far and wide. #qanon #releasethememo #memorelease #mynamei

https://t.co/eOd0MrKbr8
[(19, 0.98698634)]
Their father gets interviewed regarding his 2 children!
Check out ⬇➡ @EllaaaCruzzz
For further details!
https://t.co/sXzwfHHXzN
@POTUS @realDonaldTrump #SaveTheChildren 
😢😢😢😢
🙏🙏🙏🙏
I AM GOING TO THROW UP!!!!!!
🤢🤢🤢🤢😢😢🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏🙏
https://t.co/V3QIAQLXRE
🤢🤢😢😢🙏🙏🙏🙏🙏
https://t.co/Nlrs23DLbg
UK Police Questioning the little Girl!
As he changes the subject regarding babies being eaten🤐🤐🤢🤢🤢🤢😢😢😢😢🙏🙏🙏🙏🙏
@EllaaaCruzzz
https://t.co/NQMnuLHTgz
https://t.co/Zuqe2tHrrH
Damn Straight!!!
THESE PEOPLE ARE SICK!!!
#MAGA #KAG #DrainingTheSwamp
#DrainTheDeepStateSwamp #GreatAwakening #CrookedHillary
#LockThemAllUp #PizzaGate #PedoGate #ObamaGate
#MichelleObamaIsAMan
⬇#WINNING ⬇
@POTUS 🇺🇸 @realDonaldTrump 🇺🇸 https://t.co/dCXHl5ETeu
ABSOLUTELY EFFING SICK!!!!!!
https://t.co/H4FIfA2uR6
😢🙏😢🙏😢🙏😢🤢
https://t.co/ffb6AZfNMy
[(18, 0.5861723), (19, 0.39014348)]
💦🌊WATCH THE WATER💧🌧

"Biggest Intel drop in our known HISTORY!"
Why is this event BIG
What does it signify
Y 

The company seems to have realistic expectations, and to be taking ownership of their own past failings. That said, @Facebook's conduct or stock value is the least important part of the story.
[(8, 0.19896439), (13, 0.5999388), (17, 0.19085583)]
pt 1) This is a survey by gun control advocates that I trust about as far as I can throw it.

Pt. 2 see https://t.co/u922SDgDOL

pt. 3) “red flag laws” -- If people "really" believe that these individuals are a danger to themselves or others, confine them to a mental...
health facility.  Simply saying that someone can't legally buy a gun isn't a serious response. People can get guns in other ways just about as easily as they can buy illegal drugs. In addition, if you really people someone is a danger, why only take away their guns?...
Why not also take away their cars?

pt 4) The intimate relationship numbers are useless because they also include crimes committed against prostitutes by Johns and Pimps. Women shouldn't be concerned about all men

@CDRtgneixample @CDRsLleida @cdrbalsareny
[(6, 0.92083335)]
5. @realDonaldTrump I vote top-down, starting with the #LSM to expose and remove the #MockingBird. This way, #WeThePeople can hear #TRUTH from more places than what they denigrate as "alt media." FYI, alt simply means TRUTHFUL, so just substitute that when this term is used.
3. @realDonaldTrump Before the full #TRUTH can be disclosed, we must round up, arrest, and try for #Treason and other relevant crimes those involved with the #Cabal. I'm unclear on sequence here due to need to manage the public (civil unrest will be pushed by #Cabal).
2. @realDonaldTrump Next, we will need to expose, remove, and try for #Treason those involved in the #DeepState. Their nefarious work doesn't stop with the current #Coup but is widespread, including more recent scandals (#Benghazi, #IRS, #FastandFurious) but goes back decades.
8. @realDonaldTrump #UnsealTheIndictments Once we have all those involved...and it's a large number (~10K sealed indi

(7) The strongest argument for abortion on demand is the woman's bodily autonomy. Unfortunately for the left, that too fails, as soon as it's pointed out that half of babies are female. The argument for abortion after the age of viability (20 weeks) immediately fails, too.
[(11, 0.14869334), (18, 0.8464418)]
PS: just to confuse you guys even more, I have zero time for Jeff Sessions, but the man was pictured dining with Rosenstein for a reason, gang.

Steady as she goes, chaps. Hold your fire. We're getting to the end of this terrible period of US history now. Stay frosty. ;) https://t.co/nPehD1UiFb
If Andy McCabe opened a criminal investigation into Jeff Sessions but Mueller told Sessions lawyer that it had been closed, it was closed ***for a reason*** folks. And the reason wasn't "McCabe got it wrong, Sessions was transparent with Congress"
Jeff Sessions had no choice in the matter of firing McCabe, because of the Inspector General's report. 

I would caution everybody into reading to

3/ If Bitcoin is to become a foundation for human and machine peer-to-peer interaction that nobody can control or censor, each user needs to easily be able to run a full node
[(4, 0.9912037)]
5. Anons are quick with the memes...lol. https://t.co/l1WWTU3Rpz
9. Love it when anons generate these graphics showing connections between the news and Q's drops. The  DS doesn't care who they kill to achieve their goal. They want to shut Q comms down. Must be over the target.

(h/t @M2Madness) https://t.co/Y56OtRw46T
2. Looks like Zuckerberg and RT (Rex Tillerson??) met w/8 others incl 1 fmr IC Dir (Brennan?) to strategize plan &amp; 4am Clown Media narrative/talking points. Q team still monitoring thanks to algorithm provided by @Snowden. Checkmate time? FB on last legs? Nobody gets free pass. https://t.co/5wkZGCswac
7. I agree w/these anons. The bombings were DS FF to highlight Q as dangerous "fake news". I had never heard the term "fake news" until right after @wikileaks published the Podesta 

So 'microtargeting' is just a means. This is the crucial difference people aren't getting. Even the data for the microtargeting (derived illicitly it seems) isn't as important as operating principle of #CambridgeAnalytica - it's not campaigning, it's a military grade weapon
[(8, 0.1685253), (17, 0.13818404), (19, 0.6883199)]
Cambridge Analytica's intensive psy ops campign "directly targeted Bernie Sanders voters to prevent them from voting" for Hillary Clinton -- and overran search engines, making real information harder to find. #FakeNews #Propaganda
In addition to the Cambridge Analytica campaign, Russia targeted the same voters with similar messaging that reached 126 million Americans #TrumpRussia #TrumpColluded https://t.co/9Te7xGoGtX
Undercover reporter: "So the candidate is the puppet?"

#CambridgeAnalytica CEO Alexander Nix: "Always." https://t.co/jYrZrZfek5
"In June 2016, the Trump campaign was flailing... the Mercers... offered Trump a huge cash injection, but they insisted he

https://t.co/NzeX5VC6bX
[(17, 0.99515307)]
A corrupt media establishment is more dangerous —IMO— than having corrupt government. Hell, they ARE the government. Their silence on Julian’s torture tells all. 

#ReconnectJulian
You don’t have to like Julian. But know this: No one has come close to publishing what he has published, and NO ONE has EVER protected their sources like #Wikileaks has. Have you ever blown the whistle? If so, support him now. 

#ReconnectJulian
Thinking of @JulianAssange and how badly Hillary still wants him dead. 

Write to him.
“I was so ticked at him. Until I realized where he was headed..” #ReconnectJulian https://t.co/4bmNsbgz2G
Julian Assange has never had to retract a single story as #FakeNews. 

#Wikileaks has won every single court case that challenged their information. 

Every. Single. One. 

NO mainstream media network can ever say that. 

#ReconnectJulian
The publications Julian Assange has given us (undeservedly) have lifted the veil of corruption so 

7/ The other variable I mentioned, stability of the network, got a massive boost in 2017, when users fought off corporate and political attacks. As a network, #bitcoin showed reliability and built trust that it won't be altered by insiders or govts.
[(6, 0.111966826), (11, 0.42259702), (13, 0.12522286), (17, 0.33627245)]
Par défaut, les personnes, pages et listes que vous suivez/likez SONT PUBLIQUES.
C'est à partir de ces informations que Cambridge Analytica semble avoir si bien fonctionné. (4/5)
Facebook explique que « vous pouvez contrôler la plupart des informations que d’autres [vos amis] peuvent communiquer à des applications » mais que « ces contrôles ne vous permettent pas de limiter l’accès à vos informations publiques et à la liste de vos amis ». (3/5)
https://t.co/amBueh5LWf :
« si vous publiez quelque chose sur Facebook, toute personne qui peut y accéder [vos amis] peut permettre à d’autres (comme des jeux, des applications ou des sites web qu’ils utilisent) d’y accéder » (2

### Análisis de tópicos

Es posible analizar la relación entre los tópicos obtenidos a través de la librería pyLDAvis, la cual grafica la distancia entre los tópicos

In [None]:
dictionary = gensim.corpora.Dictionary.load(NDIC)
corpus = pickle.load(open(NCOR, 'rb'))
lda = gensim.models.ldamodel.LdaModel.load(NMOD)
import pyLDAvis.gensim
lda_display = pyLDAvis.gensim.prepare(lda, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)

# 