Table of contents: TODO

## Removal of giveaway posts

a) __Naive Bayes classification__ of FB posts to detect viral marketing.  
b) __remove whole threads__ that started with a giveaway post. 

Before removal: 114,826 documents  
After removal: 59,207 documents


In [1]:
import pandas as pd

from src.giveaway.GiveawayClassifier import GiveawayClassifier
from src.utility.general import export_serialized

In [2]:
# load in dataset you with to work on
df = pd.read_csv(
    'data/hpv_data_reactions_copy.csv',
    parse_dates = ['time']
)

  interactivity=interactivity, compiler=compiler, result=result)


Load training data for the classifier (494 documents).  

POST-level content found to contain Marie Louise's stopwords.  
Hand labeled by one person.

In [3]:
labeled = (pd.read_csv('data/200414_giveaway_training.csv')
           # drops 2 rows with a missing label (496 rows in original file)
           .dropna(subset=['giveaway']))

X = labeled['text']
y = labeled['giveaway']

Train the Giveaway Classifier.

In [4]:
gc = GiveawayClassifier(X=X, y=y)
gc.train()
gc.report

Unnamed: 0,accuracy,brier_n,brier_giveaway,recall_n,recall_giveaway,precision_n,precision_giveaway
train,0.973913,0.973913,0.026087,0.992832,0.893939,0.975352,0.967213
test,0.973154,0.973154,0.026846,0.984252,0.909091,0.984252,0.909091


Classify only POST-level content in the loaded dataset.  
The model classifies short comments unreliably.

In [5]:
df_post = df.query('content_type == "POST"')

giveawas_df = (gc
               .predict_new(df_post.text, negative_for_url=True)
               .query('predicted == 1')
               .rename(columns={'index': 'id_orig'})
              )

giveawas_df

Unnamed: 0,id_orig,text,predicted
21,109826,Et GODT svar :)\n\nhttps://www.facebook.com/g...,1.0
608,110413,VIND 2 PLADSER TIL VORES OVERDÃ…DIGE SKALDYRSBU...,1.0
636,110441,*** TILLYKKE TIL DEN HELDIGE VINDER : Christin...,1.0
668,110473,Velkommen til Ã†rÃ¸ ðŸ˜Š\nhttps://www.facebook.com/...,1.0
705,110510,"Konkurrence! I vores nye elektronikbutik, Capi...",1.0
...,...,...,...
4932,114737,Konkurrence: Vind et valgfrit ur fra Wooden Wo...,1.0
4990,114795,Stadig ledige pladser til Ã¥rets julegave-works...,1.0
5008,114813,Yoga i bjergtagende landskaber. Et alternativt...,1.0
5010,114815,Nu er det snart jul - og det vil vi gerne fejr...,1.0


Filter found threads from the original dataset  
a) find post_id's that were labeled as a giveaway  
b) filter threads with such post ids out  

In [6]:
bad_threads = df.query('@giveawas_df.id_orig').post_id
bad_threads = [num for num in bad_threads]

# remove bad threads
S1_giveaway_removed = df.query('post_id != @bad_threads')

# save whole dataframe
S1_giveaway_removed.to_csv('data/S1_giveaway_removed.csv')

# save texts with ID
export_serialized(
    df=S1_giveaway_removed,
    column='text',
    path='data/S2_text_id.ndjson'
)

<br>

## Preprocessing
_[text_to_x](https://github.com/centre-for-humanities-computing/text_to_x)_

a) __tokens__, __lemmas__, __POS__ & __dependency parsing__ using [Stanza](https://github.com/stanfordnlp/stanza)  
b) __NER__ using [Flair](https://github.com/flairNLP/flair)

Takes a lot of time to run. 
It is recommended that you run this part from the terminal.

```bash
cd hpv-vaccine
python3 src/preprocessing.py -p data/S2_text_id.ndjson -o data/S3_prep.ndjson --lang 'da' --jobs 4 --bugstring True
```


<br>

## Feature selection 

a) __Filter out non-meaningful Parts of Speech from all texts__.   
Only NOUN, PROP-NOUN, ADJ, VERB and ADVERB will be kept


b) __Neural detection of phrases__.  
If two tokens appear together often, they will be concatenated into a single token.

In [6]:
import ndjson

from src.utility import phraser
from src.utility.general import load_data

In [None]:
# import preprocessed data
texts_id = load_data('data/S3_prep.ndjson')

# phraser has both a) & b) functionality
texts_phrased = phraser.train(
    texts_id,
    lang='da',
    out_path='data/S4_fb_phrase.ndjson'
)

# texts only
texts = [doc['text'] for doc in texts_phrased]
# ids only
ids = [doc['id'] for doc in texts_phrased]

In [7]:
# ### in case you don't want to run the phraser each time
# # text data
# with open('data/S4_fb_phrase.ndjson') as f:
#     texts_phrased = ndjson.load(f)

# # texts only
# texts = [doc['text'] for doc in texts_phrased]
# # ids only
# ids = [doc['id'] for doc in texts_phrased]

<br>

## Seed selection

a) __Train a CBOW model__  
To be used for finding related words to query.  
Intentions behind the parameters:
- words that appear together in the whole FB post (window=20)
- frequent words, so that the seeds are generalizable (min_count=100)

_comment: potentially this could be taken care of by PmiSvdEmbeddings._

b) __Enhance phrase list__  
Add synonyms and related words to a given phrase list. This will be used as guide the topic model.

In [None]:
from gensim.models import Word2Vec, KeyedVectors

# from src.embeddings.pmisvd import PmiSvdEmbeddings
from src.embeddings.query_ops import import_query, get_related

Import desired seeds in a long csv format.  
The seeds to be enhanced are in a single column (col).

In [None]:
# import phrase list
query_list = import_query(
    ordlist_path='data/200818_hpv_query.csv',
    lang='da',
    col='term'
)

Train the CBOW model and get {topn} related words to each term.  
A related word must appear at least {cutoff} times  least 50 times in the dataset.

In [None]:
# train a cbow model
cbow_texts = Word2Vec(
    texts,
    size=100, window=20, min_count=100,
    sg=0, hs=0,
    iter=500, workers=4
)

# get a list of words similar to those in the phrase list
query_related = get_related(cbow_texts.wv, query_list, topn=10, cutoff=50)

The model can also be browser from here

In [None]:
get_related(cbow_texts.wv, ['kÃ¸n'], topn=10, cutoff=50)

Add topic labels & export

In [None]:
# add topic labels to the enhanced list
topic = pd.read_csv('data/200818_hpv_query.csv')
enhanced_topic = pd.merge(query_related, topic, on='term')

# save
(enhanced_topic
 .to_csv('data/S5_query_related.csv'))

Now the seeds have to be __manually redacted__.

<br>

## Topic modeling

In [1]:
from itertools import product

import pandas as pd

from src.lda.asymmetric import grid_search_lda_ASM
from src.lda.seeded import grid_search_lda_SED
from src.utility.general import compile_report

In [2]:
# extract topic seeds
S6_query_redacted = pd.read_csv('data/S6_query_redacted.csv')
seeds = (S6_query_redacted
         .dropna(subset=['related'])
         .groupby('topic')['related']
         .apply(list)
         .to_frame()
         .related
         .tolist())

In [13]:
len(seeds)

12

### Seeded LDA

a) pick folder to save the resutls to (`batch_sed`)  
b) pick priors (`priors_range`). Each tuple is a pair of alpha and eta.  
c) train using `grid_search_lda_SED()`  
d) evaluate models by topic coherence using `compile_report()`  

In [3]:
# please change destination folder here
batch_sed = 'models/200820_seed_tfidf_2000iter/'

In [4]:
# pick priors
alpha_range = [0.05, 0.1, 0.5, 1]
eta_range = [0.05, 0.1, 0.5, 1]

priors_range = list(product(alpha_range, eta_range))

In [9]:
# train
grid_search_lda_SED(
    texts=texts,
    seed_topic_list=seeds,
    n_topics_range=range(12, 30),
    priors_range=priors_range,
    out_dir=batch_sed,
    n_top_words=10,
    vectorizer_type='count',
    iterations=2000,
    save_doc_top=True,
    verbose=False
)

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype, int):
  if sparse and not np.issubdtype(doc_word.dtype

In [10]:
# evaluate
compile_report(batch_sed + 'report_lines/')

Unnamed: 0,model,n_top,alpha,eta,training_time,coh_score,coh_topic
0,14T_13I_seed,14,0.05,0.05,93.37142,-2.207671,"[-2.297570107772246, -2.2970567761455105, -3.8..."
1,12T_1I_seed,12,0.05,0.05,86.365941,-2.209302,"[-3.113918331194481, -2.5184405998387027, -3.0..."
2,13T_12I_seed,13,0.05,0.05,89.366744,-2.229409,"[-2.245147496876341, -1.7135920109868719, -2.6..."
3,19T_18I_seed,19,0.05,0.05,115.826513,-2.326914,"[-4.308687519908912, -2.179636696544568, -4.01..."
4,22T_21I_seed,22,0.05,0.05,128.095669,-2.347583,"[-5.026304693073706, -1.8060117125169737, -2.7..."
5,20T_19I_seed,20,0.05,0.05,117.725147,-2.357374,"[-2.3067563108322173, -1.8376075695404843, -4...."
6,25T_24I_seed,25,0.05,0.05,137.921025,-2.376776,"[-6.211970302400729, -1.7350799582227023, -2.0..."
7,24T_23I_seed,24,0.05,0.05,137.43387,-2.386515,"[-3.18861277283595, -1.83961449123678, -3.4533..."
8,18T_17I_seed,18,0.05,0.05,112.326309,-2.396821,"[-6.380083429608378, -1.8596222070079023, -2.5..."
9,15T_14I_seed,15,0.05,0.05,100.304887,-2.417185,"[-2.4301566304406843, -1.7371794643674494, -2...."


### "Asymmetric" LDA

In [11]:
# please change destination folder here
batch_asm = 'models/200822_asm/'

In [12]:
grid_search_lda_ASM(
    texts=texts,
    n_topics_range=range(5, 31, 1),
    iterations=2000,
    passes=2,
    out_dir=batch_asm,
    verbose=False,
    save_doc_top=True,
)

In [13]:
compile_report(batch_asm + 'report_lines/')

Unnamed: 0,model,n_top,alpha,eta,training_time,coh_score,coh_topic
0,6T_ASM,6,"[0.3298124074935913, 0.3872484266757965, 0.220...","[1.3929634094238281, 0.7351519465446472, 0.678...",21.485971,0.540152,"[0.529720051062926, 0.639857440303258, 0.43173..."
1,10T_ASM,10,"[0.19344855844974518, 0.08479581773281097, 0.1...","[0.6174671649932861, 0.2936669588088989, 0.155...",22.365427,0.505155,"[0.4423418663228607, 0.5418130223127049, 0.336..."
2,5T_ASM,5,"[0.1921791136264801, 0.45144397020339966, 0.19...","[5.437136650085449, 0.8440811038017273, 0.5494...",19.564418,0.504196,"[0.42421821713678937, 0.45770508588816633, 0.4..."
3,9T_ASM,9,"[0.34198513627052307, 0.11371507495641708, 0.1...","[0.23478837311267853, 0.6338014602661133, 0.21...",22.179205,0.496829,"[0.6196989500481827, 0.3581051427695182, 0.417..."
4,7T_ASM,7,"[0.3161607086658478, 0.8285216689109802, 0.150...","[0.6112836003303528, 0.6879484057426453, 0.418...",20.464174,0.462062,"[0.3742076181319628, 0.614157467406206, 0.4480..."
5,8T_ASM,8,"[0.10406914353370667, 0.18141095340251923, 1.7...","[0.9625723958015442, 0.49314406514167786, 0.26...",21.081144,0.456428,"[0.5529821105695587, 0.21823768580601452, 0.56..."
6,13T_ASM,13,"[0.11990565061569214, 0.13796326518058777, 0.0...","[0.16414712369441986, 0.3473307192325592, 0.26...",24.648457,0.456253,"[0.3034172640072536, 0.632230480456957, 0.3679..."
7,16T_ASM,16,"[0.1193760335445404, 0.09221818298101425, 0.08...","[0.11086179316043854, 0.11109703779220581, 0.0...",24.837307,0.434935,"[0.3945412616231427, 0.3390824009437571, 0.378..."
8,14T_ASM,14,"[0.24182115495204926, 0.08691311627626419, 0.1...","[0.16693948209285736, 0.12412422895431519, 0.1...",25.475473,0.433006,"[0.6068167704118306, 0.4011406086447936, 0.537..."
9,11T_ASM,11,"[0.11412285268306732, 0.12344231456518173, 0.1...","[0.36364489793777466, 0.1969403475522995, 0.14...",22.686989,0.430342,"[0.34666188624751293, 0.36374067770087903, 0.3..."


<br>

## Model evolution

In [None]:
import src.topicevolution.run_ntr as ntr 

In [None]:
# is there a better way of solving this?
# couldn't we use some batch_asm trick?
import ndjson

with open('models/200811_asm/doctop_mats/10T_ASM_mat.ndjson') as f:
    doctop = ndjson.load(f)

In [None]:
len(doctop) == len(ids)

In [None]:
ntr.process_windows(
    doc_top_prob=doctop,
    ID=ids,
    window=[50, 100, 200],
    out_dir='models/200811_asm/ntr/10T_ASM/'
)

<br>

## Topic usage