# Gensim `Doc2Vec` Tutorial on the IMDB Sentiment Dataset  <--- an edit by F_D

## Load corpus 
This part is heavily edited to allow me to import an CSV file 
The file is two columns, the first column containing the class (Binary) and the second column containing the text

In [1]:
import locale
import glob
import os.path
import requests
import tarfile
import sys
import codecs
from smart_open import smart_open
import re
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import gensim
import os
import collections
import smart_open
import random    



In [2]:
from random import randint
def normalize_text(text):
    norm_text = text.lower()
    # Replace breaks with spaces
    norm_text = norm_text.replace('<br />', ' ')
    # Pad punctuation with spaces on both sides
    norm_text = re.sub(r"([\.\",\(\)!\?;:])", " \\1 ", norm_text)
    return norm_text
from collections import namedtuple

alldocs = []
SentimentDocument = namedtuple('SentimentDocument', 'words tags split Class')

#Note that split isn't really required, and tags is random gen because I was to tried to actually fix it :(


fname = 'mbti_2.csv'
with smart_open.smart_open(fname, encoding="iso-8859-1") as f:        
    for line_no, line in enumerate(f):
            if line_no==0:
                pass
            
            else:
                words = gensim.utils.simple_preprocess(line)[1:]
                tags = [randint(0,9)]
                split = ['train', 'test', ][line_no//1200]
                
                if gensim.utils.simple_preprocess(line)[0] == 'depression':
                    
                    Class = 1.0 
                else:
                    Class = 0.0 
                alldocs.append(SentimentDocument(words, tags, split , Class))
         
                 
train_docs = [doc for doc in alldocs if doc.split == 'train']
test_docs = [doc for doc in alldocs if doc.split == 'test']

print('%d docs: %d train-sentiment, %d test-sentiment' % (len(alldocs), len(train_docs), len(test_docs)))
print("Import done, total len is",len(alldocs))
total_num_obs = len(alldocs)
from random import shuffle
doc_list = alldocs[:]  
shuffle(doc_list)

# create train and test by just doing a 75/25% split
from math import floor, ceil
train_docs = alldocs[0:floor(3*total_num_obs/4)]
test_docs = alldocs[floor(3*total_num_obs/4):]
print("Lenght of train is:",len(train_docs))
print("Lenght of test is:",len(test_docs))

1547 docs: 1199 train-sentiment, 348 test-sentiment
Import done, total len is 1547
Lenght of train is: 1160
Lenght of test is: 387


The remaining part of the document should be as normal. 

In [5]:
from random import shuffle
doc_list = alldocs[:]  
shuffle(doc_list)

## Set-up Doc2Vec Training & Evaluation Models

We approximate the experiment of Le & Mikolov ["Distributed Representations of Sentences and Documents"](http://cs.stanford.edu/~quocle/paragraph_vector.pdf) with guidance from Mikolov's [example go.sh](https://groups.google.com/d/msg/word2vec-toolkit/Q49FIrNOQRo/J6KG8mUj45sJ):

`./word2vec -train ../alldata-id.txt -output vectors.txt -cbow 0 -size 100 -window 10 -negative 5 -hs 0 -sample 1e-4 -threads 40 -binary 0 -iter 20 -min-count 1 -sentence-vectors 1`

We vary the following parameter choices:
* 100-dimensional vectors, as the 400-d vectors of the paper take a lot of memory and, in our tests of this task, don't seem to offer much benefit
* Similarly, frequent word subsampling seems to decrease sentiment-prediction accuracy, so it's left out
* `cbow=0` means skip-gram which is equivalent to the paper's 'PV-DBOW' mode, matched in gensim with `dm=0`
* Added to that DBOW model are two DM models, one which averages context vectors (`dm_mean`) and one which concatenates them (`dm_concat`, resulting in a much larger, slower, more data-hungry model)
* A `min_count=2` saves quite a bit of model memory, discarding only words that appear in a single doc (and are thus no more expressive than the unique-to-each doc vectors themselves)

In [6]:
%%time
from gensim.models import Doc2Vec
import gensim.models.doc2vec
from collections import OrderedDict
import multiprocessing

cores = multiprocessing.cpu_count()
assert gensim.models.doc2vec.FAST_VERSION > -1, "This will be painfully slow otherwise"

simple_models = [
    # PV-DBOW plain
    Doc2Vec(dm=0, vector_size=100, negative=5, hs=0, min_count=2, sample=0, 
            epochs=20, workers=cores),
    # PV-DM w/ default averaging; a higher starting alpha may improve CBOW/PV-DM modes
    Doc2Vec(dm=1, vector_size=100, window=10, negative=5, hs=0, min_count=2, sample=0, 
            epochs=20, workers=cores, alpha=0.05, comment='alpha=0.05'),
    # PV-DM w/ concatenation - big, slow, experimental mode
    # window=5 (both sides) approximates paper's apparent 10-word total window size
    Doc2Vec(dm=1, dm_concat=1, vector_size=100, window=5, negative=5, hs=0, min_count=2, sample=0, 
            epochs=20, workers=cores),
]

for model in simple_models:
    model.build_vocab(alldocs)
    print("%s vocabulary scanned & state initialized" % model)

models_by_name = OrderedDict((str(model), model) for model in simple_models)

Doc2Vec(dbow,d100,n5,mc2,t8) vocabulary scanned & state initialized
Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8) vocabulary scanned & state initialized
Doc2Vec(dm/c,d100,n5,w5,mc2,t8) vocabulary scanned & state initialized
Wall time: 369 ms


Le and Mikolov notes that combining a paragraph vector from Distributed Bag of Words (DBOW) and Distributed Memory (DM) improves performance. We will follow, pairing the models together for evaluation. Here, we concatenate the paragraph vectors obtained from each model with the help of a thin wrapper class included in a gensim test module. (Note that this a separate, later concatenation of output-vectors than the kind of input-window-concatenation enabled by the `dm_concat=1` mode above.)

In [7]:
from gensim.test.test_doc2vec import ConcatenatedDoc2Vec
models_by_name['dbow+dmm'] = ConcatenatedDoc2Vec([simple_models[0], simple_models[1]])
models_by_name['dbow+dmc'] = ConcatenatedDoc2Vec([simple_models[0], simple_models[2]])

## Predictive Evaluation Methods

Let's define some helper methods for evaluating the performance of our Doc2vec using paragraph vectors. We will classify document sentiments using a logistic regression model based on our paragraph embeddings. We will compare the error rates based on word embeddings from our various Doc2vec models.

In [8]:
import numpy as np
import statsmodels.api as sm
from random import sample
    
def logistic_predictor_from_data(train_targets, train_regressors):
    """Fit a statsmodel logistic predictor on supplied data"""
    logit = sm.Logit(train_targets, train_regressors)
    predictor = logit.fit(disp=0)
    # print(predictor.summary())
    return predictor

def error_rate_for_model(test_model, train_set, test_set, 
                         reinfer_train=False, reinfer_test=False, 
                         infer_steps=None, infer_alpha=None, infer_subsample=0.2):
    """Report error rate on test_doc sentiments, using supplied model and train_docs"""

    train_targets = [doc.Class for doc in train_set]
    if reinfer_train:
        train_regressors = [test_model.infer_vector(doc.words, steps=infer_steps, alpha=infer_alpha) for doc in train_set]
    else:
        train_regressors = [test_model.docvecs[doc.tags[0]] for doc in train_set]
    train_regressors = sm.add_constant(train_regressors)
    predictor = logistic_predictor_from_data(train_targets, train_regressors)

    test_data = test_set
    if reinfer_test:
        if infer_subsample < 1.0:
            test_data = sample(test_data, int(infer_subsample * len(test_data)))
        test_regressors = [test_model.infer_vector(doc.words, steps=infer_steps, alpha=infer_alpha) for doc in test_data]
    else:
        test_regressors = [test_model.docvecs[doc.tags[0]] for doc in test_docs]
    test_regressors = sm.add_constant(test_regressors)
    
    # Predict & evaluate
    test_predictions = predictor.predict(test_regressors)
    corrects = sum(np.rint(test_predictions) == [doc.Class for doc in test_data])
    errors = len(test_predictions) - corrects
    error_rate = float(errors) / len(test_predictions)
    return (error_rate, errors, len(test_predictions), predictor)

  from pandas.core import datetools


## Bulk Training & Per-Model Evaluation

Note that doc-vector training is occurring on *all* documents of the dataset, which includes all TRAIN/TEST/DEV docs.

We evaluate each model's sentiment predictive power based on error rate, and the evaluation is done for each model. 

(On a 4-core 2.6Ghz Intel Core i7, these 20 passes training and evaluating 3 main models takes about an hour.)

In [9]:
from collections import defaultdict
error_rates = defaultdict(lambda: 1.0)  # To selectively print only best errors achieved

In [11]:
for model in simple_models: 
    print("Training %s" % model)
    %time model.train(doc_list, total_examples=len(doc_list), epochs=model.epochs)
    
    print("\nEvaluating %s" % model)
    err_rate, err_count, test_count, predictor = error_rate_for_model(model, train_docs, test_docs)
    error_rates[str(model)] = err_rate
    print("\n%f %s\n" % (err_rate, model))

Training Doc2Vec(dbow,d100,n5,mc2,t8)
Wall time: 1.32 s

Evaluating Doc2Vec(dbow,d100,n5,mc2,t8)

1.000000 Doc2Vec(dbow,d100,n5,mc2,t8)

Training Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8)




Wall time: 2.22 s

Evaluating Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8)

1.000000 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8)

Training Doc2Vec(dm/c,d100,n5,w5,mc2,t8)




Wall time: 2.82 s

Evaluating Doc2Vec(dm/c,d100,n5,w5,mc2,t8)

1.000000 Doc2Vec(dm/c,d100,n5,w5,mc2,t8)





In [18]:
for model in [models_by_name['dbow+dmm'], models_by_name['dbow+dmc']]: 
    print("\nEvaluating %s" % model)
    %time err_rate, err_count, test_count, predictor = error_rate_for_model(model, train_docs, test_docs)
    error_rates[str(model)] = err_rate
    print("\n%f %s\n" % (err_rate, model))


Evaluating <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB29E8>
Wall time: 169 ms

1.000000 <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB29E8>


Evaluating <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB2E10>




Wall time: 178 ms

1.000000 <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB2E10>



## Achieved Sentiment-Prediction Accuracy

In [19]:
# Compare error rates achieved, best-to-worst
print("Err_rate Model")
for rate, name in sorted((rate, name) for name, rate in error_rates.items()):
    print("%f %s" % (rate, name))

Err_rate Model
1.000000 <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB29E8>
1.000000 <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB2E10>
1.000000 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8)
1.000000 Doc2Vec(dbow,d100,n5,mc2,t8)
1.000000 Doc2Vec(dm/c,d100,n5,w5,mc2,t8)


In our testing, contrary to the results of the paper, on this problem, PV-DBOW alone performs as good as anything else. Concatenating vectors from different models only sometimes offers a tiny predictive improvement – and stays generally close to the best-performing solo model included. 

The best results achieved here are just around 10% error rate, still a long way from the paper's reported 7.42% error rate. 

(Other trials not shown, with larger vectors and other changes, also don't come close to the paper's reported value. Others around the net have reported a similar inability to reproduce the paper's best numbers. The PV-DM/C mode improves a bit with many more training epochs – but doesn't reach parity with PV-DBOW.)

## Examining Results

### Are inferred vectors close to the precalculated ones?

In [20]:
doc_id = np.random.randint(simple_models[0].docvecs.count)  # Pick random doc; re-run cell for more examples
print('for doc %d...' % doc_id)
for model in simple_models:
    inferred_docvec = model.infer_vector(alldocs[doc_id].words)
    print('%s:\n %s' % (model, model.docvecs.most_similar([inferred_docvec], topn=3)))

for doc 7...
Doc2Vec(dbow,d100,n5,mc2,t8):
 [(4, 0.7876366972923279), (6, 0.6072877645492554), (1, 0.567745566368103)]
Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8):
 [(4, 0.47035813331604004), (6, 0.3715808689594269), (0, 0.3621639013290405)]
Doc2Vec(dm/c,d100,n5,w5,mc2,t8):
 [(6, 0.9173181653022766), (9, 0.9059411287307739), (4, 0.8942681550979614)]


(Yes, here the stored vector from 20 epochs of training is usually one of the closest to a freshly-inferred vector for the same words. Defaults for inference may benefit from tuning for each dataset or model parameters.)

### Do close documents seem more related than distant ones?

In [21]:
import random

doc_id = np.random.randint(simple_models[0].docvecs.count)  # pick random doc, re-run cell for more examples
model = random.choice(simple_models)  # and a random model
sims = model.docvecs.most_similar(doc_id, topn=model.docvecs.count)  # get *all* similar documents
print(u'TARGET (%d): «%s»\n' % (doc_id, ' '.join(alldocs[doc_id].words)))
print(u'SIMILAR/DISSIMILAR DOCS PER MODEL %s:\n' % model)
for label, index in [('MOST', 0), ('MEDIAN', len(sims)//2), ('LEAST', len(sims) - 1)]:
    print(u'%s %s: «%s»\n' % (label, sims[index], ' '.join(alldocs[sims[index][0]].words)))

TARGET (8): «so sick of everything nothing goes right for me no one likes me don have friends and think my parents are starting to hate me slowly have done nothing with my life so useless wish could give my life to someone that deserves it have no clue how will kill my self but will do it before september ends hope it not painful goodbye»

SIMILAR/DISSIMILAR DOCS PER MODEL Doc2Vec(dbow,d100,n5,mc2,t8):

MOST (1, 0.3715978264808655): «since my first moments of conscious thought believed that was different to others while others were naturally inclined towards happiness and contentedness was not like them always thought was simply in bad stage that once was finished in this period be it school work etc would be happy and have purpose however finally realized that it was in myself that some are not made for happiness have finally reached that conclusion that hate myself and that anything else is an excuse and pathetic one at that the meaningless of life is cruel but while most simply igno

Somewhat, in terms of reviewer tone, movie genre, etc... the MOST cosine-similar docs usually seem more like the TARGET than the MEDIAN or LEAST... especially if the MOST has a cosine-similarity > 0.5. Re-run the cell to try another random target document.

### Do the word vectors show useful similarities?

In [22]:
word_models = simple_models[:]

In [23]:
import random
from IPython.display import HTML
# pick a random word with a suitable number of occurences
while True:
    word = random.choice(word_models[0].wv.index2word)
    if word_models[0].wv.vocab[word].count > 10:
        break
# or uncomment below line, to just pick a word from the relevant domain:
#word = 'comedy/drama'
similars_per_model = [str(model.wv.most_similar(word, topn=20)).replace('), ','),<br>\n') for model in word_models]
similar_table = ("<table><tr><th>" +
    "</th><th>".join([str(model) for model in word_models]) + 
    "</th></tr><tr><td>" +
    "</td><td>".join(similars_per_model) +
    "</td></tr></table>")
print("most similar words for '%s' (%d occurences)" % (word, simple_models[0].wv.vocab[word].count))
HTML(similar_table)

most similar words for 'seeing' (24 occurences)


"Doc2Vec(dbow,d100,n5,mc2,t8)","Doc2Vec(""alpha=0.05"",dm/m,d100,n5,w10,mc2,t8)","Doc2Vec(dm/c,d100,n5,w5,mc2,t8)"
"[('ml', 0.38908714056015015), ('professional', 0.370185911655426), ('popular', 0.3606008291244507), ('contacted', 0.3296450972557068), ('paleontological', 0.32092174887657166), ('hiiva', 0.31072068214416504), ('impose', 0.3054892122745514), ('drop', 0.2976997494697571), ('short', 0.29242533445358276), ('badge', 0.28632062673568726), ('christain', 0.28462910652160645), ('blunted', 0.28410691022872925), ('there', 0.2801179885864258), ('urgent', 0.2781597375869751), ('ignored', 0.27601686120033264), ('expressing', 0.2700083255767822), ('host', 0.265575647354126), ('idolised', 0.2648528516292572), ('milf', 0.2639293074607849), ('interest', 0.2637418210506439)]","[('contracts', 0.5608643889427185), ('points', 0.4747487008571625), ('festering', 0.45549091696739197), ('doubts', 0.4549620747566223), ('hundred', 0.45324772596359253), ('weekends', 0.44597485661506653), ('holy', 0.44336381554603577), ('helping', 0.4431059956550598), ('mountains', 0.43379127979278564), ('nobles', 0.43192771077156067), ('imminent', 0.43129244446754456), ('claimed', 0.4307110905647278), ('pays', 0.42425063252449036), ('exams', 0.41793036460876465), ('thousandfold', 0.4142264127731323), ('irrelevant', 0.4120589792728424), ('men', 0.4109652638435364), ('south', 0.4087587594985962), ('psalm', 0.40834206342697144), ('hadn', 0.4070690870285034)]","[('dragging', 0.630577027797699), ('doing', 0.5840270519256592), ('forget', 0.5701508522033691), ('invite', 0.5681229829788208), ('avoiding', 0.5661417245864868), ('deleting', 0.5540841817855835), ('replaced', 0.5507333278656006), ('explaining', 0.5500693917274475), ('suppose', 0.5471111536026001), ('tune', 0.5420584678649902), ('remove', 0.5410057306289673), ('witness', 0.5406255125999451), ('using', 0.5396437048912048), ('asking', 0.5383777618408203), ('conquer', 0.5302994251251221), ('enjoying', 0.5284969210624695), ('ending', 0.523793637752533), ('reading', 0.5235559940338135), ('keeping', 0.5213981866836548), ('transmissions', 0.5180156230926514)]"


Do the DBOW words look meaningless? That's because the gensim DBOW model doesn't train word vectors – they remain at their random initialized values – unless you ask with the `dbow_words=1` initialization parameter. Concurrent word-training slows DBOW mode significantly, and offers little improvement (and sometimes a little worsening) of the error rate on this IMDB sentiment-prediction task, but may be appropriate on other tasks, or if you also need word-vectors. 

Words from DM models tend to show meaningfully similar words when there are many examples in the training data (as with 'plot' or 'actor'). (All DM modes inherently involve word-vector training concurrent with doc-vector training.)

### Are the word vectors from this dataset any good at analogies?

In [24]:
# grab the file if not already local
questions_filename = 'questions-words.txt'
if not os.path.isfile(questions_filename):
    # Download IMDB archive
    print("Downloading analogy questions file...")
    url = u'https://raw.githubusercontent.com/tmikolov/word2vec/master/questions-words.txt'
    r = requests.get(url)
    with smart_open(questions_filename, 'wb') as f:
        f.write(r.content)
assert os.path.isfile(questions_filename), "questions-words.txt unavailable"
print("Success, questions-words.txt is available for next steps.")

Downloading analogy questions file...


TypeError: 'module' object is not callable

In [25]:
# Note: this analysis takes many minutes
for model in word_models:
    score, sections = model.wv.evaluate_word_analogies('questions-words.txt')
    correct, incorrect = len(sections[-1]['correct']), len(sections[-1]['incorrect'])
    print('%s: %0.2f%% correct (%d of %d)' % (model, float(correct*100)/(correct+incorrect), correct, correct+incorrect))

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'evaluate_word_analogies'

Even though this is a tiny, domain-specific dataset, it shows some meager capability on the general word analogies – at least for the DM/mean and DM/concat models which actually train word vectors. (The untrained random-initialized words of the DBOW model of course fail miserably.)

## Slop

In [None]:
This cell left intentionally erroneous.

### Advanced technique: re-inferring doc-vectors

Because the bulk-trained vectors had much of their training early, when the model itself was still settling, it is *sometimes* the case that rather than using the bulk-trained vectors, new vectors re-inferred from the final state of the model serve better as the input/test data for downstream tasks. 

Our `error_rate_for_model()` function already had a non-default option to re-infer vectors before training/testing the classifier, so here we test that option. (This takes as long or longer than initial bulk training, as inference is only single-threaded.)

In [26]:
for model in simple_models + [models_by_name['dbow+dmm'], models_by_name['dbow+dmc']]: 
    print("Evaluating %s re-inferred" % str(model))
    pseudomodel_name = str(model)+"_reinferred"
    %time err_rate, err_count, test_count, predictor = error_rate_for_model(model, train_docs, test_docs, reinfer_train=True, reinfer_test=True, infer_subsample=1.0)
    error_rates[pseudomodel_name] = err_rate
    print("\n%f %s\n" % (err_rate, pseudomodel_name))

Evaluating Doc2Vec(dbow,d100,n5,mc2,t8) re-inferred


TypeError: 'NoneType' object cannot be interpreted as an integer


1.000000 Doc2Vec(dbow,d100,n5,mc2,t8)_reinferred

Evaluating Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8) re-inferred


TypeError: 'NoneType' object cannot be interpreted as an integer


1.000000 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t8)_reinferred

Evaluating Doc2Vec(dm/c,d100,n5,w5,mc2,t8) re-inferred


TypeError: 'NoneType' object cannot be interpreted as an integer


1.000000 Doc2Vec(dm/c,d100,n5,w5,mc2,t8)_reinferred

Evaluating <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB29E8> re-inferred


TypeError: 'NoneType' object cannot be interpreted as an integer


1.000000 <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB29E8>_reinferred

Evaluating <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB2E10> re-inferred


TypeError: 'NoneType' object cannot be interpreted as an integer


1.000000 <gensim.test.test_doc2vec.ConcatenatedDoc2Vec object at 0x000001A60BEB2E10>_reinferred



In [25]:
# Compare error rates achieved, best-to-worst
print("Err_rate Model")
for rate, name in sorted((rate, name) for name, rate in error_rates.items()):
    print("%f %s" % (rate, name))

Err_rate Model
0.102120 Doc2Vec(dbow,d100,n5,mc2,t4)+Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t4)_reinferred
0.102240 Doc2Vec(dbow,d100,n5,mc2,t4)_reinferred
0.102600 Doc2Vec(dbow,d100,n5,mc2,t4)
0.103360 Doc2Vec(dbow,d100,n5,mc2,t4)+Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t4)
0.104320 Doc2Vec(dbow,d100,n5,mc2,t4)+Doc2Vec(dm/c,d100,n5,w5,mc2,t4)_reinferred
0.105080 Doc2Vec(dbow,d100,n5,mc2,t4)+Doc2Vec(dm/c,d100,n5,w5,mc2,t4)
0.146200 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t4)_reinferred
0.154280 Doc2Vec("alpha=0.05",dm/m,d100,n5,w10,mc2,t4)
0.218120 Doc2Vec(dm/c,d100,n5,w5,mc2,t4)_reinferred
0.225760 Doc2Vec(dm/c,d100,n5,w5,mc2,t4)


Here, we do *not* see much benefit of re-inference. It's more likely to help if the initial training used fewer epochs (10 is also a common value in the literature for larger datasets), or perhaps in larger datasets. 

### To get copious logging output from above steps...

In [None]:
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
rootLogger = logging.getLogger()
rootLogger.setLevel(logging.INFO)

### To auto-reload python code while developing...

In [None]:
%load_ext autoreload
%autoreload 2