# Using Doc2Vec Model

Please follow these steps to use the trained and saved Doc2Vec model file for further tasks. First, make sure you have the following files in your <underline>current working directory</underline>. You can download them [here](https://drive.google.com/open?id=0BzN3cL-RAt8TSFZJMkZsY2o0WlE):
* delta.cooper
* delta.cooper.docvecs.doctag_syn0.npy
* delta.cooper.syn0.npy
* delta.cooper.syn1neg.npy
* delta.cooper.syn1.npy
* delta-vectors.pkl.gzip

Once you are sure that you have all these files, install **gensim** module like you would install anyother python module:
<pre>
pip install gensim
</pre>
OR, in case of OSX
<pre>
easy_install gensim
</pre>

After this run the below code snippet to get you up and running.

In [1]:
import gzip
import cPickle
from gensim.models import Doc2Vec

model_file_name = 'delta.cooper'
vectors_file_name = 'delta-vectors.pkl.gz'

try:
    model = Doc2Vec.load(model_file_name)
    print "Model Loaded"
    print "This model has ",len(model.docvecs)," document (tweet) vectors"
    
    vectors_file = gzip.open(vectors_file_name,'rb')
    vectors, labels = cPickle.load(vectors_file)
    print "Vectors and Labels loaded from pickle file"
    print "Vectors:"
    print "Type of object: ",type(vectors)," ; Shape: ",vectors.shape
    print ""
    print "Labels:"
    print "Type of object: ",type(labels)," ; Shape: ",labels.shape
except IOError:
    print "Some of the files have not been found or loaded. Check the above list and try again!"
    raise
except:
    print "Unknown error!"
    raise

Model Loaded
This model has  271644  document (tweet) vectors
Vectors and Labels loaded from pickle file
Vectors:
Type of object:  <type 'numpy.ndarray'>  ; Shape:  (271644, 100)

Labels:
Type of object:  <type 'numpy.ndarray'>  ; Shape:  (271644,)


Now you might be thinking - **"Why the hell would I need the model file/object loaded when I can get the vectors from the pickle file?"**

Because, there is immense power in that model object. Period.
You get your own version of similarity function - to calculate distances for clustering, etc etc.,
Read the documentation [here](https://radimrehurek.com/gensim/models/doc2vec.html).

## Let me show you what I mean
Here's an example of finding tweets similar to some tweet using this model

In [3]:
#Loading another file to show you tweet text for this example

import json
f = open('delta-en-index-ready.txt','r')
tweets = json.loads(f.read())
f.close()

print "Loaded tweets for demo"

Loaded tweets for demo


**Now choose a random tweet and check its text**

In [37]:
rand_tweet = tweets[8]
print rand_tweet['tweet_text']

My ultimate favorite Christmas cds are my James Taylor and Shawn Colvin holiday collections. #repeat


**Now find similar tweets and display their content**

In [38]:
matches = model.docvecs.most_similar(rand_tweet['tweet_id'])
id_list = [m[0] for m in matches]

for t in tweets:
    if t['tweet_id'] in id_list:
        print "Match::   ",t["tweet_text"]

Match::    #Best Taylor Swift Holiday Collection CHRISTMAS #CD LIMITED EDITION  #Beautiful #News 
Match::    Playing Love came down at Christmas by Shawn Colvin from the album "Holiday Songs and Lullabies" - iTunes: 
Match::    #StorySnugAdvent Alphabet: E is for Elmer  Do you have any more Xmas or wintery book sugges 
Match::    @C_Hardman Eee Lovalee Aall yerz needz a copy uv mei Xmas CD an tharll be the pawfect Christmas.
Match::    Me, day after thanksgiving: "I forgot how much I like Christmas music"
Me, 2 days after T giving: "Ill skullfuck anyone playing Xmas music"
Match::    @AuthorSAMcAuley @magicandarchery That will be my ultimate Christmas pressie You are my SANTA Love you sooooooooo much 
Match::    @SethStokesTSE Christmas is your second favorite holiday.... 
Match::    The 25 Days of Christmas is my favorite holiday
Match::    It is only 24 days until @jscari24 favorite holiday, Christmas
Match::    #Best TAYLOR SWIFT - The Holiday Collection - Christmas #CD Target Exclu

**Observe that our random tweet only had the word 'CD' which made sense of the entire tweet. Using that the model has discovered tweets describing other music or books that people are enjoying this christmas**