# pAIper for Google Colab

**If you have any questions, don't hesitate to contact Noah at noah@blacksheepfoods.com or 9728166228.**


**Instructions:** 
1. This is a notebook. It allows you to run tiny blocks ("cells") of code from your browser.
2. You must run the code cells in order (like a Jupyter Notebook)
3. For each cell, you can either click the play button to the left of the cell or press Shift + Enter.





# Setting Up
Go through the following steps and run the following cells to set up the Black Sheep Foods pAIper repository and its necessary packages.


First, clone (copy) the Black Sheep Foods repository onto this Colab runtime. If you get a "fatal" error, ignore it - this just means the repository was already copied.

In [None]:
!git clone https://github.com/bcatoto/bsf.git

We also need to retrieve the existing models, which are too large to be stored on Github. Run the cell below:


In [None]:
!wget https://paiper-test-1.s3.us-east-2.amazonaws.com/models/dataset1
!wget https://paiper-test-1.s3.us-east-2.amazonaws.com/models/dataset1.trainables.syn1neg.npy
!wget https://paiper-test-1.s3.us-east-2.amazonaws.com/models/dataset1.wv.vectors.npy

Let's now copy the models  to the corresponding bsf directories for later use.

In [None]:
!cp dataset1 '/content/bsf/paiper/food2vec/wv'
!cp dataset1.trainables.syn1neg.npy '/content/bsf/paiper/food2vec/wv'
!cp dataset1.wv.vectors.npy '/content/bsf/paiper/food2vec/wv'
print('Files copied.')

Next, we need to set the working directory to the `bsf` folder. We will also install all of the necessary packages and data.



In [None]:
import os
os.chdir("bsf")
!pip install -r requirements.txt
!cde data download
!python -m spacy download en_core_web_sm

# Food2Vec

The Food2Vec class uses gensim's [Phrases](https://radimrehurek.com/gensim/models/phrases.html) model to extract phrases from the corpus and gensim's [Word2Vec](https://radimrehurek.com/gensim/models/word2vec.html) model to form word embeddings from the data. The Food2Vec constructor takes one positional argument, `tag`, which is the label the corresponding Classifier applied to articles when storing it in the MongoDB database.

The pretrained phrasers and models should already be loaded in the `paiper/food2vec/phrasers` and `paiper/food2vec/wv` folder. Load `dataset1` by running the code below.

In [None]:
from paiper.food2vec import Food2Vec

model = Food2Vec('dataset1')
model.load_phraser()
model.load_wv()
print('Model loaded.')

Model loaded.


The `most_similar()` function prints a list of words most similar to the queried word based on the corpus the Word2Vec model is trained on. The function takes two arguments:

*   `term`: The query term for which the most similar terms will be returned
*   `topn`: Defaults to 1, the number of results  to return

There are two optional arguments if you want to add a math-based filter to the results: 

* `vector_math`: Defaults to False, boolean flag if you want to add a post-processing step. 
* `closer`: Defaults to empty quotes, one additional term with a positive connotation
* `farther`: Defaults to empty quotes, one additional term with a negative connotation

You can add and subtract vectors to the initial vector to achieve different results. For instance, if you wanted to find words that were similar to "flavor" but close to "plant" and far from "meat", you could write:
```
model.most_similar('flavor', vector_math=True, topn=5, closer='plant', farther='meat')
```

We recommend experimenting with different filter words. Let us know if you get any interesting results!

In [None]:
model.most_similar('flavor', vector_math=True, topn=5, closer='plant', farther='meat')


In [None]:
model.most_similar('dog', topn=5)

The `analogy()` function prints a list of words that complete a given analogy based on the corpus the Word2Vec model is trained on. The format of the analogy as follows:

> `same` is to `opp` as `term` is to `analogy()`
>
> Example: cow is to beef as pig is to what?

The function takes three positional arguments and one optional argument:

*   `term`: The term to find the corresponding analogy to
*   `same`: The term in the given analogy that corresponds to the `term`
*   `opp`: The term in the given analogy that corresponds to the resulting term
*   `topn`: Defaults to 1, the number of results to return

The order of the words is `model.analogy(term, same, opp, # of results you want)`
You can think of the order as "pig is to what as cow is to beef?" ('pig', 'cow', 'beef', topn=5)

In [None]:
model.analogy('pig', 'cow', 'beef', topn=5)