Skip to content

Harin329/Digest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digest

Welcome to Digest. This is an exploration of the Word2Vec algorithm. The goal of this project is to create an API that accepts a block of recipe steps and returns a parsed list of steps. The API is being applied in the app Umami. The process of the project involves downloading a recipe set (RecipeNLG), using Google's Word2Vec algorithm to create word vector mappings, importing mappings into spaCy, calculating average vectors for a set of ingredients and actions and finally using these metrics to determine if a given recipe step is valid from a block of text.

Prerequisites

Local Development

  1. Clone the repo
  2. Create Virtual Environment
  3. Run pip install -r requirements.txt
  4. Run uvicorn main:app --reload

Usage

  1. Git clone latest from repo
  2. Download recipeNLG dataset into data/ folder and unzip into csv format
  3. Run python3 process.py to process the csv into a corpus text
  4. make word2vec algorithm and run the commands according to the instructions ./word2vec -train ../data/directions.txt -output vectors.txt -cbow 0 -size 300 -window 8 -negative 25 -hs 0 -sample 1e-4 -threads 24 -binary 0 -iter 15
  5. Use the generated .txt file to train the spaCy model with python -m spacy init vectors en ./word2vec/vectors.txt ./model
  6. Load the model into main.py and run GET token
  7. Deploy & Enjoy!
    • Run zip -r Digest.zip . -x '.??*' to zip the project
    • Upload zip to Azure with:
    curl -X POST \               
    -H 'Content-Type: application/zip' \
    -u '$DigestAPI' \
    -T Digest.zip \
    https://digestapi.scm.azurewebsites.net/api/zipdeploy
    

About

Digest a block of test into recipe steps!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages