<a href="https://colab.research.google.com/github/azhelyazkova/demo2/blob/main/nltk101_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install all required dependencies

Why is this necessary?
To make sure our code is clean and modular, I have separated the main functions into different files - search_words.py and tagging_utils.py. However, these are just Python scripts that can be imported. These scripts expect to have the following dependency in our Python environment, so we need to make sure we install it beforehand.

💡 Always be aware of your dependencies and your environment! This will save you lots of troubleshooting time in the future.

In general, we always package and install our environment as a first step, so this is not the best practice, but it is done for the purpose of the demo.

In [None]:
pip install sklearn_crfsuite

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sklearn_crfsuite
  Downloading sklearn_crfsuite-0.3.6-py2.py3-none-any.whl (12 kB)
Collecting python-crfsuite>=0.8.3
  Downloading python_crfsuite-0.9.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-crfsuite, sklearn_crfsuite
Successfully installed python-crfsuite-0.9.9 sklearn_crfsuite-0.3.6


## Mount Google Drive
To have access to the scripts and input data, mount your drive.
The scripts expect you to have the colab_demo directory in your main Google Drive directory!

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)

Mounted at /content/gdrive/


We are simulating a deployment here - copying the required files from our source repo to the environment in which we are executing the code.

In [None]:
!cp /content/gdrive/MyDrive/colab_demo_2023-main/utils/tagging_utils.py .
!cp /content/gdrive/MyDrive/colab_demo_2023-main/utils/search_words.py .

In [None]:
import tagging_utils
import search_words

[nltk_data] Downloading collection 'all'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/abc.zip.
[nltk_data]    | Downloading package alpino to /root/nltk_data...
[nltk_data]    |   Unzipping corpora/alpino.zip.
[nltk_data]    | Downloading package averaged_perceptron_tagger to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data]    | Downloading package averaged_perceptron_tagger_ru to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping
[nltk_data]    |       taggers/averaged_perceptron_tagger_ru.zip.
[nltk_data]    | Downloading package basque_grammars to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   Unzipping grammars/basque_grammars.zip.
[nltk_data]    | Downloading package bcp47 to /root/nltk_data...
[nltk_data]    | Downloading package biocreative_ppi to
[nltk_data]    |     /root/nltk_data...
[nltk_data]    |   U

What does the search_words function do?

In [None]:
phrase = "black tea"

In [None]:
search_words.find_noun_in_phrase(phrase)

'tea'

In [None]:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

In [None]:
!wget -q https://www.apache.org/dist/spark/spark-3.2.3/spark-3.2.3-bin-hadoop3.2.tgz

In [None]:
!tar xf spark-3.2.3-bin-hadoop3.2.tgz

In [None]:
!pip install -q findspark

In [None]:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-3.2.3-bin-hadoop3.2"

In [None]:
import findspark
findspark.init()

In [None]:
from pyspark.sql import SparkSession

spark = SparkSession.builder\
        .master("local")\
        .appName("Colab")\
        .config('spark.ui.port', '4050')\
        .getOrCreate()

Find the products in the Grocery dataset which most closely match the search phrase.

In [None]:
search_words.run_calculations(spark, phrase)



Fetching data.
Stash Organic Black & Green Tea Earl Grey  2.0
Suave Professionals Black Raspberry + White Tea Color Care Conditioner With Bonus Touchable Finish Hairspray 2.0
Twinings Of London Premium Black Tea Blackcurrant Breeze  2.0
Stash Green & Black Tea Bags Fusion Breakfast With Matcha  2.0
Lipton Natural Energy Premium Black Tea Bags  2.0
Choice Organic Teas Black Tea Earl Grey  2.0
Tea Iced Blushbry Black 2.0
Harney & Sons Black Tea Blends English Breakfast  2.0
Bigelow Black Tea Pomegranate  2.0
Harris Decaffeinated Black Tea Bags  2.0
Twinings Of London Pure Black Iced Tea Keurig K 2.0
Honest Tea Organic Black Forest Berry 2.0
Guaranteed Value Decaffeinated Black Tea  2.0
Nature's Promise Black Tea With Plum 2.0
Stash Black Tea Earl Grey  2.0
Bigelow Black Tea Spiced Chai  2.0
Tazo Chai Spiced Black Tea Latte Concentrate 2.0
Newman's Own Organics 100ct Organic Black Tea 2.0
Lipton Blackberry Flavored Black Tea Bags 2.0
Simply Enjoy Earl Grey Creme Organic Black Tea Bags  2.