# Overview
This tutorial is designed to use the pre-trained  `word2vec_udemy.model` model in order to predict the price of your Udemy course based on a list of keywords. 
Note that we will be using the `raw_udemy_databse.csv` file.

# Steps

1.   Import the libraries
2.   Import the database and the model
    * word2vec_udemy.model
    * raw_udemy_databse.csv

3.   Define some necessary functions
    *   findTitle()
    *   Average()

4.   Define the predictPrice() function - where the actual price prediction happens
5.   Test the predictPrice() function

# Import the libraries

In [9]:
import pandas as pd
import numpy as np
import re
import nltk
from gensim.models import Word2Vec
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Set the Databse & Model path 
Note: Make sure the files is present in your directory

In [10]:
DATABSE_PATH = "/content/cleaned_udemy_database.csv"
MODEL_PATH = "/content/word2vec_udemy.model.model"

In [11]:
# creating data frame object from csv file
dataframe = pd.read_csv(DATABSE_PATH)

# Filter dataFrame by substring criteria
Find rows of the dataframe that contain a given substring in the `title` column.
<br>
Note: the function returns a dataframe

In [12]:
def findTitle(dfObj, value, coloumn='title'):
  contain_values = dfObj[dfObj[coloumn].str.contains(value)]
  return contain_values

# Calculate the mean
We will use this function is in the predictPrice() function

In [2]:
def Average(lst):
    return sum(lst) / len(lst)

#Predict Price
This is the main function where the price prediction happens on the basis of the given keywords input. The function operates as follows:


1. First, find a list of topn similar words based on the given keywords
2. Second, for each of the similiar_word in the list, find courses that contain the similiar_word substring in their titles (using `findTitle()` function), calculate the average of their prices (using `average()` function), and return the predicted price



In [18]:
def predictPrice(keyWords, topn=6):
  similarKeyWords_list = []
  mean_list = []
  try:
    model = Word2Vec.load(MODEL_PATH)
    similarKeyWords = model.wv.most_similar(keyWords, topn=topn)  

  except Exception as e:
    print(e)
    return
 
  for keyword in similarKeyWords:
    if not keyword[0].isdigit() and keyword[0]!='':
      similarKeyWords_list.append(keyword[0])
  

  for item in similarKeyWords_list:
    myDataframe = findTitle(dataframe, str(item))
    mean = myDataframe['price'].mean()
    mean_list.append(mean)

  print(f"Top {topn} key words with the highest similarity:  {similarKeyWords_list}" )
  print(f"Predicted price:  {round(Average(mean_list))} Euro")

  



---



---



# Test the predictPrice() function

In [None]:
predictPrice(keyWords='python',topn=4)