<a href="https://colab.research.google.com/github/RajaSuhashKesari/MyDataEngineeringPractices/blob/main/Creating_and_using_UDF_in_Pyspark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**`Importing packages`**

In [6]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import udf,col
spark = SparkSession.builder.appName('Creating and using UDF in Pyspark').getOrCreate()

# **`Data`**

In [2]:
#data to create dataframe
data = [("The product is great",),
        ("Worst service ever",),
        ("Okay experience",),
        ("Not good at all",),
        ("Happy with the support",),
        (None,),
        ("This is average",)]

# **`Creating Dataframe`**

In [3]:
feedbacks_df = spark.createDataFrame(data=data,schema=['review'])
feedbacks_df.show()

+--------------------+
|              review|
+--------------------+
|The product is great|
|  Worst service ever|
|     Okay experience|
|     Not good at all|
|Happy with the su...|
|                NULL|
|     This is average|
+--------------------+



### **Python function which is going to classify the feedbacks into positive, negative and neutral**

In [10]:
def classify_feedback(feedback):
  if feedback == None:
    return "Unknown"

  feedback = feedback.lower()
  if "great" in feedback or "excellent" in feedback or "happy" in feedback:
    return "Positive"
  elif "bad" in feedback or "not good" in feedback or "worst" in feedback:
    return "Negative"
  elif "fine" in feedback or "okay" in feedback:
    return "Neutral"
  else:
    return "Uncategorized"

# **`Register UDF in Pyspark`**

In [12]:
udf_classify_feedback = udf(classify_feedback)

# **`Using UDF in Pyspark`**

In [13]:
feedbacks_df = feedbacks_df.withColumn('sentiment', udf_classify_feedback(feedbacks_df['review']))
feedbacks_df.show()

+--------------------+-------------+
|              review|    sentiment|
+--------------------+-------------+
|The product is great|     Positive|
|  Worst service ever|     Negative|
|     Okay experience|      Neutral|
|     Not good at all|     Negative|
|Happy with the su...|     Positive|
|                NULL|      Unknown|
|     This is average|Uncategorized|
+--------------------+-------------+

