<a href="https://colab.research.google.com/github/ashutosh-linux/AIML/blob/main/2303A52328_SETB23_B37.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Prediction of Customer interest in products sold Online using Turkish Reviews
1. Identify the top 5 reasons products with low user reviews
2. Counts the products with similar ratings and list the products with >4 ratings
3. Find the max and min ratings of products based on reviews
4. Name the product was most reviewed by customers
5. Name the reviews given for electronic products and edible products.
6. Apply either Classification Model or Clustering Model to evaluate the dataset.

In [3]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder

# Load the dataset
file_path = '/content/Turkish_Reviews.csv'
reviews = pd.read_csv(file_path)

# 1. Identify the Top 5 Reasons for Low Reviews (Ratings <= 2)
low_reviews = reviews[reviews['Rating'] <= 2]
low_review_reasons = low_reviews['Review_Text'].str.split(expand=True).stack().value_counts().head(5)
print("Top 5 Reasons for Low Reviews:\n", low_review_reasons)

# 2. Count Products with Similar Ratings & List Products with >4 Ratings
rating_counts = reviews.groupby('Rating')['Product_Name'].count()
products_above_4 = reviews[reviews['Rating'] > 4]['Product_Name'].unique()
print("\nProducts Count by Rating:\n", rating_counts)
print("\nProducts with Ratings > 4:\n", products_above_4)

# 3. Find the Max and Min Ratings for Each Product
max_min_ratings = reviews.groupby('Product_Name')['Rating'].agg(['max', 'min'])
print("\nMax and Min Ratings for Each Product:\n", max_min_ratings)

# 4. Name the Most Reviewed Product
most_reviewed_product = reviews['Product_Name'].value_counts().idxmax()
print("\nMost Reviewed Product:", most_reviewed_product)

# 5. Extract Reviews for Electronics and Edibles
electronics_reviews = reviews[reviews['Category'] == 'Electronics']['Review_Text']
edibles_reviews = reviews[reviews['Category'] == 'Edibles']['Review_Text']
print("\nReviews for Electronics:\n", electronics_reviews.tolist())
print("\nReviews for Edibles:\n", edibles_reviews.tolist())

# 6. Apply Clustering Model to Evaluate the Dataset
# Preprocess the Review Text for Clustering
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(reviews['Review_Text'])

# Apply K-Means Clustering
num_clusters = 3
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
reviews['Cluster'] = kmeans.fit_predict(X)

# Evaluate Clusters
print("\nClustered Reviews:")
for cluster in range(num_clusters):
    print(f"\nCluster {cluster}:")
    print(reviews[reviews['Cluster'] == cluster][['Product_Name', 'Review_Text']])

Top 5 Reasons for Low Reviews:
 Poor        1
watery      1
Too         1
Overripe    1
food        1
Name: count, dtype: int64

Products Count by Rating:
 Rating
1    4
2    4
3    1
4    5
5    6
Name: Product_Name, dtype: int64

Products with Ratings > 4:
 ['Laptop' 'Cookies' 'Bread' 'TV' 'Apple' 'Orange']

Max and Min Ratings for Each Product:
                  max  min
Product_Name             
Apple              5    5
Banana             2    2
Bread              5    5
Cereal             4    4
Charger            1    1
Chips              1    1
Chocolate          4    4
Cookies            5    5
Headphones         2    2
Juice              2    2
Keyboard           3    3
Laptop             5    5
Microwave          4    4
Milk               1    1
Orange             5    5
Oven               2    2
Refrigerator       1    1
Smartphone         4    4
TV                 5    5
Washing Machine    4    4

Most Reviewed Product: Laptop

Reviews for Electronics:
 ['Excellent product