In [None]:
import pandas as pd
import numpy as np
# Visualize the category ratings distribution using a bar chart
import matplotlib.pyplot as plt

In [None]:
bigbasket_data = pd.read_excel('BigBasket.xlsx')

In [None]:
bigbasket_data

In [None]:
# Task for Lab Exercise 1
# Calculate the average rating for each product category
average_ratings_by_category = bigbasket_data.groupby('category')['rating'].mean().sort_values(ascending=False)



plt.figure(figsize=(10, 6))
average_ratings_by_category.plot(kind='bar', color='skyblue')
plt.title('Average Customer Rating by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Average Rating')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

# Display the ranking of categories
average_ratings_by_category

The average ratings for each product category have been calculated and ranked. Here are some insights from the analysis:

	•	Top-rated categories:
	•	“Baby Care” (4.08)
	•	“Foodgrains, Oil & Masala” (4.08)
	•	“Bakery, Cakes & Dairy” (4.06)
	•	Lowest-rated categories:
	•	“Kitchen, Garden & Pets” (3.57)
	•	“Cleaning & Household” (3.78)

Categories like “Eggs, Meat & Fish” and “Fruits & Vegetables” do not have ratings, possibly due to missing data.

Marketing Strategy Suggestions:

	•	Promote high-rated categories:
	•	Emphasize categories like “Baby Care” and “Foodgrains, Oil & Masala” in marketing campaigns, leveraging their positive reviews to attract customers.
	•	Improve lower-rated categories:
	•	For “Kitchen, Garden & Pets” and “Cleaning & Household,” consider targeted promotions like discounts, improving product quality, or customer feedback initiatives.

Q4

In [None]:
# Display the first few rows of the dataset to understand its structure
bigbasket_data.head()

In [None]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Selecting the relevant columns for clustering
data_for_clustering = bigbasket_data[['sale_price', 'rating']].dropna()

# Standardizing the data for clustering
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data_for_clustering)

# Using the Elbow Method to find the optimal number of clusters
inertia = []
range_n_clusters = range(1, 11)

for k in range_n_clusters:
    kmeans = KMeans(n_clusters=k, random_state=0)
    kmeans.fit(data_scaled)
    inertia.append(kmeans.inertia_)

# Plotting the Elbow curve
plt.figure(figsize=(8, 5))
plt.plot(range_n_clusters, inertia, marker='o')
plt.title("Elbow Method for Optimal k")
plt.xlabel("Number of clusters (k)")
plt.ylabel("Inertia")
plt.show()

The Elbow Method plot indicates the optimal number of clusters. Typically, the “elbow” point suggests a good balance between model simplicity and accuracy. Based on this, I’ll select an appropriate number of clusters (likely around 3 or 4) and proceed with clustering and visualizing the results. 

In [None]:
# Choosing 4 clusters based on the elbow method and applying K-Means
kmeans = KMeans(n_clusters=4, random_state=0)
clusters = kmeans.fit_predict(data_scaled)

# Adding cluster labels to the original dataset for analysis
data_for_clustering['Cluster'] = clusters

# Inverse transform to get back to original scale for interpretation
data_for_clustering[['sale_price', 'rating']] = scaler.inverse_transform(data_scaled)

# Visualizing the clusters
plt.figure(figsize=(10, 7))
plt.scatter(data_for_clustering['sale_price'], data_for_clustering['rating'], c=data_for_clustering['Cluster'], cmap='viridis', s=50)
plt.colorbar(label='Cluster')
plt.xlabel("Sale Price")
plt.ylabel("Rating")
plt.title("K-Means Clustering of Products by Sale Price and Rating")
plt.show()

The scatter plot shows the segmentation of products into four clusters based on sale price and rating. Here’s an interpretation of each cluster and potential marketing insights:

	1.	Low Price / High Rating (e.g., Cluster 1): These products are affordable and highly rated, indicating strong customer satisfaction. Marketing could focus on highlighting value for money, possibly bundling these items with other popular products to encourage cross-selling.
	2.	Low Price / Low Rating (e.g., Cluster 2): These products are budget-friendly but have lower ratings. Marketing strategies could focus on improving product perception, possibly by encouraging reviews or highlighting positive aspects. Promoting discounts or limited-time offers could also attract attention.
	3.	High Price / High Rating (e.g., Cluster 3): High-cost, high-rated products signify premium quality. These can be marketed as luxury or premium items, targeting customers who prioritize quality over price. Emphasizing features and unique selling points in advertisements would be effective.
	4.	High Price / Low Rating (e.g., Cluster 4): These products are expensive yet not highly rated, suggesting they may not meet customer expectations. Marketing could focus on highlighting improvements, collecting feedback for adjustments, or providing assurance with money-back guarantees to rebuild customer trust.

These segments offer tailored approaches to enhance customer satisfaction and drive sales across different price and quality perceptions.

Lab Exercise 5:

Background:
SocialConnect, a leading social media platform, is seeking to enhance its advertising revenue by better understanding the connections between users and their interactions on the platform. They have collected extensive data on user demographics, engagement metrics, and social connections. The marketing team believes that employing link analysis and online analytical processing (OLAP) techniques will enable them to uncover valuable insights for targeted advertising and content personalization.

Question:
As a data analyst at SocialConnect, develop a comprehensive plan for utilizing both link analysis and OLAP techniques to extract actionable insights from the platform's data. Explain how you would integrate these two methods, detailing the specific steps involved in conducting link analysis and OLAP. Provide examples of the types of insights that could be gained through each approach and discuss how these insights could be applied to improve SocialConnect's advertising revenue and user experience. Justify your recommendations and address any potential challenges in implementing these techniques.


Q5

Utilizing Link Analysis and OLAP for SocialConnect

Task:

As a data analyst at SocialConnect, develop a plan for using both link analysis and OLAP (Online Analytical Processing) techniques to extract actionable insights from user demographics, engagement metrics, and social connections. These insights will be used to improve advertising revenue and content personalization.

Solution Plan:

	1.	Data Sources:
	•	User Demographics: Age, gender, location, occupation.
	•	Engagement Metrics: Likes, comments, shares, time spent on platform, click-through rates.
	•	Social Connections: Friend networks, interaction frequency between users.
	2.	Link Analysis:
	•	Purpose: Analyze the relationships between users to discover influencers, communities, and engagement patterns.
	•	Techniques:
	•	Network Graphs: Represent users as nodes and interactions as edges to identify clusters of highly connected users.
	•	Centrality Measures: Identify key influencers based on their position in the network (e.g., users with the highest betweenness or degree centrality).
	•	Community Detection: Detect groups of users with common interests or behaviors.
Application:
	•	Target these influential users and communities with personalized ads to increase engagement and revenue.
	3.	OLAP:
	•	Purpose: Perform multidimensional analysis on user data, breaking it down by demographics, engagement levels, and activity types.
	•	OLAP Cube: Create a cube that aggregates metrics like total engagement (e.g., total likes, shares) based on dimensions such as age group, region, and social group.
	•	Drill-Down and Roll-Up: Allow granular exploration of data (e.g., exploring engagement by city, then drilling down to age groups).
	4.	Applications:
	•	Targeted Advertising: Use OLAP to identify high-engagement demographics and locations for targeted ad campaigns.
	•	Content Personalization: Leverage engagement patterns from different user groups to recommend content.

Challenges:

	•	Data Integration: Combining link analysis and OLAP insights from different data sources can be complex and require sophisticated data models.
	•	Scalability: Handling large amounts of user interaction data may need advanced computing resources.