#  INTERNSHIP PROJECT - COGNIFYZ TECHNOLOGIES <hr>
# RESTAURANTS ANALYSIS<hr>
# LEVEL- 2


# Task-1  Restaurant Reviews


# <hr>Problem Statement : 

## Objective: 
## 1. Analyze the distribution of aggregate ratings and determine the most common rating range. 
## 2. Calculate the average number of votes received by restaurants.

## Overview
### The task is to analyze the restaurant ratings dataset to gain insights into the distribution of aggregate ratings and the engagement level of customers based on the number of votes received by restaurants. <hr>

# IMPORT LIBRARIES

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# LOAD DATASET

In [None]:
df=pd.read_csv('Restaurent_data.csv')
df

# Handling Dataset

In [None]:
df.dropna(subset=['Aggregate rating', 'Votes'], inplace=True)
df = df.dropna(subset=['Cuisines'])

# SOLVE PROBLEM-1

In [None]:
# Distribution of Aggregate Ratings
plt.figure(figsize=(10, 6))
sns.histplot(df['Aggregate rating'], bins=20, kde=False, color='lightgreen')
plt.title('Rating Distribution')
plt.xlabel('Aggregate Rating')
plt.ylabel('Number of Restaurants')
plt.show()

In [None]:
# Most common rating range
common_range = df['Aggregate rating'].mode()[0]
print("Most common rating range:",common_range)

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(df['Aggregate rating'], bins=20, kde=False, color='darkgreen')
plt.title('Rating Distribution')
plt.xlabel('Aggregate Rating')
plt.ylabel('Number of Restaurants')
plt.axvline(x=common_range, color='red', linestyle='', label=f'Most Common Rating Range: {common_range}')
plt.legend()
plt.show()

# Solve Problem -2

In [None]:
# Average Number of Votes Calculation
average_votes = df['Votes'].mean()
print("Average number of votes received by restaurants: {:.2f}".format(average_votes))

# Findings and Insights<hr>
## 1. Distribution of Ratings:
### The most common rating range was identified, indicating the general level of customer satisfaction with the restaurants.<br>

## 2.Average Number of Votes:
### The average number of votes received by restaurants was calculated, providing insight into the engagement and popularity of the restaurants among customers.<hr>

# Task-2  Cuisine Combination<hr>

# <hr>Problem Statement : 

## Objective: 
## 1. Identify the most common combinations of cuisines in the dataset.
## 2. Determine if certain cuisine combinations tend to have higher ratings.<hr>

## Overview
### The task is to analyze the restaurant ratings dataset to gain insights into the distribution of aggregate ratings and the engagement level of customers based on the number of votes received by restaurants. <hr>

# Solve Problem -1

In [None]:
# Split the cominations into string lists and create combinations for each restaurant
cuisine_combinations = df['Cuisines'].astype(str).str.split(', ').apply(sorted).apply(', '.join)

In [None]:
# find the commom cuisines combinations
common_combinations = cuisine_combinations.value_counts().head(10)
print("Most Common Cuisine Combinations:\n",common_combinations)

# Data Visualization- PROBLEM-2

In [None]:
#visualize most common cuisines
plt.figure(figsize=(10, 6))
plt.barh(common_combinations.index, common_combinations.values, color='skyblue')
plt.xlabel('Number of Restaurants')
plt.ylabel('Cuisine Combination')
plt.title('Top 5 Most Common Cuisine Combinations')
plt.gca().invert_yaxis()  # Invert y-axis to show the most common combination at the top
plt.show()

# Solve Problem-2

In [None]:
# Calculate the average rating for each cuisine combination
average_ratings_by_combination = df.groupby(cuisine_combinations)['Aggregate rating'].mean()
print("\nAverage Ratings by Cuisine Combination:")
print(average_ratings_by_combination)

# Data Visualization PROBLEM-2<hr>

In [None]:
plt.subplot(1, 2, 2)
plt.barh(avg_rating_combinations.index, avg_rating_combinations.values, color='pink')
plt.xlabel('Average Rating')
plt.title('Average Rating for Top 5 Cuisine Combinations')
plt.gca().invert_yaxis()  # Ensure the order matches the first plot
plt.show()

# Findings and Insights
## 1. The most common cuisine combinations are visualized, providing insights into popular pairings.
## 2. The average ratings for different cuisine combinations reveal customer preferences and the quality of these combinations.
## 3. The scatter plot helps identify if certain cuisine combinations tend to have higher ratings, aiding in understanding customer satisfaction trends.

# Task-3  Geographic Analysis
<hr>

# <hr>Problem Statement : 

## Objective: 
## 1. Plot the locations of restaurants on a map using longitude and latitude coordinates.
## 2. Identify any patterns or clusters of restaurants in specific areas

## Overview
### The goal of this task is to visualize the distribution of restaurants based on their geographical coordinates (longitude and latitude). By plotting the locations on a map, we aim to identify patterns or clusters where restaurants are concentrated. <hr>

In [None]:
pip install folium

In [None]:
import folium
from folium.plugins import MarkerCluster

# Solve Problem -1 (Plot location of Restaurant)

In [None]:
# Initialize the map centered around the mean coordinates
map_center = [df['Latitude'].mean(), df['Longitude'].mean()]
map = folium.Map(location=map_center, zoom_start=12)

In [None]:
# Add a marker cluster to layer
marker_cluster = MarkerCluster().add_to(map)

# Add marker to the map
for idx, row in df.iterrows():
    folium.Marker(
        location=[row['Latitude'], row['Longitude']],
        popup=f"Restaurant: {row['Restaurant Name']}",  
        icon=folium.Icon(color='blue', icon='info-sign')
    ).add_to(marker_cluster)
    
map   

# Solve Problem -2

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
from sklearn.preprocessing import StandardScaler

In [None]:
coordinates = df[['Longitude', 'Latitude']].values

# Standardize coordinates
scaler = StandardScaler()
scaled = scaler.fit_transform(coordinates)

In [None]:
# Perform hierarchical clustering
Z = linkage(scaled_coords, method='ward')

# Define number of clusters
n = 5  # You can adjust the number of clusters

# Assign clusters
df['Cluster'] = fcluster(Z, n, criterion='maxclust')

In [None]:
# Plot the heatmap
plt.figure(figsize=(12, 8))

# Plot each cluster with a different color
for cluster in range(1, n + 1):
    cd = df[df['Cluster'] == cluster]
    plt.scatter(cd['Longitude'], cd['Latitude'], 
                label=f'Cluster {cluster}', alpha=0.5)

# Add titles and labels
plt.title('Restaurant Locations Heatmap')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.legend()
plt.grid(True)
plt.show()

# Findings and Insights<hr>
## 1. Restaurant Locations: The map visualization shows dense areas with high restaurant concentrations, revealing popular dining districts.
## 2. Clusters: Hierachical clustering identifies distinct regions with high or low restaurant densities, highlighting potential business opportunities.<br><hr>
# Recommendations:
## Expansion: Target high-density areas for new ventures.
## Marketing: Focus promotions in popular zones.
## Development: Explore low-density areas for growth potential.<hr><br>

# Task-4  Restaurant Chains
<hr>

# <hr>Problem Statement : 

## Objective: 
## 1. Identify if there are any restaurant chains present in the dataset.
## 2. Analyze the ratings and popularity of different restaurant chains.

## Overview
###  <hr>The task involves analyzing restaurant chains within a dataset to understand their performance. Specifically, we aim to identify if any restaurant chains exist, evaluate their average ratings, and assess their popularity. This will provide insights into how different chains perform relative to one another and help identify any trends or patterns in their ratings and popularity.

# Solve Problem -1

In [None]:
# Count occurrences of each restaurant name
countc = df['Restaurant Name'].value_counts()

# Filter to find names that appear more than once (indicating chains)
chains = countc[countc > 1].index.tolist()
print("Restaurant Chains Identified:",chains)

# Solve Problem -2

In [None]:
# Create a DataFrame for only chains( filter data)
chain_data = df[df['Restaurant Name'].isin(chains)]

In [None]:
# Calculate average ratings and popularity (total votes) for each chain
chainratings = chain_data.groupby('Restaurant Name')['Aggregate rating'].mean()
chain_votes = chain_data.groupby('Restaurant Name')['Votes'].sum()

In [None]:
# ANALYZE THE RESULT 
Result = pd.DataFrame({'Average Rating': chainratings, 'Total Votes': chain_votes})
print(Result)

 # Finding and insights 
## 1. Identification of Chains:
### Multiple restaurants with the same name were identified, indicating the presence of several restaurant chains in the dataset.

## 2. Performance Analysis:
### Average Ratings: The average ratings for each restaurant chain were calculated, providing insights into customer satisfaction.
### Popularity: The total votes for each chain were summed, indicating the level of customer engagement and popularity.
### Patterns: Chains with higher average ratings tend to be more popular, indicating that customer satisfaction is linked to higher engagement and possibly better business performance.<hr> 

# Recommendations
### Focus on High-Rated Chains: Chains with higher average ratings should be given priority for marketing campaigns and promotional activities to leverage their positive customer perception.

### Improve Low-Rated Chains: Chains with lower average ratings should undergo service and quality improvements to boost customer satisfaction and overall performance.

### Monitor Trends: Continuously track the ratings and popularity of these chains to detect any changes or trends over time, and take proactive measures based on the insights gathered. <hr>

# Conclusion of Level-2<hr>

## 1. The majority of restaurants received moderate to high ratings, and restaurants with higher ratings tended to have more customer engagement.
## 2.  Certain cuisine combinations were more common and also received higher ratings, indicating a preference for specific cuisine pairings among customers.
## 3. Certain areas had a high concentration of restaurants, indicating popular dining locations and potential target areas for new restaurant openings.
## 4. Higher-rated chains were more popular, suggesting that customer satisfaction significantly impacts restaurant popularity.<hr>

## Overall Skills Learned
### 1. Data Analysis: Proficient in handling, cleaning, and analyzing datasets to extract meaningful insights.
### 2. Visualization: Developed skills in creating various plots to visualize data distribution, patterns, and relationships.
### 3. Statistical Analysis: Enhanced ability to calculate averages, distributions, and perform comparative analysis.
### 4. Geospatial Analysis: Gained experience in plotting geospatial data to identify geographical patterns and clusters.