# Problem Statement: 

## Identifying which venue attributes most influence its popularity (measured by user reviews, ratings,prices, or amenities).

- In the growing Indian wedding industry—valued at over $50 billion(Rs.5000 Cr), choosing the right venue is one of the most critical and expensive decisions. 
- Platforms like WedMeGood list hundreds of venues, but users often struggle to identify the most suitable options due to information overload and lack of personalized insights.
- By analyzing what makes certain venues more popular (e.g., ratings, location, amenities), this project aims to bridge that gap using data science.
- The insights can help platforms optimize their recommendations, empower venue owners to enhance offerings, and improve the overall user decision-making experience.

## Main Problem Statement : marketing strategies to increase customer engagement in finding halls for weddings, parties, etc

# Data Collection

In [5]:
import requests
import pandas as pd
import numpy as np
import bs4 
from bs4 import BeautifulSoup
import re
import warnings
warnings.filterwarnings("ignore")

cities=["delhi-ncr","Mumbai","chennai","pune","lucknow","jaipur"]

NAME=[]
RATINGS=[]
REVIEWS=[]
TYPE=[]
LOCATION=[]
PAX=[]
ROOMS=[]
MENU_PRICE=[]
AMENITIES=[]



for page_no in range(1,31):
    for city in cities:
        url= f"https://www.wedmegood.com/vendors/{city}/wedding-venues/?page={page_no}"
        request_header={'Content-Type': 'text/html; charset=UTF-8','User-Agent': 'Chrome/101.0.0.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/119.0','Accept-Encoding': 'gzip, deflate, br'}
        page= requests.get(url, headers= request_header)
        soup= BeautifulSoup(page.text)
        section = soup.find_all("div" , class_ = "vendor-info")
    
        for i in section:
            #name
            name = i.find("a" , class_ = "vendor-detail text-bold h6")
            if name:
                NAME.append(name.text)
            else:
                NAME.append(np.nan)
        
            #ratings
            ratings = i.find("span" , class_ = "StarRatingNew fw-500 center rating-new-5 regular")
            if ratings:
                RATINGS.append(ratings.text)
            else:
                RATINGS.append(np.nan)
        
            #reviews
            rev = i.find("span" , class_ = "review-cnt regular nowrap margin-l-5")
            reviews = re.findall(r"\d+ reviews|\d+ review" , rev.text)
            if reviews:
                REVIEWS.append(reviews[0])
            else:
                REVIEWS.append(np.nan)
        
            #wedding location type
            type1 = i.find("p" , class_ = "margin-l-5")
            if type1:
                TYPE.append(type1.text)
            else:
                TYPE.append(np.nan)
        
            #location
            location = i.find("p" , class_ = "vendor-detail")
            if location:
                LOCATION.append(location.text)
            else:
                LOCATION.append(np.nan)


            #menu price
            menu_price = i.find("div" , class_ = "frow f-wrap")
            #menu_price2= re.findall("")
            if menu_price:
                MENU_PRICE.append(menu_price.text)
            else:
                MENU_PRICE.append(np.nan)


            # pax
            pax_and_rooms= i.find("div", class_="vendor-price frow margin-10 f-space-between protected-content jsEvent-onCopy")
            text2= pax_and_rooms.text
            pax= re.findall("^\d+-\d+", text2)
            if pax:
                PAX.append(pax[0])
            else:
                PAX.append(np.nan)
        
        
        
            # rooms
            rooms= re.findall(" pax(\d+)",text2)
            if rooms:
                ROOMS.append(rooms[0])
            else:
                ROOMS.append(np.nan)

            # amenities
            amenities = i.find("p" , class_ = "pointer")
            if amenities:
                AMENITIES.append(amenities.text)
            else:
                AMENITIES.append(np.nan)
            

print(NAME)
print(len(NAME))
print()
print(RATINGS)
print(len(RATINGS))
print()
print(REVIEWS)
print(len(REVIEWS))
print()
print(TYPE)
print(len(TYPE))
print(LOCATION)
print(len(LOCATION))
print()
print(MENU_PRICE)
print(len(MENU_PRICE))
print(PAX)
print(len(PAX))
print()
print(ROOMS)
print(len(ROOMS))
print()
print(AMENITIES)
print(len(AMENITIES))

['Hyatt Regency Delhi', 'The Gracious Banquets, Naraina, Delhi', 'Hotel The Royal Plaza', 'Grand Mantram', 'Park Boulevard Hotel, New Delhi', 'Radiance Tania Farms', 'Watercrest- Venue Luxe', 'Club Riviera by FNP Venues', 'Krishna Greens By Mapple Gold', 'Tivoli Bijwasan', 'Calista Resort', 'The Tivoli', 'The Ritz by Ferns N Petals', 'The Leela Ambience Convention Hotel Delhi', 'Myst by The Zora - Delhi Convention Center', 'The Grand New Delhi', 'SK Klyde Grand', 'Radisson Gurugram Udyog Vihar', 'Noormahal Palace', 'Four Points by Sheraton New Delhi, Airport Highway', 'The Royal Palms', 'Ushodhaya CSP Gardens', 'Rajan Gardens', 'The Leela Palace Chennai- Seaside Luxury', 'RITAM', "Taj Fisherman's Cove Resort & Spa Chennai", "Hotel Chandra's Inn", 'Blue Bay Beach Resort', 'Aiyavoo Mahal', 'Asirvatham Mahal', 'Sugam Resort & Convention Center', 'Shelter Beach Resort', 'The Trident', 'Annalakshmi', 'Radisson Blu Resort, Temple Bay Mamallapuram', "Rina's Venue", 'The Palace House', 'Anand 

# Data Frame Creation

In [6]:
data= {
    "Name": NAME,
    "Type": TYPE,
    "Location": LOCATION,
    "Ratings": RATINGS,
    "Reviews": REVIEWS,
    "Menu_Price": MENU_PRICE,
    "Pax":PAX,
    "Rooms": ROOMS,
    "Amenities": AMENITIES
}

In [7]:
Halls = pd.DataFrame(data)

In [8]:
Halls

Unnamed: 0,Name,Type,Location,Ratings,Reviews,Menu_Price,Pax,Rooms,Amenities
0,Hyatt Regency Delhi,"4 Star & Above Wedding Hotels, Banquet Halls","R K Puram, Delhi NCR",4.8,13 reviews,"₹5,000per plate",,,+4 more
1,"The Gracious Banquets, Naraina, Delhi","Banquet Halls, Small Function / Party Halls","West Delhi, Delhi NCR",4.9,95 reviews,"₹1,800per plate",,,+5 more
2,Hotel The Royal Plaza,"4 Star & Above Wedding Hotels, Banquet Halls","New Delhi, Delhi NCR",4.7,3 reviews,"₹4,500per plate",,,+3 more
3,Grand Mantram,"Banquet Halls, Marriage Garden / Lawns","Bandhwari, Delhi NCR",5.0,30 reviews,"₹1,850per plate",,,+7 more
4,"Park Boulevard Hotel, New Delhi","4 Star & Above Wedding Hotels, Banquet Halls","Chattarpur, Delhi NCR",5.0,55 reviews,"₹4,200per plate",,,+7 more
...,...,...,...,...,...,...,...,...,...
2975,Hotel Anju Shree Inn,"Banquet Halls, 3 Star Hotels with Banquets",Jaipur,5.0,4 reviews,₹700per plate,,,+3 more
2976,Moti Palace Marriage Garden,"Banquet Halls, Marriage Garden / Lawns","Durgapura, Jaipur",,1 review,₹500per plate,,,+1 more
2977,Hotel Indiana Pride,"Banquet Halls, 3 Star Hotels with Banquets","Lal Kothi, Jaipur",,,₹850per plate,,,+3 more
2978,Haveli Garden,Marriage Garden / Lawns,Jaipur,,,₹300per plate,,,+2 more


In [6]:
# creating csv file of data frame
Halls.to_csv('New_More_Halls_Data.csv', index = False)