# Exploratory Data Analysis - Top Indian Places to Visit

This notebook performs exploratory analysis on the dataset of top places to visit in India.

In [3]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('../data/Top Indian Places to Visit.csv')
print("Dataset loaded successfully!")
print(f"Dataset shape: {df.shape}")

Dataset loaded successfully!
Dataset shape: (325, 16)


## Dataset Overview
Let's examine the structure and content of the dataset

In [4]:
# Display first few rows
print("First 5 rows of the dataset:")
print(df.head())
print("\nDataset Info:")
print(df.info())

First 5 rows of the dataset:
   Unnamed: 0      Zone  State   City                  Name          Type  \
0           0  Northern  Delhi  Delhi            India Gate  War Memorial   
1           1  Northern  Delhi  Delhi        Humayun's Tomb          Tomb   
2           2  Northern  Delhi  Delhi     Akshardham Temple        Temple   
3           3  Northern  Delhi  Delhi  Waste to Wonder Park    Theme Park   
4           4  Northern  Delhi  Delhi         Jantar Mantar   Observatory   

  Establishment Year  time needed to visit in hrs  Google review rating  \
0               1921                          0.5                   4.6   
1               1572                          2.0                   4.5   
2               2005                          5.0                   4.6   
3               2019                          2.0                   4.1   
4               1724                          2.0                   4.2   

   Entrance Fee in INR Airport with 50km Radius Weekly Of

In [5]:
# Check for missing values
print("Missing Values:")
print(df.isnull().sum())
print("\nMissing Value Percentage:")
print((df.isnull().sum() / len(df) * 100).round(2))

Missing Values:
Unnamed: 0                            0
Zone                                  0
State                                 0
City                                  0
Name                                  0
Type                                  0
Establishment Year                    0
time needed to visit in hrs           0
Google review rating                  0
Entrance Fee in INR                   0
Airport with 50km Radius              0
Weekly Off                          293
Significance                          0
DSLR Allowed                          0
Number of google review in lakhs      0
Best Time to visit                    0
dtype: int64

Missing Value Percentage:
Unnamed: 0                           0.00
Zone                                 0.00
State                                0.00
City                                 0.00
Name                                 0.00
Type                                 0.00
Establishment Year                   0.00
time neede

## Statistical Summary
Numerical features statistics

In [6]:
# Statistical summary of numeric columns
print("\nDescriptive Statistics:")
print(df.describe().round(2))


Descriptive Statistics:
       Unnamed: 0  time needed to visit in hrs  Google review rating  \
count      325.00                       325.00                325.00   
mean       162.00                         1.81                  4.49   
std         93.96                         0.97                  0.27   
min          0.00                         0.50                  1.40   
25%         81.00                         1.00                  4.40   
50%        162.00                         1.50                  4.50   
75%        243.00                         2.00                  4.60   
max        324.00                         7.00                  4.90   

       Entrance Fee in INR  Number of google review in lakhs  
count               325.00                            325.00  
mean                115.81                              0.41  
std                 530.86                              0.65  
min                   0.00                              0.01  
25%        

## Categorical Features Distribution
Count of unique values in categorical columns

In [7]:
# Unique counts for categorical columns
categorical_cols = df.select_dtypes(include='object').columns
print("Unique value counts in categorical columns:")
for col in categorical_cols:
    print(f"\n{col}: {df[col].nunique()} unique values")
    print(df[col].value_counts().head())

Unique value counts in categorical columns:

Zone: 6 unique values
Zone
Southern    98
Northern    89
Eastern     45
Western     40
Central     39
Name: count, dtype: int64

State: 33 unique values
State
Uttar Pradesh    23
Maharastra       20
West Bengal      20
Delhi            19
Karnataka        19
Name: count, dtype: int64

City: 214 unique values
City
Delhi        16
Goa          14
Hyderabad    11
Mumbai       10
Kolkata      10
Name: count, dtype: int64

Name: 321 unique values
Name
City Palace                2
Wonderla Amusement Park    2
Thiksey Monastery          2
Ramanathaswamy Temple      2
India Gate                 1
Name: count, dtype: int64

Type: 78 unique values
Type
Temple           59
Beach            25
Fort             22
Lake             16
National Park    14
Name: count, dtype: int64

Establishment Year: 162 unique values
Establishment Year
Unknown         111
1950              5
1600              4
2013              4
12th century      4
Name: count, dtype: 

See https://pandas.pydata.org/docs/user_guide/migration-3-strings.html#string-migration-select-dtypes for details on how to write code that works with pandas 2 and 3.
  categorical_cols = df.select_dtypes(include='object').columns


## Geographic Distribution
Places distribution by Zone, State, and City

In [8]:
# Geographic distribution
print("Places by Zone:")
print(df['Zone'].value_counts())
print("\n" + "="*50)
print("\nPlaces by State (Top 10):")
print(df['State'].value_counts().head(10))
print("\n" + "="*50)
print("\nPlaces by City (Top 10):")
print(df['City'].value_counts().head(10))

Places by Zone:
Zone
Southern         98
Northern         89
Eastern          45
Western          40
Central          39
North Eastern    14
Name: count, dtype: int64


Places by State (Top 10):
State
Uttar Pradesh       23
Maharastra          20
West Bengal         20
Delhi               19
Karnataka           19
Himachal Pradesh    18
Andhra Pradesh      18
Kerala              16
Rajasthan           15
Madhya Pradesh      15
Name: count, dtype: int64


Places by City (Top 10):
City
Delhi            16
Goa              14
Hyderabad        11
Mumbai           10
Kolkata          10
Visakhapatnam     8
Bangalore         5
Ahmedabad         5
Jaipur            5
Leh               5
Name: count, dtype: int64


## Place Types Analysis
Distribution of different types of places in the dataset

In [9]:
# Place types distribution
print("Distribution of Place Types:")
print(df['Type'].value_counts())
print(f"\nTotal unique types: {df['Type'].nunique()}")

Distribution of Place Types:
Type
Temple                59
Beach                 25
Fort                  22
Lake                  16
National Park         14
                      ..
Township               1
Entertainment          1
Commercial Complex     1
Mosque                 1
Race Track             1
Name: count, Length: 78, dtype: int64

Total unique types: 78


## Ratings & Reviews Analysis
Google review ratings and number of reviews

In [10]:
# Ratings and reviews analysis
print("Google Review Ratings Statistics:")
print(f"Mean Rating: {df['Google review rating'].mean():.2f}")
print(f"Min Rating: {df['Google review rating'].min():.2f}")
print(f"Max Rating: {df['Google review rating'].max():.2f}")
print(f"Median Rating: {df['Google review rating'].median():.2f}")

print("\n" + "="*50)
print("\nNumber of Google Reviews (in lakhs) Statistics:")
print(f"Mean Reviews: {df['Number of google review in lakhs'].mean():.2f}")
print(f"Min Reviews: {df['Number of google review in lakhs'].min():.2f}")
print(f"Max Reviews: {df['Number of google review in lakhs'].max():.2f}")
print(f"Median Reviews: {df['Number of google review in lakhs'].median():.2f}")

Google Review Ratings Statistics:
Mean Rating: 4.49
Min Rating: 1.40
Max Rating: 4.90
Median Rating: 4.50


Number of Google Reviews (in lakhs) Statistics:
Mean Reviews: 0.41
Min Reviews: 0.01
Max Reviews: 7.40
Median Reviews: 0.17


## Entrance Fee & Duration Analysis
Entrance fees and time needed to visit

In [None]:
# Entrance fee analysis
print("Entrance Fee Statistics (in INR):")
print(f"Mean Fee: ₹{df['Entrance Fee in INR'].mean():.2f}")
print(f"Median Fee: ₹{df['Entrance Fee in INR'].median():.2f}")
print(f"Min Fee: ₹{df['Entrance Fee in INR'].min():.2f}")
print(f"Max Fee: ₹{df['Entrance Fee in INR'].max():.2f}")
print(f"Free Places: {(df['Entrance Fee in INR'] == 0).sum()}")

print("\n" + "="*50)
print("\nTime Needed to Visit (hours):")
print(f"Mean Time: {df['time needed to visit in hrs'].mean():.2f} hours")
print(f"Median Time: {df['time needed to visit in hrs'].median():.2f} hours")
print(f"Min Time: {df['time needed to visit in hrs'].min():.2f} hours")
print(f"Max Time: {df['time needed to visit in hrs'].max():.2f} hours")

## Significance Categories & Best Time Analysis
Place significance and recommended visit times

In [11]:
# Significance distribution
print("Places by Significance:")
print(df['Significance'].value_counts())

print("\n" + "="*50)
print("\nBest Time to Visit Distribution:")
print(df['Best Time to visit'].value_counts())

print("\n" + "="*50)
print("\nDSLR Photography Allowed:")
print(df['DSLR Allowed'].value_counts())

Places by Significance:
Significance
Historical            78
Religious             75
Nature                47
Recreational          30
Wildlife              29
Cultural              13
Scenic                10
Shopping               7
Entertainment          5
Adventure              5
Architectural          4
Botanical              3
Environmental          2
Scientific             2
Artistic               2
Sports                 2
Educational            2
Natural Wonder         2
Market                 1
Food                   1
Spiritual              1
Archaeological         1
Agricultural           1
Engineering Marvel     1
Trekking               1
Name: count, dtype: int64


Best Time to Visit Distribution:
Best Time to visit
All          164
Morning       88
Afternoon     44
Evening       26
All            1
Anytime        1
Night          1
Name: count, dtype: int64


DSLR Photography Allowed:
DSLR Allowed
Yes    265
No      60
Name: count, dtype: int64
