# Understanding Tourism in India

### Introduction

Tourism in India is important for the country's economy and is growing rapidly. The Ministry of Tourism designs national policies for the development and promotion of tourism. The Ministry consults and collaborates with other stakeholders in the sector including various central ministries/agencies, state governments, union territories and private sector representatives. Concerted efforts are being made to promote niche tourism products such as rural, cruise, medical and eco-tourism.The Ministry of Tourism maintains the Incredible India campaign focused on promoting the tourism in India.

### Business Problem

One of the importances of tourism is the employment opportunity which it offers to the people of that country. The purpose of this project is to analyze the tourist places of a given state in India and try to recommend the best location where they can open a restaurant or lodging to make the best use of the opportunity.

This project helps the people who are interested in opening a restaurant, lodging, transport services.

### Data Source

Districts in India wikipedia page: https://en.wikipedia.org/wiki/List_of_districts_in_India is the major source of data that is being used to obtain all the districts of India.

Here, we are considering one of the states of India, __Tamil Nadu__.

The beautifulsoup4 package, to scrape information from the web pages and convert it into a pandas dataframe. Then we use Python geopy package to get latitude and longitude for all the districts. Foursquare API is used to understand the neighbourhoods in the districts to give clear idea to the stakeholders.

In [2]:
import numpy as np # library to handle data in a vectorized manner
from bs4 import BeautifulSoup
import pandas as pd # library for data analsysis
import requests # library to handle requests
from geopy.geocoders import Nominatim

Wikipedia URL

In [14]:
URL = 'https://en.wikipedia.org/wiki/List_of_districts_of_Tamil_Nadu'

Requesting and Getting response using the URL

In [15]:
s = requests.Session()
response = s.get(URL, timeout=10)

Using HTML Parser to parse through the page's source code

In [16]:
soup = BeautifulSoup(response.content, 'html.parser')
pretty_soup = soup.prettify()

Using the correct Class id to scrape the table

In [19]:
table_ = soup.find('table', {"class":'wikitable sortable'})

Extracting each row from the HTML file and storing it in the list

In [20]:
list_row = []
for row in table_.findAll("tr"):
    list_row.append(row)

Creating a Dataframe to store the table values

In [52]:
data = pd.DataFrame(columns = ['District', 'Code', 'Area', 'Population', 'Population_Density'])

Filtering out only the table values and storing it in the DataFrame

In [53]:
for i in range(1, len(list_row)):
    data.loc[i, 'District'] = str(list_row[i]).split('\n')[3].split('>')[2].split('<')[0]
    data.loc[i, 'Code'] = str(list_row[i]).split('\n')[5].split('>')[1]
    if len(str(list_row[i]).split('\n')[13].split('>')[1].split('<')) == 1:
        data.loc[i, 'Area'] = str(list_row[i]).split('\n')[13].split('>')[1]
    else:
        data.loc[i, 'Area'] = str(list_row[i]).split('\n')[13].split('>')[1].split('<')[0]
    data.loc[i, 'Population'] = str(list_row[i]).split('\n')[15].split('>')[1]
    data.loc[i, 'Population_Density'] = str(list_row[i]).split('\n')[17].split('>')[1]

In [59]:
data["Latitude"] = ""
data["Longitude"] = ""
for i in range(1, data.shape[0]):
    geolocator = Nominatim(user_agent="my_user_agent")
    loc = geolocator.geocode(data.loc[i, 'District'])
    data.loc[i, 'Latitude'] = loc.latitude
    data.loc[i, 'Longitude'] = loc.longitude

In [71]:
data.head()

Unnamed: 0,District,Code,Area,Population,Population_Density,Latitude,Longitude
1,Ariyalur,AR,1949.31,754894,390,11.076,79.117455
2,Chengalpattu,CGL,2944.96,2556244,868,12.6841,79.983637
3,Chennai,CH,426.0,4646732,26076,13.0837,80.270186
4,Coimbatore,CO,4723.0,3458045,732,11.0018,76.962842
5,Cuddalore,CU,3703.0,2605914,709,11.7564,79.763464
