<h1 align="center"><font size="5">LOCATION RECOMMENDER SYSTEM FOR A TOURISM-BASED HOTEL</font></h1>
<h1 align="center"><font size="5">CASE STUDY: LOS ANGELES</font></h1>

In this notebook, the best locations for situating a hotel in los angeles are recommended based on the kinds of venues that are in their proximity. The list of neighborhoods are scraped from a wikipedia page and their coordinates are generated using the geopy library and manual entry. The neighborhoods are explored by leveraging foursquare location data

## Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#ref1">Data Acquisition and Cleaning</a></li>
        <li><a href="#ref2">Explore the Neighborhoods</a></li>
        <li><a href="#ref3">Designing the Recommender System</a></li>
    </ol>
</div>
<br>

### Import necessary dependencies

In [1]:
from bs4 import BeautifulSoup as bs # for scraping

import requests # for sending web requests

# data manipulation
import pandas as pd
from pandas import json_normalize
import numpy as np

# for obtaining the coordinates of a location
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

# visualization tools
import folium
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline


### This code is used to suppress all warnings

In [2]:
%%javascript
(function(on) {
const e=$( "<a>Setup failed</a>" );
const ns="js_jupyter_suppress_warnings";
var cssrules=$("#"+ns);
if(!cssrules.length) cssrules = $("<style id='"+ns+"' type='text/css'>div.output_stderr { } </style>").appendTo("head");
e.click(function() {
    var s='Showing';  
    cssrules.empty()
    if(on) {
        s='Hiding';
        cssrules.append("div.output_stderr, div[data-mime-type*='.stderr'] { display:none; }");
    }
    e.text(s+' warnings (click to toggle)');
    on=!on;
}).click();
$(element).append(e);
})(true);

<IPython.core.display.Javascript object>

<a id="ref1"></a>
# 1. Data Acquisition and Cleaning

**Send a GET request to the URL of the wikipedia page containing the data**

In [3]:
url = "https://en.wikipedia.org/wiki/List_of_districts_and_neighborhoods_of_Los_Angeles"
html_content = requests.get(url).text

**Fetch and parse the data using BeautifulSoup**

In [4]:
soup = bs(html_content, "html5lib")

**Retrieve data**

In [5]:
raw_data = soup.find_all("div", attrs={"class": "div-col"})

**Clean the raw data and store it in a list**

In [6]:
neighborhoods = []
for data in raw_data:
    for li in data.ul.find_all("li"):
        neighborhoods.append(li.text.split("[")[0].strip())

How many neighborhoods were retrieved?

In [7]:
len(neighborhoods)

199

View the first five neighborhoods

In [8]:
neighborhoods[:5]

['Angelino Heights',
 'Angeles Mesa',
 'Angelus Vista',
 'Arleta',
 'Arlington Heights']

**Create a dataframe containing the neighborhoods retrieved**

In [9]:
neighborhoods_df = pd.DataFrame(neighborhoods, columns=["Neighborhood"])
neighborhoods_df.head()

Unnamed: 0,Neighborhood
0,Angelino Heights
1,Angeles Mesa
2,Angelus Vista
3,Arleta
4,Arlington Heights


Rename neighborhood for simplicity

In [10]:
neighborhoods_df["Neighborhood"].replace(to_replace=["Bel Air, Bel-Air or Bel Air Estates"], value=["Bel Air"],\
                                        inplace=True)

Warehouse District and Wholesale District are the same neighborhood so delete one of them

In [11]:
index = neighborhoods_df[neighborhoods_df["Neighborhood"] == "Wholesale District"].index
neighborhoods_df.drop(index=index, inplace=True)
neighborhoods_df.reset_index(drop=True, inplace=True)

**Obtain the coordinates (latitudes and longitudes) of the neighborhoods**

In [12]:
locator = Nominatim(user_agent="myGeocoder")

geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

address = neighborhoods_df["Neighborhood"].apply(lambda x: x + ", Los Angeles, USA")
location = address.apply(geocode)

coordinates = location.apply(lambda loc: tuple(loc.point) if loc else None)

neighborhoods_df[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(
    coordinates.tolist(), index=neighborhoods_df.index)

neighborhoods_df.drop(["Altitude"], axis=1, inplace=True)

neighborhoods_df.head()

RateLimiter caught an error, retrying (0/2 tries). Called with (*('Lincoln Heights, Los Angeles, USA',), **{}).
Traceback (most recent call last):
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1319, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 966, in send
    self.connect()
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1414, in connect
    super

RateLimiter caught an error, retrying (0/2 tries). Called with (*('Rose Hills, Los Angeles, USA',), **{}).
Traceback (most recent call last):
  File "C:\Users\user-pc\anaconda3\lib\site-packages\geopy\geocoders\base.py", line 367, in _call_geocoder
    page = requester(req, timeout=timeout, **kwargs)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1362, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1322, in do_open
    r = h.getresponse()
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1344, in getresponse
    response.begin()
  File "C:\Users\user-p

RateLimiter caught an error, retrying (0/2 tries). Called with (*('Skid Row, Los Angeles, USA',), **{}).
Traceback (most recent call last):
  File "C:\Users\user-pc\anaconda3\lib\site-packages\geopy\geocoders\base.py", line 367, in _call_geocoder
    page = requester(req, timeout=timeout, **kwargs)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1362, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1322, in do_open
    r = h.getresponse()
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1344, in getresponse
    response.begin()
  File "C:\Users\user-pc\

RateLimiter caught an error, retrying (0/2 tries). Called with (*('Spaulding Square, Los Angeles, USA',), **{}).
Traceback (most recent call last):
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1319, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 966, in send
    self.connect()
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1414, in connect
    supe

RateLimiter caught an error, retrying (0/2 tries). Called with (*('Windsor Square, Los Angeles, USA',), **{}).
Traceback (most recent call last):
  File "C:\Users\user-pc\anaconda3\lib\urllib\request.py", line 1319, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1026, in _send_output
    self.send(msg)
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 966, in send
    self.connect()
  File "C:\Users\user-pc\anaconda3\lib\http\client.py", line 1422, in connect
    server

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Angelino Heights,34.070289,-118.254796
1,Angeles Mesa,33.991402,-118.31952
2,Angelus Vista,,
3,Arleta,34.241327,-118.432205
4,Arlington Heights,34.043494,-118.321374


**Replace nan values with 0**

In [13]:
neighborhoods_df.fillna(0, inplace=True)
neighborhoods_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Angelino Heights,34.070289,-118.254796
1,Angeles Mesa,33.991402,-118.31952
2,Angelus Vista,0.0,0.0
3,Arleta,34.241327,-118.432205
4,Arlington Heights,34.043494,-118.321374


**The coordinates of some locations were not returned so the data will be manually sourced and saved in a csv file**

**First, let's get a dataframe of the neighborhoods with missing coordinates**

In [14]:
missing = neighborhoods_df[(neighborhoods_df.Latitude == 0) & (neighborhoods_df.Longitude == 0)]
missing.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
2,Angelus Vista,0.0,0.0
10,Baldwin Vista,0.0,0.0
23,Brentwood Glen,0.0,0.0
31,Castle Heights,0.0,0.0
46,East Gate Bel Air,0.0,0.0


Save the **missing** dataframe to a csv file

In [15]:
missing.to_csv("Missing.csv")

**Having manually populated the csv file with the mising coordinates, the file is now read into a dataframe**

In [16]:
missing = pd.read_csv("Missing_fill.csv", index_col=0)
missing.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
2,Angelus Vista,34.046954,-118.317488
10,Baldwin Vista,34.0135,-118.3627
23,Brentwood Glen,34.0655,-118.4627
31,Castle Heights,34.0314,-118.3999
46,East Gate Bel Air,34.080833,-118.435556


**Get a dataframe of the neighborhoods whose coordinates were returned by the geopy library**

In [17]:
available = neighborhoods_df[(neighborhoods_df.Latitude != 0) & (neighborhoods_df.Longitude != 0)]
available.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Angelino Heights,34.070289,-118.254796
1,Angeles Mesa,33.991402,-118.31952
3,Arleta,34.241327,-118.432205
4,Arlington Heights,34.043494,-118.321374
5,Arts District,34.041239,-118.23445


**Merge the available and missing dataframes into one**

In [18]:
neighborhoods = pd.concat([available, missing], axis=0)

#Sort dataframe alphabetically
neighborhoods = neighborhoods.reset_index().sort_values(by="index").reset_index(drop=True).drop("index", axis=1)

# round all coordinate values to 4 d.p
neighborhoods.iloc[:, 1:] = neighborhoods.iloc[:, 1:].apply(lambda x: round(x, 4))

neighborhoods.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Angelino Heights,34.0703,-118.2548
1,Angeles Mesa,33.9914,-118.3195
2,Angelus Vista,34.047,-118.3175
3,Arleta,34.2413,-118.4322
4,Arlington Heights,34.0435,-118.3214


**Check dataframe for duplicates**

In [19]:
neighborhoods[neighborhoods.duplicated(keep=False)]

Unnamed: 0,Neighborhood,Latitude,Longitude


There are no duplicates

### Save it to a csv file for future references

In [20]:
neighborhoods.to_csv("LA_geospatial_data.csv")

### Now that we have a complete dataframe, let's visualize the nieghborhoods on a map centering around LA

**Obtain the coordinates of los angeles**

In [21]:
address = "Los Angeles, USA"
locator = Nominatim(user_agent="foursquare_agent")
location = locator.geocode(address)
latitude = location.latitude
longitude = location.longitude

**Create and show map of LA highlighting its neighborhoods**

In [22]:
la_map = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, label in zip(neighborhoods["Latitude"], neighborhoods["Longitude"], neighborhoods["Neighborhood"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(la_map)  
la_map

<a id="ref2"></a>
# 2. Explore the Neighborhoods

**Define Foursquare Credentials and Version**

In [23]:
CLIENT_ID = "MHJUWL4WN3DUWTZZZUVXB4EELKI4315Z4QS4ZCKZK4YN2A30" # Foursquare ID
CLIENT_SECRET = "T4Q0QYJJYZJFL2LTGPTLG33LB43QNYHVLBTS42SZEXE4LXHW" # Foursquare Secret
VERSION = "20180605" # Foursquare API version

print("Your credentials:")
print("CLIENT_ID: " + CLIENT_ID)
print("CLIENT_SECRET:" + CLIENT_SECRET)

Your credentials:
CLIENT_ID: MHJUWL4WN3DUWTZZZUVXB4EELKI4315Z4QS4ZCKZK4YN2A30
CLIENT_SECRET:T4Q0QYJJYZJFL2LTGPTLG33LB43QNYHVLBTS42SZEXE4LXHW


### Let's explore one of the neighborhoods

**Get the neighborhood's name and coordinates**

In [24]:
neighborhood_latitude = neighborhoods.loc[0, "Latitude"] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, "Longitude"] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, "Neighborhood"] # neighborhood name

print(f"Latitude and longitude values of {neighborhood_name} are {neighborhood_latitude}, {neighborhood_longitude}")

Latitude and longitude values of Angelino Heights are 34.0703, -118.2548


### Let's get the top 100 venues within a 1000 m radius of the neighborhood

**Make an explore call to Foursquare API and get the results**

In [25]:
LIMIT = 200
RADIUS = 1000

url = f"https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}\
&ll={neighborhood_latitude}, {neighborhood_longitude}&v={VERSION}&radius={RADIUS}&limit={LIMIT}"

results = requests.get(url).json()

**View the results**

In [26]:
results

{'meta': {'code': 200, 'requestId': '5ef6da8cf1ed7d287630a183'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'East LA',
  'headerFullLocation': 'East LA, Los Angeles',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 94,
  'suggestedBounds': {'ne': {'lat': 34.079300009000015,
    'lng': -118.24395531373428},
   'sw': {'lat': 34.06129999099999, 'lng': -118.26564468626573}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f75a626e5e8f16c87566797',
       'name': 'Halliwell Manor',
       'location': {'address': '1329 Carroll Ave',
        'lat': 34.069328534140894,
        'lng': -118.25416524263122,
        'labeledLatLngs'

The results in the above form do not make much sense so let's clean and process it into a more understandable form

**Let's define a function to extract the category of each venue**

In [27]:
def get_category_type(row):
    try:
        categories_list = row["categories"]
    except:
        categories_list = row["venue.categories"]
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]["name"]

**Clean and process the json file into a dataframe**

In [28]:
venues = results["response"]["groups"][0]["items"]
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ["venue.name", "venue.categories", "venue.location.lat", "venue.location.lng"]
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues["venue.categories"] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]


**View the first few rows of the resulting dataframe**

In [29]:
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Halliwell Manor,Performing Arts Venue,34.069329,-118.254165
1,Guisados,Taco Place,34.070262,-118.250437
2,Eightfold Coffee,Coffee Shop,34.071245,-118.250698
3,Ototo,Sake Bar,34.072659,-118.25174
4,Subliminal Projects,Art Gallery,34.07229,-118.250737


How many venues were returned for Angelino Heights?

In [30]:
nearby_venues.shape[0]

94

### Now, let's create and apply a function that performs the above exploration for all neighborhoods

In [31]:
def getNearbyVenues(names, latitudes, longitudes):
    
    RADIUS = 1000
    LIMIT = 100
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
 
        # create the API request URL
        url = f"https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}\
        &ll={lat}, {lng}&v={VERSION}&radius={RADIUS}&limit={LIMIT}"
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v["venue"]["name"], 
            v["venue"]["location"]["lat"], 
            v["venue"]["location"]["lng"],  
            v["venue"]["categories"][0]["name"]) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ["Neighborhood", 
                  "Neighborhood Latitude", 
                  "Neighborhood Longitude", 
                  "Venue", 
                  "Venue Latitude", 
                  "Venue Longitude", 
                  "Venue Category"]
    
    return(nearby_venues)

In [35]:
la_venues = getNearbyVenues(neighborhoods["Neighborhood"], neighborhoods["Latitude"], neighborhoods["Longitude"])

**View the first few rows of the dataframe**

In [36]:
la_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Angelino Heights,34.0703,-118.2548,Halliwell Manor,34.069329,-118.254165,Performing Arts Venue
1,Angelino Heights,34.0703,-118.2548,Guisados,34.070262,-118.250437,Taco Place
2,Angelino Heights,34.0703,-118.2548,Eightfold Coffee,34.071245,-118.250698,Coffee Shop
3,Angelino Heights,34.0703,-118.2548,Ototo,34.072659,-118.25174,Sake Bar
4,Angelino Heights,34.0703,-118.2548,Subliminal Projects,34.07229,-118.250737,Art Gallery


**Examine the shape of the dataframe**

In [37]:
la_venues.shape

(10961, 7)

A total of 10,961 venues were returned for all neighborhoods

**How many unique venue categories are there?**

In [None]:
la_venues["Venue Category"].value_counts()

In [38]:
len(la_venues["Venue Category"].unique())

418

There are 418 unique venue categories

**How many venues for each neighborhood were returned?**

In [39]:
la_venues.groupby("Neighborhood")["Venue"].count()

Neighborhood
Angeles Mesa          36
Angelino Heights      94
Angelus Vista         94
Arleta                10
Arlington Heights     42
                    ... 
Wilshire Park         28
Windsor Square        56
Winnetka              29
Woodland Hills        75
Yucca Corridor       100
Name: Venue, Length: 198, dtype: int64

**Let's check for the top 10 neighborhoods with the most venues**

In [40]:
la_venues.groupby("Neighborhood")["Venue"].count().sort_values(ascending=False)[:10]

Neighborhood
Yucca Corridor        100
NoHo Arts District    100
Jewelry District      100
Koreatown             100
Larchmont             100
Little Ethiopia       100
Little Tokyo          100
Mid-Wilshire          100
Miracle Mile          100
North Hollywood       100
Name: Venue, dtype: int64

**What are the top 10 most common venue categories?**

In [44]:
la_venues.groupby("Venue Category")["Venue"].count().sort_values(ascending=False)[:10]

Venue Category
Coffee Shop             507
Mexican Restaurant      395
Pizza Place             274
Fast Food Restaurant    232
Grocery Store           218
Café                    210
Bar                     207
Sandwich Place          202
Sushi Restaurant        189
Italian Restaurant      179
Name: Venue, dtype: int64

**Print the top 5 venues in each neighborhood**

<a id="ref3"></a>
# 3. Designing the Recommender System

In choosing a location for establishing a tourism-based hotel, the most important factor is the presence of venues that attract tourists. For design purposes, the venue categories are grouped into the following broad categories and are assigned weighting factors based on their relative importance.

| Category                         |  Weighting Factor  |
|----------------------------------|--------------------|
| Attractions/Entertainments       |        0.50        |
| Food Services                    |        0.15        |
| Medical Services                 |        0.15        |
| Transportation                   |        0.20        |

Attractions/Entertainments include venues such as art galleries, theme parks, theaters, golf ranges, historic sites, scenic lookouts, museums, landmarks etc. Food services include restaurants, bars, joints etc. Airports, trains, bus stations, roads etc are all classified under transportaton. Hospitals, pharmacies, drugstores, clinics are classified under medical services

**Note: Venues that do not fall under any of the above categories are grouped under "Others".**

## Get the general category of each venue

**Inspect the various venue categories to know the keywords to use to categorize them under the general categories stated above**

In [54]:
pd.set_option("display.max_rows", 500)
pd.DataFrame(la_venues["Venue Category"].unique())

Unnamed: 0,0
0,Performing Arts Venue
1,Taco Place
2,Coffee Shop
3,Sake Bar
4,Art Gallery
5,Arcade
6,Japanese Restaurant
7,BBQ Joint
8,Beer Store
9,American Restaurant


**Define keywords for the general categories and categorize the venues based on the keywords**

In [55]:
# Define a list of keywords for each category
attractions = ["Gallery", "Historic", "Park", "Recreation", "Scenic", "Garden", "Museum", "Aquarium",
               "Beach", "Public Art", "Resort", "Exhibit", "Marina", "Lake", "Monument", "Landmark",
               "Mountain", "Arcade", "Cultural", "Fountain", "Tour", "Waterfront", "Massage", 
               "Performing", "club", "Pool", "Circus", "Skate", "Playground", "Yoga", "Spa",
               "Gun Range", "Gym", "Gaming", "Theater", "Golf", "Entertainment", "Lounge", 
               "Stadium", "Amphitheater", "Bowling", "Casino", "Nightlife", "Rink", "Dive",
               "Laser Tag", "Shopping"]

food_services = ["Restaurant", "Bar", "Coffee Shop", "BBQ", "Beer", "Breakfast", "Taco", "Sandwich", "Food", "Café",
              "Pizza", "Joint", "Donut", "Diner", "Ice Cream", "Creperie", "Smoothie", "Pie", "Gastropub", "Tea",
              "Cupcake", "Yogurt", "Dessert", "Salad", "Bagel", "Steak", "Chocolate", "Cheese", "Pub", "Snack",
              "Cafeteria", "Burrito", "Chips", "Pastry", "Noodle", "Bistro", "Soup" ]

medical = ["Doctor", "Pharmacy", "Drugstore", "Clinic", "Hospital", "Dispensary", "Medical Center" ]

transport = ["Airport", "Train", "Road", "Bus Stop", "Bus Station", "Rail", "Parking", "Car", "Garage", "Plane",
            "Bus Line", "Bike", "Boat"]
                 
for index, row in la_venues.iterrows():
    for keyword in attractions:
        if keyword in row["Venue Category"]:
            la_venues.loc[index, "General Category"] = "Attractions/Entertainments"
    for keyword in food_services:
        if keyword in row["Venue Category"]:
            la_venues.loc[index, "General Category"] = "Food Services"
    for keyword in medical:
        if keyword in row["Venue Category"]:
            la_venues.loc[index, "General Category"] = "Medical Services"
    for keyword in transport:
        if keyword in row["Venue Category"]:
            la_venues.loc[index, "General Category"] = "Transportation"

la_venues.fillna("Others", inplace=True)

la_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,General Category
0,Angelino Heights,34.0703,-118.2548,Halliwell Manor,34.069329,-118.254165,Performing Arts Venue,Attractions/Entertainments
1,Angelino Heights,34.0703,-118.2548,Guisados,34.070262,-118.250437,Taco Place,Food Services
2,Angelino Heights,34.0703,-118.2548,Eightfold Coffee,34.071245,-118.250698,Coffee Shop,Food Services
3,Angelino Heights,34.0703,-118.2548,Ototo,34.072659,-118.25174,Sake Bar,Food Services
4,Angelino Heights,34.0703,-118.2548,Subliminal Projects,34.07229,-118.250737,Art Gallery,Attractions/Entertainments


### Create a profile for recommendation

In [56]:
profile = { "Attractions/Entertainments": 0.50, "Food Services": 0.15, "Medical Services": 0.15,
           "Transportation": 0.20, "Others": 0 }
profile = pd.Series(profile)
profile

Attractions/Entertainments    0.50
Food Services                 0.15
Medical Services              0.15
Transportation                0.20
Others                        0.00
dtype: float64

**Create dummy variables for each General Category**

In [57]:
categories = pd.get_dummies(la_venues["General Category"])
categories.head()

Unnamed: 0,Attractions/Entertainments,Food Services,Medical Services,Others,Transportation
0,1,0,0,0,0
1,0,1,0,0,0
2,0,1,0,0,0
3,0,1,0,0,0
4,1,0,0,0,0


Add a **Neighborhood** column to the **categories** dataframe

In [58]:
current_columns = categories.columns.to_list()
categories["Neighborhood"] = la_venues["Neighborhood"]
categories = categories[["Neighborhood"] + current_columns]
categories.head()

Unnamed: 0,Neighborhood,Attractions/Entertainments,Food Services,Medical Services,Others,Transportation
0,Angelino Heights,1,0,0,0,0
1,Angelino Heights,0,1,0,0,0
2,Angelino Heights,0,1,0,0,0
3,Angelino Heights,0,1,0,0,0
4,Angelino Heights,1,0,0,0,0


**Group the categories dataframe by Neighborhood**

In [59]:
categories = categories.groupby("Neighborhood").sum()
categories.head()

Unnamed: 0_level_0,Attractions/Entertainments,Food Services,Medical Services,Others,Transportation
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Angeles Mesa,4,16,3,13,0
Angelino Heights,20,51,1,21,1
Angelus Vista,7,61,2,22,2
Arleta,1,4,0,5,0
Arlington Heights,3,23,2,11,3


### Create a recommendation table

**Next, we take the weighted average of each neighborhood based on the defined profile and then make recommendations**

In [60]:
recommendation_df = ((categories * profile).sum(axis=1)) / (profile.sum())
recommendation_df.head()

Neighborhood
Angeles Mesa          4.85
Angelino Heights     18.00
Angelus Vista        13.35
Arleta                1.10
Arlington Heights     5.85
dtype: float64

**Let's see the top 5 recommended neighborhoods**

In [61]:
recommendation_df.sort_values(ascending=False, inplace=True)
recommendation_df.head()

Neighborhood
Carthay            21.85
Miracle Mile       21.00
Little Ethiopia    20.45
Whitley Heights    19.85
Exposition Park    19.50
dtype: float64

### Get a summary of the top 10 recommended neighborhoods

In [65]:
recommendations = categories.loc[recommendation_df.head(10).index, :].reset_index()
recommendations.index = range(1, 11)
recommendations

Unnamed: 0,Neighborhood,Attractions/Entertainments,Food Services,Medical Services,Others,Transportation
1,Carthay,30,43,0,25,2
2,Miracle Mile,28,44,0,26,2
3,Little Ethiopia,27,44,1,27,1
4,Whitley Heights,28,39,0,33,0
5,Exposition Park,24,49,1,16,0
6,Yucca Corridor,22,53,0,25,0
7,Park La Brea,25,42,0,33,0
8,Cahuenga Pass,19,57,1,23,0
9,Angelino Heights,20,51,1,21,1
10,Chinatown,16,62,0,19,3


Now that we have made recommendations for the best locations to establish a tourism-based hotel in Los Angeles, further investigations/analysis of other factors based on the preference of the prospective owner can then be made to narrow down the number of potential locations.

### Thanks for viewing this notebook.
### Author: OLALEKE Moshood A.