## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction  <a name="introduction"></a>

Hanami (花見 , "flower viewing") is the traditional custom of viewing and enjoying the beauty of flowers, mainly cherry blossoms, and springtime in Japan. Each year the "cherry blossom front", the advance of blooming of cherry blossoms across Japan, is tracked as it slowly moves northward. The front generally indicates the opening of the first blossoms (kaika) rather than the arrival of full bloom (mankai). Forecasts for the cherry blossom front by weather services across Japan are closely followed by those who wish to enjoy hanami as the blossoms only last for one to two weeks. 

To enjoy Hanami, one can admire from a distance in which the cherry blossoms have been described as appearing as beautiful clouds, have a picnic under the blooming trees, or take a nice stroll in a park after enjoying a bowl of Ramen. 

While the exact origins of ramen noodles is up to debate (Japan or China origin), what is not up for debate is how Ramen shops across Japan have really made Ramen their own. As the name suggest, Ramen shops specialize in ramen dishes. Ramen in simple terms are wheat-flour noodles served in broth. But Ramen is anything but simple. With over more than 10,000 ramen shops in Japan[@1], each shop provides a different take on the dish of Ramen. 

In this project, we will try to find the best city in Japan in which to enjoy both Hanami and Ramen during **the month of April**. Best in this instance, refers to how easily a person can travel to the city (i.e **promixity to train station or ferry**), **Hanami spots**, and **Ramen shops**.

## Data  <a name="data"></a>

With the criteria mention in the Introduction, factors that will influence our decision are:
* Timing of cherry blossoming in the city
* Proximity to a train station or ferry
* Cherry Blossom viewing spots (Hanami) spots, if any
* Ramen shops located within a city

Following data sources will be needed to extract/generate the required information:
* Cherry Blossoms Forecast Data and Average Cherry Blossoms blooming dates using data gathered from- [https://japan-guide.com/sakura](https://japan-guide.com/sakura)
* Cherry Blossom viewing Spots using data gathered from [https://www.japan-guide.com/e/e2011_where.html](https://www.japan-guide.com/e/e2011_where.html)
* Train stations or ferry location obtain using the **Foursquare API**
* Number of Ramen shops and their location in every city obtained using **Foursquare API**

With the data collected, we hope to find a city or cities that will be able satisfy a persons needs of enjoying the beauty of cherry blossoms and a bowl of delicious Ramen while being able to hop on a train or a ferry then a train. 

### Average Cherry Blossoms blooming dates

Let's begin by obtaining the typical timing for the blooming of cherry blossoms in cities across Japan. 

We can obtain Average Cherry Blossoms blooming dates [`average_Blooming`] from [https://www.japan-guide.com/e/e2011_when.html](https://www.japan-guide.com/e/e2011_when.html).

The data is contained within a table on the webpage, meaning the data can be read directly into a pandas dataframe.

In [1]:
import pandas as pd 
pd.set_option('display.max_colwidth', None)

averageBlooming_df = pd.read_html('https://www.japan-guide.com/e/e2011_when.html', header=0)[0]
print(f'Number of cities in table: {averageBlooming_df.shape[0]}')
averageBlooming_df.head()

Number of cities in table: 20


Unnamed: 0,City,Average Opening,Average Full Bloom
0,Sapporo,May 3,May 7
1,Hakodate,April 30,May 4
2,Hirosaki,April 23,April 28
3,Sendai,April 11,April 16
4,Tokyo,March 26,April 3


### Cherry Blossoms Forecast across Japan

Next, let's grab Cherry Blossoms Forecast for cities across Japan. Forecast data can be found on [japan-guide.com/sakura](japan-guide.com/sakura). Forecast data on this site is provided by the [Japan Weather Association](https://www.jwa.or.jp/english/). 

The forecast data for the cities are located within five different tables. Each table is for a differect region of Japan. To scrap the name of each table (Region), a BeautifulSoup object will be created. [Data for project was scraped on April 05,2021]

In [2]:
import os
from bs4 import BeautifulSoup as BS
import requests

def get_html_soup(html_page, scrap=0):
    html_file = "{}.html".format(html_page.strip('https://www.').replace('/','-'))
    if os.path.exists(html_file):
        with open(html_file, 'rb') as html:
            html_content = html.read()
    else:
        html_content = requests.get(html_page).text
        with open(html_file, 'wb') as file: #writes as binary file
            page = bytes(html_response, 'utf-8')
            file.write(page)
    soup = BS(html_content, 'lxml')
    return soup

In [3]:
cherryForecast_soup = get_html_soup("https://www.japan-guide.com/sakura/")
cherryForecast_table_titles = [title.text for title in cherryForecast_soup.select(".season_forecast__region_name")]
cherryForecast_table_titles

['Major Cities',
 'Kyushu, Shikoku and Chugoku',
 'Kansai',
 'Kanto and Chubu',
 'Tohoku and Hokkaido']

As before, pandas will be used to scrap the tables. Due to there being more than one table, pandas will create a list of dataframes when using `pd.read_html`.

In [4]:
cherryForecast_data = pd.read_html(str(cherryForecast_soup))
print(f'Number of dataframes within cherryForecast_data list: {len(cherryForecast_data)}')

Number of dataframes within cherryForecast_data list: 5


#### Data Cleaning

The first Forecast table,`Major Cities`, will be ignored due to being redundant with these cities also being found within the remaining regional Forecast tables.

The corresponding region name will be associated with each city. 

All the regional Forecast tables, not the `Major Cities` table, will be combined into one Dataframe, [`cherryForecast_df`].

A bit of cleaning of the column titles will be conducted as well. 

In [5]:
cherryForecast_data = pd.read_html(str(cherryForecast_soup))

for itx, df in enumerate(cherryForecast_data):
    df["Region"] = cherryForecast_table_titles[itx]
    
cherryForecast_df = pd.concat(cherryForecast_data[1:]).rename(columns ={'Unnamed: 0': "City", 'Est. Opening*': 'Est_Opening','Est. Best Viewing*': 'Est_Best_Viewing', 'Current State': 'Current_State'})
print(f'The number of cities with forecasts: {cherryForecast_df.shape[0]}.')
cherryForecast_df.head()

The number of cities with forecasts: 23.


Unnamed: 0,City,Est_Opening,Est_Best_Viewing,Current_State,Region
0,Fukuoka,Opened: March 12,March 20 to 28,End of Season,"Kyushu, Shikoku and Chugoku"
1,Kumamoto,Opened: March 17,March 22 to 28,End of Season,"Kyushu, Shikoku and Chugoku"
2,Hiroshima,Opened: March 11,March 24 to April 1,End of Season,"Kyushu, Shikoku and Chugoku"
3,Matsuyama,Opened: March 15,March 24 to 31,End of Season,"Kyushu, Shikoku and Chugoku"
4,Takamatsu,Opened: March 15,March 24 to April 1,End of Season,"Kyushu, Shikoku and Chugoku"


### Cherry Blossom Viewing Spots

Suggestions for the best cherry blossom viewing spots can be found at [https://www.japan-guide.com/e/e2011_where.html](https://www.japan-guide.com/e/e2011_where.html).

The information for the different Cherry Blossom Viewing Spots will be scraped using a BeautifulSoup object. The infomation that will be obtained for each spot will be:

* City
* Viewing Spot
* Description
* Ratings
* Ratings Description

In [6]:
import re

viewing_soup = get_html_soup("https://www.japan-guide.com/e/e2011_where.html")
viewing_cities = [city.text for city in viewing_soup.select('h3')]
viewing_spots = []

for city, section in zip(viewing_cities, viewing_soup.select('section.spot_list')):
    spot_descriptions = iter(section.select('.spot_list__spot__desc'))
    for spot in section.select('.spot_list__spot__name'):
        cell = {}
        cell['City'] = city
        cell['Viewing_Spot'] = re.sub(r'[^\w\s]+', '', spot.text) # removes all dots from end of string
        cell['Description'] = next(spot_descriptions).text.replace('\n','').replace('\r','')
        if spot.find('span', class_='dot_rating has-tooltip'):
            cell['Rating'] = spot.find('span', class_='dot_rating has-tooltip')['data-dots']
            cell['Rating_Description'] = spot.find('span', class_='dot_rating has-tooltip')['data-tooltip-label']
        else:
            cell['Rating'] = ''
            cell['Rating_Description'] = 'No Rating'
        viewing_spots.append(cell)
        
viewingSpot_df = pd.DataFrame(viewing_spots)
print(f'Number of "Best" viewing spots across Japan : {viewingSpot_df.shape[0]}.')
viewingSpot_df.head()

Number of "Best" viewing spots across Japan : 48.


Unnamed: 0,City,Viewing_Spot,Description,Rating,Rating_Description
0,Tokyo,Shinjuku Gyoen,"Shinjuku Gyoen features more than one thousand cherry trees of over a dozen varieties, including numerous early and late blooming trees. There are spacious lawn areas, and the atmosphere is calm and peaceful. Thanks to the early and late blooming trees, Shinjuku Gyoen is a good hanami destination for those who miss the main season by a week or two. Alcoholic drinks are prohibited.",3,Best of Japan
1,Tokyo,Ueno Park,"One of Japan's most crowded, lively and popular spots for cherry blossom parties, Ueno Park features more than 1000 trees along the street leading towards the National Museum and around Shinobazu Pond. Ueno Park's blossoms typically open a couple of days ahead of the blossoms in many other spots in the city.",2,Highly Recommended
2,Tokyo,Chidorigafuchi,"Hundreds of cherry trees decorate the moats of former Edo Castle around Kitanomaru Park, creating one of Tokyo's most outstanding cherry blossom sights. Boats are available for rent, but picnics are not allowed. Trees are lit up in the evenings. Many food stands can be found at nearby Yasukuni Shrine where another thousand cherry trees are planted.",2,Highly Recommended
3,Tokyo,Sumida Park,The park stretches for a few hundred meters along both sides of Sumida River with views of the Tokyo Sky Tree. Some food stands are available. Cherry blossom viewing can also be enjoyed from boats that cruise the river. Trees are lit up in the evenings.,1,Recommended
4,Yokohama,Sankeien Garden,"Sankeien is a Japanese landscape garden with a central pond, various historical buildings moved there from across the country, and several hundred cherry trees.",1,Recommended


#### Obtaining coordinates for Viewing Spots

We will use reverse geocoding from Google to obtain the coordinates for each of the viewing spots to determine the prefecture thus region each of the viewing spots are within. The coordinates will be saved to a local csv file to keep from repeated calls to Google Api.

In [7]:
import csv
import creds
import numpy as np

csvName = 'viewingspots_coords.csv'
def get_coords(name, details):
    name = f'{name}, Japan'
    inital_call = f'Making call to googleapi for {name}....'
    url = f'https://maps.googleapis.com/maps/api/place/findplacefromtext/json?key={creds.GoogleKey}&input={name}&inputtype=textquery&fields={details}'
    response = requests.get(url).json()
    if (response['status'] == 'OK'):
        print(f'{inital_call}Valid location found...')
        lat = response['candidates'][0]['geometry']['location']['lat']
        lon = response['candidates'][0]['geometry']['location']['lng']
    else:
        name = f'{name}, Japan'
        print(f'{inital_call} No valid location found...Trying call with {name}')
        url = f'https://maps.googleapis.com/maps/api/place/findplacefromtext/json?key={creds.GoogleKey}&input={name}&inputtype=textquery&fields={details}'
        response = requests.get(url).json()
        if (response['status'] == 'OK'):
            print(f'-----{name} Valid location found...')
            lat = response['candidates'][0]['geometry']['location']['lat']
            lon = response['candidates'][0]['geometry']['location']['lng']
    return lat,lon

if not os.path.exists(csvName): # checks to see if local csv with coordinates is already available in local directory
    return_details = 'geometry'
    coordinates_list = []
    for viewingSpot in viewingSpot_df['Viewing_Spot']:
        lat, lon = get_coords(viewingSpot, return_details)
        coordinates_list.append([viewingSpot, lat, lon])
        
    cols = ['Viewing_Spot','Latitude','Longitude']    
    viewingSpot_coords = pd.DataFrame(coordinates_list,columns = cols)
    
    # saves the coordinates to local csvfile to stop future calls to Google
    viewingSpot_coords.to_csv(csvName,index=False)
    
viewingSpot_coords = pd.read_csv(csvName)
viewingSpot_coords.head()

Unnamed: 0,Viewing_Spot,Latitude,Longitude
0,Shinjuku Gyoen,35.685176,139.710052
1,Ueno Park,35.71548,139.774145
2,Chidorigafuchi,35.690159,139.74854
3,Sumida Park,35.712487,139.803965
4,Sankeien Garden,35.417051,139.658827


#### Finding the prefecture and region for each of the locations within `viewingSpot_df`. 

We will now use the coordinates to find which prefecture and region each of the viewing spots are within. This will be accomplished with a geojson file found [here](https://dl.dropboxusercontent.com/s/luj2iy5przp90k5/jp_prefs.geojson) as well as a table obtain from [wikipedia](https://simple.wikipedia.org/wiki/Prefectures_of_Japan) which list each prefecture with corresponding region. Will need to strip all accents from the table entries from wikipedia to associate prefectures and regions with viewing spots. Following function will accomplish this.

In [8]:
def strip_accents(text_list):
    '''
    Looks through provided list to remove any accents and non alpha characters
    '''
    import unicodedata
    lst = []
    for itext, txt in enumerate(text_list):
        try:
            text = unicode(txt, 'utf-8')
        except NameError: # unicode is a default on python 3 
            pass

        text = unicodedata.normalize('NFD', txt)\
               .encode('ascii', 'ignore')\
               .decode("utf-8")
        
        text = re.sub(r'[\W0-9]+','', text) # removes all non letter characters
        lst.append(text)
    
    return lst

The next step will be to pull in the table from wikipedia with the prefecture and region data of Japan. The data is cleaned and placed into a dict, `japan_dict`.

In [9]:
japan_pref=pd.read_html('https://simple.wikipedia.org/wiki/Prefectures_of_Japan')[1]
japan_pref=japan_pref[['Prefecture','Region']]
pref = strip_accents(japan_pref['Prefecture'])
region = strip_accents(japan_pref['Region'])

japan_dict = dict(zip(pref,region))

Using geospatial information obtained from [here](https://dl.dropboxusercontent.com/s/luj2iy5przp90k5/jp_prefs.geojson) and the coordinates of each of the viewing spots, the prefecture and region for each viewing spot witll be obtained.  The coordinates will be converted into a `Point` object,and placed within `viewingSpot_coords` (now a geopandas dataframe). The `Point` objects will be used to find the corresponding prefecture from the geopandas dataframe of the geojson file, `pref_gdf`.

Finally, the `viewingSpot_df` will be updated to now have columns for Prefecture, Region, Latitude, and Longitude for each of the locations in the dataframe.

In [10]:
import geopandas as gpd
from shapely.geometry import Point

pref_gdf = gpd.read_file('jp_prefs.geojson')
geometry = gpd.points_from_xy(viewingSpot_coords.Longitude, viewingSpot_coords.Latitude)
viewingSpot_coords = gpd.GeoDataFrame(viewingSpot_coords, geometry=geometry, crs= 4326) 
pref_col = []
region_col = []

for ispot, row in viewingSpot_coords.iterrows():
    point = row.geometry
    try:
        row_pref = pref_gdf[pref_gdf.contains(point)].name.tolist()[0]
        pref_col.append(row_pref)
        region_col.append(japan_dict[row_pref])
    except:
        print(f'No prefecture or region data found for:\n{row}\n')
        row_pref = np.nan
        pref_col.append(np.nan)
        region_col.append(np.nan)

viewingSpot_df['Prefecture'] = pref_col
viewingSpot_df['Region'] = region_col
viewingSpot_df['Latitude'] = viewingSpot_coords['Latitude']
viewingSpot_df['Longitude'] = viewingSpot_coords['Longitude']

col_order = ['Viewing_Spot', 'City', 'Prefecture','Region', 'Latitude','Longitude','Description','Rating','Rating_Description']
viewingSpot_df = viewingSpot_df[col_order]

viewingSpot_df.head()

No prefecture or region data found for:
Viewing_Spot              Hiroshima Peace Park
Latitude                             34.392586
Longitude                           132.452306
geometry        POINT (132.4523059 34.3925863)
Name: 33, dtype: object



Unnamed: 0,Viewing_Spot,City,Prefecture,Region,Latitude,Longitude,Description,Rating,Rating_Description
0,Shinjuku Gyoen,Tokyo,Tokyo,Kanto,35.685176,139.710052,"Shinjuku Gyoen features more than one thousand cherry trees of over a dozen varieties, including numerous early and late blooming trees. There are spacious lawn areas, and the atmosphere is calm and peaceful. Thanks to the early and late blooming trees, Shinjuku Gyoen is a good hanami destination for those who miss the main season by a week or two. Alcoholic drinks are prohibited.",3,Best of Japan
1,Ueno Park,Tokyo,Tokyo,Kanto,35.71548,139.774145,"One of Japan's most crowded, lively and popular spots for cherry blossom parties, Ueno Park features more than 1000 trees along the street leading towards the National Museum and around Shinobazu Pond. Ueno Park's blossoms typically open a couple of days ahead of the blossoms in many other spots in the city.",2,Highly Recommended
2,Chidorigafuchi,Tokyo,Tokyo,Kanto,35.690159,139.74854,"Hundreds of cherry trees decorate the moats of former Edo Castle around Kitanomaru Park, creating one of Tokyo's most outstanding cherry blossom sights. Boats are available for rent, but picnics are not allowed. Trees are lit up in the evenings. Many food stands can be found at nearby Yasukuni Shrine where another thousand cherry trees are planted.",2,Highly Recommended
3,Sumida Park,Tokyo,Tokyo,Kanto,35.712487,139.803965,The park stretches for a few hundred meters along both sides of Sumida River with views of the Tokyo Sky Tree. Some food stands are available. Cherry blossom viewing can also be enjoyed from boats that cruise the river. Trees are lit up in the evenings.,1,Recommended
4,Sankeien Garden,Yokohama,Kanagawa,Kanto,35.417051,139.658827,"Sankeien is a Japanese landscape garden with a central pond, various historical buildings moved there from across the country, and several hundred cherry trees.",1,Recommended


Hiroshima Peace Park was not found to be within any of the prefectures Japan using the geojson file. But it is evident that it is located within the Hiroshima Prefecture. We can update the row for Hiroshima Peace Park (index = 33) with the correct prefecture and region (using `japan_dict`) to have a complete dataset.

In [11]:
viewingSpot_df.loc[33, ['Prefecture', 'Region']] = ['Hiroshima', japan_dict['Hiroshima']]

Now, we have three datasets that will be used during the Analysis section: `averageBlooming_df`, `cherryForecast_df`, and `viewingSpot_df`. Foursquare data will be obtained in the Analysis section as well.

## Methodology <a name="methodology"></a>

In this project, we hope to find a city or cities that will be able satisfy a persons needs of enjoying the beauty of cherry blossoms and a bowl of delicious Ramen while being able to hop on a train or a ferry then a train. 

The first step was to obtain the data associated with cherry blossoms blooming forecast, recommended Hanami spots, and average blooming dates.

The next step will be to "disregard" cities that has already passed its the peak of cherry blossoms blooming. Meaning we will want to find cities [`potentialCities`] which have yet to bloom.

The next step will be to identify if there any recommended Hanami spots within the `viewingSpot_df` for any of the cities within `potentialCities`.

The following step will be to use **Foursquare** to find the proxmity of both Ramen shops and train stations within all of the cities in `potentialCities`.

Finally, ...



In [None]:
# import matplotlib.pyplot as plt
# data = pref_gdf
# fig, axes = plt.subplots(figsize=(10,8))
# data.plot(ax = axes, edgecolor = "black")
# gdf.plot(ax=axes, marker='o', color='r', markersize=5)

In [None]:
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Tokyo, Japan'
geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

In [None]:
# create map of Downtown Toronto using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=15)
count=0
# add markers to map 
for lat, lng, label in zip(viewingSpot_coords['Latitude'], viewingSpot_coords['Longitude'], viewingSpot_coords['Viewing_Spot']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    count+=1
print("Number of Neighborhood markers: {}".format(count))
map

## Analysis  <a name="analysis"></a>

Lets begin by weeding out those cities which have passed peak in the Hanami season. The target time range is mid to late April to travel. The `cherryForecast_df` has a column which states the current state of the cherry blossoms within each of the cities. 

In [None]:
print(f"List of cherry blossoms blooming states: \n {cherryForecast_df['Current_State'].unique()}")

Only will keep cities that are either are `Not Open Yet`. This is due to blossoms only lasting from one to two weeks and the target time range is mid to late April.

This reveals that the northern regions of Japan, Tohoku and Hokkaido, will be of interest for for Hanami during mid to late April. This is expected due to the cherry blossom front traveling northward as the season progresses.

In [None]:
potentialCities = cherryForecast_df.loc[cherryForecast_df['Current_State'] == 'Not Open Yet']
potentialCities

Now to find recommended viewing spots within either the Tohoku or Hokkaido regions.

Need to determine which of the recommended viewing spots are located in either the Tohoku or Hokkaido regions.

## Results and Discussion  <a name="results and discussion"></a>

## Conclusion  <a name="conclusion"></a>