# IBM:DSP Capstone Project

## Territory Market Development

### Introduction / Business Problem

   
A new technical sales representative for a winery products company has been assigned a new territory. She is to contact and develop sales relationships with wineries in the Pacific Northwestern United States. She seeks to establish new relationships and develop her new sales territory. However, the Pacific Northwestern United States includes both Washington State and Oregon State. The area encompasses vast areas and not all areas are viticulture areas or vineyards. How can she best expand her territory to include new clients?

### Data  

We will use an existing winery list as a substitute for an actual existing client list from data world. We can gather this list by using a web scrape, an SQL query, or pandas. Then, we'll leverage the Foursquare API to create a second list of wineries. We'll link the location with geopy and build an overlay. We'll separate regions using k-means clustering.

#### Data World Data Set

The first data set resides at <a href="https://data.world/arthur/wineries">data world</a>. I'll first create an account and attempt to download the file as an excel sheet. This was easy enough to complete. I went into the files and looked around at the data. We are examining the Pacific Northwest Territory so I had to download the file for the <a href="https://query.data.world/s/nib6nc7kfdk7vhbaypnipvdogzcqz2">USA</a>.

Now that we have downloaded the file from the website, let's look at an alternative using Pandas

In [1]:
import pandas as pd
df = pd.read_csv('https://query.data.world/s/r4ahdp3vbrclyyim5siydrszdc6rrx')
df = df.drop(['Unnamed: 3'], axis=1) # Drop the unnamed column
df = df.drop(['Web Site'], axis=1) #Drop web site, come back put in one line
df['long'] = "" # add Longitude Column
df['lat'] = "" # add Latitude Column
df.rename(columns = {'Winery Name':'Winery'}, inplace = True) #Rename
df.head()

Unnamed: 0,Winery,State,long,lat
0,14 Hands,WA,,
1,Abacela Vineyards & Winery,OR,,
2,Abarbanel Wine Co.,NY,,
3,Abbott Winery,CA,,
4,Abeja,WA,,


In [2]:
#To select rows which have WA or OR as the State
states = ['WA','OR']
df = df[df.State.isin(states)]
df.reset_index(drop=True) #Drop the old index

Unnamed: 0,Winery,State,long,lat
0,14 Hands,WA,,
1,Abacela Vineyards & Winery,OR,,
2,Abeja,WA,,
3,Academy Wines,OR,,
4,Acme Wineworks,OR,,
...,...,...,...,...
414,Yamhill Valley Vineyards,OR,,
415,Yellow Hawk Cellar,WA,,
416,Youngberg Hill Vineyards,OR,,
417,Zefina Winery,WA,,


In [3]:
#Install Geopy
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# Need to come back here and work the loop, concatenate in.
address = '14 Hands'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

-14.694524099999999 -75.11405389565218


#### The geocoder is working, let's get the loop going and concatenate into our data frame.  

There are some issues here to consider, including the client list will likely be more complete with actual addresses. This was just a winery list and some of the winery names are not registered with the geocoding services, or maybe there is more than one location. Here, we might illustrate more data wrangling techniques or perform a table scrape; but since we have an artificial client list let's move on. 

In [4]:
for x in range(len(df)):
    try:
       # time.sleep(1) #should I add a delay
        geocode_result = geolocator.geocode(df.Winery[x])
        df['lat'][x] = geocode_result.latitude
        df['long'][x] = geocode_result.longitude
    except IndexError:
        print("Address was wrong...")
    except Exception as e:
        print("Unexpected error occured.", e )
df.head()

Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 2
Unexpected error occured. 3
Unexpected error occured. 5
Unexpected error occured. 6
Unexpected error occured. 7
Unexpected error occured. 9
Unexpected error occured. 10
Unexpected error occured. 11
Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 13
Unexpected error occured. 14
Unexpected error occured. 15
Unexpected error occured. 16
Unexpected error occured. 17
Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 19
Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 21
Unexpected error occured. 22
Unexpected error occured. 23
Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 25
Unexpected error occured. 26
Unexpected error occured. 27
Unexpected error occured. 28
Unexpected error occured. 29
Unexpect

Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 262
Unexpected error occured. 263
Unexpected error occured. 264
Unexpected error occured. 265
Unexpected error occured. 266
Unexpected error occured. 267
Unexpected error occured. 268
Unexpected error occured. 269
Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 271
Unexpected error occured. 272
Unexpected error occured. 273
Unexpected error occured. 274
Unexpected error occured. 275
Unexpected error occured. 276
Unexpected error occured. 277
Unexpected error occured. 278
Unexpected error occured. 279
Unexpected error occured. 280
Unexpected error occured. 281
Unexpected error occured. 282
Unexpected error occured. 283
Unexpected error occured. 284
Unexpected error occured. 285
Unexpected error occured. 'NoneType' object has no attribute 'latitude'
Unexpected error occured. 287
Unexpected error occured. 288
Unexpected error occured. 289
Unex

Unnamed: 0,Winery,State,long,lat
0,14 Hands,WA,-75.1141,-14.6945
1,Abacela Vineyards & Winery,OR,,
4,Abeja,WA,32.8167,1.76667
8,Academy Wines,OR,-74.2499,40.745
12,Acme Wineworks,OR,,


Ok, we got some stuff, didn't get everything, that's ok I'll keep going. Make sure you come back and get your config file hidden so you don't show everyone your credentials... like a goof ball. Empty values are messing with me, came back here to fix err up. gitrdun drop em ded

In [18]:
#Drop the NaN and it won't bother you later. 
nan_value = float("NaN") #Get rid of floating NaN
df.replace("", nan_value, inplace=True) #Any empties too
df.dropna( # Stop, drop, shut em' dowm, open up shop 
axis=0, #0h N0Oo
how='any', #That's how rough riders roll
thresh=None,
subset=None,
inplace=True)
print(df)

                       Winery State        long        lat
0                    14 Hands    WA  -75.114054 -14.694524
4                       Abeja    WA   32.816667   1.766667
8               Academy Wines    OR  -74.249891  40.744971
53              Amavi Cellars    WA -122.141754  47.733092
60             Amity Vineyard    OR -123.174372  45.117032
82                    Animale    WA  -43.110436 -22.907309
90               Antica Terra    OR -123.174372  45.117032
103            Archery Summit    OR -123.047919  45.257348
122         Ashland Vineyards    OR -122.633782  42.179024
146  Badger Mountain Vineyard    WA -119.339468  46.224048
147               Baer Winery    WA -122.152370  47.769629
151  Bainbridge Island Winery    WA -122.518522  47.624707
174           Barnard Griffin    WA  153.017488 -27.280558
197      Bear Creek Vineyards    OR  -65.637609  44.580333
200              Beaux Freres    OR    2.275383  48.818054
229          Benson Vineyards    WA  139.776598 -37.0507

#### Ok, let's see it on the map. 

In [19]:
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

address = 'Portland OR, USA'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

45.5202471 -122.6741949


In [7]:
# Create a map using folium
map_PNW = folium.Map(
    location=[location.latitude, location.longitude],
    zoom_start=6)
map_PNW

Put the wineries on there dog!

NameError: name 'row' is not defined

In [21]:
df.head()

Unnamed: 0,Winery,State,long,lat
0,14 Hands,WA,-75.114054,-14.694524
4,Abeja,WA,32.816667,1.766667
8,Academy Wines,OR,-74.249891,40.744971
53,Amavi Cellars,WA,-122.141754,47.733092
60,Amity Vineyard,OR,-123.174372,45.117032


In [22]:
for _, row in df.iterrows():
    label = '{}'.format(
        row.Winery)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [row.lat, row.long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_PNW) 
map_PNW

In [None]:
#Now wut?

### Methodology     

   * The main component of the report where I discuss and describe any exploratory data analysis, any inferential statistical testing, and what machine learning was used and why. 


### Results

   * This is where I discuss the results.

### Discussion

   * This is where I discuss any observations I noted and any recommendations I can make based on the results.

### Conclusion 

   * This is where I conclude the report.