# IBM:DSP Capstone Project

## Territory Market Development

### Introduction / Business Problem

   
A new technical sales representative for a winery products company has been assigned a new territory. She is to contact and develop sales relationships with wineries in the Pacific Northwestern United States. She seeks to establish new relationships and develop her new sales territory. However, the Pacific Northwestern United States includes both Washington State and Oregon State. The area encompasses vast areas and not all areas are viticulture areas or vineyards. How can she best expand her territory to include new clients?

### Data  

We will use an existing winery list as a substitute for an actual existing client list from data world. We can gather this list by using a web scrape, an SQL query, or pandas. Then, we'll leverage the Foursquare API to create a second list of wineries. We'll link the location with geopy and build an overlay. We'll separate regions using k-means clustering.

#### Data World Data Set

The first data set resides at <a href="https://data.world/arthur/wineries">data world</a>. I'll first create an account and attempt to download the file as an excel sheet. This was easy enough to complete. I went into the files and looked around at the data. We are examining the Pacific Northwest Territory so I had to download the file for the <a href="https://query.data.world/s/nib6nc7kfdk7vhbaypnipvdogzcqz2">USA</a>.

Now that we have downloaded the file from the website, let's look at an alternative using Pandas

In [29]:
import pandas as pd
df = pd.read_csv('https://query.data.world/s/r4ahdp3vbrclyyim5siydrszdc6rrx')
#Examining the first couple of rows
df.head(5)

Unnamed: 0,Winery Name,State,Web Site,Unnamed: 3
0,14 Hands,WA,www.14handswine.com,
1,Abacela Vineyards & Winery,OR,www.abacela.com,
2,Abarbanel Wine Co.,NY,www.kosher-wine.com,
3,Abbott Winery,CA,www.abbottwinery.com,
4,Abeja,WA,www.abeja.net,


The data needs to be cleaned up a little bit by removing all states, keeping only those in The Pacific Northwest. That includes only Washington and Oregon. 

In [30]:
#To select rows which have WA or OR as the State
states = ['WA','OR']
df[df.State.isin(states)]

Unnamed: 0,Winery Name,State,Web Site,Unnamed: 3
0,14 Hands,WA,www.14handswine.com,
1,Abacela Vineyards & Winery,OR,www.abacela.com,
4,Abeja,WA,www.abeja.net,
8,Academy Wines,OR,,
12,Acme Wineworks,OR,,
...,...,...,...,...
2882,Yamhill Valley Vineyards,OR,www.yamhill.com,
2884,Yellow Hawk Cellar,WA,,closed
2888,Youngberg Hill Vineyards,OR,www.youngberghill.com,
2895,Zefina Winery,WA,www.zefina.com,


So, we have 419 different wineries in this list. This is going to be the dummy client list. Now, how do I compare this data with the data leveraged from four square. In order to do this, we'll use geopy to get latitude and longitude equivalents. This doesn't appear to work very well. I need addresses. Try geocoding with google api using walkthrough and accessing the data world set via csv     

Then concatenate with longitude and latitude, 

Then use foursquare.

In [24]:
#Import the Libraries and Get Foursquare Up
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

address = 'Portland OR, USA'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

45.5202471 -122.6741949


In [28]:
# Create a map using folium
map_PNW = folium.Map(
    location=[location.latitude, location.longitude],
    zoom_start=6)
map_PNW

Let's see if we can get geopy to integrate with the four-square data.

### Methodology     

   * The main component of the report where I discuss and describe any exploratory data analysis, any inferential statistical testing, and what machine learning was used and why. 


### Results

   * This is where I discuss the results.

### Discussion

   * This is where I discuss any observations I noted and any recommendations I can make based on the results.

### Conclusion 

   * This is where I conclude the report.