# Scott Breitbach
## Milestone 4: Connecting to an API / Pulling in the Data and Cleaning / Formatting
## 20-Feb-2021
## DSC540, Weeks 9-10

In [1]:
import pandas as pd
import requests
# import json
import os
from dotenv import load_dotenv
import urllib

In [2]:
%load_ext dotenv
%dotenv
load_dotenv("BEERMAPPING_KEY")

True

### Perform at least 5 data transformation and/or cleansing steps to your website data.
For Example:
* Replace Headers
* Format data into a more readable format
* Identify outliers and bad data
* Find duplicates
* Fix casing or inconsistent values
* Conduct Fuzzy Matching

# Nebraska Breweries

### My struggle with getting API keys:

Initially I tried to get an API key to Untappd.com, but they are restrictive in who they offer API keys to and my purposes did not qualify.  
I treid twice in January to request an API from BreweryDB.com, and then learned that they are temporarily not offering API keys, though I did manage to get an API key to their sandbox, which contains a subset of their database.  
My next backup plan was BeerMapping.com. I as able to obtain an API key and I spent a fair amount of time submitting updates to their database for breweries. This week, their website went down, so I went back to BreweryDB's sandbox. Once I got my code figured out for retrieving the data I wanted, I did a search for Nebraska, which returned 0 results. I was able to return a handful of breweries from Colorado, so I know the code was working, unfortunately there were just no Nebraska breweries in the sandbox. [See code here](https://github.com/ScottBreitbach/DSC540/blob/main/Final%20Project/BrewerydbAPI.ipynb)  
It is Sunday morning now and in a last ditch effort, I checked BeerMapping.com again and it appears to be back up!

## Use BeerMapping.com API to retrieve data on Nebraska breweries

In [3]:
apiKey = os.environ.get("BEERMAPPING_KEY")
baseURL = "http://beermapping.com/webservice/"

#### Retrieve Brewery Data as JSON:

In [4]:
def getStateBreweries(state):
    '''
    Function to retrieve data about breweries within a US State
    '''
    # Assign URL Service to search by State:
    service = "locstate"
    # Assign search term:
    query = state
    # Assemble URL:
    requestURL = f"{baseURL}{service}/{apiKey}/{query}&s=json"
        # Note '&s=json' appended to return JSON formatted data
    
    response = requests.request("GET", requestURL)
    print(response.status_code)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    jsonResponse = response.json()
    
    return jsonResponse

#### Search by State (2-letter abbr):

In [5]:
state = str(input("Enter 2-letter US State abbreviation:"))
breweryJSON = getStateBreweries(state)

Enter 2-letter US State abbreviation: NE


200


## Build a DataFrame from the returned JSON data:

#### Check the keys from one of the returned JSON dictionaries:

In [6]:
breweryJSON[0].keys()

dict_keys(['id', 'name', 'status', 'reviewlink', 'proxylink', 'blogmap', 'street', 'city', 'state', 'zip', 'country', 'phone', 'url', 'overall', 'imagecount'])

#### Use the keys to build a dictionary from the data and return a DataFrame:  
##### [Replace Headers / Format data into a more readable format]

In [7]:
def buildBreweryDB(listBreweries):
    '''
    Take a list of Brewery data in JSON format.
    Output a DF with key information about those breweries.
    '''
    # Define an empty dictionary, with keys:
    # [Replace Headers]
    breweryDict = {'BreweryID': [], 'BreweryName': [], 'Type': [], 'ReviewLink': [],
                   'ProxyLink': [], 'Map': [], 'StreetAddress': [], 'City': [],
                   'State': [], 'Zip': [], 'Country': [], 'PhoneNum': [],
                   'Website': [], 'Rating': [], 'ImageCount': []}
    
    for b in listBreweries:
        breweryDict['BreweryID'].append(b['id'])
        breweryDict['BreweryName'].append(b['name'])
        breweryDict['Type'].append(b['status'])
        breweryDict['ReviewLink'].append(b['reviewlink'])
        breweryDict['ProxyLink'].append(b['proxylink'])
        breweryDict['Map'].append(b['blogmap'])
        breweryDict['StreetAddress'].append(b['street'])
        breweryDict['City'].append(b['city'])
        breweryDict['State'].append(b['state'])
        breweryDict['Zip'].append(b['zip'])
        breweryDict['Country'].append(b['country'])
        breweryDict['PhoneNum'].append(b['phone'])
#         breweryDict['Website'].append(b['url'])
        # [Format data into a more readable format]
        breweryDict['Website'].append(urllib.parse.unquote(b['url'])) # Fix parsed URLs
        breweryDict['Rating'].append(b['overall'])
        breweryDict['ImageCount'].append(b['imagecount'])
        
    return pd.DataFrame(breweryDict)

In [8]:
beerDF = buildBreweryDB(breweryJSON)
beerDF.head()

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
0,21938,5168 Brewing,Brewery,https://beermapping.com/location/21938,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,5730 Hidcote Drive,Lincoln,NE,68516,United States,402-875-5588,,0,0
1,22335,5168 Taproom,Beer Bar,https://beermapping.com/location/22335,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,3201 Farnam Street,Omaha,NE,68131,United States,402-934-5168,http://www.5168brewing.com/,0,0
2,21215,Alamo Draft House,Beer Bar,https://beermapping.com/location/21215,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,12750 Westport Pkwy,La Vista,NE,68138,United States,402-505-9979,https://drafthouse.com/omaha,0,0
3,19644,Backswing Brewing,Brewery,https://beermapping.com/location/19644,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,(402) 515-4263,facebook.com/BackswingBrewingCo/timeline,0,0
4,21194,Backswing Brewing Company,Brewery,https://beermapping.com/location/21194,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,402-515-4263,,0,0


#### Check what types of beer venues we're dealing with:

In [9]:
beerDF.Type.unique()

array(['Brewery', 'Beer Bar', 'Beer Store', 'Brewpub', 'Homebrew'],
      dtype=object)

#### A quick scan through each type to look for anything concerning:
##### [Identify outliers and bad data]

In [10]:
# beerDF[beerDF['Type'] == 'Brewery']

In [11]:
# beerDF[beerDF['Type'] == 'Beer Bar']

In [12]:
# beerDF[beerDF['Type'] == 'Beer Store']

In [13]:
# beerDF[beerDF['Type'] == 'Brewpub']

In [14]:
# beerDF[beerDF['Type'] == 'Homebrew']

Lazlo's in the Lincoln Haymarket is the location for Empyrean Brewing Co and was listed under the 'Beer Bar' category and not under 'Brewery'. I used `beerDF.at[55, 'Type'] = 'Brewery'` to fix the issue, but later located Empyrean Brewing under the 'Brewpub' category, so this code was subsequently removed.  

All of the locations appear to be categorized correctly, so I will create a new DataFrame containing just the 'Brewery' and 'Brewpub' categories.

In [15]:
beerDF.at[36, 'Type'] = 'Beer Bar'

I found later that another restaruant owned by the Empyrean Brewery people was categorized as 'Brewery' so I have reclassified it as 'Beer Bar' to match the Lazlo's restaurants.

#### Create a DataFrame containing only breweries:

In [16]:
breweryDF = beerDF[(beerDF['Type'] == 'Brewery') | (beerDF['Type'] == 'Brewpub')]
breweryDF.head()

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
0,21938,5168 Brewing,Brewery,https://beermapping.com/location/21938,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,5730 Hidcote Drive,Lincoln,NE,68516,United States,402-875-5588,,0,0
3,19644,Backswing Brewing,Brewery,https://beermapping.com/location/19644,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,(402) 515-4263,facebook.com/BackswingBrewingCo/timeline,0,0
4,21194,Backswing Brewing Company,Brewery,https://beermapping.com/location/21194,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,402-515-4263,,0,0
5,14326,Beaver View Brewing,Brewery,https://beermapping.com/location/14326,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,524 S. 4th Street,Albion,NE,68620,United States,,,0,0
9,17706,Benson Brewery,Brewpub,https://beermapping.com/location/17706,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,6059 Maple Street,Omaha,NE,68104,United States,(402) 937-1892,bensonbrewery.com,0,0


#### Another look over just the brewery names for any outliers:
##### [Identify outliers and bad data]

In [17]:
breweryDF['BreweryName'].unique()

array(['5168 Brewing', 'Backswing Brewing', 'Backswing Brewing Company',
       'Beaver View Brewing', 'Benson Brewery',
       'Blue Blood Brewing Company', 'Boiler Brewing Company',
       'Bolo Beer Co', 'Bootleg Brewers',
       'Bootleg Brewers / Sandhills Brewing Company',
       'Bottle Rocket Brewing Company', 'Brickway Brewery and Distillery',
       'Broken Arrow Cellars and Brewery', 'Brush Creek Brewing Company',
       'Code Beer Company', 'Cosmic Eye Brewing Company',
       'Empyrean Ales - Lazlos',
       'Fairfield Opera House Brewery and Grill',
       'Farnam House Brewing Company', 'First Street Brewing Company',
       'Flyover Brewing Company', 'Gottberg Brewpub',
       'Granite City Food and Brewery - Lincoln',
       'Granite City Food and Brewery - Omaha',
       'Infusion Brewing Company', 'Jaipur Restaurant and Brewpub',
       'Johnnie Byrd Brewing Company', 'Kinkaider Brewing Company',
       'Lazy Horse Vineyard and Brewery', 'Loop Brewing',
       'Lost 

Data Fixes:  
* [Rename] **5168 Brewing** changed their name to **Catalyst Brewing Co** in 2020.  
* [Remove] **Beaver View**, **Blue Blood**, **Misty's**, and **Spilker** have all closed and should be removed.  
* [No change] **Granite City** is a chain and not a local Nebraska brewery, but it appears I kept it in the previous Milestones so I'll leave it in.  
* [Rename] **Broken Arrow** is the parent winery for **Hanging Horseshoe**
* [Rename] Remove 'Lazlos' from **Empyrean** as it is the restaurant portion.

Duplicates:  
* Backswing  
* Bootleg
* Granite City (keeping for some reason)

In [18]:
breweryDF['BreweryName'].unique()

array(['5168 Brewing', 'Backswing Brewing', 'Backswing Brewing Company',
       'Beaver View Brewing', 'Benson Brewery',
       'Blue Blood Brewing Company', 'Boiler Brewing Company',
       'Bolo Beer Co', 'Bootleg Brewers',
       'Bootleg Brewers / Sandhills Brewing Company',
       'Bottle Rocket Brewing Company', 'Brickway Brewery and Distillery',
       'Broken Arrow Cellars and Brewery', 'Brush Creek Brewing Company',
       'Code Beer Company', 'Cosmic Eye Brewing Company',
       'Empyrean Ales - Lazlos',
       'Fairfield Opera House Brewery and Grill',
       'Farnam House Brewing Company', 'First Street Brewing Company',
       'Flyover Brewing Company', 'Gottberg Brewpub',
       'Granite City Food and Brewery - Lincoln',
       'Granite City Food and Brewery - Omaha',
       'Infusion Brewing Company', 'Jaipur Restaurant and Brewpub',
       'Johnnie Byrd Brewing Company', 'Kinkaider Brewing Company',
       'Lazy Horse Vineyard and Brewery', 'Loop Brewing',
       'Lost 

#### Remove closed breweries:

In [19]:
def removeBreweries(df, col, values):
    '''
    Remove rows based based on values in a specified 
    column that match a list of passed values.
    '''
    return df[df[col].isin(values) == False]

In [20]:
closed = ['Beaver View Brewing', 'Blue Blood Brewing Company',
         "Misty's Steakhouse And Brewery", 'Spilker Ales']

In [21]:
len(breweryDF)

53

In [22]:
breweryDF = removeBreweries(breweryDF, 'BreweryName', closed)

In [23]:
len(breweryDF)

49

#### Rename some breweries:

In [24]:
nameChanges = {
    '5168 Brewing': 'Catalyst Brewing Company',
    'Broken Arrow Cellars and Brewery': 'Hanging Horseshoe Brewing Company',
    'Empyrean Ales - Lazlos': 'Empyrean Brewing Company',
}

In [25]:
def updateNames(df, col, replaceDict):
    '''
    Rename values in a specified column, using a 
    dictonary of old names and new names.
    '''
    for k, v in replaceDict.items():
        df[col].replace(k, v, inplace=True)

In [26]:
updateNames(breweryDF, 'BreweryName', nameChanges)

In [27]:
breweryDF.head()

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
0,21938,Catalyst Brewing Company,Brewery,https://beermapping.com/location/21938,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,5730 Hidcote Drive,Lincoln,NE,68516,United States,402-875-5588,,0,0
3,19644,Backswing Brewing,Brewery,https://beermapping.com/location/19644,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,(402) 515-4263,facebook.com/BackswingBrewingCo/timeline,0,0
4,21194,Backswing Brewing Company,Brewery,https://beermapping.com/location/21194,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,402-515-4263,,0,0
9,17706,Benson Brewery,Brewpub,https://beermapping.com/location/17706,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,6059 Maple Street,Omaha,NE,68104,United States,(402) 937-1892,bensonbrewery.com,0,0
14,20376,Boiler Brewing Company,Brewery,https://beermapping.com/location/20376,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,"129 N 10th St, Suite 8",Lincoln,NE,68508,United States,402-261-8775,boilerbrewingcompany.com,0,0


#### Check for duplicates:

In [28]:
breweryDF[breweryDF['BreweryName'].duplicated()]

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
47,20515,Infusion Brewing Company,Brewery,https://beermapping.com/location/20515,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,6271 South 118th Street,Omaha,NE,68137,United States,402-934-2064,infusionbrewing.com,0,0
69,18889,Nebraska Brewing Company,Brewery,https://beermapping.com/location/18889,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,6950 S 108th St,La Vista,NE,68128,United States,,nebraskabrewingco.com,0,0
95,16272,Thunderhead Brewing Company,Brewery,https://beermapping.com/location/16272,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,201 Avenue F,Axtell,NE,68924,United States,(308) 237-1558,thunderheadbrewing.com,0,0
104,21370,Zipline Brewing Company,Brewery,https://beermapping.com/location/21370,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,721 N 14th St,Omaha,NE,68102,United States,,http://ziplinebrewing.com/,0,0


In [29]:
breweryDF[breweryDF['BreweryName'] == 'Nebraska Brewing Company']

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
68,7136,Nebraska Brewing Company,Brewpub,https://beermapping.com/location/7136,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,7474 Towne Center Pkwy. Suite 101,Papillion,NE,68046,United States,(402) 934-7100,nebraskabrewingco.com,91.666675,8
69,18889,Nebraska Brewing Company,Brewery,https://beermapping.com/location/18889,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,6950 S 108th St,La Vista,NE,68128,United States,,nebraskabrewingco.com,0.0,0


In [30]:
breweryDF.iloc[1:3]

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
3,19644,Backswing Brewing,Brewery,https://beermapping.com/location/19644,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,(402) 515-4263,facebook.com/BackswingBrewingCo/timeline,0,0
4,21194,Backswing Brewing Company,Brewery,https://beermapping.com/location/21194,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,402-515-4263,,0,0


Checking the duplicates, I found that one of Nebraska Brewing Company's locations has closed (Papillion) and Backswing Brewing is in there twice, under slightly different names.  
I will delete the closed Nebraska Brewing location as well as the second Backswing because it has less information available. I will then rename it to match.

In [31]:
removeID = [7136, 21194]
breweryDF = removeBreweries(breweryDF, 'BreweryID', removeID)

In [32]:
rename = {'Backswing Brewing': 'Backswing Brewing Co.'}
updateNames(breweryDF, 'BreweryName', rename)

In [33]:
print(breweryDF[['BreweryID', "BreweryName"]])

     BreweryID                                  BreweryName
0        21938                     Catalyst Brewing Company
3        19644                        Backswing Brewing Co.
9        17706                               Benson Brewery
14       20376                       Boiler Brewing Company
15       20287                                 Bolo Beer Co
16       20835                              Bootleg Brewers
17       21191  Bootleg Brewers / Sandhills Brewing Company
18       20099                Bottle Rocket Brewing Company
20       18888              Brickway Brewery and Distillery
21       19643            Hanging Horseshoe Brewing Company
22       20832                  Brush Creek Brewing Company
25       21721                            Code Beer Company
26       22356                   Cosmic Eye Brewing Company
32         417                     Empyrean Brewing Company
33       20101      Fairfield Opera House Brewery and Grill
34       19001                 Farnam Ho

Discovered another duplicate (Bootleg). Researched and removed as appropriate.

In [34]:
removeID = [20835]
breweryDF = removeBreweries(breweryDF, 'BreweryID', removeID)

In [35]:
breweryDF

Unnamed: 0,BreweryID,BreweryName,Type,ReviewLink,ProxyLink,Map,StreetAddress,City,State,Zip,Country,PhoneNum,Website,Rating,ImageCount
0,21938,Catalyst Brewing Company,Brewery,https://beermapping.com/location/21938,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,5730 Hidcote Drive,Lincoln,NE,68516,United States,402-875-5588,,0.0,0
3,19644,Backswing Brewing Co.,Brewery,https://beermapping.com/location/19644,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,500 W South St #8,Lincoln,NE,68522,United States,(402) 515-4263,facebook.com/BackswingBrewingCo/timeline,0.0,0
9,17706,Benson Brewery,Brewpub,https://beermapping.com/location/17706,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,6059 Maple Street,Omaha,NE,68104,United States,(402) 937-1892,bensonbrewery.com,0.0,0
14,20376,Boiler Brewing Company,Brewery,https://beermapping.com/location/20376,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,"129 N 10th St, Suite 8",Lincoln,NE,68508,United States,402-261-8775,boilerbrewingcompany.com,0.0,0
15,20287,Bolo Beer Co,Brewery,https://beermapping.com/location/20287,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,420 East 1st St,Valentine,NE,69201,United States,,facebook.com/BoloBeerCo,0.0,0
17,21191,Bootleg Brewers / Sandhills Brewing Company,Brewpub,https://beermapping.com/location/21191,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,45145 829th Rd,Taylor,NE,68879,United States,308-942-3440,bootlegbrewers.com,0.0,0
18,20099,Bottle Rocket Brewing Company,Brewery,https://beermapping.com/location/20099,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,230 S. 5th Street,Seward,NE,68434,United States,,bottlerocketbrewing.com,0.0,0
20,18888,Brickway Brewery and Distillery,Brewery,https://beermapping.com/location/18888,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,1116 Jackson Street,Omaha,NE,68102,United States,(402) 933-2613,http://www.drinkbrickway.com/,93.3334,1
21,19643,Hanging Horseshoe Brewing Company,Brewery,https://beermapping.com/location/19643,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,73892 332nd ave,Imperial,NE,69033,United States,﻿308-882-7772,facebook.com/pages/Broken-Arrow-Cellars/793194...,0.0,0
22,20832,Brush Creek Brewing Company,Brewpub,https://beermapping.com/location/20832,http://beermapping.com/maps/proxymaps.php?loci...,http://beermapping.com/maps/blogproxy.php?loci...,102 N. Main Street,Atkinson,NE,68713,United States,,facebook.com/BrushCreekBrewingCompany/,0.0,0


#### Reset the index:

In [36]:
breweryDF.reset_index(drop=True, inplace=True)

#### Save as CSV:

In [37]:
breweryDF.to_csv('API_beerMapping.csv', index=False)