### Project 1
#### Summer 2021
**Authors:** GOAT Team (Estaban Aramayo, Ethan Haley, Claire Meyer, and Tyler Frankenburg)

The [Cocktail DB](https://www.thecocktaildb.com/api.php) is a database of cocktails and ingredients. In this assignment, we describe how we could use the Cocktail DB's API to generate a network of cocktails and ingredients. We can use some example data to explore how we might be able to predict outcomes from this data using centrality metrics. 

##### Loading the Data

Without digging too deeply into the intricacies of the Cocktail DB API, we can leverage [this code](https://holypython.com/api-12-cocktail-database/) as a start for grabbing some example output from the API. This code leverages 2 libraries: `requests` to make an API request, and `json` to load the JSON output from the API. We can then iterate through each cocktail output to grab the relevant components. 

We can bulid a search query to pull all cocktail names (`strDrink`), the ingredients for each, and the drink categories (`strCategory`) for each cocktail.

First, we get all drinks, by first letter of name.

In [61]:
import networkx as net
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

In [44]:
baseUrl = "https://www.thecocktaildb.com/api/json/v1/1/"
letterEndpoint = "search.php?f="

def searchLetter(letter):
    data = requests.get(baseUrl + letterEndpoint + letter)
    return json.loads(data.text)

In [45]:
# Build a dataframe
name = []  # drink name
ids = []   # drink ID
cat = []   # drink category
pic = []   # thumbnail url
ingr = []  # ingredients

In [46]:
# helper function to parse ingredients
def ingreds(drinkDict):
    ing = []
    for i in range(1,16):  # API has 16 fields for ingredients of each drink, most of them empty/None
        s = "strIngredient" + str(i)
        if not d[s]:
            break
        ing.append(d[s])
    return ing

In [47]:
for l in 'abcdefghijklmnopqrstuvwxyz':
    
    drinks = searchLetter(l)['drinks']
    if not drinks: continue   #(some letters have no drinks)
    for d in drinks:
        name.append(d['strDrink'])
        ids.append(d['idDrink'])
        cat.append(d['strCategory'])
        pic.append(d['strDrinkThumb'])
        ingr.append(ingreds(d))

In [48]:
drinkDF = pd.DataFrame({'name': name,
                       'id': ids,
                       'category': cat,
                       'photoURL': pic,
                       'ingredients': ingr})
drinkDF

Unnamed: 0,name,id,category,photoURL,ingredients
0,A1,17222,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Grand Marnier, Lemon Juice, Grenadine]"
1,ABC,13501,Shot,https://www.thecocktaildb.com/images/media/dri...,"[Amaretto, Baileys irish cream, Cognac]"
2,Ace,17225,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Gin, Grenadine, Heavy cream, Milk, Egg White]"
3,Adam,17837,Ordinary Drink,https://www.thecocktaildb.com/images/media/dri...,"[Dark rum, Lemon juice, Grenadine]"
4,AT&T,13938,Ordinary Drink,https://www.thecocktaildb.com/images/media/dri...,"[Absolut Vodka, Gin, Tonic water]"
...,...,...,...,...,...
412,Zima Blaster,17027,Ordinary Drink,https://www.thecocktaildb.com/images/media/dri...,"[Zima, Chambord raspberry liqueur]"
413,Zizi Coin-coin,14594,Punch / Party Drink,https://www.thecocktaildb.com/images/media/dri...,"[Cointreau, Lemon juice, Ice, Lemon]"
414,Zippy's Revenge,14065,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"[Amaretto, Rum, Kool-Aid]"
415,Zimadori Zinger,15801,Punch / Party Drink,https://www.thecocktaildb.com/images/media/dri...,"[Midori melon liqueur, Zima]"


Save/load the df so as not to have to make 26 API calls every time notebook opens

In [49]:
#drinkDF.to_csv('drinkDF.csv')
drinkDF = pd.read_csv('drinkDF.csv', index_col=0)

One more step needed on the load, since pandas converts lists to string literals on csv storage

In [None]:
import ast
drinkDF['ingredients'] = drinkDF.ingredients.apply(ast.literal_eval)

##### Make a bipartite graph, with cocktails as one type and ingredients as the other.

In [50]:
from networkx.algorithms import bipartite

name = drinkDF.name.values
cat = drinkDF.category.values
ingr = drinkDF.ingredients.values

drinks = set(name)
ingreds = set(i for iList in ingr for i in iList)

B = net.Graph()
B.add_nodes_from(drinks, bipartite='Cocktail')
B.add_nodes_from(ingreds, bipartite='Ingredient')

for d in range(len(drinkDF)):
    B.add_node(name[d], category=cat[d])
    for ing in ingr[d]:
        B.add_edge(name[d], ing)

Then the 2 bipartite projection graphs are these:

In [51]:
D = bipartite.weighted_projected_graph(B, drinks)
I = bipartite.weighted_projected_graph(B, ingreds)

In [52]:
D.nodes()

['Orange Crush',
 'Thai Coffee',
 'Coffee Liqueur',
 'John Collins',
 'Mai Tai',
 'Porto flip',
 'Rum Toddy',
 'Affair',
 'French 75',
 'Pink Penocha',
 "Planter's Punch",
 'Lassi - Sweet',
 'Havana Cocktail',
 'Afternoon',
 'Orgasm',
 'Grand Blue',
 'Nuked Hot Chocolate',
 "Mother's Milk",
 'Texas Rattlesnake',
 'Zombie',
 'Gin Sour',
 'Algonquin',
 "Empellón Cocina's Fat-Washed Mezcal",
 'Orange Scented Hot Chocolate',
 'Balmoral',
 'Jam Donut',
 'Flying Dutchman',
 'Gimlet',
 'Kool-Aid Slammer',
 'Gagliardo',
 'Sidecar',
 'Tipperary',
 'Lazy Coconut Paloma',
 'Boxcar',
 'Zoksel',
 'Cafe Savoy',
 'Vodka And Tonic',
 'Lassi Khara',
 'Dragonfly',
 'Gin Squirt',
 'Tequila Sour',
 'Lassi - Mango',
 'Vampiro',
 'Shot-gun',
 'Frozen Pineapple Daiquiri',
 'Grizzly Bear',
 'Sweet Tooth',
 'Gin Daisy',
 'Bible Belt',
 'Irish Cream',
 'Zinger',
 'Abilene',
 'Jack Rose Cocktail',
 'Orange Rosemary Collins',
 'Lone Tree Cooler',
 'Harvey Wallbanger',
 'City Slicker',
 'Queen Elizabeth',
 'Penici

#### Calculate Centrality by Category

From the project doc: For each of the nodes in the dataset, calculate degree centrality and eigenvector centrality.
Compare your centrality measures across your categorical groups.

In [53]:
drinks_cocktails = drinkDF[(drinkDF['category']=="Cocktail")]
drinks_cocktails.head()

Unnamed: 0,name,id,category,photoURL,ingredients
0,A1,17222,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"['Gin', 'Grand Marnier', 'Lemon Juice', 'Grena..."
2,Ace,17225,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"['Gin', 'Grenadine', 'Heavy cream', 'Milk', 'E..."
12,Addison,17228,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"['Gin', 'Vermouth']"
16,Aviation,17180,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"['Gin', 'lemon juice', 'maraschino liqueur']"
21,Afterglow,12560,Cocktail,https://www.thecocktaildb.com/images/media/dri...,"['Grenadine', 'Orange juice', 'Pineapple juice']"


In [54]:
drinks_shots = drinkDF[(drinkDF['category']=="Shot")]
drinks_shots.head()

Unnamed: 0,name,id,category,photoURL,ingredients
1,ABC,13501,Shot,https://www.thecocktaildb.com/images/media/dri...,"['Amaretto', 'Baileys irish cream', 'Cognac']"
5,ACID,14610,Shot,https://www.thecocktaildb.com/images/media/dri...,"['151 proof rum', 'Wild Turkey']"
25,B-53,13332,Shot,https://www.thecocktaildb.com/images/media/dri...,"['Kahlua', 'Sambuca', 'Grand Marnier']"
26,B-52,15853,Shot,https://www.thecocktaildb.com/images/media/dri...,"['Baileys irish cream', 'Grand Marnier', 'Kahl..."
29,Big Red,13222,Shot,https://www.thecocktaildb.com/images/media/dri...,"['Irish cream', 'Goldschlager']"


In [55]:
drinks_c = set(drinks_cocktails.name.values)
ingreds_c = set(i for iList in drinks_cocktails.ingredients.values for i in iList)

dc_graph = net.Graph()

dc_graph.add_nodes_from(drinks_c, bipartite='Cocktail')
dc_graph.add_nodes_from(ingreds_c,bipartite='Ingredient')

for d in drinks_cocktails.index:
    dc_graph.add_node(name[d], category=cat[d])
    for ing in ingr[d]:
        dc_graph.add_edge(name[d], ing)

In [56]:
cocktail_degree = net.degree_centrality(dc_graph)
cocktail_degree = pd.DataFrame.from_dict(cocktail_degree, orient='index').reset_index()

In [57]:
drinks_s = set(drinks_shots.name.values)
ingreds_s = set(i for iList in drinks_shots.ingredients.values for i in iList)

ds_graph = net.Graph()

ds_graph.add_nodes_from(drinks_s, bipartite='Cocktail')
ds_graph.add_nodes_from(ingreds_s,bipartite='Ingredient')

for d in drinks_shots.index:
    ds_graph.add_node(name[d], category=cat[d])
    for ing in ingr[d]:
        ds_graph.add_edge(name[d], ing)

In [58]:
shots_degree = net.degree_centrality(ds_graph)
shots_degree = pd.DataFrame.from_dict(shots_degree, orient='index').reset_index()

In [59]:
shots_degree.head()

Unnamed: 0,index,0
0,Zipperhead,0.298851
1,Kool-Aid Shot,0.333333
2,Orange Crush,0.264368
3,Freddy Kruger,0.264368
4,Turkeyball,0.287356


In [60]:
cocktail_eig = net.eigenvector_centrality_numpy(dc_graph)
cocktail_eig = pd.DataFrame.from_dict(cocktail_eig, orient='index').reset_index()

In [61]:
shots_eig = net.eigenvector_centrality_numpy(ds_graph)
shots_eig = pd.DataFrame.from_dict(shots_eig, orient='index').reset_index()

##### Compare Centrality Measures across Categories
Then we can create summary dataframes of both centrality measures, and sort by each to see top 10 cocktails, shots, and ordinary drinks by each centrality measure.

In [62]:
summary_shots = pd.merge(shots_degree, shots_eig, how = "inner", on = "index")
summary_shots = summary_shots.rename(columns = 
        {"index":"Name","0_x":"Degree Centrality","0_y":"Eigenvector Centrality"})
summary_shots.head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
0,Zipperhead,0.298851,0.139237
1,Kool-Aid Shot,0.333333,0.150909
2,Orange Crush,0.264368,0.126544
3,Freddy Kruger,0.264368,0.124334
4,Turkeyball,0.287356,0.134488


In [63]:
summary_cocktails = pd.merge(cocktail_degree, cocktail_eig, how = "inner", on = "index")
summary_cocktails = summary_cocktails.rename(columns = 
        {"index":"Name","0_x":"Degree Centrality","0_y":"Eigenvector Centrality"})
summary_cocktails.head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
0,Hot Toddy,0.141844,0.063169
1,Tequila Sunrise,0.141844,0.065833
2,Bora Bora,0.170213,0.082281
3,Manhattan,0.212766,0.088857
4,Martini,0.148936,0.06875


In [64]:
summary_cocktails.sort_values(by=['Degree Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
129,",",0.602837,0.172508
137,',0.602837,0.172508
98,[,0.602837,0.172508
130,,0.602837,0.172508
111,],0.602837,0.172508


In [65]:
summary_cocktails.sort_values(by=['Eigenvector Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
129,",",0.602837,0.172508
130,,0.602837,0.172508
98,[,0.602837,0.172508
111,],0.602837,0.172508
137,',0.602837,0.172508


In [66]:
summary_shots.sort_values(by=['Degree Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
77,,0.390805,0.180026
76,",",0.390805,0.180026
62,],0.390805,0.180026
49,[,0.390805,0.180026
37,',0.390805,0.180026


In [67]:
summary_shots.sort_values(by=['Eigenvector Centrality'],ascending=False).head()

Unnamed: 0,Name,Degree Centrality,Eigenvector Centrality
76,",",0.390805,0.180026
49,[,0.390805,0.180026
37,',0.390805,0.180026
62,],0.390805,0.180026
77,,0.390805,0.180026


In [30]:
cocktail_degree_mean = summary_cocktails["Degree Centrality"].mean()
cocktail_eig_mean = summary_cocktails["Eigenvector Centrality"].mean()
cocktail_degree_median = summary_cocktails["Degree Centrality"].median()
cocktail_eig_median = summary_cocktails["Eigenvector Centrality"].median()

In [31]:
shot_degree_mean = summary_shots["Degree Centrality"].mean()
shot_eig_mean = summary_shots["Eigenvector Centrality"].mean()
shot_degree_median = summary_shots["Degree Centrality"].median()
shot_eig_median = summary_shots["Eigenvector Centrality"].median()

In [32]:
print("cocktail_degree_mean: ",cocktail_degree_mean)
print("cocktail_eig_mean: ",cocktail_eig_mean)
print("cocktail_degree_median: ",cocktail_degree_median)
print("cocktail_eig_median: ",cocktail_eig_median)
print("shot_degree_mean: ",shot_degree_mean)
print("shot_eig_mean: ",shot_eig_mean)
print("shot_degree_median: ",shot_degree_median)
print("shot_eig_median: ",shot_eig_median)

cocktail_degree_mean:  0.1989811207671561
cocktail_eig_mean:  0.07456404129425308
cocktail_degree_median:  0.1702127659574468
cocktail_eig_median:  0.07674675467362752
shot_degree_mean:  0.1948798328108673
shot_eig_mean:  0.09220281262450505
shot_degree_median:  0.21839080459770116
shot_eig_median:  0.1057503262300857


In [33]:
# to add - ordinary drink category, clean up, and interpretation

{'C', 'Kool-Aid Shot', 'Orange Crush', 'w', 'ä', 'r', 'Royal Bitch', 'Dirty Nipple', 'K', "Mother's Milk", 'Texas Rattlesnake', 'I', 'A', 'Big Red', 'm', 'p', 'Jam Donut', 'l', 'Kool-Aid Slammer', 'B-53', 'n', 't', 'u', 'Kool First Aid', '7', 'h', 'e', 'O', 'y', 'Shot-gun', 'R', 'Y', 'T', 'P', '[', 'i', 'Royal Flush', 'Bubble Gum', 'j', 'Red Snapper', '-', 'Chocolate Milk', 'M', 'J', 'Quick F**K', 'k', 'Bumble Bee', ' ', 'S', 'Tequila Surprise', 'Jelly Bean', 'q', 'Lemon Shot', 'W', 'H', 'o', 'Freddy Kruger', 'Bob Marley', 'Moranguito', 'g', '.', 'V', 'f', 'Tequila Slammer', 'G', 's', 'Jello shots', 'L', 'd', 'B', 'Zipperhead', 'Turkeyball', 'Fahrenheit 5000', 'ABC', 'c', 'a', 'D', 'B-52', '5', ']', 'F', 'Flaming Dr. Pepper', 'ACID', '1', 'b', ',', "'", 'Damned if you do'}
