# Capstone Proposals:
    
As Americans, we have an enormous variety of foods commonly available at grocery stores, corner shops, and increasingly, online. For one project, I'd like to take a look at the fundamental ingredients and nutrients that compose this great variety. My hope would be to predict the number of nutrients found in certain food items, as well as looking hollistically at the ingredients that are most common in our foods. I would like to then relate this to general dietary habits of the US population, and get a granular idea of the nutrient composition of diets. The stretch goal would be to predict the changes in nutrient-use given an increasing population of vegans, where I could estimate increases in required nutrient needs based on 'normal' diets. I'm unsure whether this would be kosher, because obviously I can't check the quality of my predictions for the future. I could, however, see if I could predict correlations on historical data in growth of the vegan diet in the US, and then use that model to predict the effect of future growth of vegan diets.
       
*** (I'm still pondering the best way to go about this, but at least you can see where I'm hoping to take this) ***
    
'Nutrient Database from 2012' - https://catalog.data.gov/dataset/usda-national-nutrient-database-for-standard-reference
'Nutrient Database from 2009' - https://catalog.data.gov/dataset/usda-national-nutrient-database-for-standard-reference-release-22
    
Partially pre-cleaned nutrient data - https://github.com/mhess126/usda_national_nutrients
USDA API (not sure if this could be useful yet?) - https://ndb.nal.usda.gov/ndb/api/doc
    
    
I'm still looking for better data to work with that could give me some idea of dietary habits, but here's where I've been looking - https://catalog.data.gov/dataset?q=bureauCode:%22005:13%22 ; https://catalog.data.gov/dataset?q=usda+consumption+national+nutrient&sort=views_recent+desc&ext_location=&ext_bbox=&ext_prev_extent=-142.03125%2C2.4601811810210052%2C-59.0625%2C58.63121664342478
    
    
Alternatively, I could look at pricing these ingredients, based on the foodtypes that we find them in. For example, take a chili sauce. Of this sauce, take a look at the unique ingredients, and their respective portion size in the sauce. Let's say that black beans compose 20% of the chili, and the chili runs 5 dollars/unit. Then the pricing for the black bean ingredient would be 1 dollar.

From there, I would want to observe how expensive these ingredients can get and how their prices change depending on what products they may be found in. 
    
'Food Price Outlook, current' - https://catalog.data.gov/dataset/food-price-outlook
    

In [1]:
# pip install scrapy
# pip install --upgrade zope2

import foursquare
import json
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
import requests
from scrapy import Selector
from scrapy.http import HtmlResponse
import seaborn as sns
import time
import unicodedata

In [2]:
CLIENT_ID = '33NDJLQ342FAMTNX5Z55PR0PQQOJZRAZZ3XEAI0ERQXEJRUL'
CLIENT_SECRET = 'FGMFNZGMWUR1ILZFGH2NV1OKQY3WK5AAPHWKXTFWRR3B4Z4E'
client = foursquare.Foursquare(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)

In [3]:
#Let's define a geo dictionary whose bounds encompass all of SF, and another 4 miles south as well (so ~7mi x 11mi)
ne = {'ne_lat': -122.3550796509, 'ne_long': 37.8127675576}
sw = {'sw_lat': -122.5164413452, 'sw_long': 37.7078622611}

# east/west
lat_bounds = [ne['ne_lat'], sw['sw_lat']]
print lat_bounds[0]
# north/south
lon_bounds = [ne['ne_long'], sw['sw_long']]
print lon_bounds

#increment ~ half a mile (in latitude/longitude)
increment = 0.017

# The gridding below moves North, starting from the bottom SW corner boundary, and then moves east half a mile,
# and repeats the process until stopping at the NE corner boundary.
grid_pairs = []
for lat in np.arange(lat_bounds[1], lat_bounds[0], increment):
    for lon in np.arange(lon_bounds[1], lon_bounds[0], increment):
        grid_pairs.append([lat, lon])
        
print len(grid_pairs)

-122.355079651
[37.8127675576, 37.7078622611]
70


In [14]:
print len(df_ready_rows)

43


In [None]:
# NOTES:
# should i make the loop above this cell more efficient by checking and storing which url's don't work?
# B/C then I can exclude them the next time i run my loops ; BUT, it's probably better to keep track of these,
# because while some won't even be eateries, many will be eateries that simply don't have foursqare menus.

# In such cases, knowing the venue information could still be valuable, because we can surface those to users who
# wish to manually add items (could possibly add items by taking pictures of the menu where the item is located)?

In [5]:
# Works with the Explore Endpoints
unique_venues_from_explore = []
unique_venue_names_from_explore = []
start_time = time.time()
for x, y in grid_pairs:
    try:
        # Radius is radius in meters around given 'll'; 800 meters
        # is approx 0.5 miles (but I may adjust the radius going forward)

        explore = client.venues.explore(params={'ll': '%.2f, %.2f' % (y, x), 'llAcc':'100.0','radius': '3000',
                                               'section': 'food','limit':'50','offset':'50','sortByDistance':'1'})
        explored_venue_ids = []
        explored_venue_names = []
        for i in range(len(explore['groups'][0]['items'])):
            try:
                pulled_id = explore['groups'][0]['items'][i]['venue']['menu']['url']
                explored_venue_ids.append(pulled_id)
                pulled_name = explore['groups'][0]['items'][i]['venue']['name']
                explored_venue_names.append(pulled_name)
            except:
                pass
        for next_id, next_name in zip(explored_venue_ids, explored_venue_names):
            if 'foursquare.com' in str(next_id):
                unique_venues_from_explore.append(next_id)
                unique_venue_names_from_explore.append(next_name)
        print("--- %s seconds ---" % (time.time() - start_time))
    except:
        pass

--- 0.548406124115 seconds ---
--- 0.796556949615 seconds ---
--- 1.04234695435 seconds ---
--- 1.28093504906 seconds ---
--- 1.53449201584 seconds ---
--- 1.72589015961 seconds ---
--- 2.25262594223 seconds ---
--- 2.71405816078 seconds ---
--- 2.92296695709 seconds ---
--- 3.83194899559 seconds ---
--- 5.04638600349 seconds ---
--- 5.55070400238 seconds ---
--- 6.0943801403 seconds ---
--- 6.31870102882 seconds ---
--- 6.53303909302 seconds ---
--- 7.23715806007 seconds ---
--- 7.76065611839 seconds ---
--- 8.36559414864 seconds ---
--- 9.02214312553 seconds ---
--- 9.74282217026 seconds ---
--- 9.96415996552 seconds ---
--- 10.2667839527 seconds ---
--- 11.6931929588 seconds ---
--- 13.0062270164 seconds ---
--- 13.9695091248 seconds ---
--- 14.7413480282 seconds ---
--- 15.5843451023 seconds ---
--- 15.9563419819 seconds ---
--- 16.22458601 seconds ---
--- 16.6350030899 seconds ---
--- 17.3421201706 seconds ---
--- 18.0700571537 seconds ---
--- 18.8109660149 seconds ---
--- 19.4613

In [6]:
print len(unique_venue_names_from_explore), len(unique_venues_from_explore)

1362 1362


In [4]:
# Now that I have plenty of urls, let's plug them into a scraper so I can populate my df. Here are the headers
# I'm looking to get as well...

column_headers = ['venue_name', 'venue_desc_list', 'venue_menu_url', 'venue_rated', 'meta_menu_n', 'depth_menus_n',
                  'menu_item_name', 'menu_item_price', 'menu_item_desc']

df_ready_rows = []
def parse_url(url, data=False):

    response               =  requests.get(url)
    
    #Steps:
    #1) get the unicode objects
    #2) change objects from unicode to string
    
    venue_name_uni         = Selector(text=response.text).xpath('//h1[@class="venueName"]/text()').extract()
    venue_name             = unicodedata.normalize('NFKD', venue_name_uni[0]).encode('ascii','ignore')
    
    # I also need to do an iteration to capture the multiple descriptors.
    # I'll store the descriptors in a list to capture the entire description:
    venue_desc_uni         =  Selector(text=response.text).xpath('//span[@class="unlinkedCategory"]/\
    text()').extract()
    venue_desc_list        = []
    for venue_desc_phrase in range(len(venue_desc_uni)):
        venue_desc_n       = unicodedata.normalize('NFKD', venue_desc_uni[venue_desc_phrase]).encode('ascii',
                                                                                                  'ignore')
        venue_desc_list.append(venue_desc_n)

    #The url too, right?
    venue_menu_url         = url

    #Grabbing the venue rating, just a note: venueScore positive/neutral/negative, but I'm only getting the
    #rating 1-10
    venueScore_options     = ['positive','neutral','negative']
    venue_rated = []
    for vs_option in venueScore_options:
        try:
            venue_rating_uni        = Selector(text=response.text).xpath('//div[@class="venueRateBlock  "]/\
    span[@class="venueScore '+vs_option+'"]/span/text()').extract()
            venue_rated             = unicodedata.normalize('NFKD', venue_rating_uni[0]).encode('ascii','ignore')
            venue_rated = float(venue_rated)
        except:
            pass
        
    #Even if there is no rating, I'd still like to keep track of that...
    if venue_rated == np.nan:
        venue_rated = 'rating_not_available'
        
    #And I'll transform the list back into a string...
#     venue_rated = venue_rated[0]
    
    #NOTE: do i also need to account for when menus don't have titles? because in that case meta_menu_list
    #could/would
    #return null. if so, perhaps just do a 'try excepct:pass' function if it can't find titles, but could it still
    #grab the menu items? maybe i should just put in a "null title" for the meta_menu_n to overcome this
    #I no longer think this is an issue, but maybe something to put in the appendix for later?
    
    meta_menu_list      =  Selector(text=response.text).xpath('//h2[@class="categoryName"]/text()').extract()
        
    for meta_menu_item in range(len(meta_menu_list)):
        
        meta_menu_n         = unicodedata.normalize('NFKD', meta_menu_list[meta_menu_item]).encode('ascii',
                                                                                                   'ignore')
        
#         print "meta menu title %d:" %(meta_menu_item+1), meta_menu_n, "# of meta menus:", len(meta_menu_list)
        
        depth_menus_n_uni = Selector(text=response.text).xpath('//div[@class="menu"]['+str(meta_menu_item+1)+']/\
        div[@class="menuItems"]/div[@class="section"]/div[@class="sectionHeader"]/\
        div[@class="sectionName"]/text()').extract()
        
        for meta_depth_nn in range(len(depth_menus_n_uni)):
            
            depth_menus_n     = unicodedata.normalize('NFKD', depth_menus_n_uni[meta_depth_nn]
                                                     ).encode('ascii','ignore')

            #get the name of the depth menu, and record it's location as 'n_level'
            n_level = meta_depth_nn+1
#             print "depth menu title %d:" %(n_level), depth_menus_n
            
            #let's grab the entire depth menu:
            depth_menu_id_uni = Selector(text=response.text).xpath('//div[@class="menu"]\
            ['+str(meta_menu_item+1)+']/div[@class="menuItems"]/div[@class="section"]['+str(n_level)+']/\
            div[@class="sectionHeader"]/div[@class="sectionName"]/text()').extract()
            depth_menu_id     = unicodedata.normalize('NFKD', depth_menu_id_uni[0]
                                                     ).encode('ascii','ignore')
            depth_menu_id = len(depth_menu_id_uni)
#             print "#id of depth menu:", depth_menu_id
            
            #loop throught the left and right side of each container:
            left_or_right_list = ['left','right']
            
            for left_or_right in left_or_right_list:
    
                #need the length of the [left/right] container, to iterate through:
                container_len_uni = Selector(text=response.text).xpath('//div[@class="menu"]\
                ['+str(meta_menu_item+1)+']/div[@class="menuItems"]/div[@class="section"]['+str(n_level)+']/div\
                [@class="entryContainer"]/div[@class="'+left_or_right+'Column"]/\
                div[@class="entry"]/node()[1]//text()').extract()
#                 print "left_check:", left_or_right, "contain len:", len(container_len_uni)
            
                for section_n in range(len(container_len_uni)):                    
                    
                    #now we can get the name of that menu item...
                    menu_item_name_uni = Selector(text=response.text).xpath('//div[@class="menu"]\
                    ['+str(meta_menu_item+1)+']/div[@class="menuItems"]/div[@class="section"]\
                    ['+str(n_level)+']/div\
                    [@class="entryContainer"]/div[@class="'+left_or_right+'Column"]/div[@class="entry"]\
                    ['+str(section_n+1)+']/node()[1]//text()').extract()
                    menu_item_name     = unicodedata.normalize('NFKD', menu_item_name_uni[0]
                                                                    ).encode('ascii','ignore')
#                     print "menu_item_name:", menu_item_name
                    
                    #and then we can get the price (if there is one...)
                    try:
                        menu_item_price_uni = Selector(text=response.text).xpath('//div[@class="menu"]\
                    ['+str(meta_menu_item+1)+']/div[@class="menuItems"]/div[@class="section"]\
                    ['+str(n_level)+']/div\
                    [@class="entryContainer"]/div[@class="'+left_or_right+'Column"]/div[@class="entry"]\
                    ['+str(section_n+1)+']/node()[2]//text()').extract()
                        menu_item_price     = unicodedata.normalize('NFKD', menu_item_price_uni[0]
                                                                    ).encode('ascii','ignore')
                        menu_item_price = float(menu_item_price)
#                         print "menu_item_price:", menu_item_price
                    except:
#                         print "menu_item_price:", "price_not_available"
                        menu_item_price = 'price_not_available'
                    
                    #and finally the description (if there is one...)
                    try:
                        menu_item_desc_uni = Selector(text=response.text).xpath('//div[@class="menu"]\
                    ['+str(meta_menu_item+1)+']/div[@class="menuItems"]/div[@class="section"]\
                    ['+str(n_level)+']/div\
                    [@class="entryContainer"]/div[@class="'+left_or_right+'Column"]/div[@class="entry"]\
                    ['+str(section_n+1)+']/node()[3]//text()').extract()
                        menu_item_desc     = unicodedata.normalize('NFKD', menu_item_desc_uni[0]
                                                                    ).encode('ascii','ignore')
#                         print "menu_item_desc:", menu_item_desc
                    except:
#                         print "menu_item_desc:", "desc_not_available"
                        menu_item_desc = 'desc_not_available'

                    # Finally, I'll append my results so that when I wrap up the fuction, I can finish with
                    # a prepared set of info, dataframe ready.
                    df_ready_rows.append([venue_name,
                                       venue_desc_list,
                                       venue_menu_url,
                                       venue_rated,
                                       meta_menu_n,
                                       depth_menus_n,
                                       menu_item_name,
                                       menu_item_price,
                                       menu_item_desc])
    return df_ready_rows

In [7]:
# Actually, through the Explore endpoint, I was able to directly grab the menu url, so no need to manually build my
# url this time...
start_time = time.time()
for menu_url in unique_venues_from_explore[:30]:
    try:
        parse_url(menu_url)
    except:
        pass
    print("--- %s seconds ---" % (time.time() - start_time))
#Takes about 4 mins for 30 url's

--- 4.59175300598 seconds ---
--- 20.7690000534 seconds ---
--- 49.7427229881 seconds ---
--- 59.6149230003 seconds ---
--- 63.0322151184 seconds ---
--- 67.710144043 seconds ---
--- 82.1016180515 seconds ---
--- 88.5862419605 seconds ---
--- 90.4797620773 seconds ---
--- 92.4712779522 seconds ---
--- 106.917865038 seconds ---
--- 118.13925004 seconds ---
--- 125.899214029 seconds ---
--- 136.419843912 seconds ---
--- 146.922915936 seconds ---
--- 150.837317944 seconds ---
--- 165.162094116 seconds ---
--- 168.844830036 seconds ---
--- 173.622781038 seconds ---
--- 183.326632977 seconds ---
--- 186.305504084 seconds ---
--- 195.626914024 seconds ---
--- 205.212553978 seconds ---
--- 208.961025 seconds ---
--- 211.175385952 seconds ---
--- 224.969084978 seconds ---
--- 227.076222897 seconds ---
--- 233.700560093 seconds ---
--- 246.751344919 seconds ---
--- 252.437146902 seconds ---


In [8]:
explore_else = pd.DataFrame(df_ready_rows, columns=column_headers)
explore_else.shape

(3103, 9)

In [9]:
explore_else.head()

Unnamed: 0,venue_name,venue_desc_list,venue_menu_url,venue_rated,meta_menu_n,depth_menus_n,menu_item_name,menu_item_price,menu_item_desc
0,Peking Restaurant,[Chinese Restaurant],https://foursquare.com/v/peking-restaurant/4b9...,6.9,Main Menu,Appetizers,Vegetable Pot Stickers (6),4.95,desc_not_available
1,Peking Restaurant,[Chinese Restaurant],https://foursquare.com/v/peking-restaurant/4b9...,6.9,Main Menu,Appetizers,Fried Won Ton (10),3.95,desc_not_available
2,Peking Restaurant,[Chinese Restaurant],https://foursquare.com/v/peking-restaurant/4b9...,6.9,Main Menu,Appetizers,Boiled Shrimp Dumplings (10),5.5,desc_not_available
3,Peking Restaurant,[Chinese Restaurant],https://foursquare.com/v/peking-restaurant/4b9...,6.9,Main Menu,Appetizers,Combination Hot Appetizers (for Two),7.95,"2 egg rolls, 4 fried shrimps and 4 pot stickers."
4,Peking Restaurant,[Chinese Restaurant],https://foursquare.com/v/peking-restaurant/4b9...,6.9,Main Menu,Appetizers,Vegetarian Egg Rolls (4) *,3.95,desc_not_available


In [96]:
# Here I'm just testing out a prototype search function
count = 0
loc_list = []
for i in range(explore_else.shape[0]):
    flat_desc = ' '.join(explore_else.venue_desc_list[i])
    if 'american' in flat_desc.lower():
        count += 1
        loc_list.append(i)
print count, loc_list

37 [2768, 2769, 2770, 2771, 2772, 2773, 2774, 2775, 2776, 2777, 2778, 2779, 2780, 2781, 2782, 2783, 2784, 2785, 2786, 2787, 2788, 2789, 2790, 2791, 2792, 2793, 2794, 2795, 2796, 2797, 2798, 2799, 2800, 2801, 2802, 2803, 2804]


In [97]:
for i in loc_list:
    dffff = explore_else.loc[[i]]
dffff

Unnamed: 0,venue_name,venue_desc_list,venue_menu_url,venue_rated,meta_menu_n,depth_menus_n,menu_item_name,menu_item_price,menu_item_desc
2804,Underdog,"[Hot Dog Joint, American Restaurant, Vegetaria...",https://foursquare.com/v/underdog/49d00adcf964...,8.0,Catering Menu,Organic Condiments,Dijon Mustard,5,desc_not_available


In [None]:
# Below is an alternate approach, using the Search Endpoints

In [4]:
# This will independently pull the venue names and id codes that correspond to the geographical areas
# I paired off in the previous step with grid_pairs. The names and id's will be subesquently used to
# construct menu url's, which I then intend to scrape.
unique_venues_from_search = []
unique_venue_names_from_search = []
start_time = time.time()

for x, y in grid_pairs:
    try:
        search = client.venues.search(params={'ll': '%.2f, %.2f' % (y, x),'query': 'food', 'limit':'50',
                                      'intent':'browse','radius':'800'})
        searched_venue_ids = [search['venues'][i]['id'] for i in range(len(search['venues']))]
        searched_name_ids = [search['venues'][i]['name'] for i in range(len(search['venues']))]
        for next_id, next_name in zip(searched_venue_ids, searched_name_ids):
            unique_venues_from_search.append(next_id)
            unique_venue_names_from_search.append(next_name)
#         print('--- %s loop-active seconds ---' % (time.time() - start_time))
    except:
        print('Sleeping...')
#         time.sleep(random.randint(115,140))
print('--- %s active seconds ---' % (time.time() - start_time))

--- 11.4930729866 active seconds ---


In [5]:
print len(unique_venue_names_from_search), len(unique_venues_from_search)

358 358


In [6]:
unique_venue_names_from_search[:5], unique_venues_from_search[:5]

([u'Food Fair',
  u'Food Fair',
  u'Asian American Food Company',
  u'Other Avenues Food Store',
  u'7-Eleven'],
 [u'4e3d8765ae60454236667cc4',
  u'4e3d8765ae60454236667cc4',
  u'463bfdccf964a52026461fe3',
  u'4a90954ff964a520a61820e3',
  u'4afba001f964a520d51e22e3'])

In [7]:
# This will only work for the Search Endpoint
menu_urls_from_search = []
base_url = 'https://foursquare.com/v/'
for venue_id, venue_name in zip(unique_venues_from_search, unique_venue_names_from_search):
    dat_id = unicodedata.normalize('NFKD', venue_id).encode('ascii','ignore')
    dat_name = unicodedata.normalize('NFKD', venue_name).encode('ascii','ignore')
    dat_name = dat_name.lower().replace('/','-').replace(' ','-')
    transformed_url = base_url+dat_name+'/'+dat_id+'/menu'
    menu_urls_from_search.append(transformed_url)
len(menu_urls_from_search)

358

In [8]:
# Testing that no venues have been duplicated with my searches
unique_urls_from_search = list(set(menu_urls_from_search))

In [10]:
len(unique_urls_from_search)
#This amounts to roughly 8% of my total venues searched i.e. 280/(70*50)

280

In [12]:
# Could be not working because they don't have a menu, or because they're not a restaurant. So could be useful to
# later determine if they are or aren't restaurants to begin with
start_time = time.time()
for menu_url in unique_urls_from_search[:30]:
    try:
        parse_url(menu_url)
    except:
        pass
    print("--- %s seconds ---" % (time.time() - start_time))

--- 0.49055480957 seconds ---
--- 12.5686228275 seconds ---
--- 12.7575688362 seconds ---
--- 12.9671459198 seconds ---
--- 13.1560468674 seconds ---
--- 13.3051497936 seconds ---
--- 25.0519609451 seconds ---
--- 25.3167629242 seconds ---
--- 25.493188858 seconds ---
--- 37.2827179432 seconds ---
--- 37.6100490093 seconds ---
--- 52.5052340031 seconds ---
--- 52.6898667812 seconds ---
--- 53.4898679256 seconds ---
--- 62.1172599792 seconds ---
--- 64.9277229309 seconds ---
--- 65.1554119587 seconds ---
--- 77.1680719852 seconds ---
--- 77.3154718876 seconds ---
--- 77.5070137978 seconds ---
--- 89.2520778179 seconds ---
--- 89.5217859745 seconds ---
--- 89.6632127762 seconds ---
--- 89.8353009224 seconds ---
--- 90.3072829247 seconds ---
--- 90.7108669281 seconds ---
--- 101.565645933 seconds ---
--- 101.976698875 seconds ---
--- 113.615211964 seconds ---
--- 113.790491819 seconds ---


In [16]:
search_else = pd.DataFrame(df_ready_rows, columns=column_headers)
search_else.shape

(43, 9)

In [17]:
for j in search_else.venue_desc_list:
    if 'vegan' in j[0]:
        print "vegans"

In [18]:
search_else.loc[:50[]

Unnamed: 0,venue_name,venue_desc_list,venue_menu_url,venue_rated,meta_menu_n,depth_menus_n,menu_item_name,menu_item_price,menu_item_desc
0,Victor's,[Mexican Restaurant],https://foursquare.com/v/victor's/4a898fc8f964...,7.5,Main Menu,Tacos,Beef Or Grilled Chicken,2.65,desc_not_available
1,Victor's,[Mexican Restaurant],https://foursquare.com/v/victor's/4a898fc8f964...,7.5,Main Menu,Tacos,Chile Verde Or Pork,2.65,desc_not_available
2,Victor's,[Mexican Restaurant],https://foursquare.com/v/victor's/4a898fc8f964...,7.5,Main Menu,Tacos,Chicken Mole,2.65,desc_not_available
3,Victor's,[Mexican Restaurant],https://foursquare.com/v/victor's/4a898fc8f964...,7.5,Main Menu,Tacos,Nachos,1.5,"Cheese, salsa, sour cream, refried beans, grou..."
4,Victor's,[Mexican Restaurant],https://foursquare.com/v/victor's/4a898fc8f964...,7.5,Main Menu,Tortas,Beef Or Grilled Chicken,5.7,desc_not_available


In [19]:
search_else.venue_menu_url.unique()

array(["https://foursquare.com/v/victor's/4a898fc8f964a520660820e3/menu",
       'https://foursquare.com/v/artesano/51e0ad55498eb7f2b6ed10e2/menu'], dtype=object)

In [None]:
#BELOW ARE POTENTIALLY USEFUL, BUT UNUSED MATERIAL::

In [72]:
# This will be the func to add the next offset to my completed venue list...
def extend_unique_venues(unique_venues, proposed_venue):
    if proposed_venue not in unique_venues:
        unique_venues.append(proposed_venue)

In [None]:
# # Category/column titles, in order:
# # [venue_name, venue_desc_list, venue_menu_url, venue_rated], [meta_menu_n], [depth_menus_n], [menu_item_name,
# # menu_item_price, menu_item_desc]

# venue_rows = []
# for [venue_name, venue_desc_list, venue_menu_url, venue_rated] in venues: 
#     for meta_menu in meta_menu_n:
#         for depth_menu in depth_menus_n:
#             venue_rows.append([venue_name,
#                                venue_desc_list,
#                                venue_rated,
#                                meta_menu,
#                                depth_menu,
#                                menu_item_name,
#                                menu_item_price,
#                                menu_item_desc])

In [9]:
# # Category/column titles, in order:
# # [venue_name, venue_desc_list, venue_menu_url, venue_rated], [meta_menu_n], [depth_menus_n], [menu_item_name,
# # menu_item_price, menu_item_desc]

# venue_dict = {}
# for [venue_name, venue_desc_list, venue_menu_url, venue_rated] in venues:
#     venue_dict[venue_name] = {'desc_list':venue_desc_list,
#                               'menu_url':venue_menu_url,
#                               'rating':venue_rated}
    
#     for meta_menu in meta_menu_n:
#         venue_dict[venue_name][meta_menu] = {}
            
#         for depth_menu in depth_menus_n:
#             venue_dict[venue_name][meta_menu][depth_menu] = {'menu_item_name':menu_item_name,
#                                                              'menu_item_price':menu_item_price,
#                                                              'menu_item_desc':menu_item_desc}
            
