### Overview of Project
This project explores the cuisine diversity of 11 North American metros using the publicly accessible Yelp dataset. You can download the dataset here.

Diversity is something we often strive for; diverse systems allow for more inclusion and are more robust to change. Diverse areas may also be more enticing to visit as they allow a wider range of experiences. In order to investigate diversity, we can use <b>[diversity indices](https://en.wikipedia.org/wiki/Diversity_index)</b>, which measure diversity through the number and abundance of different types in a dataset. In this project, I examine cuisine diversity among a group of North American metros by using the cuisine categories of their restaurants. I calculate three measures of diversity for each metro, as well as look deeper into the most and least common categories.

Measures of diversity are founded in [information theory](https://en.wikipedia.org/wiki/Information_theory), but the majority of their use and development comes from the field of ecology. In ecology, these measures are used to compare species diversity among communities, based on the number of unique species sampled and how many individuals of that species are sampled. They're also commonly used in population genetics. However, they are perfect for comparing cuisine diversity among different metros. They can help us understand which metros have the highest diversity overall and also which metros have the highest diversity <b>for their size </b>, since larger metros are expected to have higher diversity as each additional restaurant we add gives us another chance to have a new cuisine type.

Measures of diversity are based on two basic ideas, richness and evenness:

#### Richness
Richness refers to a very simple measure: the number of unique categories sampled. In ecology, this would be the number of unique species found in a community. In our project, richness refers to the number of unique cuisine categories found in a metro. This measure can tell you: if I go to a particular metro, how many different types of cuisines can I find (even if I have to travel a far distance)? A higher number means more cuisines.

#### Evenness
Evenness refers to how evenly the individual items (species, restaurants) are distributed across the different categories. Consider city A which has 10 restaurants which are all Italian, vs. city B which has 10 restaurants which are each a different cuisine. City A would have low evenness, whereas city B would have high evenness.

#### Shannon's H and Simpson's D
Shannon's H and Simpson's D (sometimes called Shannon Index and Simpson Index) are metrics that incorporate both the number of unique categories (richness) as well as the relative abundance of those categories (eveness). In this project, these metrics relate to how many different cuisines are present in a metro and how common they are. They can help answer the question: if I go to a particular metro, how likely am I to find different types of cuisines nearby? Metros with low diversity (low values of these metrics) will have fewer types of cuisines and/or be dominated by certain common cuisines. Metros with higher diversity will have more types of cuisines and/or the percentage of each cuisine type will be more similar. Shannon's H is weighted more towards richness: it gives more weight to each additional cuisine type; Simpson's D is weighted more towards evenness: it gives more weight to having an even distribution of cuisine types. Thus a metro with a lower Shannon's H but a higher Simpson's D will have fewer cuisine types but have more even distribution of restaurants among the cuisine types it has. In contrast, a metro with a higher Shannon's H but lower Simpson's D will have more cuisine types but will be dominated by more common cuisines.


### Data & Methods
I used the Yelp public dataset, which samples from 10 US metros (Boise, Indianapolis, Nashville, New Orleans, Philadelphia, Reno, Santa Barbara, St. Louis, Tampa/St. Petersburg,  and Tucson) and one Canadian metro (Edmonton). I removed all businesses that weren't from one of these metros and limited the businesses to those in the Restaurants category and those with at least one national/regional cuisine category. Diversity metrics require a single category for each item, so I used the most unique category listed for each restaurant based on all the restaurants in the sample. For example, if a restaurant was categorized as Chinese and Szechuan, I labeled it as Szechuan, since there are fewer Szechuan restaurants than Chinese. I counted the number of unique cuisines (richness) and the number of restaurants with each type of cuisine for each metro and divided by the total number of restaurants to get the relative frequency. I used the relative frequency to calculate Shannon's H and Simpson's D, and I plotted each metric against the number of restaurants in the city, since I expect the diversity to go up as metros gain more restaurants. I also plotted a regression line to compare each diversity metric to the expected value for that size metro.

 


In [1]:
import altair as alt
import numpy as np
import pandas as pd

pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

### Read in data

In [2]:
raw_df = pd.read_json(
    "/Users/amyshapiro/Documents/yelp_dataset/yelp_academic_dataset_business.json",
    lines=True
)
# see data structure
raw_df.head()

Unnamed: 0,business_id,name,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,attributes,categories,hours
0,Pns2l4eNsfO8kk83dixA6A,"Abby Rappoport, LAC, CMQ","1616 Chapala St, Ste 2",Santa Barbara,CA,93101,34.426679,-119.711197,5.0,7,0,{'ByAppointmentOnly': 'True'},"Doctors, Traditional Chinese Medicine, Naturop...",
1,mpf3x-BjTdTEA3yCZrAYPw,The UPS Store,87 Grasso Plaza Shopping Center,Affton,MO,63123,38.551126,-90.335695,3.0,15,1,{'BusinessAcceptsCreditCards': 'True'},"Shipping Centers, Local Services, Notaries, Ma...","{'Monday': '0:0-0:0', 'Tuesday': '8:0-18:30', ..."
2,tUFrWirKiKi_TAnsVWINQQ,Target,5255 E Broadway Blvd,Tucson,AZ,85711,32.223236,-110.880452,3.5,22,0,"{'BikeParking': 'True', 'BusinessAcceptsCredit...","Department Stores, Shopping, Fashion, Home & G...","{'Monday': '8:0-22:0', 'Tuesday': '8:0-22:0', ..."
3,MTSW4McQd7CbVtyjqoe9mw,St Honore Pastries,935 Race St,Philadelphia,PA,19107,39.955505,-75.155564,4.0,80,1,"{'RestaurantsDelivery': 'False', 'OutdoorSeati...","Restaurants, Food, Bubble Tea, Coffee & Tea, B...","{'Monday': '7:0-20:0', 'Tuesday': '7:0-20:0', ..."
4,mWMc6_wTdE0EUBKIGXDVfA,Perkiomen Valley Brewery,101 Walnut St,Green Lane,PA,18054,40.338183,-75.471659,4.5,13,1,"{'BusinessAcceptsCreditCards': 'True', 'Wheelc...","Brewpubs, Breweries, Food","{'Wednesday': '14:0-22:0', 'Thursday': '16:0-2..."


In [3]:
# set index to business id; this helps us when doing further analysis
raw_df.set_index('business_id', inplace=True)

### Clean/prepare data
Cleaning and preparing data for analysis is an important step where the `pandas` package can be very helpful.

Yelp businesses can have multiple category types, some of which are parent categories (i.e. `Restaurants` or `Home Services`) and others which are child categories (e.g. `Pizza`, `Italian`, `Plumbers`, `Home Cleaners`).

For this analysis, I want to limit the data to only businesses within the `Restaurants` parent category, and then only restaurants that have a national or regional cuisine categorization (i.e. `Italian` or `Southern` vs. `Pizza` or `Barbeque`).

I also want to categorize each business into a single metro area. The Yelp dataset samples from 10 metro areas from the US and one from Canada. Many of these are from the metro city itself, but many are from surrounding area or have mispelled or alternate cities. In order to categorize each business to a city, I'll use the state and assume the business is part of the metro area if it's in that state (each state only falls into one metro area). There are some businesses that are from outlier states as well, and I'll remove these.

In [4]:
raw_df.groupby(['state']).count().sort_values('name', ascending=False)[['name']]

Unnamed: 0_level_0,name
state,Unnamed: 1_level_1
PA,34039
FL,26330
TN,12056
IN,11247
MO,10913
LA,9924
AZ,9912
NJ,8536
NV,7715
AB,5573


In [5]:
state_dict = {
    'Boise': ['ID'],
    'Edmonton': ['AB'],
    'Indianapolis': ['IN'],
    'Nashville': ['TN'],
    'New Orleans': ['LA'],
    'Philadelphia': ['DE', 'NJ', 'PA'],
    'Reno': ['NV'],
    'Santa Barbara': ['CA'],
    'St. Louis': ['IL', 'MO'],
    'Tampa-St. Petersburg': ['FL'],
    'Tucson': ['AZ'],
}
raw_df = raw_df.loc[raw_df['state'].isin(
    [st for st_list in state_dict.values() for st in st_list])]

def apply_state(state, state_dict=state_dict):
    for k, v in state_dict.items():
        if state in v:
            return k

raw_df['metro'] = raw_df['state'].apply(apply_state)
raw_df.groupby('metro').count()['name']

metro
Boise                    4467
Edmonton                 5573
Indianapolis            11247
Nashville               12056
New Orleans              9924
Philadelphia            44840
Reno                     7715
Santa Barbara            5203
St. Louis               13058
Tampa-St. Petersburg    26330
Tucson                   9912
Name: name, dtype: int64

In [6]:
# limit to restaurants only
raw_df['is_restaurant'] = raw_df['categories'].apply(lambda x: 'Restaurants' in str(x))
raw_df = raw_df.loc[raw_df['is_restaurant'] == True]

# split restaurants categories string
raw_df['cat'] = raw_df['categories'].apply(lambda x: str(x).split(', '))

# get a df with categories as columns + one hot encoding for each category + business
df_cats = raw_df['cat'].str.join('|').str.get_dummies()
df_cats.head()

Unnamed: 0_level_0,Acai Bowls,Accessories,Accountants,Active Life,Acupuncture,Adult,Adult Education,Adult Entertainment,Advertising,Afghan,African,Air Duct Cleaning,Airlines,Airport Lounges,Airport Shuttles,Airport Terminals,Airports,Amateur Sports Teams,American (New),American (Traditional),Amusement Parks,Animal Shelters,Antiques,Apartments,Appliances,Appliances & Repair,Aquariums,Arabic,Arcades,Argentine,Armenian,Art Classes,Art Galleries,Art Museums,Art Tours,Arts & Crafts,Arts & Entertainment,Asian Fusion,Auction Houses,Audio/Visual Equipment Rental,Australian,Austrian,Auto Customization,Auto Detailing,Auto Glass Services,Auto Loan Providers,Auto Parts & Supplies,Auto Repair,Automotive,Axe Throwing,Bagels,Bakeries,Bangladeshi,Banks & Credit Unions,Bar Crawl,Barbeque,Barbers,Bars,Bartenders,Basque,Battery Stores,Batting Cages,Beach Bars,Beaches,Beauty & Spas,Bed & Breakfast,Beer,Beer Bar,Beer Gardens,Beer Hall,Beer Tours,Belgian,Beverage Store,Bike Rentals,Bike Repair/Maintenance,Bikes,Billing Services,Bistros,Blow Dry/Out Services,Boat Charters,Boat Repair,Boat Tours,Boating,Bocce Ball,Body Shops,Books,Bookstores,Botanical Gardens,Bowling,Brasseries,Brazilian,Breakfast & Brunch,Breweries,Brewing Supplies,Brewpubs,Bridal,British,Bubble Tea,Buddhist Temples,Buffets,Building Supplies,Burgers,Burmese,Bus Tours,Business Consulting,Business Law,Butcher,CSA,Cabaret,Cafes,Cafeteria,Cajun/Creole,Calabrian,Cambodian,Canadian (New),Candle Stores,Candy Stores,Cannabis Clinics,Cannabis Dispensaries,Cantonese,Car Dealers,Car Share Services,Car Stereo Installation,Car Wash,Cards & Stationery,Caribbean,Carpet Cleaning,Casinos,Caterers,Champagne Bars,Check Cashing/Pay-day Loans,Cheese Shops,Cheesesteaks,Chicken Shop,Chicken Wings,Child Care & Day Care,Children's Clothing,Chinese,Chiropractors,Chocolatiers & Shops,Christmas Trees,Churches,Cideries,Cigar Bars,Cinema,Climbing,Clothing Rental,Club Crawl,Cocktail Bars,Coffee & Tea,Coffee & Tea Supplies,Coffee Roasteries,Coffeeshops,Colleges & Universities,Colombian,Comedy Clubs,Comfort Food,Comic Books,Commercial Truck Dealers,Commercial Truck Repair,Community Centers,Community Service/Non-Profit,Contract Law,Contractors,Convenience Stores,Conveyor Belt Sushi,Cooking Classes,Cooking Schools,Cosmetic Surgeons,Cosmetics & Beauty Supply,Costumes,Counseling & Mental Health,Countertop Installation,Country Clubs,Country Dance Halls,Couriers & Delivery Services,Crane Services,Creperies,Cuban,Cucina campana,Cultural Center,Cupcakes,Custom Cakes,Customized Merchandise,Czech,DJs,Damage Restoration,Dance Clubs,Dance Schools,Dance Studios,Day Spas,Delicatessen,Delis,Dentists,Department Stores,Departments of Motor Vehicles,Desserts,Diagnostic Imaging,Diagnostic Services,Dietitians,Dim Sum,Diners,Dinner Theater,Discount Store,Distilleries,Dive Bars,Divorce & Family Law,Do-It-Yourself Food,Doctors,Dog Parks,Dominican,Donairs,Donburi,Donuts,Drive-In Theater,Drive-Thru Bars,Drugstores,Dry Cleaning,Dry Cleaning & Laundry,Dumplings,Eastern European,Eatertainment,Education,Educational Services,Egyptian,Electronics,Electronics Repair,Elementary Schools,Empanadas,Employment Law,Escape Games,Ethical Grocery,Ethiopian,Ethnic Food,Ethnic Grocery,Event Planning & Services,Eyebrow Services,Eyelash Service,Eyewear & Opticians,Falafel,Family Practice,Farmers Market,Farms,Fashion,Fast Food,Festivals,Filipino,Financial Advising,Financial Services,Fire Departments,Fireplace Services,Fireworks,Fish & Chips,Fishing,Fitness & Instruction,Flea Markets,Flight Instruction,Floral Designers,Florists,Flowers & Gifts,Fondue,Food,Food Banks,Food Court,Food Delivery Services,Food Stands,Food Tours,Food Trucks,French,Fruits & Veggies,Funeral Services & Cemeteries,Furniture Reupholstery,Furniture Stores,Fuzhou,Garage Door Services,Gardeners,Gas Stations,Gastropubs,Gay Bars,Gelato,Gemstones & Minerals,General Dentistry,General Litigation,Georgian,German,Gift Shops,Gluten-Free,Go Karts,Golf,Golf Cart Rentals,Golf Equipment,Golf Lessons,Graphic Design,Greek,Grill Services,Grilling Equipment,Grocery,Guamanian,Guest Houses,Gun/Rifle Ranges,Guns & Ammo,Gymnastics,Gyms,Hainan,Hair Extensions,Hair Loss Centers,Hair Removal,Hair Salons,Hair Stylists,Haitian,Hakka,Halal,Hardware Stores,Haunted Houses,Hawaiian,Head Shops,Health & Medical,Health Markets,Health Retreats,Heating & Air Conditioning/HVAC,Herbal Shops,Herbs & Spices,Hiking,Himalayan/Nepalese,Hindu Temples,Historical Tours,Hobby Shops,Holiday Decorations,Home & Garden,Home Cleaning,Home Decor,Home Organization,Home Services,Home Theatre Installation,Homeowner Association,Honduran,Honey,Hong Kong Style Cafe,Hookah Bars,Horse Racing,Horseback Riding,Hospitals,Hostels,Hot Dogs,Hot Pot,Hotel bar,Hotels,Hotels & Travel,Hungarian,Hunting & Fishing Supplies,IT Services & Computer Repair,Iberian,Ice Cream & Frozen Yogurt,Ice Delivery,Imported Food,Indian,Indonesian,Indoor Playcentre,Interior Design,Internal Medicine,International,International Grocery,Internet Cafes,Internet Service Providers,Investing,Irish,Irish Pub,Irrigation,Israeli,Italian,Izakaya,Jails & Prisons,Japanese,Japanese Curry,Jazz & Blues,Jewelry,Juice Bars & Smoothies,Junk Removal & Hauling,Karaoke,Kebab,Keys & Locksmiths,Kids Activities,Kids Hair Salons,Kiosk,Kitchen & Bath,Kitchen Incubators,Kitchen Supplies,Kombucha,Korean,Kosher,LAN Centers,Lahmacun,Landmarks & Historical Buildings,Landscape Architects,Landscaping,Laotian,Laser Tag,Latin American,Laundromat,Laundry Services,Lawn Services,Lawyers,Leather Goods,Lebanese,Legal Services,Leisure Centers,Libraries,Limos,Live/Raw Food,Livestock Feed & Supply,Local Flavor,Local Services,Lounges,Macarons,Mags,Makeup Artists,Malaysian,Marinas,Marketing,Mass Media,Massage,Massage Therapy,Maternity Wear,Mattresses,Meat Shops,Medical Centers,Medical Spas,Mediterranean,Men's Clothing,Men's Hair Salons,Mexican,Middle Eastern,Mini Golf,Mobile Phone Repair,Mobile Phones,Modern European,Mongolian,Moroccan,Motorcycle Dealers,Movers,Municipality,Museums,Music & DVDs,Music & Video,Music Production Services,Music Venues,Musical Instruments & Teachers,Musicians,Nail Salons,Nail Technicians,Naturopathic/Holistic,Neurologist,New Mexican Cuisine,Newspapers & Magazines,Nicaraguan,Nightlife,Noodles,Nurseries & Gardening,Nutritionists,Office Cleaning,Officiants,Oil Change Stations,Olive Oil,Oral Surgeons,Organic Stores,Oriental,Osteopaths,Outdoor Furniture Stores,Outdoor Gear,Outlet Stores,Packing Services,Paddleboarding,Paint & Sip,Painters,Pakistani,Pan Asian,Pancakes,Parent Cafes,Parking,Parks,Party & Event Planning,Party Bus Rentals,Party Characters,Party Equipment Rentals,Party Supplies,Pasta Shops,Patisserie/Cake Shop,Pawn Shops,Pediatricians,Performing Arts,Perfume,Persian/Iranian,Personal Assistants,Personal Chefs,Personal Injury Law,Personal Shopping,Peruvian,Pest Control,Pet Adoption,Pet Boarding,Pet Groomers,Pet Photography,Pet Services,Pet Sitting,Pet Stores,Pet Training,Pets,Pharmacy,Photographers,Photography Stores & Services,Piano Bars,Pick Your Own Farms,Piercing,Pilates,Pita,Pizza,Plastic Surgeons,Playgrounds,Plumbing,Poke,Polish,Pool & Billiards,Pool Halls,Pop-Up Restaurants,Pop-up Shops,Popcorn Shops,Portuguese,Poutineries,Preschools,Pressure Washers,Pretzels,Print Media,Private Tutors,Professional Services,Propane,Property Management,Psychiatrists,Public Markets,Public Services & Government,Public Transportation,Pubs,Puerto Rican,RV Parks,RV Repair,Race Tracks,Radio Stations,Radiologists,Rafting/Kayaking,Ramen,Real Estate,Real Estate Agents,Real Estate Law,Real Estate Services,Recording & Rehearsal Studios,Recreation Centers,Recycling Center,Reflexology,Reiki,Religious Items,Religious Organizations,Resorts,Rest Stops,Restaurant Supplies,Restaurants,Roadside Assistance,Rolfing,Roman,Roofing,Russian,Salad,Salvadoran,Sandwiches,Sardinian,Scandinavian,Scottish,Screen Printing/T-Shirt Printing,Seafood,Seafood Markets,Security Systems,Self Storage,Senegalese,Septic Services,Serbo Croatian,Service Stations,Session Photography,Shanghainese,Shared Office Spaces,Shaved Ice,Shaved Snow,Shipping Centers,Shoe Stores,Shopping,Shopping Centers,Sicilian,Singaporean,Skate Shops,Skating Rinks,Ski Resorts,Skin Care,Smog Check Stations,Smokehouse,Soccer,Social Clubs,Solar Installation,Somali,Soul Food,Soup,South African,Southern,Souvenir Shops,Spanish,Speakeasies,Special Education,Specialty Food,Specialty Schools,Spiritual Shop,Sporting Goods,Sports Bars,Sports Betting,Sports Clubs,Sports Wear,Sri Lankan,Stadiums & Arenas,Steakhouses,Street Vendors,Strip Clubs,Supernatural Readings,Supper Clubs,Sushi Bars,Swimming Pools,Syrian,Szechuan,TV Mounting,Tabletop Games,Tableware,Tacos,Taiwanese,Tapas Bars,Tapas/Small Plates,Tasting Classes,Tattoo,Tax Services,Taxis,Tea Rooms,Team Building Activities,Tennis,Teppanyaki,Tex-Mex,Thai,Themed Cafes,Threading Services,Thrift Stores,Ticket Sales,Tiki Bars,Tires,Tobacco Shops,Tonkatsu,Tours,Towing,Toy Stores,Traditional Chinese Medicine,Trailer Repair,Train Stations,Trainers,Trains,Trampoline Parks,Transmission Repair,Transportation,Travel Agents,Travel Services,Tree Services,Trinidadian,Trivia Hosts,Truck Rental,Turkish,Tuscan,Tutoring Centers,Ukrainian,University Housing,Unofficial Yelp Events,Used,Used Bookstore,Used Car Dealers,Uzbek,Vacation Rentals,Vape Shops,Vegan,Vegetarian,Venezuelan,Venues & Event Spaces,Veterinarians,Video Game Stores,Videos & Video Game Rental,Vietnamese,Vintage & Consignment,Vinyl Records,Virtual Reality Centers,Visitor Centers,Vitamins & Supplements,Waffles,Walking Tours,Watches,Water Heater Installation/Repair,Water Purification Services,Water Stores,Waxing,Web Design,Wedding Chapels,Wedding Planning,Weight Loss Centers,Wheel & Rim Repair,Whiskey Bars,Wholesale Stores,Wholesalers,Wigs,Wildlife Control,Windshield Installation & Repair,Wine & Spirits,Wine Bars,Wine Tasting Classes,Wine Tasting Room,Wine Tours,Wineries,Women's Clothing,Wraps,Yelp Events,Yoga,Zoos
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1,Unnamed: 365_level_1,Unnamed: 366_level_1,Unnamed: 367_level_1,Unnamed: 368_level_1,Unnamed: 369_level_1,Unnamed: 370_level_1,Unnamed: 371_level_1,Unnamed: 372_level_1,Unnamed: 373_level_1,Unnamed: 374_level_1,Unnamed: 375_level_1,Unnamed: 376_level_1,Unnamed: 377_level_1,Unnamed: 378_level_1,Unnamed: 379_level_1,Unnamed: 380_level_1,Unnamed: 381_level_1,Unnamed: 382_level_1,Unnamed: 383_level_1,Unnamed: 384_level_1,Unnamed: 385_level_1,Unnamed: 386_level_1,Unnamed: 387_level_1,Unnamed: 388_level_1,Unnamed: 389_level_1,Unnamed: 390_level_1,Unnamed: 391_level_1,Unnamed: 392_level_1,Unnamed: 393_level_1,Unnamed: 394_level_1,Unnamed: 395_level_1,Unnamed: 396_level_1,Unnamed: 397_level_1,Unnamed: 398_level_1,Unnamed: 399_level_1,Unnamed: 400_level_1,Unnamed: 401_level_1,Unnamed: 402_level_1,Unnamed: 403_level_1,Unnamed: 404_level_1,Unnamed: 405_level_1,Unnamed: 406_level_1,Unnamed: 407_level_1,Unnamed: 408_level_1,Unnamed: 409_level_1,Unnamed: 410_level_1,Unnamed: 411_level_1,Unnamed: 412_level_1,Unnamed: 413_level_1,Unnamed: 414_level_1,Unnamed: 415_level_1,Unnamed: 416_level_1,Unnamed: 417_level_1,Unnamed: 418_level_1,Unnamed: 419_level_1,Unnamed: 420_level_1,Unnamed: 421_level_1,Unnamed: 422_level_1,Unnamed: 423_level_1,Unnamed: 424_level_1,Unnamed: 425_level_1,Unnamed: 426_level_1,Unnamed: 427_level_1,Unnamed: 428_level_1,Unnamed: 429_level_1,Unnamed: 430_level_1,Unnamed: 431_level_1,Unnamed: 432_level_1,Unnamed: 433_level_1,Unnamed: 434_level_1,Unnamed: 435_level_1,Unnamed: 436_level_1,Unnamed: 437_level_1,Unnamed: 438_level_1,Unnamed: 439_level_1,Unnamed: 440_level_1,Unnamed: 441_level_1,Unnamed: 442_level_1,Unnamed: 443_level_1,Unnamed: 444_level_1,Unnamed: 445_level_1,Unnamed: 446_level_1,Unnamed: 447_level_1,Unnamed: 448_level_1,Unnamed: 449_level_1,Unnamed: 450_level_1,Unnamed: 451_level_1,Unnamed: 452_level_1,Unnamed: 453_level_1,Unnamed: 454_level_1,Unnamed: 455_level_1,Unnamed: 456_level_1,Unnamed: 457_level_1,Unnamed: 458_level_1,Unnamed: 459_level_1,Unnamed: 460_level_1,Unnamed: 461_level_1,Unnamed: 462_level_1,Unnamed: 463_level_1,Unnamed: 464_level_1,Unnamed: 465_level_1,Unnamed: 466_level_1,Unnamed: 467_level_1,Unnamed: 468_level_1,Unnamed: 469_level_1,Unnamed: 470_level_1,Unnamed: 471_level_1,Unnamed: 472_level_1,Unnamed: 473_level_1,Unnamed: 474_level_1,Unnamed: 475_level_1,Unnamed: 476_level_1,Unnamed: 477_level_1,Unnamed: 478_level_1,Unnamed: 479_level_1,Unnamed: 480_level_1,Unnamed: 481_level_1,Unnamed: 482_level_1,Unnamed: 483_level_1,Unnamed: 484_level_1,Unnamed: 485_level_1,Unnamed: 486_level_1,Unnamed: 487_level_1,Unnamed: 488_level_1,Unnamed: 489_level_1,Unnamed: 490_level_1,Unnamed: 491_level_1,Unnamed: 492_level_1,Unnamed: 493_level_1,Unnamed: 494_level_1,Unnamed: 495_level_1,Unnamed: 496_level_1,Unnamed: 497_level_1,Unnamed: 498_level_1,Unnamed: 499_level_1,Unnamed: 500_level_1,Unnamed: 501_level_1,Unnamed: 502_level_1,Unnamed: 503_level_1,Unnamed: 504_level_1,Unnamed: 505_level_1,Unnamed: 506_level_1,Unnamed: 507_level_1,Unnamed: 508_level_1,Unnamed: 509_level_1,Unnamed: 510_level_1,Unnamed: 511_level_1,Unnamed: 512_level_1,Unnamed: 513_level_1,Unnamed: 514_level_1,Unnamed: 515_level_1,Unnamed: 516_level_1,Unnamed: 517_level_1,Unnamed: 518_level_1,Unnamed: 519_level_1,Unnamed: 520_level_1,Unnamed: 521_level_1,Unnamed: 522_level_1,Unnamed: 523_level_1,Unnamed: 524_level_1,Unnamed: 525_level_1,Unnamed: 526_level_1,Unnamed: 527_level_1,Unnamed: 528_level_1,Unnamed: 529_level_1,Unnamed: 530_level_1,Unnamed: 531_level_1,Unnamed: 532_level_1,Unnamed: 533_level_1,Unnamed: 534_level_1,Unnamed: 535_level_1,Unnamed: 536_level_1,Unnamed: 537_level_1,Unnamed: 538_level_1,Unnamed: 539_level_1,Unnamed: 540_level_1,Unnamed: 541_level_1,Unnamed: 542_level_1,Unnamed: 543_level_1,Unnamed: 544_level_1,Unnamed: 545_level_1,Unnamed: 546_level_1,Unnamed: 547_level_1,Unnamed: 548_level_1,Unnamed: 549_level_1,Unnamed: 550_level_1,Unnamed: 551_level_1,Unnamed: 552_level_1,Unnamed: 553_level_1,Unnamed: 554_level_1,Unnamed: 555_level_1,Unnamed: 556_level_1,Unnamed: 557_level_1,Unnamed: 558_level_1,Unnamed: 559_level_1,Unnamed: 560_level_1,Unnamed: 561_level_1,Unnamed: 562_level_1,Unnamed: 563_level_1,Unnamed: 564_level_1,Unnamed: 565_level_1,Unnamed: 566_level_1,Unnamed: 567_level_1,Unnamed: 568_level_1,Unnamed: 569_level_1,Unnamed: 570_level_1,Unnamed: 571_level_1,Unnamed: 572_level_1,Unnamed: 573_level_1,Unnamed: 574_level_1,Unnamed: 575_level_1,Unnamed: 576_level_1,Unnamed: 577_level_1,Unnamed: 578_level_1,Unnamed: 579_level_1,Unnamed: 580_level_1,Unnamed: 581_level_1,Unnamed: 582_level_1,Unnamed: 583_level_1,Unnamed: 584_level_1,Unnamed: 585_level_1,Unnamed: 586_level_1,Unnamed: 587_level_1,Unnamed: 588_level_1,Unnamed: 589_level_1,Unnamed: 590_level_1,Unnamed: 591_level_1,Unnamed: 592_level_1,Unnamed: 593_level_1,Unnamed: 594_level_1,Unnamed: 595_level_1,Unnamed: 596_level_1,Unnamed: 597_level_1,Unnamed: 598_level_1,Unnamed: 599_level_1,Unnamed: 600_level_1,Unnamed: 601_level_1,Unnamed: 602_level_1,Unnamed: 603_level_1,Unnamed: 604_level_1,Unnamed: 605_level_1,Unnamed: 606_level_1,Unnamed: 607_level_1,Unnamed: 608_level_1,Unnamed: 609_level_1,Unnamed: 610_level_1,Unnamed: 611_level_1,Unnamed: 612_level_1,Unnamed: 613_level_1,Unnamed: 614_level_1,Unnamed: 615_level_1,Unnamed: 616_level_1,Unnamed: 617_level_1,Unnamed: 618_level_1,Unnamed: 619_level_1,Unnamed: 620_level_1,Unnamed: 621_level_1,Unnamed: 622_level_1,Unnamed: 623_level_1,Unnamed: 624_level_1,Unnamed: 625_level_1,Unnamed: 626_level_1,Unnamed: 627_level_1,Unnamed: 628_level_1,Unnamed: 629_level_1,Unnamed: 630_level_1,Unnamed: 631_level_1,Unnamed: 632_level_1,Unnamed: 633_level_1,Unnamed: 634_level_1,Unnamed: 635_level_1,Unnamed: 636_level_1,Unnamed: 637_level_1,Unnamed: 638_level_1,Unnamed: 639_level_1,Unnamed: 640_level_1,Unnamed: 641_level_1,Unnamed: 642_level_1,Unnamed: 643_level_1,Unnamed: 644_level_1,Unnamed: 645_level_1,Unnamed: 646_level_1,Unnamed: 647_level_1,Unnamed: 648_level_1,Unnamed: 649_level_1,Unnamed: 650_level_1,Unnamed: 651_level_1,Unnamed: 652_level_1,Unnamed: 653_level_1,Unnamed: 654_level_1,Unnamed: 655_level_1,Unnamed: 656_level_1,Unnamed: 657_level_1,Unnamed: 658_level_1,Unnamed: 659_level_1,Unnamed: 660_level_1,Unnamed: 661_level_1,Unnamed: 662_level_1,Unnamed: 663_level_1,Unnamed: 664_level_1,Unnamed: 665_level_1,Unnamed: 666_level_1,Unnamed: 667_level_1,Unnamed: 668_level_1,Unnamed: 669_level_1,Unnamed: 670_level_1,Unnamed: 671_level_1,Unnamed: 672_level_1,Unnamed: 673_level_1,Unnamed: 674_level_1,Unnamed: 675_level_1,Unnamed: 676_level_1,Unnamed: 677_level_1,Unnamed: 678_level_1,Unnamed: 679_level_1,Unnamed: 680_level_1,Unnamed: 681_level_1,Unnamed: 682_level_1,Unnamed: 683_level_1,Unnamed: 684_level_1,Unnamed: 685_level_1,Unnamed: 686_level_1,Unnamed: 687_level_1,Unnamed: 688_level_1,Unnamed: 689_level_1,Unnamed: 690_level_1,Unnamed: 691_level_1,Unnamed: 692_level_1,Unnamed: 693_level_1,Unnamed: 694_level_1,Unnamed: 695_level_1,Unnamed: 696_level_1,Unnamed: 697_level_1,Unnamed: 698_level_1,Unnamed: 699_level_1,Unnamed: 700_level_1,Unnamed: 701_level_1,Unnamed: 702_level_1,Unnamed: 703_level_1,Unnamed: 704_level_1,Unnamed: 705_level_1,Unnamed: 706_level_1,Unnamed: 707_level_1,Unnamed: 708_level_1,Unnamed: 709_level_1,Unnamed: 710_level_1,Unnamed: 711_level_1,Unnamed: 712_level_1,Unnamed: 713_level_1,Unnamed: 714_level_1,Unnamed: 715_level_1,Unnamed: 716_level_1,Unnamed: 717_level_1,Unnamed: 718_level_1,Unnamed: 719_level_1,Unnamed: 720_level_1,Unnamed: 721_level_1,Unnamed: 722_level_1,Unnamed: 723_level_1,Unnamed: 724_level_1,Unnamed: 725_level_1,Unnamed: 726_level_1,Unnamed: 727_level_1,Unnamed: 728_level_1
MTSW4McQd7CbVtyjqoe9mw,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
CF33F8-E6oudUQ46HnavjQ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
k0hlBqXX-Bt0vf1op7Jr1w,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
bBDDEgkFA1Otx9Lfe7BZUQ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
eEOYSgkmpB90uNA7lDOMRA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Data cleaning and categorizing can also take some manual effort. In order to limit to cuisine categories, I manually looked through all the categories and selected only the ones that I deemed cuisine types. This includes international cuisines (e.g. `Afghan`, `Thai`) as well as American cuisines (e.g. `American (New)`, `Southern`, `Creole`). However, I wanted to see if a generative AI could do this task for me. These are some results.

In [7]:
# All the categories
list(df_cats.columns)

['Acai Bowls',
 'Accessories',
 'Accountants',
 'Active Life',
 'Acupuncture',
 'Adult',
 'Adult Education',
 'Adult Entertainment',
 'Advertising',
 'Afghan',
 'African',
 'Air Duct Cleaning',
 'Airlines',
 'Airport Lounges',
 'Airport Shuttles',
 'Airport Terminals',
 'Airports',
 'Amateur Sports Teams',
 'American (New)',
 'American (Traditional)',
 'Amusement Parks',
 'Animal Shelters',
 'Antiques',
 'Apartments',
 'Appliances',
 'Appliances & Repair',
 'Aquariums',
 'Arabic',
 'Arcades',
 'Argentine',
 'Armenian',
 'Art Classes',
 'Art Galleries',
 'Art Museums',
 'Art Tours',
 'Arts & Crafts',
 'Arts & Entertainment',
 'Asian Fusion',
 'Auction Houses',
 'Audio/Visual Equipment Rental',
 'Australian',
 'Austrian',
 'Auto Customization',
 'Auto Detailing',
 'Auto Glass Services',
 'Auto Loan Providers',
 'Auto Parts & Supplies',
 'Auto Repair',
 'Automotive',
 'Axe Throwing',
 'Bagels',
 'Bakeries',
 'Bangladeshi',
 'Banks & Credit Unions',
 'Bar Crawl',
 'Barbeque',
 'Barbers',
 

First, I gave ChatGPT this above list with this prompt:
`this is a list of string values surrounded by single quotes and separated by commas each on a single line. return the list in the same format only selecting the values that are national or regional cuisine types`

This is the list it returned:

'Acai Bowls',
'Afghan',
'African',
'Air Duct Cleaning',
'Airport Lounges',
'Airport Shuttles',
'Airport Terminals',
'Airports',
'American (New)',
'American (Traditional)',
'Amusement Parks',
'Antiques',
'Appliances',
'Appliances & Repair',
'Aquariums',
'Arabic',
'Argentine',
'Armenian',
'Asian Fusion',
'Australian',
'Austrian',
'Bagels',
'Bakeries',
'Bangladeshi',
'Barbeque',
'Basque',
'Belgian',
'Beverage Store',
'Bistros',
'Brazilian',
'British',
'Bubble Tea',
'Buffets',
'Burgers',
'Burmese',
'Cafes',
'Cajun/Creole',
'Calabrian',
'Cambodian',
'Canadian (New)',
'Caribbean',
'Caterers',
'Champagne Bars',
'Cheese Shops',
'Cheesesteaks',
'Chicken Shop',
'Chinese',
'Colombian',
'Comfort Food',
'Cooking Classes',
'Crepes',
'Cuban',
'Cupcakes',
'Czech',
'Delis',
'Desserts',
'Dim Sum',
'Diners',
'Donairs',
'Donburi',
'Doughnuts',
'Eastern European',
'Egyptian',
'Empanadas',
'Ethiopian',
'European',
'Falafel',
'Farmers Market',
'Fast Food',
'Filipino',
'Fish & Chips',
'Food',
'Food Banks',
'Food Court',
'Food Delivery Services',
'Food Stands',
'Food Tours',
'Food Trucks',
'French',
'Fruits & Veggies',
'German',
'Gluten-Free',
'Greek',
'Hakka',
'Halal',
'Hawaiian',
'Himalayan/Nepalese',
'Honduran',
'Hot Dogs',
'Hot Pot',
'Hungarian',
'Indian',
'Indonesian',
'Irish',
'Italian',
'Izakaya',
'Japanese',
'Japanese Curry',
'Juice Bars & Smoothies',
'Kebab',
'Korean',
'Kosher',
'Laotian',
'Latin American',
'Lebanese',
'Malaysian',
'Mediterranean',
'Mexican',
'Middle Eastern',
'Modern European',
'Mongolian',
'Moroccan',
'New Mexican Cuisine',
'Nicaraguan',
'Noodles',
'Pakistani',
'Pan Asian',
'Pancakes',
'Peruvian',
'Pizza',
'Polish',
'Portuguese',
'Poutineries',
'Ramen',
'Russian',
'Salad',
'Salvadoran',
'Sandwiches',
'Scandinavian',
'Seafood',
'Senegalese',
'Serbo Croatian',
'Singaporean',
'Slovakian',
'Soul Food',
'Soup',
'South African',
'Southern',
'Spanish',
'Sri Lankan',
'Sushi Bars',
'Syrian',
'Szechuan',
'Tacos',
'Taiwanese',
'Tapas Bars',
'Tapas/Small Plates',
'Tea Rooms',
'Tex-Mex',
'Thai',
'Trattorie',
'Turkish',
'Ukrainian',
'Vegan',
'Vegetarian',
'Venezuelan',
'Vietnamese',
'Waffles'

And it gave me this response:

`These are the cuisine-related categories from the original list. If you have any specific criteria for further filtering or adjustments, please let me know!`

You can see that it did filter out some non-cuisine items, but it still retained some non-cuisine items ('Air Duct Cleaning') and didn't filter to only national and regional cuisines, as I had asked it to.

So I asked it to refine the sample with this:
 `please return the list with only national or regional cuisine types, not with a type of food; for example, 'Thai' and 'Senegalese' are national cuisines, 'Southern' and 'Tex-Mex' are regional cuisines, but 'Waffles', 'Pizza' and 'Vegan' are types of food, not a national or regional cuisine.`
 
 This did improve the response, with the following list; let's compare it to the one I manually made:

In [8]:
chatgpt_output = [
    'Afghan',
'African',
'Argentine',
'Armenian',
'Asian Fusion',
'Australian',
'Austrian',
'Bangladeshi',
'Belgian',
'Brazilian',
'British',
'Cajun/Creole',
'Calabrian',
'Cambodian',
'Canadian (New)',
'Caribbean',
'Colombian',
'Cuban',
'Czech',
'Dominican',
'Eastern European',
'Egyptian',
'Ethiopian',
'Filipino',
'French',
'German',
'Greek',
'Hakka',
'Hawaiian',
'Himalayan/Nepalese',
'Honduran',
'Indian',
'Indonesian',
'Irish',
'Italian',
'Jamaican',
'Japanese',
'Kebab',
'Korean',
'Laotian',
'Latin American',
'Lebanese',
'Malaysian',
'Mediterranean',
'Mexican',
'Middle Eastern',
'Mongolian',
'Moroccan',
'New Mexican Cuisine',
'Nicaraguan',
'Pakistani',
'Pan Asian',
'Peruvian',
'Polish',
'Portuguese',
'Puerto Rican',
'Scandinavian',
'Senegalese',
'Serbo Croatian',
'Singaporean',
'Slovakian',
'Southern',
'Spanish',
'Sri Lankan',
'Syrian',
'Taiwanese',
'Tex-Mex',
'Thai',
'Turkish',
'Ukrainian',
'Venezuelan',
'Vietnamese'
]

In [9]:
# This is the list I manually made of all the cuisine categories
nat_cat = [
    'Afghan',
    'African',
    'American (New)',
    'American (Traditional)',
    'Arabic',
    'Argentine',
    'Armenian',
    'Asian Fusion',
    'Australian',
    'Austrian',
    'Bangladeshi',
    'Basque',
    'Belgian',
    'Brazilian',
    'British',
    'Burmese',
    'Cajun/Creole',
    'Calabrian',
    'Cambodian',
    'Canadian (New)',
    'Cantonese',
    'Caribbean',
    'Chinese',
    'Colombian',
    'Cuban',
    'Czech',
    'Dominican',
    'Eastern European',
    'Egyptian',
    'Ethiopian',
    'Filipino',
    'French',
    'Fuzhou',
    'Georgian',
    'German',
    'Greek',
    'Guamanian',
    'Hainan',
    'Haitian',
    'Hakka',
    'Hawaiian',
    'Himalayan/Nepalese',
    'Honduran',
    'Hungarian',
    'Iberian',
    'Indian',
    'Indonesian',
    'Irish',
    'Israeli',
    'Italian',
    'Japanese',
    'Korean',
    'Laotian',
    'Latin American',
    'Lebanese',
    'Malaysian',
    'Mexican',
    'Middle Eastern',
    'Modern European',
    'Mongolian',
    'Moroccan',
    'New Mexican Cuisine',
    'Nicaraguan',
    'Pakistani',
    'Pan Asian',
    'Persian/Iranian',
    'Peruvian',
    'Polish',
    'Portuguese',
    'Puerto Rican',
    'Roman',
    'Russian',
    'Salvadoran',
    'Sardinian',
    'Scandinavian',
    'Scottish',
    'Senegalese',
    'Serbo Croatian',
    'Shanghainese',
    'Sicilian',
    'Singaporean',
    'Somali',
    'South African',
    'Southern',
    'Spanish',
    'Sri Lankan',
    'Syrian',
    'Szechuan',
    'Taiwanese',
    'Tex-Mex',
    'Thai',
    'Trinidadian',
    'Tuscan',
    'Ukrainian',
    'Uzbek',
    'Venezuelan',
    'Vietnamese',
]

In [10]:
# These are the items that ChatGPT missed but I had in my list
missing_items = [x for x in nat_cat if x not in chatgpt_output]
print(missing_items)

['American (New)', 'American (Traditional)', 'Arabic', 'Basque', 'Burmese', 'Cantonese', 'Chinese', 'Fuzhou', 'Georgian', 'Guamanian', 'Hainan', 'Haitian', 'Hungarian', 'Iberian', 'Israeli', 'Modern European', 'Persian/Iranian', 'Roman', 'Russian', 'Salvadoran', 'Sardinian', 'Scottish', 'Shanghainese', 'Sicilian', 'Somali', 'South African', 'Szechuan', 'Trinidadian', 'Tuscan', 'Uzbek']


In [11]:
# These are the items that were in ChatGPTs list but not in my list
missing_items2 = [x for x in chatgpt_output if x not in nat_cat]
print(missing_items2)

['Jamaican', 'Kebab', 'Mediterranean', 'Slovakian', 'Turkish']


So, ChatGPT did help me with a few cuisines I had missed when I manually made my list ... or did it? Two of these types weren't even in the original list I gave it (Jamaican and Slovakian)! It also missed many more cuisines that I had added. So just use care when using it for tasks like these.

In [12]:
# This is the list I manually made with the two missing types added
nat_cat = [
    'Afghan',
    'African',
    'American (New)',
    'American (Traditional)',
    'Arabic',
    'Argentine',
    'Armenian',
    'Asian Fusion',
    'Australian',
    'Austrian',
    'Bangladeshi',
    'Basque',
    'Belgian',
    'Brazilian',
    'British',
    'Burmese',
    'Cajun/Creole',
    'Calabrian',
    'Cambodian',
    'Canadian (New)',
    'Cantonese',
    'Caribbean',
    'Chinese',
    'Colombian',
    'Cuban',
    'Czech',
    'Dominican',
    'Eastern European',
    'Egyptian',
    'Ethiopian',
    'Filipino',
    'French',
    'Fuzhou',
    'Georgian',
    'German',
    'Greek',
    'Guamanian',
    'Hainan',
    'Haitian',
    'Hakka',
    'Hawaiian',
    'Himalayan/Nepalese',
    'Honduran',
    'Hungarian',
    'Iberian',
    'Indian',
    'Indonesian',
    'Irish',
    'Israeli',
    'Italian',
    'Japanese',
    'Korean',
    'Laotian',
    'Latin American',
    'Lebanese',
    'Malaysian',
    'Mediterranean',
    'Mexican',
    'Middle Eastern',
    'Modern European',
    'Mongolian',
    'Moroccan',
    'New Mexican Cuisine',
    'Nicaraguan',
    'Pakistani',
    'Pan Asian',
    'Persian/Iranian',
    'Peruvian',
    'Polish',
    'Portuguese',
    'Puerto Rican',
    'Roman',
    'Russian',
    'Salvadoran',
    'Sardinian',
    'Scandinavian',
    'Scottish',
    'Senegalese',
    'Serbo Croatian',
    'Shanghainese',
    'Sicilian',
    'Singaporean',
    'Somali',
    'South African',
    'Southern',
    'Spanish',
    'Sri Lankan',
    'Syrian',
    'Szechuan',
    'Taiwanese',
    'Tex-Mex',
    'Thai',
    'Trinidadian',
    'Turkish',
    'Tuscan',
    'Ukrainian',
    'Uzbek',
    'Venezuelan',
    'Vietnamese',
]

In [13]:
df_cats = df_cats[nat_cat].copy()

In [14]:
# combine two different American categories into one
df_cats['American'] = 0
df_cats.loc[df_cats['American (New)'] + df_cats['American (Traditional)'] > 0, 'American'] = 1
df_cats.drop(['American (New)', 'American (Traditional)'], axis=1, inplace=True)
df_cats['sum'] = df_cats.sum(axis=1)

In [16]:
# create a dictionary of the number of each cuisine type across all restaurants
# and locations; this is used to rank the categories for each business and select the least
# common category
frequencies_dict = df_cats.sum().to_dict()
frequencies_dict

{'Afghan': 44,
 'African': 115,
 'Arabic': 36,
 'Argentine': 34,
 'Armenian': 15,
 'Asian Fusion': 1547,
 'Australian': 13,
 'Austrian': 5,
 'Bangladeshi': 14,
 'Basque': 39,
 'Belgian': 24,
 'Brazilian': 77,
 'British': 75,
 'Burmese': 18,
 'Cajun/Creole': 923,
 'Calabrian': 1,
 'Cambodian': 18,
 'Canadian (New)': 261,
 'Cantonese': 93,
 'Caribbean': 590,
 'Chinese': 3169,
 'Colombian': 67,
 'Cuban': 283,
 'Czech': 6,
 'Dominican': 44,
 'Eastern European': 2,
 'Egyptian': 16,
 'Ethiopian': 67,
 'Filipino': 108,
 'French': 503,
 'Fuzhou': 1,
 'Georgian': 4,
 'German': 125,
 'Greek': 779,
 'Guamanian': 1,
 'Hainan': 1,
 'Haitian': 20,
 'Hakka': 2,
 'Hawaiian': 185,
 'Himalayan/Nepalese': 21,
 'Honduran': 24,
 'Hungarian': 6,
 'Iberian': 10,
 'Indian': 838,
 'Indonesian': 23,
 'Irish': 215,
 'Israeli': 2,
 'Italian': 4573,
 'Japanese': 1830,
 'Korean': 497,
 'Laotian': 48,
 'Latin American': 810,
 'Lebanese': 83,
 'Malaysian': 30,
 'Mediterranean': 1263,
 'Mexican': 4600,
 'Middle Easter

In [17]:
def rank_cat(x):
    """ Takes in all categories of a restaurant and returns the 
    category with the lowest frequency"""
    y = []
    for cat in x:
        if "American" in cat:
            y.append(("American", frequencies_dict["American"]))
        if cat in frequencies_dict:
            y.append((cat, frequencies_dict[cat]))
    if y == []:
        return ''
    y.sort(key = lambda x: x[1])
    return y[0][0]

In [18]:
# create a field called single_cat that is the least common category for that business
raw_df['single_cat'] = raw_df['cat'].apply(rank_cat)
raw_df = raw_df.loc[raw_df['single_cat'] != '']
raw_df.shape

(32532, 17)

In [19]:
# drop columns we don't need
df = raw_df.drop(
    ['name', 'address', 'city', 'state','postal_code',
     'latitude', 'longitude', 'stars', 'review_count',
    'is_open', 'attributes', 'categories',
     'hours', 'is_restaurant', 'cat'], axis=1)

In [20]:
df.head()

Unnamed: 0_level_0,metro,single_cat
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1
k0hlBqXX-Bt0vf1op7Jr1w,St. Louis,Greek
eEOYSgkmpB90uNA7lDOMRA,Tampa-St. Petersburg,Vietnamese
il_Ro8jwPlHresjw9EGmBg,Indianapolis,American
0bPLkL0QhhPO5kt1_EXmNQ,Tampa-St. Petersburg,Italian
MUTTqe8uqyMdBl186RmNeA,Philadelphia,Japanese


### Richness
To calculate richness, we can just count the number of unique `single_cat` types in each metro

In [21]:
richness = df.groupby('metro').nunique().reset_index()
richness.columns = ['Metro', 'Richness']

In [22]:
richness

Unnamed: 0,Metro,Richness
0,Boise,45
1,Edmonton,62
2,Indianapolis,62
3,Nashville,65
4,New Orleans,62
5,Philadelphia,88
6,Reno,44
7,Santa Barbara,50
8,St. Louis,69
9,Tampa-St. Petersburg,77


### Shannon's H and Simpson's D

To calculate the diversity indices, we first need to create dfs of the frequency of each cuisine in each metro.

In [23]:
# Create a df with each cuisine as columns, each metro as rows,
# and the number of restaurants as the values
df = df.merge(pd.get_dummies(df['single_cat']), left_index=True, right_index=True)
df.drop('single_cat', axis=1, inplace=True)
df_gb = df.groupby('metro').sum()
df_gb

Unnamed: 0_level_0,Afghan,African,American,Arabic,Argentine,Armenian,Asian Fusion,Australian,Austrian,Bangladeshi,Basque,Belgian,Brazilian,British,Burmese,Cajun/Creole,Calabrian,Cambodian,Canadian (New),Cantonese,Caribbean,Chinese,Colombian,Cuban,Czech,Dominican,Eastern European,Egyptian,Ethiopian,Filipino,French,Fuzhou,Georgian,German,Greek,Guamanian,Hainan,Haitian,Hakka,Hawaiian,Himalayan/Nepalese,Honduran,Hungarian,Iberian,Indian,Indonesian,Irish,Israeli,Italian,Japanese,Korean,Laotian,Latin American,Lebanese,Malaysian,Mediterranean,Mexican,Middle Eastern,Modern European,Mongolian,Moroccan,New Mexican Cuisine,Nicaraguan,Pakistani,Pan Asian,Persian/Iranian,Peruvian,Polish,Portuguese,Puerto Rican,Roman,Russian,Salvadoran,Sardinian,Scandinavian,Scottish,Senegalese,Serbo Croatian,Shanghainese,Sicilian,Singaporean,Somali,South African,Southern,Spanish,Sri Lankan,Syrian,Szechuan,Taiwanese,Tex-Mex,Thai,Trinidadian,Turkish,Tuscan,Ukrainian,Uzbek,Venezuelan,Vietnamese
metro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1
Boise,3.0,1.0,270.0,3.0,3.0,0.0,25.0,0.0,0.0,0.0,6.0,0.0,3.0,1.0,0.0,5.0,0.0,0.0,0.0,1.0,4.0,40.0,0.0,1.0,0.0,0.0,1.0,1.0,2.0,0.0,8.0,0.0,0.0,4.0,12.0,0.0,0.0,0.0,0.0,21.0,2.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,63.0,35.0,16.0,0.0,3.0,0.0,0.0,13.0,120.0,5.0,4.0,9.0,0.0,2.0,0.0,3.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,12.0,1.0,0.0,0.0,0.0,0.0,30.0,16.0,0.0,1.0,0.0,0.0,0.0,0.0,21.0
Edmonton,3.0,5.0,135.0,4.0,1.0,0.0,50.0,0.0,0.0,0.0,0.0,0.0,4.0,9.0,0.0,10.0,0.0,1.0,253.0,4.0,12.0,137.0,0.0,3.0,1.0,0.0,0.0,0.0,10.0,22.0,22.0,0.0,0.0,6.0,25.0,0.0,0.0,0.0,2.0,4.0,2.0,0.0,0.0,1.0,78.0,3.0,6.0,0.0,161.0,104.0,55.0,4.0,10.0,16.0,2.0,22.0,42.0,20.0,7.0,4.0,0.0,0.0,0.0,16.0,2.0,6.0,0.0,3.0,7.0,0.0,0.0,1.0,3.0,0.0,4.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.0,7.0,1.0,2.0,0.0,1.0,15.0,4.0,37.0,0.0,5.0,0.0,10.0,0.0,2.0,133.0
Indianapolis,3.0,10.0,893.0,2.0,2.0,0.0,65.0,0.0,0.0,0.0,2.0,2.0,6.0,9.0,7.0,50.0,0.0,0.0,0.0,5.0,16.0,154.0,0.0,5.0,0.0,3.0,1.0,1.0,5.0,5.0,15.0,0.0,0.0,7.0,41.0,0.0,0.0,2.0,0.0,10.0,1.0,2.0,0.0,0.0,51.0,1.0,18.0,0.0,226.0,75.0,23.0,2.0,30.0,0.0,1.0,29.0,359.0,28.0,2.0,2.0,4.0,10.0,1.0,14.0,5.0,0.0,7.0,0.0,0.0,3.0,0.0,1.0,7.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,56.0,4.0,0.0,0.0,0.0,5.0,102.0,53.0,0.0,4.0,0.0,0.0,0.0,5.0,29.0
Nashville,0.0,5.0,883.0,0.0,3.0,2.0,57.0,4.0,1.0,2.0,0.0,0.0,6.0,2.0,0.0,66.0,0.0,1.0,0.0,4.0,25.0,118.0,4.0,5.0,0.0,2.0,0.0,3.0,10.0,2.0,36.0,0.0,0.0,9.0,89.0,0.0,0.0,0.0,0.0,15.0,2.0,3.0,0.0,0.0,34.0,0.0,11.0,0.0,233.0,97.0,30.0,12.0,24.0,4.0,1.0,36.0,364.0,36.0,6.0,4.0,1.0,15.0,0.0,7.0,3.0,5.0,5.0,0.0,1.0,3.0,0.0,0.0,12.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,2.0,2.0,280.0,6.0,0.0,1.0,6.0,2.0,116.0,75.0,0.0,9.0,0.0,0.0,1.0,5.0,31.0
New Orleans,0.0,4.0,541.0,4.0,1.0,1.0,51.0,0.0,0.0,0.0,5.0,1.0,9.0,1.0,2.0,470.0,0.0,0.0,0.0,6.0,24.0,94.0,4.0,17.0,0.0,2.0,0.0,2.0,3.0,7.0,86.0,0.0,0.0,3.0,14.0,0.0,0.0,4.0,0.0,11.0,0.0,6.0,0.0,0.0,15.0,0.0,5.0,0.0,186.0,65.0,16.0,2.0,38.0,15.0,0.0,27.0,146.0,41.0,8.0,0.0,3.0,7.0,4.0,9.0,1.0,2.0,6.0,0.0,1.0,0.0,1.0,1.0,4.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,96.0,10.0,0.0,0.0,1.0,3.0,43.0,36.0,3.0,6.0,0.0,1.0,0.0,4.0,103.0
Philadelphia,19.0,37.0,2990.0,6.0,11.0,2.0,291.0,4.0,2.0,8.0,4.0,14.0,17.0,17.0,7.0,88.0,0.0,16.0,0.0,43.0,144.0,786.0,14.0,20.0,1.0,17.0,0.0,6.0,19.0,22.0,156.0,1.0,4.0,24.0,144.0,0.0,1.0,7.0,0.0,37.0,5.0,6.0,4.0,9.0,255.0,16.0,83.0,2.0,1885.0,427.0,185.0,17.0,83.0,12.0,11.0,189.0,682.0,140.0,32.0,3.0,27.0,25.0,0.0,96.0,28.0,11.0,18.0,22.0,12.0,26.0,0.0,11.0,16.0,2.0,1.0,1.0,3.0,0.0,12.0,8.0,5.0,0.0,1.0,118.0,59.0,0.0,2.0,46.0,29.0,187.0,255.0,9.0,32.0,2.0,3.0,7.0,7.0,224.0
Reno,1.0,0.0,306.0,0.0,0.0,0.0,36.0,0.0,0.0,1.0,3.0,0.0,1.0,2.0,0.0,4.0,0.0,0.0,0.0,3.0,0.0,62.0,0.0,0.0,1.0,0.0,0.0,0.0,3.0,10.0,8.0,0.0,0.0,6.0,16.0,0.0,0.0,0.0,0.0,18.0,1.0,0.0,0.0,0.0,11.0,0.0,3.0,0.0,94.0,34.0,20.0,0.0,7.0,0.0,0.0,7.0,226.0,5.0,2.0,3.0,2.0,4.0,0.0,5.0,3.0,2.0,2.0,0.0,0.0,0.0,0.0,2.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,10.0,0.0,0.0,0.0,1.0,3.0,21.0,27.0,0.0,0.0,0.0,0.0,0.0,0.0,31.0
Santa Barbara,0.0,0.0,226.0,1.0,4.0,0.0,25.0,2.0,0.0,0.0,1.0,2.0,1.0,2.0,0.0,8.0,0.0,0.0,0.0,2.0,1.0,26.0,0.0,3.0,0.0,0.0,0.0,0.0,2.0,0.0,26.0,0.0,0.0,3.0,6.0,0.0,0.0,0.0,0.0,7.0,1.0,0.0,0.0,0.0,6.0,1.0,1.0,0.0,72.0,32.0,13.0,0.0,6.0,1.0,0.0,12.0,163.0,3.0,2.0,1.0,2.0,4.0,0.0,5.0,1.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,5.0,0.0,0.0,7.0,5.0,17.0,17.0,0.0,1.0,1.0,0.0,0.0,0.0,10.0
St. Louis,8.0,7.0,1237.0,5.0,3.0,5.0,62.0,0.0,0.0,1.0,4.0,2.0,6.0,7.0,0.0,58.0,1.0,0.0,0.0,4.0,13.0,278.0,1.0,5.0,2.0,3.0,0.0,0.0,8.0,13.0,33.0,0.0,0.0,16.0,59.0,0.0,0.0,0.0,0.0,5.0,4.0,3.0,0.0,0.0,29.0,1.0,30.0,0.0,403.0,77.0,55.0,2.0,18.0,7.0,1.0,46.0,327.0,26.0,9.0,3.0,2.0,8.0,1.0,23.0,6.0,4.0,4.0,1.0,1.0,0.0,0.0,4.0,0.0,0.0,1.0,1.0,1.0,0.0,2.0,1.0,0.0,0.0,0.0,61.0,1.0,0.0,3.0,7.0,4.0,109.0,58.0,0.0,7.0,2.0,0.0,0.0,0.0,50.0
Tampa-St. Petersburg,2.0,6.0,1731.0,6.0,6.0,4.0,144.0,3.0,2.0,2.0,9.0,3.0,11.0,23.0,2.0,68.0,0.0,0.0,0.0,11.0,98.0,289.0,37.0,191.0,1.0,17.0,0.0,2.0,3.0,17.0,39.0,0.0,0.0,28.0,219.0,0.0,0.0,7.0,0.0,40.0,0.0,1.0,1.0,0.0,75.0,0.0,43.0,0.0,679.0,164.0,46.0,9.0,77.0,16.0,3.0,79.0,444.0,49.0,18.0,4.0,5.0,28.0,0.0,23.0,13.0,3.0,25.0,9.0,2.0,55.0,0.0,5.0,3.0,1.0,2.0,1.0,2.0,0.0,1.0,4.0,0.0,0.0,0.0,137.0,43.0,0.0,2.0,9.0,4.0,171.0,189.0,5.0,12.0,1.0,2.0,0.0,22.0,104.0


In [24]:
# divide by the total number of restaurants in each metro to get the % frequency
df_freq = df_gb.div(df_gb.sum(axis=1), axis=0)
df_freq

Unnamed: 0_level_0,Afghan,African,American,Arabic,Argentine,Armenian,Asian Fusion,Australian,Austrian,Bangladeshi,Basque,Belgian,Brazilian,British,Burmese,Cajun/Creole,Calabrian,Cambodian,Canadian (New),Cantonese,Caribbean,Chinese,Colombian,Cuban,Czech,Dominican,Eastern European,Egyptian,Ethiopian,Filipino,French,Fuzhou,Georgian,German,Greek,Guamanian,Hainan,Haitian,Hakka,Hawaiian,Himalayan/Nepalese,Honduran,Hungarian,Iberian,Indian,Indonesian,Irish,Israeli,Italian,Japanese,Korean,Laotian,Latin American,Lebanese,Malaysian,Mediterranean,Mexican,Middle Eastern,Modern European,Mongolian,Moroccan,New Mexican Cuisine,Nicaraguan,Pakistani,Pan Asian,Persian/Iranian,Peruvian,Polish,Portuguese,Puerto Rican,Roman,Russian,Salvadoran,Sardinian,Scandinavian,Scottish,Senegalese,Serbo Croatian,Shanghainese,Sicilian,Singaporean,Somali,South African,Southern,Spanish,Sri Lankan,Syrian,Szechuan,Taiwanese,Tex-Mex,Thai,Trinidadian,Turkish,Tuscan,Ukrainian,Uzbek,Venezuelan,Vietnamese
metro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1
Boise,0.003836,0.001279,0.345269,0.003836,0.003836,0.0,0.031969,0.0,0.0,0.0,0.007673,0.0,0.003836,0.001279,0.0,0.006394,0.0,0.0,0.0,0.001279,0.005115,0.051151,0.0,0.001279,0.0,0.0,0.001279,0.001279,0.002558,0.0,0.01023,0.0,0.0,0.005115,0.015345,0.0,0.0,0.0,0.0,0.026854,0.002558,0.0,0.0,0.0,0.006394,0.0,0.0,0.0,0.080563,0.044757,0.02046,0.0,0.003836,0.0,0.0,0.016624,0.153453,0.006394,0.005115,0.011509,0.0,0.002558,0.0,0.003836,0.001279,0.001279,0.0,0.0,0.0,0.001279,0.0,0.002558,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001279,0.0,0.0,0.0,0.015345,0.001279,0.0,0.0,0.0,0.0,0.038363,0.02046,0.0,0.001279,0.0,0.0,0.0,0.0,0.026854
Edmonton,0.00197,0.003283,0.088641,0.002626,0.000657,0.0,0.03283,0.0,0.0,0.0,0.0,0.0,0.002626,0.005909,0.0,0.006566,0.0,0.000657,0.16612,0.002626,0.007879,0.089954,0.0,0.00197,0.000657,0.0,0.0,0.0,0.006566,0.014445,0.014445,0.0,0.0,0.00394,0.016415,0.0,0.0,0.0,0.001313,0.002626,0.001313,0.0,0.0,0.000657,0.051215,0.00197,0.00394,0.0,0.105712,0.068286,0.036113,0.002626,0.006566,0.010506,0.001313,0.014445,0.027577,0.013132,0.004596,0.002626,0.0,0.0,0.0,0.010506,0.001313,0.00394,0.0,0.00197,0.004596,0.0,0.0,0.000657,0.00197,0.0,0.002626,0.000657,0.0,0.0,0.0,0.0,0.000657,0.0,0.001313,0.004596,0.000657,0.001313,0.0,0.000657,0.009849,0.002626,0.024294,0.0,0.003283,0.0,0.006566,0.0,0.001313,0.087328
Indianapolis,0.001208,0.004026,0.359501,0.000805,0.000805,0.0,0.026167,0.0,0.0,0.0,0.000805,0.000805,0.002415,0.003623,0.002818,0.020129,0.0,0.0,0.0,0.002013,0.006441,0.061997,0.0,0.002013,0.0,0.001208,0.000403,0.000403,0.002013,0.002013,0.006039,0.0,0.0,0.002818,0.016506,0.0,0.0,0.000805,0.0,0.004026,0.000403,0.000805,0.0,0.0,0.020531,0.000403,0.007246,0.0,0.090982,0.030193,0.009259,0.000805,0.012077,0.0,0.000403,0.011675,0.144525,0.011272,0.000805,0.000805,0.00161,0.004026,0.000403,0.005636,0.002013,0.0,0.002818,0.0,0.0,0.001208,0.0,0.000403,0.002818,0.0,0.000403,0.000403,0.0,0.0,0.0,0.0,0.0,0.0,0.000403,0.022544,0.00161,0.0,0.0,0.0,0.002013,0.041063,0.021337,0.0,0.00161,0.0,0.0,0.0,0.002013,0.011675
Nashville,0.0,0.001759,0.310697,0.0,0.001056,0.000704,0.020056,0.001407,0.000352,0.000704,0.0,0.0,0.002111,0.000704,0.0,0.023223,0.0,0.000352,0.0,0.001407,0.008797,0.04152,0.001407,0.001759,0.0,0.000704,0.0,0.001056,0.003519,0.000704,0.012667,0.0,0.0,0.003167,0.031316,0.0,0.0,0.0,0.0,0.005278,0.000704,0.001056,0.0,0.0,0.011963,0.0,0.003871,0.0,0.081985,0.034131,0.010556,0.004222,0.008445,0.001407,0.000352,0.012667,0.128079,0.012667,0.002111,0.001407,0.000352,0.005278,0.0,0.002463,0.001056,0.001759,0.001759,0.0,0.000352,0.001056,0.0,0.0,0.004222,0.0,0.0,0.0,0.0,0.0,0.0,0.000704,0.000352,0.000704,0.000704,0.098522,0.002111,0.0,0.000352,0.002111,0.000704,0.040816,0.02639,0.0,0.003167,0.0,0.0,0.000352,0.001759,0.010908
New Orleans,0.0,0.001752,0.236969,0.001752,0.000438,0.000438,0.022339,0.0,0.0,0.0,0.00219,0.000438,0.003942,0.000438,0.000876,0.205869,0.0,0.0,0.0,0.002628,0.010512,0.041174,0.001752,0.007446,0.0,0.000876,0.0,0.000876,0.001314,0.003066,0.03767,0.0,0.0,0.001314,0.006132,0.0,0.0,0.001752,0.0,0.004818,0.0,0.002628,0.0,0.0,0.00657,0.0,0.00219,0.0,0.081472,0.028471,0.007008,0.000876,0.016645,0.00657,0.0,0.011827,0.063951,0.017959,0.003504,0.0,0.001314,0.003066,0.001752,0.003942,0.000438,0.000876,0.002628,0.0,0.000438,0.0,0.000438,0.000438,0.001752,0.0,0.0,0.0,0.0,0.0,0.000438,0.0,0.0,0.0,0.0,0.04205,0.00438,0.0,0.0,0.000438,0.001314,0.018835,0.015769,0.001314,0.002628,0.0,0.000438,0.0,0.001752,0.045116
Philadelphia,0.001839,0.003582,0.289448,0.000581,0.001065,0.000194,0.02817,0.000387,0.000194,0.000774,0.000387,0.001355,0.001646,0.001646,0.000678,0.008519,0.0,0.001549,0.0,0.004163,0.01394,0.076089,0.001355,0.001936,9.7e-05,0.001646,0.0,0.000581,0.001839,0.00213,0.015102,9.7e-05,0.000387,0.002323,0.01394,0.0,9.7e-05,0.000678,0.0,0.003582,0.000484,0.000581,0.000387,0.000871,0.024685,0.001549,0.008035,0.000194,0.182478,0.041336,0.017909,0.001646,0.008035,0.001162,0.001065,0.018296,0.066021,0.013553,0.003098,0.00029,0.002614,0.00242,0.0,0.009293,0.002711,0.001065,0.001742,0.00213,0.001162,0.002517,0.0,0.001065,0.001549,0.000194,9.7e-05,9.7e-05,0.00029,0.0,0.001162,0.000774,0.000484,0.0,9.7e-05,0.011423,0.005712,0.0,0.000194,0.004453,0.002807,0.018103,0.024685,0.000871,0.003098,0.000194,0.00029,0.000678,0.000678,0.021684
Reno,0.000984,0.0,0.301181,0.0,0.0,0.0,0.035433,0.0,0.0,0.000984,0.002953,0.0,0.000984,0.001969,0.0,0.003937,0.0,0.0,0.0,0.002953,0.0,0.061024,0.0,0.0,0.000984,0.0,0.0,0.0,0.002953,0.009843,0.007874,0.0,0.0,0.005906,0.015748,0.0,0.0,0.0,0.0,0.017717,0.000984,0.0,0.0,0.0,0.010827,0.0,0.002953,0.0,0.09252,0.033465,0.019685,0.0,0.00689,0.0,0.0,0.00689,0.222441,0.004921,0.001969,0.002953,0.001969,0.003937,0.0,0.004921,0.002953,0.001969,0.001969,0.0,0.0,0.0,0.0,0.001969,0.007874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000984,0.0,0.0,0.009843,0.0,0.0,0.0,0.000984,0.002953,0.020669,0.026575,0.0,0.0,0.0,0.0,0.0,0.0,0.030512
Santa Barbara,0.0,0.0,0.303356,0.001342,0.005369,0.0,0.033557,0.002685,0.0,0.0,0.001342,0.002685,0.001342,0.002685,0.0,0.010738,0.0,0.0,0.0,0.002685,0.001342,0.034899,0.0,0.004027,0.0,0.0,0.0,0.0,0.002685,0.0,0.034899,0.0,0.0,0.004027,0.008054,0.0,0.0,0.0,0.0,0.009396,0.001342,0.0,0.0,0.0,0.008054,0.001342,0.001342,0.0,0.096644,0.042953,0.01745,0.0,0.008054,0.001342,0.0,0.016107,0.218792,0.004027,0.002685,0.001342,0.002685,0.005369,0.0,0.006711,0.001342,0.001342,0.004027,0.0,0.0,0.0,0.0,0.0,0.001342,0.0,0.001342,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002685,0.006711,0.0,0.0,0.009396,0.006711,0.022819,0.022819,0.0,0.001342,0.001342,0.0,0.0,0.0,0.013423
St. Louis,0.002462,0.002155,0.380733,0.001539,0.000923,0.001539,0.019083,0.0,0.0,0.000308,0.001231,0.000616,0.001847,0.002155,0.0,0.017852,0.000308,0.0,0.0,0.001231,0.004001,0.085565,0.000308,0.001539,0.000616,0.000923,0.0,0.0,0.002462,0.004001,0.010157,0.0,0.0,0.004925,0.018159,0.0,0.0,0.0,0.0,0.001539,0.001231,0.000923,0.0,0.0,0.008926,0.000308,0.009234,0.0,0.124038,0.0237,0.016928,0.000616,0.00554,0.002155,0.000308,0.014158,0.100646,0.008002,0.00277,0.000923,0.000616,0.002462,0.000308,0.007079,0.001847,0.001231,0.001231,0.000308,0.000308,0.0,0.0,0.001231,0.0,0.0,0.000308,0.000308,0.000308,0.0,0.000616,0.000308,0.0,0.0,0.0,0.018775,0.000308,0.0,0.000923,0.002155,0.001231,0.033549,0.017852,0.0,0.002155,0.000616,0.0,0.0,0.0,0.015389
Tampa-St. Petersburg,0.000356,0.001069,0.308446,0.001069,0.001069,0.000713,0.025659,0.000535,0.000356,0.000356,0.001604,0.000535,0.00196,0.004098,0.000356,0.012117,0.0,0.0,0.0,0.00196,0.017463,0.051497,0.006593,0.034034,0.000178,0.003029,0.0,0.000356,0.000535,0.003029,0.006949,0.0,0.0,0.004989,0.039024,0.0,0.0,0.001247,0.0,0.007128,0.0,0.000178,0.000178,0.0,0.013364,0.0,0.007662,0.0,0.120991,0.029223,0.008197,0.001604,0.013721,0.002851,0.000535,0.014077,0.079116,0.008731,0.003207,0.000713,0.000891,0.004989,0.0,0.004098,0.002316,0.000535,0.004455,0.001604,0.000356,0.0098,0.0,0.000891,0.000535,0.000178,0.000356,0.000178,0.000356,0.0,0.000178,0.000713,0.0,0.0,0.0,0.024412,0.007662,0.0,0.000356,0.001604,0.000713,0.03047,0.033678,0.000891,0.002138,0.000178,0.000356,0.0,0.00392,0.018532


In [164]:
def get_cum_freq_metro(df_freq, metro):
    df = df_freq.loc[[metro]].sort_values(
        axis=1,
        by=metro,
        ascending=False).T * 100
    df['cum_sum'] = df[metro].cumsum()
    df.reset_index(drop=True, inplace=True)
    df = df.drop(metro, axis=1)
    df.columns = [metro]
    df = df.reset_index()
    return df

In [145]:
comp = df_freq.loc[['Santa Barbara', 'Philadelphia']].sort_values(
    axis=1,
    by='Philadelphia',
    ascending=False) * 100
comp

Unnamed: 0_level_0,American,Italian,Chinese,Mexican,Japanese,Asian Fusion,Thai,Indian,Vietnamese,Mediterranean,Tex-Mex,Korean,French,Caribbean,Greek,Middle Eastern,Southern,Pakistani,Cajun/Creole,Irish,Latin American,Spanish,Szechuan,Cantonese,Hawaiian,African,Modern European,Turkish,Taiwanese,Pan Asian,Moroccan,Puerto Rican,New Mexican Cuisine,German,Filipino,Polish,Cuban,Afghan,Ethiopian,Peruvian,Laotian,Dominican,Brazilian,British,Salvadoran,Indonesian,Cambodian,Colombian,Belgian,Portuguese,Shanghainese,Lebanese,Persian/Iranian,Malaysian,Russian,Argentine,Trinidadian,Iberian,Sicilian,Bangladeshi,Venezuelan,Uzbek,Haitian,Burmese,Egyptian,Arabic,Honduran,Himalayan/Nepalese,Singaporean,Basque,Hungarian,Australian,Georgian,Mongolian,Senegalese,Ukrainian,Tuscan,Austrian,Syrian,Armenian,Israeli,Sardinian,Scottish,South African,Czech,Scandinavian,Hainan,Fuzhou,Guamanian,Nicaraguan,Hakka,Serbo Croatian,Roman,Somali,Canadian (New),Sri Lankan,Eastern European,Calabrian
metro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1
Santa Barbara,30.33557,9.66443,3.489933,21.879195,4.295302,3.355705,2.281879,0.805369,1.342282,1.610738,2.281879,1.744966,3.489933,0.134228,0.805369,0.402685,0.268456,0.671141,1.073826,0.134228,0.805369,0.671141,0.939597,0.268456,0.939597,0.0,0.268456,0.134228,0.671141,0.134228,0.268456,0.0,0.536913,0.402685,0.0,0.0,0.402685,0.0,0.268456,0.402685,0.0,0.0,0.134228,0.268456,0.134228,0.134228,0.0,0.0,0.268456,0.0,0.0,0.134228,0.134228,0.0,0.0,0.536913,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.134228,0.0,0.134228,0.0,0.134228,0.0,0.268456,0.0,0.134228,0.0,0.0,0.134228,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.134228,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Philadelphia,28.944821,18.247822,7.608906,6.60213,4.133591,2.817038,2.468538,2.468538,2.168441,1.829622,1.810261,1.7909,1.510165,1.393998,1.393998,1.355276,1.142304,0.929332,0.851888,0.803485,0.803485,0.571152,0.445305,0.416263,0.35818,0.35818,0.309777,0.309777,0.280736,0.271055,0.261375,0.251694,0.242014,0.232333,0.212972,0.212972,0.193611,0.18393,0.18393,0.17425,0.164569,0.164569,0.164569,0.164569,0.154889,0.154889,0.154889,0.135528,0.135528,0.116167,0.116167,0.116167,0.106486,0.106486,0.106486,0.106486,0.087125,0.087125,0.077444,0.077444,0.067764,0.067764,0.067764,0.067764,0.058083,0.058083,0.058083,0.048403,0.048403,0.038722,0.038722,0.038722,0.038722,0.029042,0.029042,0.029042,0.019361,0.019361,0.019361,0.019361,0.019361,0.019361,0.009681,0.009681,0.009681,0.009681,0.009681,0.009681,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [168]:
dfList = [get_cum_freq_metro(df_freq, metro) for metro in list(richness['Metro'])]

In [170]:
from functools import reduce
df_t = reduce(lambda x, y: x.merge(y, on='index'), dfList)
df_t

Unnamed: 0,index,Boise,Edmonton,Indianapolis,Nashville,New Orleans,Philadelphia,Reno,Santa Barbara,St. Louis,Tampa-St. Petersburg,Tucson
0,0,34.526854,16.61195,35.950081,31.069669,23.69689,28.944821,30.11811,30.33557,38.073253,30.844619,27.911164
1,1,49.872123,27.183191,50.402576,43.877551,44.283837,47.192643,52.362205,52.214765,50.47707,42.943692,55.762305
2,2,57.928389,36.178595,59.500805,53.729768,52.431012,54.801549,61.614173,61.879195,60.541705,50.85531,65.666267
3,3,63.043478,45.042679,65.700483,61.92822,58.826106,61.403679,67.716535,66.174497,69.098184,56.004989,71.308523
4,4,67.519182,53.775443,69.806763,66.080225,63.337714,65.53727,71.259843,69.66443,72.453062,59.907341,74.789916
5,5,71.355499,60.604071,72.826087,70.161858,67.542707,68.354308,74.606299,73.154362,74.823022,63.310763,78.031212
6,6,74.55243,65.725542,75.442834,73.574947,71.660096,70.822846,77.65748,76.510067,76.731302,66.678546,80.372149
7,7,77.237852,69.336835,77.697262,76.706545,75.42707,73.291384,80.314961,78.791946,78.608803,69.725588,82.292917
8,8,79.923274,72.619829,79.830918,79.345531,78.274201,75.459826,82.38189,81.073826,80.424746,72.647897,83.793517
9,9,81.969309,75.377544,81.884058,81.66784,80.508103,77.289448,84.350394,82.818792,82.209911,75.213828,85.294118


In [25]:
def shannon_H(columns):
    row_sum = 0
    for col in columns:
        if col > 0:
            row_sum += -(col*np.log(col))
    return row_sum

def simpson_D(columns):
    row_sum = 0
    for col in columns:
        if col > 0:
            row_sum += col ** 2
    return 1/row_sum

In [26]:
df_total = df_gb.sum(axis=1).reset_index()
df_total.columns = ['Metro', 'Total Restaurants']

In [27]:
df_h = df_freq.apply(lambda x: shannon_H([x[col] for col in list(df_freq.columns)]),
                     axis=1).reset_index()
df_h.columns = ['Metro', "Shannon H"]

In [28]:
df_d = df_freq.apply(lambda x: simpson_D([x[col] for col in list(df_freq.columns)]),
                     axis=1).reset_index()
df_d.columns = ['Metro', "Simpson D"]

In [29]:
df_total = df_total.merge(df_h, on='Metro').merge(df_d, on='Metro')

In [30]:
df_total

Unnamed: 0,Metro,Total Restaurants,Shannon H,Simpson D
0,Boise,782.0,2.538551,6.249223
1,Edmonton,1523.0,3.054412,13.290526
2,Indianapolis,2484.0,2.536763,5.933463
3,Nashville,2842.0,2.66396,7.259984
4,New Orleans,2283.0,2.737613,8.378635
5,Philadelphia,10330.0,2.771906,7.471911
6,Reno,1016.0,2.46865,6.30524
7,Santa Barbara,745.0,2.529904,6.359059
8,St. Louis,3249.0,2.475266,5.481208
9,Tampa-St. Petersburg,5612.0,2.839989,7.837189


In [31]:
df_total = df_total.merge(richness, on='Metro')

In [32]:
df_total['Abbr'] = ['Boi', 'Edm', 'Ind', 'Nash', 'NO', 'Phi',
                    'Reno', 'SB', 'STL', 'T-SP', 'Tuc']

In [98]:
df_total['Abbr'] = ['B', 'E', 'I', 'Na', 'NO', 'P',
                    'R', 'SB', 'STL', 'T-SP', 'Tu']

In [33]:
df_total

Unnamed: 0,Metro,Total Restaurants,Shannon H,Simpson D,Richness,Abbr
0,Boise,782.0,2.538551,6.249223,45,Boi
1,Edmonton,1523.0,3.054412,13.290526,62,Edm
2,Indianapolis,2484.0,2.536763,5.933463,62,Ind
3,Nashville,2842.0,2.66396,7.259984,65,Nash
4,New Orleans,2283.0,2.737613,8.378635,62,NO
5,Philadelphia,10330.0,2.771906,7.471911,88,Phi
6,Reno,1016.0,2.46865,6.30524,44,Reno
7,Santa Barbara,745.0,2.529904,6.359059,50,SB
8,St. Louis,3249.0,2.475266,5.481208,69,STL
9,Tampa-St. Petersburg,5612.0,2.839989,7.837189,77,T-SP


In [99]:
colorscheme = [
'#003f5c',
'#2f4b7c',
'#665191',
'#a05195',
'#d45087',
'#f95d6a',
'#ff7c43',
'#ffa600',
'#4CAF50',
'#2196F3',
'#4C6EF5',

]

base = alt.Chart(df_total).mark_point(filled=True, size=90).encode(
    x=alt.X('Total Restaurants', 
            scale=alt.Scale(domain=[0, 10050]),
            axis=alt.Axis(grid=False, 
                          tickCount = 3, 
                          labelExpr='datum.value / 1000 + "K"',
                         labelFontSize=14,
                         titleFontSize=15),
           ),
    y=alt.Y("Shannon H", scale=alt.Scale(domain=[2.3, 3.1]),
            axis=alt.Axis(grid=False,
                          tickCount = 3,
                          labelFontSize=14,
                          titleFontSize=15)),
)

scatter = base.encode(
    color=alt.Color('Metro', 
                    scale=alt.Scale(range=colorscheme)),
                    #scale=alt.Scale(scheme='pinkyellowgreen')),
    tooltip=['Metro', 'Total Restaurants', "Shannon H"]
).interactive()

text = base.mark_text(
    align='left',
    baseline='middle',
    dx=7,
    fontSize=14
).encode(
    text='Abbr'
)

line = base.mark_line(color='grey', strokeDash=(4,4)
                     ).transform_regression('Total Restaurants', "Shannon H")
  
scatter + text + line

In [105]:
colorscheme = [
'#003f5c',
'#2f4b7c',
'#665191',
'#a05195',
'#d45087',
'#f95d6a',
'#ff7c43',
'#ffa600',
'#4CAF50',
'#2196F3',
'#4C6EF5',

]

base = alt.Chart(df_total).mark_point(filled=True, size=90).encode(
    x=alt.X('Total Restaurants', 
            scale=alt.Scale(domain=[0, 10050]),
            axis=alt.Axis(grid=False, 
                          tickCount = 3, 
                          labelExpr='datum.value / 1000 + "K"',
                         labelFontSize=14,
                         titleFontSize=15),
           ),
    y=alt.Y("Simpson D", scale=alt.Scale(domain=[5, 15]),
            axis=alt.Axis(grid=False,
                          tickCount = 4,
                          labelFontSize=14,
                          titleFontSize=15)),
)

scatter = base.encode(
    color=alt.Color('Metro', 
                    scale=alt.Scale(range=colorscheme)),
                    #scale=alt.Scale(scheme='pinkyellowgreen')),
    tooltip=['Metro', 'Total Restaurants', "Simpson D"]
).interactive()

text = base.mark_text(
    align='left',
    baseline='middle',
    dx=7,
    fontSize=14
).encode(
    text='Abbr'
)

line = base.mark_line(color='grey', strokeDash=(4,4)
                     ).transform_regression('Total Restaurants', "Simpson D")
  
scatter + text + line

In [100]:
colorscheme = [
'#003f5c',
'#2f4b7c',
'#665191',
'#a05195',
'#d45087',
'#f95d6a',
'#ff7c43',
'#ffa600',
'#4CAF50',
'#2196F3',
'#4C6EF5',

]

base = alt.Chart(df_total).mark_point(filled=True, size=90).encode(
    x=alt.X('Total Restaurants', 
            scale=alt.Scale(domain=[0, 10050]),
            axis=alt.Axis(grid=False, 
                          tickCount = 3, 
                          labelExpr='datum.value / 1000 + "K"',
                         labelFontSize=14,
                         titleFontSize=15),
           ),
    y=alt.Y("Richness", scale=alt.Scale(domain=[40, 90]),
            axis=alt.Axis(grid=False,
                          tickCount = 3,
                          labelFontSize=14,
                          titleFontSize=15)),
)

scatter = base.encode(
    color=alt.Color('Metro', 
                    scale=alt.Scale(range=colorscheme)),
    tooltip=['Metro', 'Total Restaurants', "Richness"]
).interactive()

text = base.mark_text(
    align='left',
    baseline='middle',
    dx=7,
    fontSize=14
).encode(
    text='Abbr'
)

line = base.mark_line(color='grey', strokeDash=(4,4)
                     ).transform_regression('Total Restaurants', "Richness")
  
scatter + text + line

In [117]:
# for poster; labels created ad hoc


base = alt.Chart(df_total).mark_point(filled=True, size=90).encode(
    x=alt.X('Total Restaurants', 
            scale=alt.Scale(domain=[0, 10500]),
            axis=alt.Axis(grid=False, 
                          tickCount = 3, 
                          labelExpr='datum.value / 1000 + "K"',
                         labelFontSize=14,
                         titleFontSize=15),
           ),
    y=alt.Y("Shannon H", scale=alt.Scale(domain=[2.3, 3.1]),
            axis=alt.Axis(grid=False,
                          tickCount = 3,
                          labelFontSize=14,
                          titleFontSize=15)),
).properties(
    width=200,
    height=200
)


scatter = base.encode(
    color=alt.Color('Metro', 
                    scale=alt.Scale(range=colorscheme)),
                    #scale=alt.Scale(scheme='pinkyellowgreen')),
    tooltip=['Metro', 'Total Restaurants', "Shannon H"]
).interactive()

line = base.mark_line(color='grey', strokeDash=(4,4)
                     ).transform_regression('Total Restaurants', "Shannon H")
  
shannon_h = scatter + line

In [123]:
# for poster; labels created ad hoc

base2 = alt.Chart(df_total).mark_point(filled=True, size=90).encode(
    x=alt.X('Total Restaurants', 
            scale=alt.Scale(domain=[0, 10050]),
            axis=alt.Axis(grid=False, 
                          tickCount = 3, 
                          labelExpr='datum.value / 1000 + "K"',
                         labelFontSize=14,
                         titleFontSize=15),
           ),
    y=alt.Y("Simpson D", scale=alt.Scale(domain=[5, 15]),
            axis=alt.Axis(grid=False,
                          tickCount = 4,
                          labelFontSize=14,
                          titleFontSize=15)),
).properties(
    width=200,
    height=200
)


scatter2 = base2.encode(
    color=alt.Color('Metro', 
                    scale=alt.Scale(range=colorscheme)),
                    #scale=alt.Scale(scheme='pinkyellowgreen')),
    tooltip=['Metro', 'Total Restaurants', "Simpson D"]
).interactive()
  
scatter2

In [126]:
# for poster; labels created ad hoc

base3 = alt.Chart(df_total).mark_point(filled=True, size=90).encode(
    x=alt.X('Total Restaurants', 
            scale=alt.Scale(domain=[0, 10050]),
            axis=alt.Axis(grid=False, 
                          tickCount = 3, 
                          labelExpr='datum.value / 1000 + "K"',
                         labelFontSize=14,
                         titleFontSize=15),
           ),
    y=alt.Y("Richness", scale=alt.Scale(domain=[40, 95]),
            axis=alt.Axis(grid=False,
                          tickCount = 3,
                          labelFontSize=14,
                          titleFontSize=15)),
).properties(
    width=200,
    height=200
)

scatter3 = base3.encode(
    color=alt.Color('Metro', 
                    scale=alt.Scale(range=colorscheme)),
    tooltip=['Metro', 'Total Restaurants', "Richness"]
).interactive()


line3 = base3.mark_line(color='grey', strokeDash=(4,4)
                     ).transform_regression('Total Restaurants', "Richness")
  
richness_plot = scatter3 + line3

In [171]:
richness_plot & scatter2 | shannon_h

### Qualitative investigation

In [129]:
# Which foods do each metro have more commonly than other metros as a % of all restaurants? 
df_freq.rank(ascending=False).T.sort_values('Edmonton')

metro,Boise,Edmonton,Indianapolis,Nashville,New Orleans,Philadelphia,Reno,Santa Barbara,St. Louis,Tampa-St. Petersburg,Tucson
Vietnamese,4.0,1.0,10.0,11.0,2.0,5.0,3.0,9.0,8.0,7.0,6.0
Indian,11.0,1.0,3.0,5.0,10.0,2.0,6.0,8.0,7.0,4.0,9.0
Filipino,10.5,1.0,7.0,9.0,4.0,6.0,2.0,10.5,3.0,5.0,8.0
Ethiopian,5.0,1.0,7.0,2.0,9.0,8.0,3.0,4.0,6.0,11.0,10.0
Indonesian,8.5,1.0,4.0,8.5,8.5,2.0,8.5,3.0,5.0,8.5,8.5
Japanese,2.0,1.0,8.0,6.0,10.0,4.0,7.0,3.0,11.0,9.0,5.0
Korean,2.0,1.0,9.0,8.0,11.0,4.0,3.0,5.0,6.0,10.0,7.0
Lebanese,10.0,1.0,10.0,5.0,2.0,7.0,10.0,6.0,4.0,3.0,8.0
Chinese,8.0,1.0,4.0,9.0,10.0,3.0,5.0,11.0,2.0,7.0,6.0
Malaysian,9.5,1.0,5.0,6.0,9.5,2.0,9.5,9.5,7.0,4.0,3.0


In [36]:
# How common is each cuisine in each metro?
df_freq.rank(ascending=False, axis=1, method='average')

Unnamed: 0_level_0,Afghan,African,American,Arabic,Argentine,Armenian,Asian Fusion,Australian,Austrian,Bangladeshi,Basque,Belgian,Brazilian,British,Burmese,Cajun/Creole,Calabrian,Cambodian,Canadian (New),Cantonese,Caribbean,Chinese,Colombian,Cuban,Czech,Dominican,Eastern European,Egyptian,Ethiopian,Filipino,French,Fuzhou,Georgian,German,Greek,Guamanian,Hainan,Haitian,Hakka,Hawaiian,Himalayan/Nepalese,Honduran,Hungarian,Iberian,Indian,Indonesian,Irish,Israeli,Italian,Japanese,Korean,Laotian,Latin American,Lebanese,Malaysian,Mediterranean,Mexican,Middle Eastern,Modern European,Mongolian,Moroccan,New Mexican Cuisine,Nicaraguan,Pakistani,Pan Asian,Persian/Iranian,Peruvian,Polish,Portuguese,Puerto Rican,Roman,Russian,Salvadoran,Sardinian,Scandinavian,Scottish,Senegalese,Serbo Croatian,Shanghainese,Sicilian,Singaporean,Somali,South African,Southern,Spanish,Sri Lankan,Syrian,Szechuan,Taiwanese,Tex-Mex,Thai,Trinidadian,Turkish,Tuscan,Ukrainian,Uzbek,Venezuelan,Vietnamese
metro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1
Boise,26.5,39.5,1.0,26.5,26.5,72.0,7.0,72.0,72.0,72.0,17.0,72.0,26.5,39.5,72.0,19.0,72.0,72.0,72.0,39.5,22.0,4.0,72.0,39.5,72.0,72.0,39.5,39.5,31.5,72.0,16.0,72.0,72.0,22.0,13.5,72.0,72.0,72.0,72.0,8.5,31.5,72.0,72.0,72.0,19.0,72.0,72.0,72.0,3.0,5.0,10.5,72.0,26.5,72.0,72.0,12.0,2.0,19.0,22.0,15.0,72.0,31.5,72.0,26.5,39.5,39.5,72.0,72.0,72.0,39.5,72.0,31.5,72.0,72.0,72.0,72.0,72.0,72.0,72.0,39.5,72.0,72.0,72.0,13.5,39.5,72.0,72.0,72.0,72.0,6.0,10.5,72.0,39.5,72.0,72.0,72.0,72.0,8.5
Edmonton,44.0,32.5,4.0,37.5,58.0,80.5,9.0,80.5,80.5,80.5,80.5,80.5,37.5,25.0,80.5,22.5,80.5,58.0,1.0,37.5,20.0,3.0,80.5,44.0,58.0,80.5,80.5,80.5,22.5,14.0,14.0,80.5,80.5,30.0,12.0,80.5,80.5,80.5,50.0,37.5,50.0,80.5,80.5,58.0,7.0,44.0,30.0,80.5,2.0,6.0,8.0,37.5,22.5,17.5,50.0,14.0,10.0,16.0,27.0,37.5,80.5,80.5,80.5,17.5,50.0,30.0,80.5,44.0,27.0,80.5,80.5,58.0,44.0,80.5,37.5,58.0,80.5,80.5,80.5,80.5,58.0,80.5,50.0,27.0,58.0,50.0,80.5,58.0,19.0,37.5,11.0,80.5,32.5,80.5,22.5,80.5,50.0,5.0
Indianapolis,42.0,23.0,1.0,48.0,48.0,80.5,7.0,80.5,80.5,80.5,48.0,48.0,30.0,25.0,27.5,11.0,80.5,80.5,80.5,34.0,19.0,4.0,80.5,34.0,80.5,42.0,57.5,57.5,34.0,34.0,20.0,80.5,80.5,27.5,12.0,80.5,80.5,48.0,80.5,23.0,57.5,48.0,80.5,80.5,10.0,57.5,18.0,80.5,3.0,6.0,17.0,48.0,13.0,80.5,57.5,14.5,2.0,16.0,48.0,48.0,39.0,23.0,57.5,21.0,34.0,80.5,27.5,80.5,80.5,42.0,80.5,57.5,27.5,80.5,57.5,57.5,80.5,80.5,80.5,80.5,80.5,80.5,57.5,8.0,39.0,80.5,80.5,80.5,34.0,5.0,9.0,80.5,39.0,80.5,80.5,80.5,34.0,14.5
Nashville,82.0,35.0,1.0,82.0,45.0,52.5,11.0,40.0,61.5,52.5,82.0,82.0,30.5,52.5,82.0,10.0,82.0,61.5,82.0,40.0,18.0,5.0,40.0,35.0,82.0,52.5,82.0,45.0,25.0,52.5,13.0,82.0,82.0,26.5,8.0,82.0,82.0,82.0,82.0,20.5,52.5,45.0,82.0,82.0,15.0,82.0,24.0,82.0,4.0,7.0,17.0,22.5,19.0,40.0,61.5,13.0,2.0,13.0,30.5,40.0,61.5,20.5,82.0,28.0,45.0,35.0,35.0,82.0,61.5,45.0,82.0,82.0,22.5,82.0,82.0,82.0,82.0,82.0,82.0,52.5,61.5,52.5,52.5,3.0,30.5,82.0,61.5,30.5,52.5,6.0,9.0,82.0,26.5,82.0,82.0,61.5,35.0,16.0
New Orleans,80.5,38.0,1.0,38.0,57.0,57.0,10.0,80.5,80.5,80.5,33.5,57.0,24.5,57.0,49.0,2.0,80.5,80.5,80.5,30.5,16.0,7.0,38.0,17.0,80.5,49.0,80.5,49.0,44.0,27.5,8.0,80.5,80.5,44.0,21.0,80.5,80.5,38.0,80.5,22.0,80.5,30.5,80.5,80.5,19.5,80.5,33.5,80.5,3.0,9.0,18.0,49.0,13.0,19.5,80.5,15.0,4.0,12.0,26.0,80.5,44.0,27.5,38.0,24.5,57.0,49.0,30.5,80.5,57.0,80.5,57.0,57.0,38.0,80.5,80.5,80.5,80.5,80.5,57.0,80.5,80.5,80.5,80.5,6.0,23.0,80.5,80.5,57.0,44.0,11.0,14.0,44.0,30.5,80.5,57.0,80.5,38.0,5.0
Philadelphia,38.5,25.5,1.0,66.0,54.5,79.5,6.0,71.5,79.5,59.5,71.5,48.5,42.5,42.5,62.5,19.0,93.5,46.0,93.5,24.0,14.5,3.0,48.5,37.0,85.5,42.5,93.5,66.0,38.5,35.5,13.0,85.5,71.5,34.0,14.5,93.5,85.5,62.5,93.5,25.5,68.5,66.0,71.5,57.5,7.5,46.0,20.5,79.5,2.0,5.0,12.0,42.5,20.5,51.0,54.5,10.0,4.0,16.0,27.5,75.0,31.0,33.0,93.5,18.0,30.0,54.5,40.0,35.5,51.0,32.0,93.5,54.5,46.0,79.5,85.5,85.5,75.0,93.5,51.0,59.5,68.5,93.5,85.5,17.0,22.0,93.5,79.5,23.0,29.0,11.0,7.5,57.5,27.5,79.5,75.0,62.5,62.5,9.0
Reno,41.0,71.5,1.0,71.5,71.5,71.5,5.0,71.5,71.5,41.0,28.0,71.5,41.0,34.5,71.5,23.5,71.5,71.5,71.5,28.0,71.5,4.0,71.5,71.5,41.0,71.5,71.5,71.5,28.0,14.5,16.5,71.5,71.5,20.0,12.0,71.5,71.5,71.5,71.5,11.0,41.0,71.5,71.5,71.5,13.0,71.5,28.0,71.5,3.0,6.0,10.0,71.5,18.5,71.5,71.5,18.5,2.0,21.5,34.5,28.0,34.5,23.5,71.5,21.5,28.0,34.5,34.5,71.5,71.5,71.5,71.5,34.5,16.5,71.5,71.5,71.5,71.5,71.5,71.5,71.5,41.0,71.5,71.5,14.5,71.5,71.5,71.5,41.0,28.0,9.0,8.0,71.5,71.5,71.5,71.5,71.5,71.5,7.0
Santa Barbara,74.5,74.5,1.0,43.0,22.5,74.5,7.0,31.5,74.5,74.5,43.0,31.5,43.0,31.5,74.5,13.0,74.5,74.5,74.5,31.5,43.0,5.5,74.5,25.5,74.5,74.5,74.5,74.5,31.5,74.5,5.5,74.5,74.5,25.5,17.0,74.5,74.5,74.5,74.5,14.5,43.0,74.5,74.5,74.5,17.0,43.0,43.0,74.5,3.0,4.0,10.0,74.5,17.0,43.0,74.5,11.0,2.0,25.5,31.5,43.0,31.5,22.5,74.5,20.0,43.0,43.0,25.5,74.5,74.5,74.5,74.5,74.5,43.0,74.5,43.0,74.5,74.5,74.5,74.5,74.5,74.5,74.5,74.5,31.5,20.0,74.5,74.5,14.5,20.0,8.5,8.5,74.5,43.0,43.0,74.5,74.5,74.5,12.0
St. Louis,26.0,30.0,1.0,36.5,48.0,36.5,7.0,84.0,84.0,63.0,42.0,53.5,33.5,30.0,84.0,10.5,63.0,84.0,84.0,42.0,22.5,4.0,63.0,36.5,53.5,48.0,84.0,84.0,26.0,22.5,15.0,84.0,84.0,21.0,9.0,84.0,84.0,84.0,84.0,36.5,42.0,48.0,84.0,84.0,17.0,63.0,16.0,84.0,2.0,6.0,12.0,53.5,20.0,30.0,63.0,14.0,3.0,18.0,24.0,48.0,53.5,26.0,63.0,19.0,33.5,42.0,42.0,63.0,63.0,84.0,84.0,42.0,84.0,84.0,63.0,63.0,63.0,84.0,53.5,63.0,84.0,84.0,84.0,8.0,63.0,84.0,48.0,30.0,42.0,5.0,10.5,84.0,30.0,53.5,84.0,84.0,84.0,13.0
Tampa-St. Petersburg,65.5,46.0,1.0,46.0,46.0,52.5,10.0,57.5,65.5,65.5,41.5,57.5,38.5,29.5,65.5,17.0,88.0,88.0,88.0,38.5,13.0,4.0,25.0,6.0,74.0,33.5,88.0,65.5,57.5,33.5,24.0,88.0,88.0,26.5,5.0,88.0,88.0,44.0,88.0,23.0,88.0,74.0,74.0,88.0,16.0,88.0,21.5,88.0,2.0,9.0,20.0,41.5,15.0,35.0,57.5,14.0,3.0,19.0,32.0,52.5,49.0,26.5,88.0,29.5,36.0,57.5,28.0,41.5,65.5,18.0,88.0,49.0,57.5,74.0,65.5,74.0,65.5,88.0,74.0,52.5,88.0,88.0,88.0,11.0,21.5,88.0,65.5,41.5,52.5,8.0,7.0,49.0,37.0,74.0,65.5,88.0,31.0,12.0


In [156]:
n = 5

# finds the names of the top-n highest-value columns in a row
def top_n_cols(row):
    top_n = row.nlargest(n)
    return list(top_n.index)

# apply the function to each row
top_n_cols_df = df_gb.apply(top_n_cols, axis=1, result_type='expand')

top_n_cols_df

Unnamed: 0_level_0,0,1,2,3,4
metro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Boise,American,Mexican,Italian,Chinese,Japanese
Edmonton,Canadian (New),Italian,Chinese,American,Vietnamese
Indianapolis,American,Mexican,Italian,Chinese,Tex-Mex
Nashville,American,Mexican,Southern,Italian,Chinese
New Orleans,American,Cajun/Creole,Italian,Mexican,Vietnamese
Philadelphia,American,Italian,Chinese,Mexican,Japanese
Reno,American,Mexican,Italian,Chinese,Asian Fusion
Santa Barbara,American,Mexican,Italian,Japanese,Chinese
St. Louis,American,Italian,Mexican,Chinese,Tex-Mex
Tampa-St. Petersburg,American,Italian,Mexican,Chinese,Greek


In [131]:
n = 10

# Define a function to find the names of the top-n highest-value columns in a row
def bottom_n_cols(row):
    bottom_n = row.nsmallest(n)
    return list(bottom_n.index)

# Apply the function to each row of the DataFrame
bot_n_cols_df = df_gb.replace(0, np.nan).apply(bottom_n_cols, axis=1, result_type='expand')

# Print the resulting DataFrame
bot_n_cols_df

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9
metro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Boise,African,British,Cantonese,Cuban,Eastern European,Egyptian,Pan Asian,Persian/Iranian,Puerto Rican,Sicilian
Edmonton,Argentine,Cambodian,Czech,Iberian,Russian,Scottish,Singaporean,Spanish,Szechuan,Hakka
Indianapolis,Eastern European,Egyptian,Himalayan/Nepalese,Indonesian,Malaysian,Nicaraguan,Russian,Scandinavian,Scottish,South African
Nashville,Austrian,Cambodian,Malaysian,Moroccan,Portuguese,Singaporean,Syrian,Uzbek,Armenian,Bangladeshi
New Orleans,Argentine,Armenian,Belgian,British,Pan Asian,Portuguese,Roman,Russian,Shanghainese,Szechuan
Philadelphia,Czech,Fuzhou,Hainan,Scandinavian,Scottish,South African,Armenian,Austrian,Israeli,Sardinian
Reno,Afghan,Bangladeshi,Brazilian,Czech,Himalayan/Nepalese,Singaporean,Szechuan,British,Modern European,Moroccan
Santa Barbara,Arabic,Basque,Brazilian,Caribbean,Himalayan/Nepalese,Indonesian,Irish,Lebanese,Mongolian,Pan Asian
St. Louis,Bangladeshi,Calabrian,Colombian,Indonesian,Malaysian,Nicaraguan,Polish,Portuguese,Scandinavian,Scottish
Tampa-St. Petersburg,Czech,Honduran,Hungarian,Sardinian,Scottish,Shanghainese,Tuscan,Afghan,Austrian,Bangladeshi
