***

# Jupyter Notebook for Can AI-tools help me code data? 

This Notebook allows you to replicate the analysis of "Can AI-tools help me code data? Guidelines for using LLM-assisted text annotation in cultural sociology and an illustrative example," hereafter referred to as "The paper." 

The Python code can easily be adapted for other LLM-assisted text annotation projects. It provides a step by step illustration of how to go from setting up the API to initial testing and validating a model. 

**Please note:** before proceeding, you need to [create an OpenAI API key](https://platform.openai.com/api-keys). Additional help with installing the OpenAI package can be found [here](https://platform.openai.com/docs/quickstart) in the Quickstart guide.

The following Python packages need to be installed for this Notebook to work: 
- json
- openai
- pandas
- tiktoken
- tqdm

***

## 1. Import packages

In [1]:
import os
import json
from openai import OpenAI
import pandas as pd
import tiktoken
from tqdm import tqdm

## 2. Set up OpenAI API

Use your personal API key in the cell below

In [2]:
client = OpenAI(
    api_key = 'your_API_key' # Use your personal OpenAI API key, see the notes above
)

#### Send a first request

Test the API by sending a first request to the GPT-4o model

In [3]:
completion = client.chat.completions.create(
  model="gpt-4o",
  temperature=1,
  messages=[
    {"role": "system", "content": "You are a Michelin star chef."},
    {"role": "user", "content": "Write a one sentence text to welcome a group of curious cultural sociologists into your restaurant."},
  ]
)
print(completion.choices[0].message.content)

Welcome, esteemed cultural sociologists, to our culinary haven where every dish is a narrative and every flavor a chapter in an evolving gastronomic story.


## 3. Load the full dataset

In the code cell below, we open the dataset with restaurant reviews (N = 1047) stored on GitHub

In [4]:
url_1 = 'https://raw.githubusercontent.com/account-for-paper-submissions/LLM-assisted-text-annotation/refs/heads/main/restaurant_reviews.csv'
reviews_df = pd.read_csv(url_1, header=0)
print('Number of reviews:', reviews_df.shape[0])
reviews_df.head()    

Number of reviews: 1047


Unnamed: 0,review
0,***A dining experience that takes you around t...
1,Diner at Ciel Bleu is a truly fantastic experi...
2,Amazing experience.\n\nMany thanks to the Cie...
3,Fantastic experience! I booked a dinner here f...
4,@ciel_bleu_restaurant Perched on the 23rd floo...


#### Input length considerations

Most LLMs have a context window, which defines the maximum number of tokens they can process at once. For GPT-4 models, this context window is 128,000 tokens. If the input text exceeds this length, the model will not be able to process it in one go. In such cases, the input needs to be split into smaller chunks.

Although the context window of the newer models is quite large, it's still important to verify that your input text fits within this limit. We can use the following code to check:

Below, we conduct a review length check to see wether reviews exceed the 128,000 context window

In [5]:
# Get the encoding for the GPT-4 model
encoding = tiktoken.encoding_for_model('gpt-4')

# Create a list of the token lengths of each review
token_lengths = [len(encoding.encode(review)) for review in reviews_df['review']]

# Print max token length
print('Maximum token length:', max(token_lengths))
if max(token_lengths) <= 128000:
    print('The maximum token length of the given texts is within the 128,000 context window for GPT-4-Turbo and GPT-4o models. \nYou are good to go.')
else:
    print('One of the texts exceeds the 128,000 context window. Please reconsider your analytical strategy.')

Maximum token length: 1366
The maximum token length of the given texts is within the 128,000 context window for GPT-4-Turbo and GPT-4o models. 
You are good to go.


## 4. The prompt 

Below we use the 8th prompt iteration, as discussed in The paper. 

In [6]:
instruction = """Your task is to classify whether reviews contain a reference to a personalized service provided by, or a personalized relationship with, a chef or staff member of a Michelin-star restaurant. To classify reviews as containing such a reference one out of the following four criteria needs to be identified:

1. Personalized interactions: mentions of personalized interactions with a chef or staff member, such as a guided tour through the kitchen, self-introductions, having a conversation or chat, sharing stories, the use of jokes, or a chef visiting the table. Not included are explanations of dishes, the accommodation of changes in the menu, or complementary dishes, such as an amuse bouche.

2. Chef’s and staff members’ first names: any reference to a chef or staff member from the reviewed restaurant by their first name, irrespective of the context in which it is used. So it does not matter whether or not the first name is associated with a personalized service. The use of a first name by itself is seen as a sign of an experienced personalized relationship. Not included are references to the first name of a chef working at another restaurant (e.g. “Our chef cooked much better than Gordon Ramsey”)

3. Assistance with special occasions: mentions of guests celebrating special occasions (e.g. anniversaries or birthdays) in the restaurant, where a chef or staff member provided assistance or special attention. This includes guests receiving a present, such as a card, or treat for the occasion, talking with a chef or staff member about the occasion, or receiving help organizing the occasion.  

4. Literal mentions of personalized service and its equivalents: any use of terms such as “personal service”, “personal approach”, “personalized attention,” “personal warmth” or being “personable,” even if no further details are provided about the nature of the service. Not included are references to personal service in a negative way, e.g.: “We did not receive any personal service.”

Exclude references to general good or exceptional service, including descriptions like "phenomenal," "friendly," or "attentive service." Focus instead on instances where there is a reference to a personalized service or a personalized relationship according to one out of the four criteria outlined above. 

Please label the review as containing a reference to a personalized service or personalized relationship as 1 if it meets any of the above criteria, and label it as 0 if it does not contain such a reference. Answer with the number and a brief motivation in JSON format. For example: {"answer": 1, "motivation": "The review mentions the chef visiting the table."}
"""

## 5. Evaluate the prompt with a test sample

In this step, we use a previously hand-coded sample of 20 reviews to test the prompt. 

Step 7, in the bottom of this Notebook, shows how to create your own test and validation samples. 

In [7]:
# Load the previously hand-coded test sample from GitHub
url_2 = 'https://raw.githubusercontent.com/account-for-paper-submissions/LLM-assisted-text-annotation/refs/heads/main/test_sample.csv'
test_sample = pd.read_csv(url_2, header=0)
print('Number of reviews:', test_sample.shape[0])
test_sample.head()    

Number of reviews: 20


Unnamed: 0,original_index,review,human_code
0,352,Hands down best food I’ve had in London. Amazi...,1
1,560,Still one of the best meals we've had in the U...,0
2,874,A great place with 3 stars.the parking is limi...,0
3,980,Absolutely incredible. A meal you'll be talkin...,0
4,31,"Good restaurant, but the service died when we ...",0


For each review, we will make an API request to classify whether it contains a reference to personal service. The results will be stored in seperate columns in the `test_sample` dataframe.

In [8]:
for index, row in tqdm(test_sample.iterrows(), total=test_sample.shape[0], leave=False):
    response = client.chat.completions.create(
        model='gpt-4-turbo-2024-04-09', # use gpt-4-turbo-2024-04-09 or gpt-4o-2024-05-13
        temperature=0.0, # temperature 
        response_format={ 'type': 'json_object' }, # return the response in JSON format
        messages=[
            {'role': 'system', 'content': instruction}, # provide the instruction from above
            {'role': 'user', 'content': row['review']} # provide the review to be classified
        ],
    )
    json_response = json.loads(response.choices[0].message.content) # parse the JSON response
    test_sample.at[index, 'llm_code'] = json_response['answer'] # store the answer in the DataFrame
    test_sample.at[index, 'motivation'] = json_response['motivation'] # store the motivation in the DataFrame

                                                                                                                       

Now we can display the dataframe and check the alignment between the LLM and a human coder

In [9]:
# Set options to display the full text in the DataFrame
pd.set_option('display.max_colwidth', None) 
test_sample['llm_code'] = test_sample['llm_code'].astype(int)

# calculate the aligment between the manual and LLM labels
alignment = (test_sample['human_code'] == test_sample['llm_code']).mean()
print('Accuracy:', alignment)

# display the sample with the LLM labels and motivations
test_sample

Accuracy: 1.0


Unnamed: 0,original_index,review,human_code,llm_code,motivation
0,352,Hands down best food I’ve had in London. Amazing flavours in every dish that made me go WOW. Inventive use of meat parts(tongue etc). Received a hand written card from the team to celebrate our birthday which was a nice touch! Vibe is quite formal. I am dreaming of the day I can come back for another visit!,1,1,"The review mentions receiving a hand-written card from the team to celebrate a birthday, indicating assistance with a special occasion."
1,560,Still one of the best meals we've had in the UK. It did help the sun shone and we sat on the terrace. The wine selection was one of the most memorable pairings we've ever had.,0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only discusses the general experience and quality of the meal and wine selection."
2,874,A great place with 3 stars.the parking is limited and difficult so take a taxi\n.,0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
3,980,Absolutely incredible. A meal you'll be talking about for years.,0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
4,31,"Good restaurant, but the service died when we started with desserts (no more water, had to ask for the bill...)\n\nBread and butter was excellent.\n\nThe chocolate dessert was way to powerful.",0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only discusses general aspects of the service and food quality."
5,448,"Very pleasant experience with great company. Good friends birthday dinner celebrations. The 8 course set menu was yummy, but, we were not able to finish the last meal.",0,0,The review mentions a birthday dinner but does not specify any personalized interaction or special attention from the chef or staff related to the occasion.
6,198,"Dame De Pic is a two Michelin starred French restaurant situated by Tower Bridge tube station in central London. The restaurant is situated within a top hotel and specialises in Southern/South East French cuisine. Which is understandable as Chef Pic is from that heritage. My wife and I visited in July 2021 for our wedding anniversary.\n\nPros:\nThe food is stupendous. Tasty. Creative. And a sight and wonder to behold. For the taste buds, the eyes, and for all the senses indeed. Furthermore, service is impeccable. The staff are friendly, professional, courteous yet impeccably trained and oh so knowledgeable on the restaurant's various dishes. Most impressive. I was most impressed by Stephanie, Art and Walter especially. Job well done! I especially liked the attention to detail by the team. Stephanie's team noticed I was on my wedding anniversary meal and took appropriate action. Most impressive! I have decided I will spend my anniversaries here every year from henceforth! Imagine if you came here to propose marriage to your loved one!\n\nCons: None really. Prices are fair in view of the quality of the food and establishment. Lovely high ceilings. Staff did not rush me at all. The soap dispenser on the left hand side ran out of soap in the men's toilet, but really, I can't find any flaw or fault in the restaurant. Well done!\n\nIn short, Dame De Pic is the London outpost of Chef Pic's international Michelin starred gastronomic empire and serves superb French food. I had a great time there, would recommend it unreservedly and look forward to visiting them again soon for great tasting French food. Well done guys!",1,1,"The review mentions assistance with a special occasion, specifically noting that the staff noticed the reviewer was celebrating their wedding anniversary and took appropriate action."
7,424,"Amazing dinner at the Ledbury - everything was fantastic from the service to the ambiance to the wine list and of course, the food. I’m so happy to have dined here on my birthday and look forward to coming back.",0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions in a personalized manner, or literal mentions of personalized service."
8,107,"I so wanted to give this full marks, as we had very high expectation for a 2 star restaurant.\n\nSadly, I have to mark it down for one fundamental failure.\n\nWe opted for the Journey tasting menu with pairing (@ £120 per person), which they delivered by the chef with each dish. HOWEVER, the timings and delivery of each pairing was wrong, and it ruined the experience of each dish ( which is the point of paring option). The pairings were served 10-15 mins before each course, and with many of the pairings needing to be serviced chilled, they were warm by the time the food arrived. This completely ruined our experience.\n\nAfter getting the attention of our waiter, he continued to time his pour wrong. Finally we spoke to one of the senior managers, who informed me that the kitchen failed to communicate with front of house, and offered our cocktails on the house as an apology.\n\nSadly, it doesn’t rectify the failure, and at a cost in excess of £600.",0,0,"The review discusses issues with service timing and communication between the kitchen and front of house, but does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
9,711,The steak was excellent and the decor was refreshing. I would recommend.,0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."


## 6. Validation

In the code cell below, we load the previously hand-coded validation sample as used for the third validation round in The paper.  

Step 7, in the bottom of this Notebook, shows how to generate your own test and validation samples.

In [10]:
url_3 = 'https://raw.githubusercontent.com/account-for-paper-submissions/LLM-assisted-text-annotation/refs/heads/main/validation_sample_20240724.csv'
validation_sample = pd.read_csv(url_3)
print('Number of reviews:', validation_sample.shape[0])
validation_sample.head()

Number of reviews: 282


Unnamed: 0,original_index,review,human_code
0,515,"The best dining experience I have ever had. The food was incredible, great produce fabulously executed. The staff were fantastic. Impeccable service, i really can not fault anything.",0
1,120,"Amazing!! Food was so delicious. Superb presentation. High cuisine techniques and flavour combinations. Incredible value for money - 3 dish set lunch at a 2 Michelin star restaurant in central London for £45 - an absolute bargain! Complementary starters (3 different bite size eats), complementary bread and flavoured butter, and complementary tarts. Super nice and profesional staff. Great ambience and lots of natural light and space. I cannot fault it. If I could give more than 5 stars, I would.",0
2,47,"Delicious, balanced taste. Not cheap (you can go on a small holiday from this) but one of the best meals I ever had! You close your eyes if you taste this. Although this establishment might seem a bit chique/stuck-up (has been around for lots of years), the staff is really nice and you get great views also.",0
3,894,Nice to see Mr Arzak come out and talked to each table.,1
4,208,"Absolutely fantastic food, very good staff, better than most 3 Michelin restaurants in London.",0


In [11]:
for index, row in tqdm(validation_sample.iterrows(), total=validation_sample.shape[0], leave=False):
    response = client.chat.completions.create(
        model='gpt-4-turbo-2024-04-09', # use gpt-4-turbo-2024-04-09 or gpt-4o-2024-05-13
        temperature=0.0, # temperature 
        response_format={ 'type': 'json_object' }, # return the response in JSON format
        messages=[
            {'role': 'system', 'content': instruction}, # provide the instruction from above
            {'role': 'user', 'content': row['review']} # provide the review to be classified
        ],
    )
    
    json_response = json.loads(response.choices[0].message.content) # parse the JSON response
    validation_sample.at[index, 'llm_code'] = json_response['answer'] # store the answer in the DataFrame
    validation_sample.at[index, 'motivation'] = json_response['motivation'] # store the motivation in the DataFrame

                                                                                                                       

In [14]:
# Save the validation sample with the LLM labels and motivations as an XLSX file
prompt = '8th_prompt_iteration'
model = 'gpt-4-turbo-2024-04-09'
validation_sample['llm_code'] = validation_sample['llm_code'].astype(int)
validation_sample.to_excel(f'validation_sample_20240724_{prompt}_{model}_llm_annotated.xlsx')

In [15]:
# Set options to display the full text in the DataFrame
pd.set_option('display.max_colwidth', None) # display full text in the DataFrame

# calculate the aligment between the manual and LLM labels
alignment = (validation_sample['human_code'] == validation_sample['llm_code']).mean()
# alignment = (validation_sample['human_code'] == validation_sample['LLM']).sum()

print('Accuracy:', alignment, round(alignment, 2)) 

# display the sample with the LLM labels and motivations
validation_sample

Accuracy: 0.9432624113475178 0.94


Unnamed: 0,original_index,review,human_code,llm_code,motivation
0,515,"The best dining experience I have ever had. The food was incredible, great produce fabulously executed. The staff were fantastic. Impeccable service, i really can not fault anything.",0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only describes the general quality of food and service."
1,120,"Amazing!! Food was so delicious. Superb presentation. High cuisine techniques and flavour combinations. Incredible value for money - 3 dish set lunch at a 2 Michelin star restaurant in central London for £45 - an absolute bargain! Complementary starters (3 different bite size eats), complementary bread and flavoured butter, and complementary tarts. Super nice and profesional staff. Great ambience and lots of natural light and space. I cannot fault it. If I could give more than 5 stars, I would.",0,0,"The review praises the food, value, and general service quality but does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
2,47,"Delicious, balanced taste. Not cheap (you can go on a small holiday from this) but one of the best meals I ever had! You close your eyes if you taste this. Although this establishment might seem a bit chique/stuck-up (has been around for lots of years), the staff is really nice and you get great views also.",0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only comments on the quality of the meal and the general niceness of the staff."
3,894,Nice to see Mr Arzak come out and talked to each table.,1,1,The review mentions the chef visiting the table.
4,208,"Absolutely fantastic food, very good staff, better than most 3 Michelin restaurants in London.",0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only comments on the quality of food and staff in general terms."
...,...,...,...,...,...
277,214,"My wife and I had the most wonderful and memorable meal here last night. We really couldn't fault anything- service, attention to detail , and most importantly the food!. Highest compliments to the chef and the team here who treated us superbly. They deserve more than a Michelin star for what they have set out and created here. We had some dietary requirements which was accommodated so so well. Very impressed and for sure our new favourite restaurant in London. Cannot wait to come back. 🤗🥇",0,0,"The review praises the service and food but does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
278,429,Excellent restaurant. The food is good and unique. The service great,0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only mentions general good service."
279,407,"We had an extra night to spend in London before flying home and decided.to have a nice dinner. We chose The Ledbury based on reviews and we were not disappointed!\n\nWe had the vegetarian tasting menu (with one course modified for me since I don't like beets) along with the paired wines.\n\nThe food was great. The wines were a treat and matched each course perfectly. The staff was fabulous. The meal was an amazing experience that served as an exclamation point on our adventure in Europe.\n\nWe particularly enjoyed the very creative vegetarian courses. All were delicious and beautifully presented.\n\nThe meal was not inexpensive, but worth every penny! We are looking forward to the time when we can visit again.",0,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It only describes the general quality of the food, wine, and service."
280,24,I've waited 8 months to book this experience and wow it was like a dream. 5 courses of heaven and the staff and chefs are the nicest I've ever met. Pure happiness the whole three hours.,0,0,"The review describes the staff and chefs as nice, but does not mention any specific personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."


In [17]:
# Identify cases of misalignment and save as a XLSX file
misalignment = validation_sample[validation_sample['human_code'] != validation_sample['llm_code']]
misalignment.to_excel(f'misalignment_validation_sample_20240724_{prompt}_{model}_llm_annotated.xlsx')
misalignment

Unnamed: 0,original_index,review,human_code,llm_code,motivation
31,340,"Amazing experience!\nWait staff operate like clockwork, just impressive to sit and watch them work. But they're also super friendly to talk to.\nPlace has a formal yet modern vibe to it, and this is reflected also in the food.\nGreat flavours, great ingredients, great drinks.\nMake sure you visit the bathroom during yoir sitting, to see the mushroom fridge!",1,0,"The review does not meet any of the criteria for personalized service or relationship. It mentions friendly staff but does not specify any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
44,341,"This restaurant definitely lives up to its fame - a Michelin two starred. Upon entry, you’d already be attracted by its fancy decorations and elegant ambience. Every dish served was a piece of artwork carefully cooked and presented. Though some of the taste were not my cup of tea. There were some in the Christmas set which I was truly in love with, such as the lobster soup(served cold), cod, and the lamb shoulder. Some food is best to be served cold so you can taste the most flavor out of it - perfectly demonstrated by the lobster soup. The cod was the finest cod I’ve had in London, very tender which made a great combination with the sauce. However, surprise never stopped in the Ledbury, the milk-fed lamb shoulder was super tender and flavorsome. I really hope that they had more of it on my plate(there was also lamb legs which was also tender but not as good as the shoulder). The Christmas tart was a nice combination of ginger and cinnamon, a good seasonal treat for the end of a nice Christmas meal. If you were to try the normal set, the brown sugar tart is a must try!!!\nPS. One of my friend really appreciated their bread, and the restaurant gave us a whole loaf to take home for free!!! This pushed my expectation of ‘service’ to a whole new level. Well done!!!",1,0,"The review describes the quality of the food and the general service level, but does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
64,930,"Arzak has lived in my imagination for years. To have now eaten the food created by a man that was like a father to Anthony Bourdain, one of my heroes, is an experience I won't soon forget. Elena's creativity is a revelation and truly life-affirming. I feel incredibly lucky to have eaten here. World class cuisine is an understatement.",1,0,"The review does not mention any personalized interactions, use of first names in a context of a personalized relationship, assistance with special occasions, or literal mentions of personalized service. It only mentions the chefs in a general context of admiration."
81,867,"Arzak is a renowned restaurant located in San Sebastian, Spain. It was founded in 1897 and is currently run by chef Juan Mari Arzak and his daughter Elena Arzak, who is also a highly acclaimed chef. Arzak has been instrumental in shaping the culinary landscape of San Sebastian, known for its gastronomic excellence.\n\nThe restaurant holds three Michelin stars and has consistently ranked among the best restaurants in the world. Arzak is celebrated for its avant-garde approach to Basque cuisine, combining tradition with innovation.\n\nArzak's menu reflects the culinary heritage of the Basque Country while incorporating modern techniques and creative presentations. The restaurant emphasizes the use of locally sourced ingredients, often featuring seafood and fresh produce from the region. Dishes at Arzak showcase a wide array of flavors, textures, and artistic plating.\n\nJuan Mari Arzak is considered a pioneer in the world of molecular gastronomy and has been at the forefront of culinary experimentation. His innovative techniques and flavor combinations have inspired many chefs around the globe.\n\nArzak's dining experience is known for its impeccable service, elegant ambiance, and attention to detail. The restaurant's staff is highly knowledgeable about the menu and can guide guests through their culinary journey.",1,0,"The review provides general information about Arzak restaurant and its chefs, but does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service."
83,351,"Cannot fault this restaurant at all. The service, the food, everything is on point. The staff are nice and very chatty which is nice and it’s not too stuffy that you don’t feel comfortable which I find with a lot of fine dining restaurants in UK. Not having a dress code yet the atmosphere is warm and very cosy. The presentation of food is what is expected of a fine dining restaurant, the taste is great. Loved executed dish expect for Deer. But it could be that I was already too full by the time I got there and I personally am not a fan of anything too Gamey in taste. The restaurant catered for my partners onion and garlic minor allergy by excluding it in his meal which is very nice as it’s sometimes quite hard to cater for that. Overall a very nice restaurant & would def go back when the menu is updated as it’s a degustation only so not much to choose from really but they do cater if there is a dish you don’t like and more than happy to swap it out.",1,0,"The review does not mention any personalized interactions with the chef or staff beyond general service, no first names are mentioned, there is no mention of assistance with special occasions, and no literal mentions of personalized service or its equivalents."
104,436,"The Ledbury is a highly acclaimed restaurant located in Notting Hill, London, England. It was founded by chef Brett Graham in 2005 and has since become one of London's most renowned dining establishments.\n\nThe Ledbury is known for its contemporary and inventive approach to modern European cuisine. The restaurant's menu showcases seasonal and locally sourced ingredients, with an emphasis on quality and flavor. Chef Brett Graham and his team create dishes that combine classic techniques with innovative flavor combinations.\n\nThe restaurant has been awarded two Michelin stars and has consistently been ranked among the top restaurants in the UK. The Ledbury offers a refined and elegant dining experience with a focus on impeccable service and attention to detail.\n\nThe menu at The Ledbury features a selection of à la carte options as well as a tasting menu. Each dish is meticulously crafted and presented with an artistic flair. The restaurant also has an extensive wine list that complements the culinary offerings.\n\nThe ambiance at The Ledbury is elegant yet relaxed, making it a popular choice for special occasions and fine dining experiences. The restaurant's stylish decor and warm atmosphere contribute to the overall dining experience.",1,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions, or literal mentions of personalized service. It focuses on general information about the restaurant and its offerings."
105,494,"Beautiful menu, great staff, the wine recommendations were on point, food was delicious and delicate. They also gave me a second desert as I was celebrating my 30th. Would definitely go back for a nice evening treat",0,1,"The review mentions receiving a second dessert while celebrating a special occasion (30th birthday), indicating personalized attention from the staff."
124,485,Update 25.09.17:\n\nIt has been few years since I have been to the Ledbury and I have to say it is still top. The food I feel is even better than on my last visit and the service as attentive as always.\n\nThis restaurant is one of the top restaurants in the world for a reason! Anybody who is lucky to get a table should definitely come and try it out for yourself you will not regret it (well not at least until you get the bill ^^)\n\nPs: and if you do get a chance try to say hello to the kitchen team and chef graham they deserve a big thank you too!\n\nOld review 2015:\n\nBooked for my birthday and it was the first time I ever visited a 2 star michelin star restaurant.\n\nI have to say I do really enjoyed the whole experience. The service was definitely one of the best I have ever experienced. The presentation of the food was also very nice.\n\nWe had the tasting menu and the lamb was probably the best I ever had in my life.\n\nThe only nick picking would be that a fruit fly landed into my wine glass but a staff member was helpful enough to fish it out for me.\n\nFor sure a restaurant worth visiting and recommending.\n\nCheers!,0,1,"The review mentions saying hello to the kitchen team and Chef Graham, indicating a personalized interaction."
125,581,World class tasting menu with excellent service.\nTry to meet chef graham and his team if you are lucky and got a table here. This restaurant deserves to be in the top 50 restaurants on the planet,0,1,"The review suggests trying to meet Chef Graham, indicating a reference to a chef by first name."
177,767,"Better late than never, I celebrated my birthday on Oct 29, 2021 during the peak of Covid-19 at this fine establishment. Arzak is a true culinary masterpiece, a gastronomic journey that goes beyond borders and pleases the senses. Located in the heart of San Sebastián, this restaurant run by the talented Chef Elena Arzak and awarded three Michelin stars will take you on a journey through the art of Basque cuisine. The excellent service and perfectly matched wine enhance\nthe dining experience. My birthday celebration at Arzak was the highlight of this unforgettable journey.",1,0,"The review does not mention any personalized interactions, use of first names, assistance with special occasions beyond the general celebration, or literal mentions of personalized service. It focuses on the general experience and service quality."


## 7. Create your own test and validation samples

In the code below, we demonstrate how to create test and validation samples. Reviews used in earlier test or validation samples should not be re-used in later test or validation samples. The code below ensures that the samples are nonoverlapping.
 
**Steps:**
1. We draw a test sample of 20 reviews. These data are written to a XLSX file so they can be hand-coded (see _Step 5_ in this Notebook on how to analyze the hand-coded test sample).

2. We draw a validation sample of 282 reviews. These data are also written to a XLSX file so they can be hand-coded (see _Step 6_ in this Notebook on how to analyze the hand-coded validation sample).

In [18]:
# 1. Draw a test sample of 20 reviews
test_sample_1 = reviews_df.sample(20, random_state=42)

# Create an empty column for the human codes
test_sample_1['human_code'] = ''

# Write to XLSX for convenient hand-coding
test_sample_1.to_excel('test_sample.xlsx')
test_sample_1.head()

Unnamed: 0,review,human_code
352,Hands down best food I’ve had in London. Amazing flavours in every dish that made me go WOW. Inventive use of meat parts(tongue etc). Received a hand written card from the team to celebrate our birthday which was a nice touch! Vibe is quite formal. I am dreaming of the day I can come back for another visit!,
560,Still one of the best meals we've had in the UK. It did help the sun shone and we sat on the terrace. The wine selection was one of the most memorable pairings we've ever had.,
874,A great place with 3 stars.the parking is limited and difficult so take a taxi\n.,
980,Absolutely incredible. A meal you'll be talking about for years.,
31,"Good restaurant, but the service died when we started with desserts (no more water, had to ask for the bill...)\n\nBread and butter was excellent.\n\nThe chocolate dessert was way to powerful.",


In [20]:
# 2. Draw a validation sample of 282 reviews

# Create a set of indices to remove from the new dataframe
indices_to_remove = set(test_sample_1.index)

# Optionally: set the indeces of multi test and/or validation samples that have been used already
# Uncomment the line of code below and modify it accordingly
# indices_to_remove = set(test_sample_1.index).union(set(test_sample_2.index))

# Drop the reviews that were used for earlier samples
df_copy = reviews_df.copy().drop(indices_to_remove)
print('Number of reviews left:', df_copy.shape[0])

# Draw the sample of needed reviews
validation_sample = df_copy.sample(282, random_state=42)
# Create an empty column for the human codes
validation_sample['human_code'] = ''

# Write to XLSX for convenient hand-coding
validation_sample.to_excel('validation_sample.xlsx')
validation_sample.head() # Note that this is not the same validation sample as used in Step 6 of this Notebook

Number of reviews left: 1027


Unnamed: 0,review,human_code
436,"The Ledbury is a highly acclaimed restaurant located in Notting Hill, London, England. It was founded by chef Brett Graham in 2005 and has since become one of London's most renowned dining establishments.\n\nThe Ledbury is known for its contemporary and inventive approach to modern European cuisine. The restaurant's menu showcases seasonal and locally sourced ingredients, with an emphasis on quality and flavor. Chef Brett Graham and his team create dishes that combine classic techniques with innovative flavor combinations.\n\nThe restaurant has been awarded two Michelin stars and has consistently been ranked among the top restaurants in the UK. The Ledbury offers a refined and elegant dining experience with a focus on impeccable service and attention to detail.\n\nThe menu at The Ledbury features a selection of à la carte options as well as a tasting menu. Each dish is meticulously crafted and presented with an artistic flair. The restaurant also has an extensive wine list that complements the culinary offerings.\n\nThe ambiance at The Ledbury is elegant yet relaxed, making it a popular choice for special occasions and fine dining experiences. The restaurant's stylish decor and warm atmosphere contribute to the overall dining experience.",
542,"Quite remarkable. Where else to have your wedding meal with mother and father in law(it was a very small affair)? A great meal is a mix of wonderful food and tastes, the people you’re with with and the place. The Ledbury had it all.",
395,I loved the experience. The only complaint is that there are too many pre-apps so the 8 course menu ended up being a bit too much food! We finished everything and it was delicious.,
109,"This was my third time eating here, having been totally blown away by the food on my first and second visits. And once again I left feeling the same this time.\n\nCanapés were beautiful and delicious as always. The uniqueness of the vodka and cocoa butter canapé always amazes me. The bread was freshly baked, with a great consistency; crisp and hard on the outside and warm and soft on the inside. Served with it was a terrific celeriac amuse-bouche. The mackerel was fresh and was enhanced further by the sweet and sour flavours of the vinaigrette. Anne Sophie Pic's signature pasta dish Les Berlingots was impressive as predicted, filled with a warm, rich, creamy cheese sauce, accompanied by a delicately sweet pea broth.\n\nThe lobster with monks beard had the perfect texture but I felt this dish was slightly ruined by the fact there was a very bitter sauce at the bottom, which I believe was made from Pomelo Hirado. Unfortunately, it was so bitter that it became dominant throughout. The monkfish meunière was excellent; the fish was cooked beautifully with the right level of seasoning, served alongside a selection of tender but crunchy vegetables. An ideal combination! The meat course consisted of guinea fowl with tonka. It was incredibly tender and soft. The tonka added a nice aroma to the dish, though the meat itself could have seen a little more seasoning, it was slightly bland. Nevertheless, this was one of the best guinea fowl dishes I have eaten, though I personally prefer other birds like duck and chicken.\n\nThe Gariguette strawberry dessert was a work of art and could not be more representative of the spring season; vibrant, refreshing, simple and light. Desserts here have always been a firm favourite of mine and continue to be, for good reason.\n\nThis was again a fabulous evening. The dishes were light and elegant, demonstrating Anne Sophie Pic's distinctive and creative cuisine. Many of the dishes had pleasant aromas, owing to the use of ingredients such as pepper and tea. La Dame de Pic continues to be one of my favourite restaurants and I always thoroughly look forward to dining here ;)\n\nA fourth visit???....... Oh yes",
431,Love the food and the wine pairing. I brought my boyfriend here for his birthday. It was sensational! One of my favourite restaurants in London.,
