# Extracting restaurant information from journal entries 📝

## Introduction 
In this notebook, we will explore `how to use AI to extract specific information from journal entries written by food critics`. By utilizing a Language Learning Model (LLM), we can automatically identify and highlight important details such as restaurant names and their signature dishes. This approach helps in efficiently organizing and analyzing unstructured text data, making it easier to gain insights from large volumes of text. We will also demonstrate how to extract this information in various formats, such as highlighting within the text and listing it in a CSV format.

### Importing required functions 

- Importing specific functions from the _helper_functions_ module:  
  - The _print_llm_response_ function is likely used to print responses from a language learning model.  
  - The _get_llm_response_ function is likely used to get responses from a language learning model.  

- Importing specific functions from the _IPython.diplay_ module:  
  -  _display_ is a function that allows you to display rich representations of Python objects directly in Jupyter Notebook cells, such as HTML, images, or Markdown text.  
  -  _HTML_ is a class that wraps raw HTML code into a displayable object. It lets you render HTML content directly in a Jupyter Notebook cell. 

In [1]:
# Importing specific functions from the _helper_functions_ module
from helper_functions import *

# Importing specific functions from the _IPython.diplay_ module
from IPython.display import display, HTML

### Using AI to highlight important information

First, load the journal entry for Rio de Janeiro, stored in the `rio_de_janeiro.txt` file. You'll use a new helper function called `read_journal`. 

In [2]:
# Load the journal entry for Rio de Janeiro
journal_rio_de_janeiro = read_journal("rio_de_janeiro.txt")

Next, write and print out a prompt that asks the LLM to highlight the restaurants and their best dishes in the journal entry:

In [4]:
prompt = f"""
Given the following journal entry from a food critic, identify the restaurants and their best dishes.
Highlight and bold each restaurant (in orange) and best dish (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_rio_de_janeiro}
"""

print(prompt)


Given the following journal entry from a food critic, identify the restaurants and their best dishes.
Highlight and bold each restaurant (in orange) and best dish (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
First up was Confeitaria Colombo, a legendary and picturesque cafÃ© in central Rio. Known for its pastel de nata (custard tart), Colombo serves a delightful treat that is hard to beat. The crispy, flaky pastry filled with creamy, sweet custard was an excellent start to the day. The cafÃ©'s historic Belle Ã‰poque ambiance added an extra layer of charm.

Next, I visited Fogo de ChÃ£o, a quintessential Brazilian steakhouse in Botafogo. Famous for its picanha (top sirloin), this churrascaria impressed with its perfectly grilled meat. The picanha was juicy, tender, and bursting with flavor, showcasing the high quality of Brazilian beef. The endless array of grilled meats served tableside made for a hearty 

Pass this prompt to an LLM and store the response in a variable called _html_response_. Then print the result:

In [5]:
# Print the LLM response
html_response = get_llm_response(prompt)

print(html_response)

```html
<p>First up was <span style="color: orange; font-weight: bold;">Confeitaria Colombo</span>, a legendary and picturesque café in central Rio. Known for its <span style="color: blue; font-weight: bold;">pastel de nata</span> (custard tart), Colombo serves a delightful treat that is hard to beat. The crispy, flaky pastry filled with creamy, sweet custard was an excellent start to the day. The café's historic Belle Époque ambiance added an extra layer of charm.</p>

<p>Next, I visited <span style="color: orange; font-weight: bold;">Fogo de Chão</span>, a quintessential Brazilian steakhouse in Botafogo. Famous for its <span style="color: blue; font-weight: bold;">picanha</span> (top sirloin), this churrascaria impressed with its perfectly grilled meat. The picanha was juicy, tender, and bursting with flavor, showcasing the high quality of Brazilian beef. The endless array of grilled meats served tableside made for a hearty and satisfying meal.</p>

<p>For a more modern dining experi

The 'print' function here displays the raw text - including all of the HTML tags that a web browser uses to display the text with proper formatting. 

In [6]:
# Display the HTML formatted output properly
display(HTML(html_response))

Now, let's load the journal entry for Tokyo, stored in the `tokyo.txt` and use AI to highlight important information.

In [7]:
# Load Tokyo journal entry
journal_tokyo = read_journal("tokyo.txt") 

prompt = f"""
Given the following journal entry from a food critic, identify the restaurants and their best dishes.
Highlight and bold each restaurant (in orange) and best dish (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_tokyo}
"""

html_response = get_llm_response(prompt)
# Display the HTML formatted output properly
display(HTML(html_response))

**Note** Notice that even though the structure of this text is very different from the previous one, the LLM is able to identify and highlight the correct items.

Now, let's load the journal entry for New York, stored in the `new_york.txt` and use AI to highlight important information.

In [8]:
# Load New York journal entry
journal_newyork = read_journal("new_york.txt") 

prompt = f"""
Given the following journal entry from a food critic, identify the restaurants and their best dishes.
Highlight and bold each restaurant (in orange) and best dish (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_newyork}
"""

html_response = get_llm_response(prompt)

# Display the HTML formatted output properly
display(HTML(html_response))

### Try for yourself!

If you like, pause the video here and try modifying the prompt above to do other things, for example:
- Have the LLM highlight any desserts in green
- Have the LLM add a relevant emoji beside any ingredients

Have the LLM highlight any desserts in green. 
Just remind from the previous notebook that not the text files enclose the dessert! Therefore let's
check which file does contain dessert 

In [9]:
files = ["cape_town.txt", "paris.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt", "istanbul.txt", "new_york.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a dessert. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")
  
    

cape_town.txt -> No
paris.txt -> Yes
rio_de_janeiro.txt -> Yes
sydney.txt -> No
tokyo.txt -> No
istanbul.txt -> No
new_york.txt -> No


From the above output, we know that only two files contain dessert , paris and rio de janiero, respectively. Therefore, let's use both to have the LLM highlight any desserts in green. 

In [10]:
# Load Rio de Janeiro journal entry
journal_riodejaneiro = read_journal("rio_de_janeiro.txt") 

prompt = f"""
Given the following journal entry from a food critic, identify and Highlight dessert (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_riodejaneiro}
"""

html_response = get_llm_response(prompt)
display(HTML(html_response))

In [11]:
# Load Paris journal entry
journal_paris = read_journal("paris.txt") 

prompt = f"""
Given the following journal entry from a food critic, identify and Highlight dessert (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_paris}
"""

html_response = get_llm_response(prompt)
display(HTML(html_response))

Have the LLM add a relevant emoji beside any ingredients

In [12]:
# Read the Paris journal entry
journal_paris = read_journal("paris.txt") 

prompt = f"""
Given the following journal entry from a food critic, add a relevant emoji beside any ingredients within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_paris}
"""

html_response = get_llm_response(prompt)
display(HTML(html_response))

In [13]:
journal_riodejaneiro = read_journal("rio_de_janeiro.txt") 

prompt = f"""
Given the following journal entry from a food critic, add a relevant emoji beside any ingredients within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_riodejaneiro}
"""

html_response = get_llm_response(prompt)
display(HTML(html_response))

### Extracting restaurants and their best dishes

Next, modify the prompt to extract the information from the text and list it out instead of highlighting it. Here is the modified prompt with the new instructions to save the data in **Comma Separated Value** (CSV) format:

<p style="background-color:#F5C780; padding:15px"> 🤖 <b>Use the Chatbot</b>:
    <br><br>
    What is the a CSV filet?
</p>

**Response Chatbot** : A `CSV (Comma-Separated Values) file` is a plain text file that contains data separated by commas. It's commonly used for storing tabular data (such as in a spreadsheet or database) in a simple text format. Each row in a CSV file represents a record, and each column represents a field in the record.

In [14]:
# Load the Rio de Janeiro journal entry
journal_rio_de_janeiro = read_journal("rio_de_janeiro.txt") 

In [15]:
prompt = f"""Please extract a comprehensive list of the restaurants 
and their respective best dishes mentioned in the following journal entry. 
Ensure that each restaurant name is accurately identified and listed. 

Provide your answer in CSV format, ready to save. 
Exclude the "```csv" declaration, don't add spaces after the comma, include column headers.

Format:
Restaurant, Dish
Res_1, Dsh_1
...

Journal entry:
{journal_rio_de_janeiro}
"""

restaurants_csv_ready_string = get_llm_response(prompt)

print(restaurants_csv_ready_string)

Restaurant,Dish  
Confeitaria Colombo,pastel de nata  
Fogo de Chão,picanha  
Olympe,moqueca de caju  
Aprazível,galinhada  


**Explanation output** Notice how the output now only contains the restaurants and names of dishes. The first line indicates what information each row contains, in this case the name of the restaurant, then a comma, then the name of the dish.

### Looping through multiple journals

In this section, you'll iterate through all the journal entries using a `for` loop and extract the restaurants and best dishes from each file:

In [16]:
files = ["cape_town.txt", "istanbul.txt", "new_york.txt", "paris.txt", 
          "rio_de_janeiro.txt", "sydney.txt", "tokyo.txt"]

for file in files:
    #Open file and read contents
    journal_entry = read_journal(file)

    #Extract restaurants and display csv
    prompt =  f"""Please extract a comprehensive list of the restaurants 
    and their respective best dishes mentioned in the following journal entry. 
    
    Ensure that each restaurant name is accurately identified and listed. 
    Provide your answer in CSV format, ready to save.

    Exclude the "```csv" declaration, don't add spaces after the 
    comma, include column headers.

    Format:
    Restaurant, Dish
    Res_1, Dsh_1
    ...

    Journal entry:
    {journal_entry}
    """
    
    print(file)
    print_llm_response(prompt)
    print("") # Prints a blank line!

cape_town.txt
Restaurant, Dish  
The Test Kitchen, Pickled Fish Tacos  
La Colombe, Tuna La Colombe  
Harbour House, Grilled Kingklip  
The Pot Luck Club, Beef Tataki  

istanbul.txt
Restaurant,Dish
Çiya Sofrası,Kuzu Tandir
Karaköy Lokantası,Midye Dolma
Asitane,Mutancana
Mikla,Lamb Rump

new_york.txt
Restaurant, Dish  
Katz's Delicatessen, pastrami on rye  
Peter Luger Steak House, porterhouse steak  
Lobster Place, lobster rolls  
Gramercy Tavern, roasted chicken with seasonal vegetables  

paris.txt
Restaurant,Dish  
Le Comptoir du Relais,Coq au Vin  
Le Jules Verne,Filet de Boeuf  
Pierre Hermé,Ispahan  
L'Ambroisie,Turbot with Artichokes and Truffle  

rio_de_janeiro.txt
Restaurant, Dish
Confeitaria Colombo, pastel de nata
Fogo de Chão, picanha
Olympe, moqueca de caju
Aprazível, galinhada

sydney.txt
Restaurant, Dish
Saint Peter, Murray Cod
Billy Kwong, Crispy Skin Duck with Davidson’s Plum Sauce
The Lord Nelson Brewery Hotel, Roast Lamb
Vic's Meat Market, BBQ Beef Brisket
Bennelon

**Explanation output**: TO BE DRAFTED

### Try for yourself! 
Try modifying the prompt inside the `for` loop above to extract different information. For example
* Extract the restaurant name and the neighborhood it is located in
* Extract each dish and it's main ingredient

- `Extract the restaurant name and the neighborhood it is located in`.


In [17]:
# Example: Extract restaurant name and neighborhood
files = ["cape_town.txt", "istanbul.txt", "new_york.txt", "paris.txt", 
          "rio_de_janeiro.txt", "sydney.txt", "tokyo.txt"]

for file in files:
    # Open file and read contents
    journal_entry = read_journal(file)

    # Extract restaurants and and neighborhoods
    prompt =  f"""Please extract a comprehensive list of the restaurants 
    and their respective neighborhood mentioned in the following journal entry. 
    
    Ensure that each restaurant name is accurately identified and listed. 
    Provide your answer in CSV format, ready to save.

    Exclude the "```csv" declaration, don't add spaces after the 
    comma, include column headers.

    Format:
    Restaurant, Neib
    Res_1, Neib_1
    ...

    Journal entry:
    {journal_entry}
    """
    
    print(file)
    print_llm_response(prompt)
    print("") # Prints a blank line!

cape_town.txt
Restaurant, Neib  
The Test Kitchen, Woodstock  
La Colombe, Constantia  
Harbour House, V&A Waterfront  
The Pot Luck Club, Woodstock  

istanbul.txt
Restaurant, Neib
Çiya Sofrası, Kadıköy
Karaköy Lokantası, Karaköy
Asitane, Edirnekapı
Mikla, Marmara Pera Hotel

new_york.txt
Restaurant, Neib  
Katz's Delicatessen, Lower East Side  
Peter Luger Steak House, Williamsburg  
Lobster Place, Meatpacking District  
Gramercy Tavern, Gramercy  

paris.txt
Restaurant,Neib  
Le Comptoir du Relais,Saint-Germain-des-Prés  
Le Jules Verne,Eiffel Tower  
Pierre Hermé,Unknown  
L'Ambroisie,Place des Vosges  

rio_de_janeiro.txt
Restaurant, Neib  
Confeitaria Colombo, Central Rio  
Fogo de Chão, Botafogo  
Olympe, Lagoa  
Aprazível, Not specified  

sydney.txt
Restaurant, Neib  
Saint Peter, Paddington  
Billy Kwong, Potts Point  
The Lord Nelson Brewery Hotel, The Rocks  
Vic's Meat Market, Sydney Fish Market  
Bennelong, Sydney Opera House  

tokyo.txt
Restaurant, Neib
Sukiyabashi Jiro

**Explanation output** TO BE DRAFTED 

- `Extract each dish and it's main ingredient`

In [18]:
files = ["cape_town.txt", "istanbul.txt", "new_york.txt", "paris.txt", 
          "rio_de_janeiro.txt", "sydney.txt", "tokyo.txt"]

for file in files:
    #Open file and read contents
    journal_entry = read_journal(file)

    #Extract restaurants and display csv
    prompt =  f"""Please extract a comprehensive list of the dishes 
    and their respective main ingredient mentioned in the following journal entry. 
    
    Ensure that each restaurant name is accurately identified and listed. 
    Provide your answer in CSV format, ready to save.

    Exclude the "```csv" declaration, don't add spaces after the 
    comma, include column headers.

    Format:
    Dish, Mingr
    Dish_1, Mingr_1
    ...

    Journal entry:
    {journal_entry}
    """
    
    print(file)
    print_llm_response(prompt)
    print("") # Prints a blank line!

cape_town.txt
Dish, Mingr  
Pickled Fish Tacos, Fish  
Tuna La Colombe, Tuna  
Grilled Kingklip, Kingklip  
Beef Tataki, Beef  

istanbul.txt
Dish, Mingr
Kuzu Tandir, lamb
Midye Dolma, mussels
Mutancana, lamb
Lamb Rump, lamb

new_york.txt
Dish, Mingr  
pastrami on rye, pastrami  
porterhouse steak, steak  
lobster rolls, lobster  
roasted chicken with seasonal vegetables, chicken  

paris.txt
Dish, Mingr  
Coq au Vin, chicken  
Filet de Boeuf, beef  
Ispahan, macaron  
Turbot with Artichokes and Truffle, turbot  

rio_de_janeiro.txt
Dish, Mingr
pastel de nata, custard
picanha, top sirloin
moqueca de caju, cashew nut
galinhada, chicken and rice

sydney.txt
Dish, Mingr
Murray Cod, Fish
Crispy Skin Duck with Davidson’s Plum Sauce, Duck
Roast Lamb, Lamb
BBQ Beef Brisket, Beef
Sydney Rock Oysters, Oysters

tokyo.txt
Dish, Mingr
Omakase sushi, sushi
Tonkotsu ramen, ramen
Fresh sashimi and street food, sashimi
Innovative tasting menu, tasting menu
Kaiseki (traditional multi-course meal), kais

**Explanation output** 

### Writing Files

Here, you will learn how you can save files with the data you have created using Python and LLM.

In [19]:
# Print the html_response variable to see it's contents
display(HTML(html_response))

Next, save the data in html_response to a file:

f = open("highlighted_text.html", 'w') 
f.write(html_response) 
f.close()

Note that you use `'w'` instead of `'r'` and `f.write` instead of `f.read` here, in contrast to when you read in a file.

<p style="background-color:#F5C780; padding:15px"> 🤖 <b>Use the Chatbot</b>:
    <br><br>
    Explain this code line by line:
    <br><br>f = open("highlighted_text.html", 'w')
    <br>f.write(html_response)
    <br>f.close()
</p>

**Response Chatbot**  This code snippet performs basic file operations in Python, specifically opening, writing to, and closing a file.To sum up,   
_1 line_: The code opens (or creates) a file called highlighted_text.html in write mode.  
_2 line_: It writes the contents of the variable html_response to the file.  
_3 line_: Finally, it closes the file to ensure the changes are saved and resources are freed.  

You can use the following button to download the file you just wrote above.
* Make sure to provide the right file name: 'highlighted_text.html' when asked!

download_file()

IMP: Still to solve the issue about the dowload file 

## Conclusion 
In this notebook, we demonstrated how to `use an LLM to extract and highlight important information from journal entries about restaurants and their signature dishes`. By iterating over multiple journal entries, we could efficiently organize and extract relevant information. This approach can be extended to various other classification and extraction tasks, making it a `versatile tool for working with unstructured text data`.

### Extra practice

### Exercise 1

Modify the prompt below to create an HTML file that highlights all the **restaurant names in green** and the **neighborhoods in pink** in the Sydney journal entry.

In [22]:
journal_sydney = read_journal("sydney.txt") 

# Modify the prompt below
prompt = f"""
Given the following journal entry from a food critic, identify the 
restaurants and their best dishes. Highlight and bold each restaurant 
(in orange) and best dish (in blue) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_tokyo}
"""

html_sydney = get_llm_response(prompt)
display(HTML(html_sydney))

Firstly, we should modify the prompt below to create an HTML file that highlights all the **restaurant names in green** and the **neighborhoods in pink** in the Sydney journal entry.

In [23]:
journal_sydney = read_journal("sydney.txt") 

# Modify the prompt below
prompt = f"""
Given the following journal entry from a food critic, identify the 
restaurants and their neighborhoods. Highlight and bold each restaurant 
(in green) and best dish (in pink) within the original text. 

Provide the output as HTML suitable for display in a Jupyter notebook. 

Journal entry:
{journal_sydney}
"""

html_sydney = get_llm_response(prompt)
display(HTML(html_sydney))

### Exercise 2

Modify the code below to save the output of the LLM to an HTML file. The file should be called `highlighted_sydney.html`.

In [None]:

f = open() 
f.write() 
f.close()

In [27]:
f = open("highlighted_sydney.html", 'w') 
f.write(html_sydney) 
f.close()

In [28]:
def download_file(filename, content):
    with open(filename, 'w') as f:
        f.write(content)


In [29]:
download_file("highlighted_sydney.html", html_sydney)
