#  Reading journals from food critics

In this lesson, you'll use AI to decide whether the contents of a file are about food and restaurants.

Text data like emails, journal entries, and social media posts often have no predefined structure. Additionally, each person writes in their own style: some use bullet points, while others prefer long paragraphs. For this reason, text data is known as **unstructured data**. 

CHATBOT : Unstructured data refers to information that does not have a predefined data model or easily organized framework. It lacks the structured format typically seen in relational databases and instead may consist of a wide variety of data types that are not easily stored or categorized in a table-like format (i.e., rows and columns). This makes unstructured data more challenging to process, analyze, and store in traditional systems.

Characteristics of Unstructured Data:
No Fixed Schema or Format: Unlike structured data (e.g., database tables), unstructured data is free-form, meaning it doesn't conform to a rigid structure.
Variety: It can include text, images, audio, video, social media posts, and more.
Large Volume: It typically represents a significant portion of data generated by modern systems and is often stored in huge volumes.
Difficult to Analyze: Since there is no standard schema, specialized tools like natural language processing (NLP), image recognition, and machine learning are often required to extract meaningful insights from unstructured data.

Examples of Unstructured Data:
Text: Emails, word processing documents (e.g., .doc files), PDFs, log files, social media posts (tweets, Facebook comments).
Media Files: Audio files, video files, and image files (e.g., .mp3, .mp4, .jpeg).
Web Content: HTML content from websites, blogs, wikis.
Sensor Data: Data from IoT devices or machine-generated data that does not follow a clear format.
Customer Feedback: Freeform responses in surveys or reviews.

Let's start by loading some helper functions to use in the notebook:

In [19]:
from helper_functions import get_llm_response, print_llm_response

## Working with text data

You'll take look at journal entries in the working directory. The journals are stored as plain text files with extension `.txt'.

Start by opening and reading the Cape Town journal:

In [20]:
f = open("cape_town.txt", "r")
journal_cape_town = f.read()
f.close()

Print the contents of the journal:

In [21]:
print(journal_cape_town)

My first destination was The Test Kitchen, a restaurant that has earned its place among the world's best. Situated in the trendy Woodstock area, this dining spot is celebrated for its innovative dishes. I was particularly taken by their signature dish, the "Pickled Fish Tacos." The tangy, flavorful fish wrapped in a soft taco, paired with a zesty salsa, was a delightful start to my culinary adventure. The industrial-chic ambiance added a modern edge to the dining experience.

Next, I made my way to La Colombe, perched on the slopes of Constantia. Known for its refined and artistic approach to cuisine, La Colombe's "Tuna La Colombe" is a must-try. This dish features perfectly seared tuna, complemented by a delicate ponzu dressing and bursts of citrus. The presentation was as exquisite as the flavors, making it a memorable highlight of the day.

At the bustling V&A Waterfront, I visited Harbour House for some of the freshest seafood in town. The "Grilled Kingklip" was a revelation. The s

Let's convert to HTML to get a better view of the contens of the selected file 

In [22]:
from IPython.display import display, HTML

display(HTML(journal_cape_town))

As you can see, the file is about restaurants and food.

Next, open the Tokyo journal entry file and read its contents:

In [23]:
f = open("tokyo.txt", "r")
journal_tokyo = f.read() 
f.close()

Print the contents of the journal:

In [24]:
print(journal_tokyo)

Tokyo's culinary landscape is nothing short of extraordinary. Each spot offers a unique taste of the city's diverse food culture. Here's a quick guide to some must-try places and dishes.

    Sukiyabashi Jiro
        Location: Ginza
        Dish: Omakase sushi
        Highlight: Impeccably crafted sushi made by the legendary Jiro Ono. Each piece is a masterclass in balance and flavor.

    Ichiran Ramen
        Location: Shibuya
        Dish: Tonkotsu ramen
        Highlight: A personal ramen booth for focused, uninterrupted enjoyment. Rich, creamy broth with perfectly cooked noodles.

    Tsukiji Outer Market
        Location: Tsukiji
        Dish: Fresh sashimi and street food
        Highlight: Vibrant market atmosphere. Indulge in ultra-fresh sashimi, grilled seafood, and other Japanese street food delights.

    Narisawa
        Location: Minato
        Dish: Innovative tasting menu
        Highlight: A fusion of French and Japanese techniques. Creative dishes with an emphasis on 

In [25]:
display(HTML(journal_tokyo))

This entry is also about restaurants and food - but notice how different the format of the journal is from the Cape Town example!

## Determining if text files are relevant using LLMs

In this section, you'll write a prompt that instructs an LLM to determine whether a file content is about food and restaurants or some other topic. 

Define the prompt and include the Tokyo journal entry as the input data to check:

In [26]:
prompt = f"""Respond with "Relevant" or "Not relevant": 
the journal describes restaurants and their specialties. 

Journal:
{journal_tokyo}"""

Print the LLM response to see if the file is relevant for our purpose or not:

In [27]:
print_llm_response(prompt)

Relevant


In [28]:
#Let's consider now the first file text journal_cape_town to see whether also this file could be relevent 
#or not relevant 
prompt = f"""Respond with "Relevant" or "Not relevant": 
the journal describes restaurants and their specialties. 

Journal:
{journal_cape_town}"""

In [29]:
print_llm_response(prompt)

Relevant


Since we are dealing with multiple files texts, let's check them using a for loop 

## Checking all files using a `for` loop

Using Python and an LLM together allows you to quickly iterate over multiple files and check the relevance of the content for your tasks.

Start by creating a list of all the files you want to check:

In [30]:
# List of the journal files
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", "sydney.txt", "tokyo.txt"]

Next, use a `for` loop to open each file and have an LLM check if the content from that file is relevant to food and restaurants.
* *If you need a refresher on `for` loops, please revisit Course 2!*

In [31]:
for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Relevant" or "Not relevant": 
    the journal describes restaurants and their specialties. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> Relevant
madrid.txt -> Not relevant
rio_de_janeiro.txt -> Relevant
sydney.txt -> Relevant
tokyo.txt -> Relevant


It seems that the Madrid journal entry is not relevant. Let's print its contents to see why the LLM flagged it as "not relevant":

In [32]:
# Here you can check the content from any journal entry
f = open("madrid.txt", "r") 
print(f.read()) 
f.close()

Madrid, as Spain's capital and largest city, is a key player in the nation's economy. Historically centered around its administrative functions, Madrid has evolved into a major financial hub, hosting the Madrid Stock Exchange and the headquarters of numerous national and international companies.

The service sector, especially tourism, is vital to Madrid's economy. Millions of tourists visit annually, attracted by the city's cultural landmarks, museums, and vibrant nightlife. Additionally, trade fairs and conferences at venues like IFEMA (Feria de Madrid) bring significant business traffic.

Innovation and technology are also growing sectors in Madrid. The city boasts a thriving startup ecosystem and hosts many tech companies, supported by a highly educated workforce from its universities and research institutions. This has spurred growth in IT, biotechnology, and renewable energy.

Madrid's well-developed transportation network, including a comprehensive metro system, high-speed rail,

The Madrid journal entry doesn't contain information about restaurants to try. Instead, it is a description of the economy of the city.

<p style="background-color:#F5C780; padding:15px"> 🤖 <b>Use the Chatbot</b>:
    <br><br>
    I am using AI to determine whether different texts are "relevant" or "not relevant" using an LLM. Does this task have a specific name in AI?
</p>

RESPONSE CHATBOT : Yes, the task you are describing in AI—using a language model to determine whether different texts are "relevant" or "not relevant"—is commonly known as text classification or binary text classification in machine learning.

1. Text Classification:
Definition: Text classification is the task of assigning predefined categories or labels to a given text based on its content. In your case, the labels would be "relevant" and "not relevant."
Binary Classification: Since you are only dealing with two possible outcomes ("relevant" or "not relevant"), this specific task is called binary classification.

## Extra practice

Experiment with different prompts to check whether files are of interest to you or not. Below is the example suggested in the video - try running it first. Then, try each exercise.

### Exercise 1

Change the prompt to classify the text for different topics, for example "mentions a dessert" or "describes the restaurant design."

In [33]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal describes restaurants and food dishes. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> Yes
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> Yes
tokyo.txt -> Yes


In [None]:
#Change the prompt to classify the text for different topics, for example "mentions a dessert" 

In [34]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a dessert. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")
  
    

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> No
tokyo.txt -> No


As it can be seen from the prompt, only the text file "rio_de_janeiro" does not enclose the dessert 

In [None]:
#Change the prompt to classify the text for different topics, for example "describes the restaurant design."

In [36]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal describes the restaurant design. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")
  

cape_town.txt -> Yes
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> No
tokyo.txt -> No


As it can be seen from the prompt, only the text file rio_de_janeiro does give some information about the design of the restaurant. Let's verify this information, reviewing the cited file

In [37]:
f = open("rio_de_janeiro.txt", "r")
journal_rio = f.read() 
f.close()

In [38]:
display(HTML(journal_rio))

In [39]:
prompt = f""" Summarize the restaurant design from the following text 
in at most two sentences.

Journal:
{journal_rio}"""

In [40]:
print_llm_response(prompt)

Confeitaria Colombo features a historic Belle Époque ambiance, enhancing its charm as a café known for its custard tarts. In contrast, Olympe offers a modern, elegant setting for contemporary Brazilian cuisine, while Aprazível boasts breathtaking views and a farm-to-table approach in a lush garden-like environment.


### Exercise 2

Using the same code below, change the prompt to classify into more than two categories.

**Example:**
- mentions a **vegetarian** dish
- mentions a **vegan** dish
- mentions both
- mentions neither

In [None]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal describes restaurants and food dishes. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

In [None]:
#change the prompt to classify into more than two categories.
#  mentions a **vegetarian** dish


In [42]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a **vegetarian** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


In [46]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a vegetarian dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


In [None]:
#change the prompt to classify into more than two categories.
#  mentions a **vegan** dish


In [43]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a **vegan** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


In [None]:
#change the prompt to classify into more than two categories.
# mentions both


In [44]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a **vegetarian** dish and a **vegan** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


In [None]:
#change the prompt to classify into more than two categories.
#  mentions neither

In [45]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # TRY CHANGING THIS PROMPT TO ASK DIFFERENT QUESTIONS
    prompt = f"""Respond with "Yes" or "No": 
    the journal not mentions a **vegetarian** dish and a **vegan** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No
