#  Reading journals from food critics 📖

## Introduction
In this notebook, we will explore `how to use AI to analyze and classify journal entries from various food critics`. The objective is to determine whether the contents of these journal entries are related to food and restaurants. By leveraging a Language Learning Model (LLM), we can efficiently process unstructured text data and classify it based on its relevance to our topic of interest. This approach can be particularly useful for organizing and filtering large volumes of text data in various domains.


### Unstructured data
Text data like emails, journal entries, and social media posts often have no predefined structure. Additionally, each person writes in their own style: some use bullet points, while others prefer long paragraphs. For this reason, text data is known as **unstructured data**. 

<p style="background-color:#F5C780; padding:15px"> 🤖 <b>Use the Chatbot</b>: What are the unstructured data?
</p>  

**Response Chatbot** : `Unstructured data` refers to information without a predefined model or structured framework, making it harder to process and store in traditional systems like relational databases. Characteristics include its lack of fixed schema, variety of formats (e.g., text, media, sensor data), large volumes, and the need for specialized tools like NLP or machine learning for analysis. Examples are emails, media files, web content, and customer feedback.

### Importing required functions 

- Importing specific functions from the _helper_functions_ module:  
  - The _print_llm_response_ function is likely used to print responses from a language learning model.  
  - The _get_llm_response_ function is likely used to get responses from a language learning model.  

- Importing specific functions from the _IPython.diplay_ module:  
  -  _display_ is a function that allows you to display rich representations of Python objects directly in Jupyter Notebook cells, such as HTML, images, or Markdown text.  
  -  _HTML_ is a class that wraps raw HTML code into a displayable object. It lets you render HTML content directly in a Jupyter Notebook cell. 

In [1]:
# Importing specific functions from the _helper_functions_ module
from helper_functions import get_llm_response, print_llm_response


# Importing specific functions from the _IPython.diplay_ module
from IPython.display import display, HTML

### Working with text data

We'll start by examining journal entries stored as plain text files with a .txt extension. Let's open and read the `Cape Town journal`:

In [3]:
# Open and read Cape Town journal
f = open("cape_town.txt", "r")
journal_cape_town = f.read()
f.close()

# Print the contents of the cape_town.txt file 
#print(journal_cape_town)

# Convert to HTML to get a better view of the contens of the selected file 
display(HTML(journal_cape_town))

**Explanation output** : As you can see, the file is about restaurants and food. 

Next, let's open the `Tokyo journal` entry and read its contents:

In [5]:
f = open("tokyo.txt", "r")
journal_tokyo = f.read() 
f.close()

# Print the contents of the tokyo.txt file 
# print(journal_tokyo)

# Convert to HTML to get a better view of the contens of the selected file 
display(HTML(journal_tokyo))

**Explanation output**: Also, this entry is about restaurants and food. However, notice how different the format of the journal is from the Cape Town example.

### Determining if text files are relevant using LLMs

In this section, you'll write a prompt that instructs an LLM to determine whether the content of a file is about food and restaurants or some other topic.

In [6]:
# Define the prompt and include the Tokyo journal entry as the input data to check
prompt = f"""Respond with "Relevant" or "Not relevant": 
the journal describes restaurants and their specialties. 

Journal:
{journal_tokyo}"""

# Print the LLM response to see if the file is relevant for our purpose or not
print_llm_response(prompt)

Relevant


**Explanation output**  As it can ben seen from the output, in the above example, the content of a selected file has been classified relevant.

In [7]:
# Define the prompt and include the Cape Town journal entry as the input data to check
prompt = f"""Respond with "Relevant" or "Not relevant": 
the journal describes restaurants and their specialties. 

Journal:
{journal_cape_town}"""


# Print the LLM response to see if the file is relevant for our purpose or not
print_llm_response(prompt)

Relevant


**Explanation output** As it can ben seen from the output, in the above example, the content of a selected file has been classified relevant.

Since we are dealing with multiple files texts, let's check them using a for loop 

### Checking all files using a `for` loop

Using Python and an LLM together allows you to quickly iterate over multiple files and check the relevance of the content for your tasks. Start by creating a list of all the files you want to check:

In [8]:
# List of the journal files
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", "sydney.txt", "tokyo.txt"]

In [9]:
# Use a for loop to open each file and have an LLM check if the content from that file is relevant to food and restaurants
for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Relevant" or "Not relevant": 
    the journal describes restaurants and their specialties. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> Relevant
madrid.txt -> Not relevant
rio_de_janeiro.txt -> Relevant
sydney.txt -> Relevant
tokyo.txt -> Relevant


**Explanation output** : As it can been seen from the output,  that the Madrid journal entry i has been classified not relevant. 

Let's print its contents to see why the LLM flagged it as "not relevant":

In [10]:
# Check the content from any journal entry
f = open("madrid.txt", "r") 
print(f.read()) 
f.close()

Madrid, as Spain's capital and largest city, is a key player in the nation's economy. Historically centered around its administrative functions, Madrid has evolved into a major financial hub, hosting the Madrid Stock Exchange and the headquarters of numerous national and international companies.

The service sector, especially tourism, is vital to Madrid's economy. Millions of tourists visit annually, attracted by the city's cultural landmarks, museums, and vibrant nightlife. Additionally, trade fairs and conferences at venues like IFEMA (Feria de Madrid) bring significant business traffic.

Innovation and technology are also growing sectors in Madrid. The city boasts a thriving startup ecosystem and hosts many tech companies, supported by a highly educated workforce from its universities and research institutions. This has spurred growth in IT, biotechnology, and renewable energy.

Madrid's well-developed transportation network, including a comprehensive metro system, high-speed rail,

The Madrid journal entry doesn't contain information about restaurants to try. Instead, it is a description of the economy of the city.

<p style="background-color:#F5C780; padding:15px"> 🤖 <b>Use the Chatbot</b>:
    <br><br>
    I am using AI to determine whether different texts are "relevant" or "not relevant" using an LLM. Does this task have a specific name in AI?
</p>

**Response Chatbot** :  Yes, the task you are describing in AI—using a language model to determine whether different texts are "relevant" or "not relevant"—is commonly known as `text classification` or `binary text classification` in machine learning.
- Text Classification is  the task of assigning predefined categories or labels to a given text based on its content. In your case, the labels would be "relevant" and "not relevant."
- Binary Classification: Since you are only dealing with two possible outcomes ("relevant" or "not relevant"), this specific task is called binary classification.

## Conclusions 
In this notebook, we demonstrated how to use an LLM to classify unstructured text data. By iterating over multiple journal entries, we could determine the relevance of the content to food and restaurants. This approach can be extended to various other classification tasks, making it a versatile tool for working with unstructured data.

## Extra practice

Experiment with different prompts to check whether files are of interest to you or not. Try each exercise.

### Exercise 1

Change the prompt to classify the text for different topics, for example `mentions a dessert` or `describes the restaurant design`.

In [11]:
# Change the prompt to classify the text for different topics
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal describes restaurants and food dishes. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> Yes
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> Yes
tokyo.txt -> Yes


In [12]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt to verify whether it is mentioned at least a dessert
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a dessert. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")
  
    

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> No
tokyo.txt -> No


**Explanation output** : As it can be seen from the prompt, only the text file "rio_de_janeiro" mentions `a dessert`.

In [13]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt to verify whether it is described the restaurant design
    prompt = f"""Respond with "Yes" or "No": 
    the journal describes the restaurant design. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")
  

cape_town.txt -> Yes
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> No
tokyo.txt -> No


**Explanation output** : As it can be seen from the prompt, only the text file rio_de_janeiro does give some information about the design of the restaurant

Let's verify this information, reviewing the text file rio_de_janeiro

In [14]:
f = open("rio_de_janeiro.txt", "r")
journal_rio = f.read() 
f.close()

# Convert to HTML to get a better view of the contens of the selected file 
display(HTML(journal_rio))

In [15]:
# Create prompt to verify whether it is described the restaurant design
prompt = f""" Summarize the restaurant design from the following text 
in at most two sentences.

Journal:
{journal_rio}"""

# Use LLM to determine if the journal entry is useful
print_llm_response(prompt)

Confeitaria Colombo features a historic Belle Époque ambiance, enhancing its charm as a café known for its custard tarts. In contrast, Olympe offers a modern, elegant setting for contemporary Brazilian cuisine, while Aprazível boasts breathtaking views and a farm-to-table approach in a lush garden-like environment.


### Exercise 2

Using the same code below, change the prompt to classify into more than two categories.

**Example:**
- mentions a **vegetarian** dish
- mentions a **vegan** dish
- mentions both
- mentions neither

In [16]:
# Change the prompt to classify the text for different topics
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal describes restaurants and food dishes. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> Yes
madrid.txt -> No
rio_de_janeiro.txt -> Yes
sydney.txt -> Yes
tokyo.txt -> Yes


Let's change the prompt to classify into more than two categories and whether it is mentioned a **vegetarian** dish


In [17]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a **vegetarian** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a vegetarian dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

Let's change the prompt to classify into more than two categories and whether it is mentioned a **vegan** dish


In [18]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a **vegan** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


Let's change the prompt to classify whether it is mentioned both a **vegeterian** dish and **vegan** dish

In [19]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal mentions a **vegetarian** dish and a **vegan** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


Let's change the prompt to classify whether it is not mentioned neither a **vegeterian** dish and **vegan** dish

In [20]:
files = ["cape_town.txt", "madrid.txt", "rio_de_janeiro.txt", 
         "sydney.txt", "tokyo.txt"]

for file in files:
    # Read journal file for the city
    f = open(file, "r")
    journal = f.read()
    f.close()

    # Create prompt
    prompt = f"""Respond with "Yes" or "No": 
    the journal not mentions a **vegetarian** dish and a **vegan** dish. 

    Journal:
    {journal}"""

    # Use LLM to determine if the journal entry is useful
    print(f"{file} -> {get_llm_response(prompt)}")

cape_town.txt -> No
madrid.txt -> No
rio_de_janeiro.txt -> No
sydney.txt -> No
tokyo.txt -> No


**Explanation output** As it can been shown in the output, a list of each journal is given with the related classification whether it is vegeterian or not (yes, no, respectively)

## Conclusion 