
## **1. Data Collection**
### **1.1. Get the list of Michelin restaurants**

In [3]:
import requests
from bs4 import BeautifulSoup
import os
import pandas as pd
from IPython.display import display

In [2]:
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
} # user agent is used to simulate that the http request comes from a real web browser, this prevent the server from blocking requests

def guide_michelin(): # 2037
        links = []
        for i in range(1,101): #100
            link = "https://guide.michelin.com/en/it/restaurants/page/{}".format(i)
            try:
                response = requests.get(link, headers=headers)
            except Exception as e:
                print(f"{e} \n {link}")
                continue
            if response.status_code == 200:
                soup = BeautifulSoup(response.text, 'html.parser')
                section = soup.find('div', class_="row restaurant__list-row js-restaurant__list_items")
                if section:  
                    for a_tag in section.find_all('a', href=True):
                        href = 'https://guide.michelin.com' + a_tag['href']
                        if href not in links and "/restaurant/" in href: 
                            links.append(href)
            else:
                print(f"Failed to retrieve page {i}")    
        return links

url_set = guide_michelin()
print(len(url_set))

1981


In [3]:
with open('links.txt', 'w') as f:
    for url in url_set:
        f.write(url + '\n')

### **1.2. Crawl Michelin restaurant pages**

In [4]:
if not os.path.exists('pages'):
    os.makedirs('pages')

with open('links.txt', 'r') as f:
    urls = f.read().splitlines()

# Create directories and save HTML documents
for index, url in enumerate(urls):
    page_number = index // 20 + 1
    directory = os.path.join('pages', f'page_{page_number}')
    if not os.path.exists(directory):
        os.makedirs(directory)
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            file_path = os.path.join(directory, f'document_{index}.html')
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(response.text)
        else:
            print(f"Failed to retrieve {url}")
    except Exception as e:
        print(f"Error fetching {url}: {e}")

print("HTML documents saved successfully.")

Error fetching https://guide.michelin.com/en/campania/gragnano/restaurant/o-me-o-il-mare: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/abruzzo/popoli_1845563/restaurant/donevandro: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/piemonte/alba/restaurant/ape-vino-e-cucina: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/campania/sorrento/restaurant/da-bob-cook-fish: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/basilicata/matera/restaurant/da-mo: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/sardegna/cagliari/restaurant/sa-domu-sarda: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/sicilia/palermo/restaurant/charleston: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/toscana/bibbiena/restaurant/il-tirabuscio262517: name 'headers' is not defined
Error fetching https://guide.michelin.com/en/emili

In [5]:
dir_paths = [os.path.join('pages', dir) for dir in os.listdir('pages')]
len(dir_paths)

100

### **1.3. Parse downloaded pages**

In [5]:
# Function to extract restaurant details from HTML content
def extract_restaurant_details(content):
    
    # Extract the restaurant name
    name = content.find('h1', class_='data-sheet__title').get_text(strip=True) if content.find('h1', class_='data-sheet__title') else ""
    
    # Extract the first row of basic information
    firstRow = content.find_all("div", class_="data-sheet__block--text")[0].get_text(strip=True)
    #firstRow = content.find("div", class_="data-sheet__block--text").get_text(strip=True)
    firstRow_list = [info.strip() for info in firstRow.split(",")]

    address = " ".join(firstRow_list[:-3]) if len(firstRow_list) > 3 else ""
    city = firstRow_list[-3] if len(firstRow_list) > 2 else ""
    postalCode = firstRow_list[-2] if len(firstRow_list) > 1 else ""
    country = firstRow_list[-1] if firstRow_list else ""

    # Extract the second row of basic information
    secondRow = content.find_all("div", class_="data-sheet__block--text")[1].get_text(strip=True)
    #secondRow = content.find("div", class_="data-sheet__block--text").get_text(strip=True)
    secondRow_list = [info.strip() for info in secondRow.split("·")]

    priceRange = secondRow_list[0] if secondRow_list else ""
    cuisineType = secondRow_list[1] if len(secondRow_list) > 1 else ""

    # Extract the description
    description = content.find("div", class_="data-sheet__description").get_text(strip=True) if content.find("div", class_="data-sheet__description") else ""

    # Extract facilities and services
    facilitiesServices_div = content.find_all("div", class_="col col-12 col-lg-6")
    # facilitiesServices_div = content.find("div", class_="col col-12 col-lg-6")
    facilitiesServices = [li.get_text(strip=True) for li in facilitiesServices_div[0].find_all("li")] if facilitiesServices_div else []
    # facilitiesServices = [li.get_text(strip=True) for li in facilitiesServices_div.find("li")] if facilitiesServices_div else []

    # Extract credit card information
    creditCards_div = content.find("div", class_="restaurant-details__services--info")
    creditCards = [os.path.basename(img["data-src"]).split("-")[0] for img in creditCards_div.find_all("img")] if creditCards_div else []

    # Extract phone number
    phoneNumber = content.find("span", attrs={"x-ms-format-detection": "none"}).get_text(strip=True) if content.find("span", attrs={"x-ms-format-detection": "none"}) else ""

    # Extract website
    website_div = content.find("div", class_="collapse__block-item link-item")
    website = website_div.find("a", class_="link js-dtm-link")["href"] if website_div and website_div.find("a", class_="link js-dtm-link") else ""

    # Return the extracted data as a dictionary
    return {
        "restaurantName": name,
        "address": address,
        "city": city,
        "postalCode": postalCode,
        "country": country,
        "priceRange": priceRange,
        "cuisineType": cuisineType,
        "description": description,
        "facilitiesServices": facilitiesServices,
        "creditCards": creditCards,
        "phoneNumber": phoneNumber,
        "website": website
    }

# Collecting data from all HTML files
#folder_paths = [d for d in os.listdir('pages') if os.path.isdir(d) and d.startswith("page_")]
dir_paths = [os.path.join('pages', dir) for dir in os.listdir('pages')]

data = []
for dir in dir_paths:
    for html_file in os.listdir(dir):
        if html_file.endswith(".html"):
            with open(os.path.join(dir, html_file), "r", encoding="utf-8") as file:
                soup = BeautifulSoup(file, "html.parser")
                restaurant_details = extract_restaurant_details(soup)
                data.append(restaurant_details)

# Create a DataFrame from the data list
df = pd.DataFrame(data)

df.columns = ["restaurantName", "address", "city", "postalCode", "country", "priceRange", "cuisineType", "description", "facilitiesServices", "creditCards", "phoneNumber", "website"]


In [9]:
# Display the DataFrame
display(df)

Unnamed: 0,restaurantName,address,city,postalCode,country,priceRange,cuisineType,description,facilitiesServices,creditCards,phoneNumber,website
0,Hydra,via Antonio Mazza 30,Salerno,84121,Italy,€€,"Campanian, Contemporary",Situated in the picturesque historic centre of...,"[Air conditioning, Restaurant offering vegetar...","[amex, dinersclub, mastercard, visa]",+39 089 995 8437,http://www.ristorantehydra.com
1,Gimmi Restaurant,via San Pietro in Lama 23,Lecce,73100,Italy,€€€,Contemporary,Despite its location in a Dominican monastery ...,"[Air conditioning, Terrace, Wheelchair access]","[amex, maestrocard, mastercard, visa]",+39 0832 700920,https://www.chiostrodeidomenicani.it/ristorante/
2,Felix Lo Basso home & restaurant,via Carlo Goldoni 36,Milan,20129,Italy,€€€€,"Italian Contemporary, Creative",Brilliant chef Felix Lo Basso’s menu is inspir...,"[Air conditioning, Counter dining, Wheelchair ...","[amex, mastercard, visa]",+39 02 4540 9759,https://www.felixlobassorestaurant.it/
3,L'Acciuga,via Settevalli 217,Perugia,06128,Italy,€€€,"Contemporary, International",You would never guess that there was a gourmet...,"[Air conditioning, Interesting wine list, Terr...","[amex, unionpay, dinersclub, discover, jcb, ma...",+39 339 263 2591,https://www.lacciuga.net/
4,Antiche Sere,via Cenischia 9,Turin,10139,Italy,€,"Piedmontese, Classic Cuisine",This renowned osteria situated in a district o...,"[Air conditioning, Terrace]","[dinersclub, mastercard, visa]",+39 011 385 4347,
...,...,...,...,...,...,...,...,...,...,...,...,...
1978,Vintage 1997,piazza Solferino 16/h,Turin,10121,Italy,€€€,"Italian, Classic Cuisine",The several tasting menus at this restaurant i...,"[Air conditioning, Interesting wine list, Rest...","[amex, mastercard, visa]",+39 011 535948,https://www.vintage1997.com/
1979,Locanda Margon,via Margone 15,Ravina,38123,Italy,€€€€,"Creative, Contemporary",This restaurant with views of Trento and the A...,"[Air conditioning, Car park, Garden or park, G...","[amex, dinersclub, mastercard, visa]",+39 0461 349401,https://www.locandamargon.it/
1980,Bon Wei,via Castelvetro 16/18,Milan,20154,Italy,€€,"Chinese, Asian",China on a plate! This attractive restaurant w...,"[Air conditioning, Wheelchair access]","[amex, mastercard, visa]",+39 02 341308,https://www.bon-wei.it/
1981,Le Lampare al Fortino,via Tiepolo molo Sant'Antonio,Trani,76125,Italy,€€€,"Mediterranean Cuisine, Modern Cuisine","Built over a medieval church, this old fort th...","[Air conditioning, Great view, Interesting win...","[amex, dinersclub, mastercard, visa]",+39 0883 480308,https://www.lelamparealfortino.it/it/home/


In [5]:
%pip install nltk

Collecting nltk
  Obtaining dependency information for nltk from https://files.pythonhosted.org/packages/4d/66/7d9e26593edda06e8cb531874633f7c2372279c3b0f46235539fe546df8b/nltk-3.9.1-py3-none-any.whl.metadata
  Using cached nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting click (from nltk)
  Obtaining dependency information for click from https://files.pythonhosted.org/packages/00/2e/d53fa4befbf2cfa713304affc7ca780ce4fc1fd8710527771b58311a3229/click-8.1.7-py3-none-any.whl.metadata
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting regex>=2021.8.3 (from nltk)
  Obtaining dependency information for regex>=2021.8.3 from https://files.pythonhosted.org/packages/01/e8/00008ad4ff4be8b1844786ba6636035f7ef926db5686e4c0f98093612add/regex-2024.11.6-cp312-cp312-macosx_10_13_x86_64.whl.metadata
  Downloading regex-2024.11.6-cp312-cp312-macosx_10_13_x86_64.whl.metadata (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.5/40.5 kB[0m [31m462.1 kB/

# 2  Search Engine

### 2.0 Preprocessing the Text

In [6]:

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import string


[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1000)>


In [10]:
stop_words = set(stopwords.words('english'))
stemmer = PorterStemmer()

def preprocess_text(text):
    # Lowercase the text
    text = text.lower()
    # Remove punctuation
    text = text.translate(str.maketrans('', '', string.punctuation))
    # Tokenize and remove stopwords, then apply stemming
    tokens = [stemmer.stem(word) for word in text.split() if word not in stop_words]
    return ' '.join(tokens)

# Apply to the description field
df['processed_description'] = df['description'].apply(preprocess_text)


### 2.1 Conjunctive Query

### 2.1.1 Create the Index!

In [11]:
from collections import defaultdict
import pandas as pd

vocabulary = {}
inverted_index = defaultdict(list)
term_id_counter = 0

for doc_id, description in enumerate(df['processed_description']):
    for word in description.split():
        # Map each unique word to a term_id
        if word not in vocabulary:
            vocabulary[word] = term_id_counter
            term_id_counter += 1
        term_id = vocabulary[word]
        inverted_index[term_id].append(doc_id)

# Save the vocabulary to a CSV file
pd.DataFrame(list(vocabulary.items()), columns=['term', 'term_id']).to_csv('vocabulary.csv', index=False)


In [12]:
import json

with open('inverted_index.json', 'w') as f:
    json.dump(inverted_index, f)


### 2.1.2 Execute the Query

In [14]:
def preprocess_query(query):
    query = query.lower()
    query = query.translate(str.maketrans('', '', string.punctuation))
    tokens = [stemmer.stem(word) for word in query.split() if word not in stop_words]
    return tokens

def conjunctive_query(query):
    query_terms = preprocess_query(query)
    term_ids = [vocabulary.get(term) for term in query_terms if term in vocabulary]

    if not term_ids:
        return pd.DataFrame(columns=["restaurantName", "address", "description", "website"])

    # Start with the document list for the first term, then intersect with others
    matching_docs = set(inverted_index[term_ids[0]])
    for term_id in term_ids[1:]:
        matching_docs &= set(inverted_index[term_id])

    results = df.loc[list(matching_docs), ["restaurantName", "address", "description", "website"]]
    return results



In [16]:
%pip install scikit-learn



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [17]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(df['processed_description'])


In [18]:
tfidf_index = defaultdict(list)
feature_names = tfidf_vectorizer.get_feature_names_out()

# Loop over each term (feature) in the TF-IDF matrix
for term_id, term in enumerate(feature_names):
    # Get non-zero document indices and the corresponding scores for this term
    doc_indices = tfidf_matrix[:, term_id].nonzero()[0]
    scores = tfidf_matrix[:, term_id].data
    
    # Append each document ID and score to the tfidf_index dictionary 
    for doc_id, score in zip(doc_indices, scores):
        tfidf_index[term].append((doc_id, score))



### 2.2 Ranked Search Engine with TF-IDF and Cosine Similarity

In [19]:
from sklearn.metrics.pairwise import cosine_similarity

def ranked_query(query, top_k=5):
    query_vec = tfidf_vectorizer.transform([preprocess_text(query)])
    cosine_similarities = cosine_similarity(query_vec, tfidf_matrix).flatten()
    top_doc_indices = cosine_similarities.argsort()[-top_k:][::-1]

    results = df.loc[top_doc_indices, ['restaurantName', 'address', 'description', 'website']]
    results['similarity_score'] = cosine_similarities[top_doc_indices]
    return results



### Testing

In [1]:
# Test the conjunctive query
query = "This pleasant, warmly decorated restaurant is ..."
conjunctive_results = conjunctive_query(query)
print(conjunctive_results)
display(conjunctive_results)
# Test the ranked query
ranked_results = ranked_query(query, top_k=5)
print(ranked_results)
display(ranked_results)


NameError: name 'conjunctive_query' is not defined

# 3. Define a New Score!


In [21]:
import heapq
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

In [22]:
def calculate_cosine_similarity(query, tfidf_vectorizer, tfidf_matrix):
    # Converti la query in un vettore TF-IDF
    query_tfidf = tfidf_vectorizer.transform([query])
    
    # Calcola la similarità coseno tra la query e tutte le descrizioni
    cosine_similarities = cosine_similarity(query_tfidf, tfidf_matrix).flatten()
    
    return cosine_similarities

In [25]:
import heapq
import pandas as pd
def custom_scoring(query, df, tfidf_vectorizer, tfidf_matrix, k=10, facilities_preferences= None, cuisine_preferences=None, price_preferences=None):
    # Definisci i pesi per ciascun attributo in modo uniforme
    DESCRIPTION_WEIGHT = 0.5
    CUISINE_WEIGHT = 0.2
    FACILITIES_WEIGHT = 0.2
    PRICE_WEIGHT = 0.2
     
    # Ottieni le similarità coseno tra la query e le descrizioni
    cosine_similarities = calculate_cosine_similarity(query, tfidf_vectorizer, tfidf_matrix)
    
    # Heap per mantenere i top-k risultati
    top_k_restaurants = []
    
    # Itera su tutti i documenti e calcola il punteggio personalizzato
    for doc_id, cosine_score in enumerate(cosine_similarities):
        # Calcola il punteggio della descrizione (similarità coseno tra query e descrizione)
        description_score = cosine_score
        
        # Incremento per corrispondenza nel `cuisineType`
        cuisine_score = 0
        if 'cuisineType' in df.columns and cuisine_preferences:
            cuisine_score = CUISINE_WEIGHT if any(pref in df.loc[doc_id, 'cuisineType'] for pref in cuisine_preferences) else 0

        # Incremento per corrispondenza nei `facilitiesServices`
        facilities_score = 0
        if 'facilitiesServices' in df.columns and facilities_preferences: 
            facilities_score = FACILITIES_WEIGHT if any(pref in df.loc[doc_id, 'facilitiesServices'] for pref in facilities_preferences) else 0

        # Incremento per `priceRange`
        price_score = 0
        if 'priceRange' in df.columns and price_preferences:
            price_score = PRICE_WEIGHT if df.loc[doc_id, 'priceRange'] in price_preferences else 0
        
        # Calcolo del punteggio finale utilizzando i pesi uniformi
        final_score = (DESCRIPTION_WEIGHT * description_score) + cuisine_score + facilities_score + price_score
        
        # Aggiungi all'heap i top-k risultati
        if len(top_k_restaurants) < k:
            heapq.heappush(top_k_restaurants, (final_score, doc_id))
        else:
            heapq.heappushpop(top_k_restaurants, (final_score, doc_id))
    
    # Ordina i risultati in ordine decrescente di punteggio
    top_k_restaurants = sorted(top_k_restaurants, key=lambda x: x[0], reverse=True)
    
    # Prepare the output
    results = []
    for score, doc_id in top_k_restaurants:
        results.append({
            "restaurantName": df.loc[doc_id, "restaurantName"],
            "address": df.loc[doc_id, "address"],
            "description": df.loc[doc_id, "description"],
            "website": df.loc[doc_id, "website"],
            "custom_score": round(score, 3)
        })
    
    return pd.DataFrame(results)

### Testing 

In [31]:
query = "modern seasonal cusine"
cuisine_preferences = ["Italian"]
service_preferences = ["Terrace", "Air conditioning"] 
price_preferences = ["€", "€€"]
top_k = 5

# Chiama la funzione custom_scoring e visualizza i risultati
results_df = custom_scoring(query, df, tfidf_vectorizer, tfidf_matrix, top_k, cuisine_preferences, price_preferences)
display(results_df)

Unnamed: 0,restaurantName,address,description,website,custom_score
0,La Botte,via Giuseppe Garibaldi 8,A modern and welcoming contemporary bistro sit...,http://www.trattorialabottestresa.it,0.124
1,Castello,via Cagna 4,This restaurant offers several different optio...,https://www.ristorantecastellodisantavittoria.it/,0.12
2,Braunwirt,piazza Chiesa 3,A modern and welcoming restaurant in the heart...,https://www.braunwirt.it/,0.115
3,Vicolo Colombina,vicolo Colombina 5/b,Situated right in the heart of the historic ce...,https://www.vicolocolombina.it/,0.115
4,Agorà,via Rossini 178,Michele Rizzo is the owner-chef of this intere...,https://www.agorarende.com,0.111


# Algorithmic Question (AQ)

In [1]:
def collect_packages(num_tests, test_cases):
    outcomes = []

    for case_index in range(num_tests):
        num_packages, package_coords = test_cases[case_index]

        # Sort packages by coordinates (x, y) to ensure the smallest lexicographical path
        package_coords.sort()

        path_steps = []
        reachable = True
        curr_x, curr_y = 0, 0  # Start at (0, 0)

        for target_x, target_y in package_coords:
            # Check if the package is reachable from the current position
            if target_x < curr_x or target_y < curr_y:
                # If any package requires moving left or down, mark as unreachable
                reachable = False
                break

            # Append necessary moves to reach the target package
            path_steps.append('R' * (target_x - curr_x))  # Move right
            path_steps.append('U' * (target_y - curr_y))  # Move up

            # Update the current position to the target package's coordinates
            curr_x, curr_y = target_x, target_y

        if reachable:
            # If all packages are reachable, append "YES" and the path
            outcomes.append("YES\n" + ''.join(path_steps))
        else:
            # If any package is unreachable, append "NO"
            outcomes.append("NO")

    return outcomes

# Sample input data
num_tests = 3
test_cases = [
    (5, [(1, 3), (1, 2), (3, 3), (5, 5), (4, 3)]),
    (2, [(1, 0), (0, 1)]),
    (1, [(4, 3)])
]

# Execute function and print results
results = collect_packages(num_tests, test_cases)
for result in results:
    print(result)


YES
RUUURRRRUU
NO
YES
RRRRUUU


### Pseudocode for the Algorithm

Given a list of packages located on a grid, where each package is represented by coordinates \((x_i, y_i)\), and a robot starting at \((0, 0)\) that can only move right (`R`) and up (`U`):

1. **Input**:
   - \( t \): the number of test cases.
   - For each test case:
      - \( n \): the number of packages.
      - A list of \( n \) packages with coordinates \((x_i, y_i)\).

2. **Algorithm**:
Start

    Function collect_packages(t, test_cases):
        Results = []   // List to store results for each test case
        
        For each test_case in test_cases:
            (n, packages) = test_case     // Extract number of packages and their coordinates
            Sort packages in ascending order by x, then by y    // Lexicographical sorting

            current_x = 0   // Initial robot position (0, 0)
            current_y = 0
            path = ""    // String to build the path
            possible = True

            For each (x, y) in packages:
                If x < current_x or y < current_y:
                    possible = False   // If the package is unreachable (backward movement)
                    Break the loop

                // Add the necessary moves to reach the package (x, y)
                path = path + "R" * (x - current_x)  // Add right movements
                path = path + "U" * (y - current_y)  // Add upward movements

                // Update the robot's position
                current_x = x
                current_y = y

            If possible:
                Append "YES" and path to Results
            Else:
                Append "NO" to Results

        Return Results   // Return results for all test cases

    // Input reading
    t = input()   // Number of test cases
    test_cases = []   // List to store test cases

    For i = 1 to t:
        n = input()   // Number of packages
        packages = []   // List to store package coordinates

        For j = 1 to n:
            x, y = input()   // Package coordinates
            packages.append((x, y))   // Add the package to the list

        Append (n, packages) to test_cases   // Add the test case to the list

    // Call the function and print the results
    results = collect_packages(t, test_cases)
    For each result in results:
        Print result

End


### Proof of Correctness

1. **Sorting Ensures Lexicographical Order**:
   - Sorting the packages by \( x \)- and then \( y \)-coordinates ensures that we attempt to collect packages in the lexicographically smallest way.
   - If there is an accessible path for all packages after sorting, this path will be the smallest lexicographical path because it minimizes right and upward movements in order.

2. **Reachability Check**:
   - By moving only right or up from each position, we guarantee that any unreachable package (one that would require moving left or down) will be detected and skipped. The algorithm returns "NO" in such cases.

3. **Path Construction**:
   - For each reachable package, the algorithm appends the correct number of `R` and `U` moves, ensuring that the robot reaches each package in the required order.

4. **Conclusion**:
   - The algorithm is correct because it verifies reachability and constructs the smallest lexicographical path if possible.

### Time Complexity Analysis

1. **Sorting the Packages**:
   - Sorting the list of \( n \) packages takes \( O(n \log n) \) time.

2. **Constructing the Path**:
   - The path is constructed by iterating over each package, which takes \( O(n) \) time. For each package, we calculate the difference in coordinates and append the required moves.

3. **Overall Complexity**:
   - The sorting step, \( O(n \log n) \), is the most time-consuming operation in the algorithm, making the overall complexity:
     \[
     O(n \log n)
     \]

### Verification with a Language Model's Analysis

If you asked a language model to evaluate this code, it would likely arrive at the same time complexity, \( O(n \log n) \), because sorting is the dominant step in this approach. If there were any discrepancies, they would likely stem from misunderstanding the nature of appending moves as an \( O(n) \) operation. However, given that sorting \( n \) packages is indeed \( O(n \log n) \) and dominates the time complexity, our analysis is accurate.

### Extending the Problem: Robot Can Move Left or Down

With the new rule allowing movement in all directions (right, left, up, down), we consider a **greedy algorithm** where the robot collects the closest package from its current location.

#### Is the Greedy Algorithm Optimal?

1. **Greedy Approach**:
   - At each step, the robot moves to the closest package, defined by the Euclidean distance or Manhattan distance from the current position.
   - The robot continues selecting the nearest uncollected package until all packages are collected.

2. **Counterexample to Greedy Optimality**:
   - The greedy approach does not guarantee the minimum path length for the following reason:
   - **Example**:
     - Suppose there are packages at \((0, 1)\), \((1, 0)\), and \((2, 2)\), and the robot starts at \((0, 0)\).
     - The greedy approach would choose either \((0, 1)\) or \((1, 0)\) first, then the remaining one, and finally \((2, 2)\).
     - However, the optimal path would be to go directly to \((2, 2)\) first, then collect \((0, 1)\) and \((1, 0)\), which would result in a shorter total path.

3. **Conclusion**:
   - The greedy algorithm is **not optimal** for minimizing the total distance. While it may be intuitive and yield a quick solution, it does not guarantee the shortest path because the closest package does not necessarily contribute to the globally shortest route for all packages.

