

# **Analyzing Archaeological Gifts to the UN**

## This notebook demonstrates a pipeline for analyzing the cultural, political, and historical factors influencing the selection of archaeological artifacts as gifts to the UN.

# **Artifact Search Flow:**

User inputs search term.

Requests and parses search results from UN Gifts site.

Extracts and prints description using LLMs.

# **CSV Input Flow:**

Loads CSV file.

Extracts info and generates responses for each artifact.

Prints extracted information and responses.


# Using Google's Gemini to make inferences.

In [None]:
# import serpapi
import spacy
from urllib.request import Request, urlopen
from time import sleep
from bs4 import BeautifulSoup
import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import chromadb
from chromadb.db.base import UniqueConstraintError
from chromadb.utils import embedding_functions





GOOGLE_API_KEY = "Insert_your_own_API_Key_here"
SERPAPI_KEY = "Insert_your_own_API_Key_here"

def getGeminiResponse(description):
  client = chromadb.PersistentClient(path="db")  # data stored in 'db' folder
  em = embedding_functions.SentenceTransformerEmbeddingFunction()
  collection = client.create_collection(name='langchain', embedding_function=em)

  genai.configure(api_key=GOOGLE_API_KEY)

# Create the model
# See https://ai.google.dev/api/python/google/generativeai/GenerativeModel
  generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 64,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
  }

  model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=generation_config,
    safety_settings = {
          HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH, #given we are asking a question about goddess of love,
          #unfortunately, we need to allow some material that is considered sexually explicit (even though it's not really)
      }
    # See https://ai.google.dev/gemini-api/docs/safety-settings
  )

  chat_session = model.start_chat(
    history=[
    ] #currently, no chat history
  )

  question = f"Why was this artifact created by past humans?"

  collection.add(
      documents = [description],
      ids = ['description']
)

  results = collection.query(
      query_texts=question,
      n_results=1
  )

  # print(results["documents"][0])

  response = chat_session.send_message(f'''{results["documents"][0]}
  Above is a document explaining a cultural artifact of the past given to the UN. Using mainly your general knowledge, explain  "{question}"''')

  client = chromadb.PersistentClient(path="db")  # or HttpClient()
  client.delete_collection("langchain")

  sleep(4)

  return response.text
def extract_info(url):
    try:
        response = urlopen(Request(
                                url=url,
                                headers={'User-Agent': 'Mozilla/5.0'}
                                  )).read()
        webpage = BeautifulSoup(response, 'html.parser')
        results = webpage.find_all("div", class_="panel-panel-inner")
        description = results[1].find_all("div", class_="field-type-text-with-summary")
        cleantext = BeautifulSoup(description[0].text, "lxml").text.replace("\n", "").replace("\xa0", "")
        return cleantext
    except Exception as e:
        return str(e)
SearchOrCSV = input('''Would you like to search for an artifact or input a csv?
Type "Search" or "From CSV"
''')
if SearchOrCSV == "Search":



  artifact = input("What UN artifact would you like to search for? ").replace(" ", "%20")

  request = f"https://www.un.org/ungifts/search/node/{artifact}"

  req = Request(
      url=request,
      headers={'User-Agent': 'Mozilla/5.0'}
  )
  webpage = urlopen(req).read()

  webpage = BeautifulSoup(webpage, 'html.parser')

  results = webpage.find_all("li", class_="search-result")

  search_links = list(map(lambda x: x.find("a"), results))

  search_results = list(map(lambda x: x.contents[0], search_links))

  artifact = input(f'''Which of the following matches what you are looking for?
  {search_results}
                  ''')


  # params = {
  #   "q": artifact, #SWITCH IT OUT WITH YOUR QUESTION + FEEL FREE TO ADD MORE PARAMS
  #   "hl": "en",
  #   "gl": "us",
  #   "google_domain": "google.com",
  #   "api_key": SERPAPI_KEY
  #   }

  # search = serpapi.search(params)

  link = search_links[search_results.index(artifact)]['href']

  # req = Request(
  #     url=link,
  #     headers={'User-Agent': 'Mozilla/5.0'}
  # )

  # webpage = urlopen(req).read()

  # webpage = BeautifulSoup(webpage, 'html.parser')

  # results = webpage.find_all("div", class_="panel-panel-inner")

  description = extract_info(link)

  # cleantext = BeautifulSoup(description[0].text, "lxml").text.replace("\n", "").replace("\xa0", "")

  # collection.add(
  #     documents = [cleantext],
  #     ids = ["Description"]
  # )

  print(getGeminiResponse(description))

elif SearchOrCSV == "From CSV":
  import pandas as pd
    # Load the CSV file
  filepath = input("What is the filepath of your csv?")
  data = pd.read_csv(filepath)

  # Display the first few rows to check the data and column names

  # Update the column name for URLs based on the inspection
  url_column_name = 'Link to Museum'  # Correct column name for URLs

  # Apply the function to each URL in the CSV file
  data['Extracted_Info'] = data[url_column_name].apply(extract_info)

  data['Gemini_Response'] = data['Extracted_Info'].apply(getGeminiResponse)

  # Display the first few rows to check the extracted information

  for index, row in data.iterrows():
    print(f"\n\n{row['Name']} : {row['Gemini_Response']}\n\n")


Would you like to search for an artifact or input a csv?
Type "Search" or "From CSV"
From CSV
What is the filepath of your csv?/content/drive/MyDrive/AI-ML/List_gifts_for UN - List_gifts_for UN.csv


Amphora : The amphora was created by past humans for several reasons:

* **Storage:** The amphora's shape, with its large belly and narrow neck, was ideal for storing and transporting various commodities like olive oil, wine, and grains. This was crucial for ancient societies that relied on agriculture and trade.
* **Transportation:** Amphorae were durable and easy to handle, making them perfect for transporting goods both within and between communities.
* **Trade:**  The amphora was a key component of ancient trade networks, allowing for the exchange of valuable goods like olive oil and wine. This facilitated economic growth and cultural exchange.
* **Art and Decoration:**  Amphorae were often decorated with intricate designs and paintings, reflecting the artistic skills and cultural valu

# Alternatively, we can use Cohere for the same purpose.

In [None]:
!pip install cohere

# !pip install beautifulsoup4
!pip install chromadb
!pip install langchain
!pip install -U langchain-community
!pip install sentence_transformers
!pip install serpapi
!pip install spacy
import cohere
import spacy
from urllib.request import Request, urlopen
from time import sleep
from bs4 import BeautifulSoup
import chromadb
from chromadb.db.base import UniqueConstraintError
from chromadb.utils import embedding_functions

COHERE_API_KEY = "Insert_your_own_API_Key_here"
SERPAPI_KEY = "Insert_your_own_API_Key_here"

def getCohereResponse(description):
    client = chromadb.PersistentClient(path="db")  # data stored in 'db' folder
    em = embedding_functions.SentenceTransformerEmbeddingFunction()
    collection = client.create_collection(name='langchain', embedding_function=em)

    co = cohere.Client(COHERE_API_KEY)

    question = f"Why was this artifact created by past humans?"

    collection.add(
        documents=[description],
        ids=['description']
    )

    results = collection.query(
        query_texts=question,
        n_results=1
    )

    response = co.generate(
        model='command-xlarge-nightly',
        prompt=f"{results['documents'][0]}\n\nUsing mainly your general knowledge, explain '{question}'",
        max_tokens=300
    )

    client = chromadb.PersistentClient(path="db")  # or HttpClient()
    client.delete_collection("langchain")

    sleep(4)

    return response.generations[0].text

def extract_info(url):
    try:
        response = urlopen(Request(
            url=url,
            headers={'User-Agent': 'Mozilla/5.0'}
        )).read()
        webpage = BeautifulSoup(response, 'html.parser')
        results = webpage.find_all("div", class_="panel-panel-inner")
        description = results[1].find_all("div", class_="field-type-text-with-summary")
        cleantext = BeautifulSoup(description[0].text, "lxml").text.replace("\n", "").replace("\xa0", "")
        return cleantext
    except Exception as e:
        return str(e)

SearchOrCSV = input('''Would you like to search for an artifact or input a csv?
Type "Search" or "From CSV"
''')
if SearchOrCSV == "Search":
    artifact = input("What UN artifact would you like to search for? ").replace(" ", "%20")

    request = f"https://www.un.org/ungifts/search/node/{artifact}"

    req = Request(
        url=request,
        headers={'User-Agent': 'Mozilla/5.0'}
    )
    webpage = urlopen(req).read()

    webpage = BeautifulSoup(webpage, 'html.parser')

    results = webpage.find_all("li", class_="search-result")

    search_links = list(map(lambda x: x.find("a"), results))

    search_results = list(map(lambda x: x.contents[0], search_links))

    artifact = input(f'''Which of the following matches what you are looking for?
  {search_results}
                  ''')

    link = search_links[search_results.index(artifact)]['href']

    description = extract_info(link)

    print(getCohereResponse(description))

elif SearchOrCSV == "From CSV":
    import pandas as pd
    # Load the CSV file
    filepath = input("What is the filepath of your csv?")
    data = pd.read_csv(filepath)

    # Update the column name for URLs based on the inspection
    url_column_name = 'Link to Museum'  # Correct column name for URLs

    # Apply the function to each URL in the CSV file
    data['Extracted_Info'] = data[url_column_name].apply(extract_info)

    data['Cohere_Response'] = data['Extracted_Info'].apply(getCohereResponse)

    # Display the first few rows to check the extracted information
    for index, row in data.iterrows():
        print(f"\n\n{row['Name']} : {row['Cohere_Response']}\n\n")


Would you like to search for an artifact or input a csv?
Type "Search" or "From CSV"
From CSV
What is the filepath of your csv?/content/drive/MyDrive/AI-ML/List_gifts_for UN - List_gifts_for UN.csv


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]



Amphora : The amphora was created by ancient humans as a functional vessel for storing and transporting various commodities, such as olive oil, wine, grains, and other perishable liquids and dry food stuffs. It was also used for ancient vase painting and illustrations, showcasing the artistic skills of the time. Additionally, olive oil had multiple purposes beyond food, including fuel for lamps, cosmetics, and medicinal ointments. The amphora's specific shape, with its narrow neck and "big-bellied" design, made it ideal for these purposes and facilitated the transport and storage of goods in the ancient Mediterranean world.




Sculptural Relief Depicting the Goddess Ishtar : Past humans likely created this artifact as a representation of Ishtar, the Mesopotamian goddess of love and war, to honor her power and influence. The sculpture may have been created as a form of worship or as a symbol of protection and justice. It could also have been created to inspire fear or respect for the

# Using Hugging Face's transformers with GPT-2:
**(What didn't work quite well)**

In [None]:
!pip install torch transformers chromadb beautifulsoup4 pandas
!pip install sentence_transformers
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import spacy
from urllib.request import Request, urlopen
from time import sleep
from bs4 import BeautifulSoup
import chromadb
from chromadb.utils import embedding_functions

def getHuggingFaceResponse(description):
    client = chromadb.PersistentClient(path="db")  # data stored in 'db' folder
    em = embedding_functions.SentenceTransformerEmbeddingFunction()
    collection = client.create_collection(name='langchain', embedding_function=em)

    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    model = GPT2LMHeadModel.from_pretrained("gpt2")

    question = f"Why was this artifact created by past humans?"

    collection.add(
        documents=[description],
        ids=['description']
    )

    results = collection.query(
        query_texts=question,
        n_results=1
    )

    prompt = f"{results['documents'][0]}\n\nUsing mainly your general knowledge, explain '{question}'"

    inputs = tokenizer.encode(prompt, return_tensors="pt")
    outputs = model.generate(inputs, max_new_tokens=150, do_sample=True, top_k=50)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    client = chromadb.PersistentClient(path="db")  # or HttpClient()
    client.delete_collection("langchain")

    sleep(4)

    return response

def extract_info(url):
    try:
        response = urlopen(Request(
            url=url,
            headers={'User-Agent': 'Mozilla/5.0'}
        )).read()
        webpage = BeautifulSoup(response, 'html.parser')
        results = webpage.find_all("div", class_="panel-panel-inner")
        description = results[1].find_all("div", class_="field-type-text-with-summary")
        cleantext = BeautifulSoup(description[0].text, "lxml").text.replace("\n", "").replace("\xa0", "")
        return cleantext
    except Exception as e:
        return str(e)

SearchOrCSV = input('''Would you like to search for an artifact or input a csv?
Type "Search" or "From CSV"
''')
if SearchOrCSV == "Search":
    artifact = input("What UN artifact would you like to search for? ").replace(" ", "%20")

    request = f"https://www.un.org/ungifts/search/node/{artifact}"

    req = Request(
        url=request,
        headers={'User-Agent': 'Mozilla/5.0'}
    )
    webpage = urlopen(req).read()

    webpage = BeautifulSoup(webpage, 'html.parser')

    results = webpage.find_all("li", class_="search-result")

    search_links = list(map(lambda x: x.find("a"), results))

    search_results = list(map(lambda x: x.contents[0], search_links))

    artifact = input(f'''Which of the following matches what you are looking for?
  {search_results}
                  ''')

    link = search_links[search_results.index(artifact)]['href']

    description = extract_info(link)

    print(getHuggingFaceResponse(description))

elif SearchOrCSV == "From CSV":
    import pandas as pd
    # Load the CSV file
    filepath = input("What is the filepath of your csv?")
    data = pd.read_csv(filepath)

    # Update the column name for URLs based on the inspection
    url_column_name = 'Link to Museum'  # Correct column name for URLs

    # Apply the function to each URL in the CSV file
    data['Extracted_Info'] = data[url_column_name].apply(extract_info)

    data['HuggingFace_Response'] = data['Extracted_Info'].apply(getHuggingFaceResponse)

    # Display the first few rows to check the extracted information
    for index, row in data.iterrows():
        print(f"\n\n{row['Name']} : {row['HuggingFace_Response']}\n\n")


Would you like to search for an artifact or input a csv?
Type "Search" or "From CSV"
From CSV
What is the filepath of your csv?/content/drive/MyDrive/AI-ML/List_gifts_for UN - List_gifts_for UN.csv


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generati



Amphora : ['This gift of an Amphora, a vase with a specific characteristic shape, is from Cyprus. This ceramic vase is of white painted ware from the Geometric period ca. 700–600 B.C. \u202fThe vessel has two handles and a narrow neck. The design is commonly ""big bellied"" and narrowed at the base. The ancient Mediterranean world commonly used the amphora for transporting and storing many commodities such as olive oil, grapes, wine, olives, grain, fish and many others perishable liquid and dry food stuffs. Many shape variations were used for ancient vase painting and illustrations. Ancient Cyprus was acknowledged for both its wine and olive oil production. Olive oil was important for food, fuel for household lamps, basic ingredients for bath oils, soaps, perfumes, cosmetics and medicinal ointments. Cypriot oil was imported into Egypt, Syria and other large coastal cities and networks. Today, archaeologists use these ancient vases to identify shipwrecks, the site age, and the volumes

# Other suggested LLMs:

Google's BERT model through Tensorflow Hub (doesn't require API keys for basic usage,)

PyTorch Hub models like RoBERTa, BART etc.

IBM Watson.