<a href="https://colab.research.google.com/github/alessandrovicenti10/Tesi_Vicenti_ITPS_2025/blob/alessandro/RAG_PCF_electronics_amazon_openaiKey.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install llama_index
!pip install llama_parse


Collecting llama_index
  Downloading llama_index-0.12.24-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama_index)
  Downloading llama_index_agent_openai-0.4.6-py3-none-any.whl.metadata (727 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.1 (from llama_index)
  Downloading llama_index_cli-0.4.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.24 (from llama_index)
  Downloading llama_index_core-0.12.24.post1-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama_index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.9-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-llms-openai<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_llms_openai-0.3.25-py3-none-any.whl.metadata (3.3 kB)


In [None]:
import nest_asyncio
import json
import os
from google.colab import userdata
from llama_index.llms.openai import OpenAI
from llama_index.core import VectorStoreIndex
from llama_parse import LlamaParse
from llama_index.core.node_parser import MarkdownElementNodeParser
from IPython.display import Markdown

nest_asyncio.apply()

# Imposta API keys
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
llm_gpt4o_mini = OpenAI(model="gpt-4o-mini")

# Parser per il file CSV delle emissioni di materiali
parser = LlamaParse(
    api_key=userdata.get('LLAMA_CLOUD'),
    result_type="markdown",
)

# Carica il dataset delle emissioni di CO2
co2_file_path = "/content/sample_data/materials_co2_emission_gpt.csv"
documents = parser.load_data(co2_file_path)
node_parser = MarkdownElementNodeParser(llm=llm_gpt4o_mini, num_workers=4)
nodes = node_parser.get_nodes_from_documents(documents)
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)
recursive_index = VectorStoreIndex(nodes=base_nodes + objects, llm=llm_gpt4o_mini)
recursive_query_engine = recursive_index.as_query_engine(similarity_top_k=5, llm=llm_gpt4o_mini)

# Carica il dataset di prodotti elettronici
product_file_path = "/content/sample_data/meta_Electronics_SMALL.jsonl"
products = []

with open(product_file_path, "r") as file:
    for line in file:
        product = json.loads(line)
        title = product.get("title", "Unknown Product")
        material = product.get("details", {}).get("Material", None)
        if material:
            products.append({"title": title, "material": material})

# Genera ed esegue query per ogni prodotto
for product in products[:10]:  # Limitiamo a 10 prodotti per test
    query = f"How much CO2 is emitted to produce a {product['title']} made from {product['material']}?"
    response = recursive_query_engine.query(query)
    print(f"\n---------------------- CO2 Emission for {product['title']} ----------------------")
    display(Markdown(f"{response}"))


Started parsing the file under job_id 24542e0a-a101-4b5a-bffb-ab9917ed1721


1it [00:00, 9078.58it/s]



---------------------- CO2 Emission for Digi-Tatoo Decal Skin Compatible With MacBook Pro 13 inch (Model A2338/ A2289/ A2251) - Protective and Decorative Full Body Laptop Skin Decal Sticker, Anti-Scratch Vinly Skin Sticker Wrap [Fresh Marble] ----------------------


The carbon dioxide emissions associated with vinyl are not specified in the provided information. Therefore, it is not possible to determine the CO2 emissions for producing the Digi-Tatoo Decal Skin made from vinyl.


---------------------- CO2 Emission for eForCity Leather Case with Stand for 7-Inch Samsung Galaxy Tab 2, White/Black Zebra (PSAMGLXTLC26) ----------------------


The CO2 emissions for producing a synthetic leather item, such as the eForCity Leather Case, would be associated with the material used. In this case, synthetic rubber, which has a CO2 emission of 3.5 kg per unit, could be a relevant comparison for synthetic leather. Therefore, the CO2 emissions for producing the case would be approximately 3.5 kg.


---------------------- CO2 Emission for Outer Space Planets Stickers(50Pcs),Planetary Systems Waterproof Vinyl Water Bottle Stickers for Teens Girls Adults Kids,Laptop Car Cup Computer Guitar Skateboard Luggage Phone Aesthetic Stickers Pack ----------------------


The carbon dioxide emissions associated with producing vinyl are not specified in the provided information. Therefore, it is not possible to determine the CO2 emissions for the Outer Space Planets Stickers made from vinyl based on the available data.


---------------------- CO2 Emission for Stanton STR8-100 Direct-Drive Digital Turntable with Straight Tone Arm (Discontinued by Manufacturer) ----------------------


The production of a Stanton STR8-100 Direct-Drive Digital Turntable made from aluminum emits 8.6 kilograms of CO2.


---------------------- CO2 Emission for luo Cherry MX RGB Switch Mute Pink Switch Mute Gray Switch Genuine German Cherry Switch (10PCS, Cherry RGB Mute Gray Switch) ----------------------


The carbon dioxide emissions for producing a Cherry material are not specified in the provided information. Therefore, it is not possible to determine the CO2 emissions for the Cherry MX RGB Switch made from Cherry.


---------------------- CO2 Emission for VFENG Premium 4-in-1 3M Full Body Skin Sticker Decals, Decorative Protective Skin for Microsoft Surface Book 3 13.5 Inch 13'' 2020+ (Core i5 Version) - Gray ----------------------


The CO2 emissions associated with producing a product made from Polyester is 6.0 kilograms per unit.


---------------------- CO2 Emission for IU3D Low Noise DC 24V 0.12A 4010 Brushless Fan Cooler 40mm x 40mm x 10mm Cooling Fan. (2PCS -24V) ----------------------


The CO2 emissions for producing plastic materials vary depending on the specific type of plastic used. Common plastics and their associated emissions include:

- Polycarbonate (PC): 5.2 kg
- Polyethylene (PE): 2.0 kg
- Polypropylene (PP): 1.8 kg
- Polystyrene (PS): 3.2 kg
- Polyethylene terephthalate (PET): 4.0 kg
- Polyvinyl chloride (PVC): 2.4 kg
- Acrylonitrile butadiene styrene (ABS): 4.2 kg
- Nylon: 5.4 kg
- Polymethylmethacrylate (PMMA): 6.0 kg
- Thermoplastic elastomers (TPE): 4.5 kg

To determine the total CO2 emissions for the fan, the specific type of plastic used in its construction would be needed, as well as the quantity of plastic in the fan. If you can provide that information, I can help calculate the total emissions.


---------------------- CO2 Emission for Cable Raceways - Floor Cord Protector Cable Shield Cord Cover PVC Floor Wire Protect ----------------------


The CO2 emissions associated with producing a product made from Polyvinyl chloride (PVC) is 2.4 kilograms per unit.


---------------------- CO2 Emission for Nurse Stickers for Water Bottles and Laptop, Nursing Stickers for Nurse Students, Nurses, and Healthcare Workers, Waterproof, Reusable, No Residue Vinyl Decal Perfect Nurse Gift (108 Designs) ----------------------


The carbon dioxide emissions associated with vinyl are not specified in the provided information. Therefore, it is not possible to determine the CO2 emissions for producing the Nurse Stickers made from vinyl based on the available data.


---------------------- CO2 Emission for AUGUR Travel Laptop Backpack ----------------------


The CO2 emissions associated with producing an AUGUR Travel Laptop Backpack made from Oxford are not specified in the provided information. Therefore, it is not possible to determine the CO2 emissions for that specific material.

In [None]:
import nest_asyncio
import json
import os
from google.colab import userdata
from llama_index.llms.openai import OpenAI
from llama_index.core import VectorStoreIndex
from llama_parse import LlamaParse
from llama_index.core.node_parser import MarkdownElementNodeParser

nest_asyncio.apply()

# Imposta API keys
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
llm_gpt4o_mini = OpenAI(model="gpt-4o-mini")

# Parser per il file CSV delle emissioni di materiali
parser = LlamaParse(
    api_key=userdata.get('LLAMA_CLOUD'),
    result_type="markdown",
)

# Carica il dataset delle emissioni di CO2
co2_file_path = "/content/sample_data/materials_co2_emission_gpt.csv"
documents = parser.load_data(co2_file_path)
node_parser = MarkdownElementNodeParser(llm=llm_gpt4o_mini, num_workers=4)
nodes = node_parser.get_nodes_from_documents(documents)
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)
recursive_index = VectorStoreIndex(nodes=base_nodes + objects, llm=llm_gpt4o_mini)
recursive_query_engine = recursive_index.as_query_engine(similarity_top_k=5, llm=llm_gpt4o_mini)

# Materiali comuni per alcune categorie di prodotti
common_materials = {
    "fan": "Polycarbonate (PC)",
    "cable": "Polyvinyl chloride (PVC)",
    "case": "Acrylonitrile butadiene styrene (ABS)",
    "battery": "Lithium-ion",
    "screen": "Glass"
}

# Carica il dataset di prodotti elettronici
product_file_path = "/content/sample_data/meta_Electronics_SMALL.jsonl"
products = []

with open(product_file_path, "r") as file:
    for line in file:
        product = json.loads(line)
        title = product.get("title", "Unknown Product")
        material = product.get("details", {}).get("Material", None)

        # Se il materiale non è specificato, usa il materiale comune più probabile
        if not material:
            for key, common_material in common_materials.items():
                if key in title.lower():
                    material = common_material
                    break

        if material:
            products.append({"title": title, "material": material})

# Genera ed esegue query per ogni prodotto
results = []
for product in products[:10]:  # Limitiamo a 10 prodotti per test
    query = f"How much CO2 is emitted to produce a {product['title']} made from {product['material']}, give me only value number with measurement unit else 'no value'?"
    response = recursive_query_engine.query(query)

    results.append({
        "product": product["title"],
        "material": product["material"],
        "co2_emission": str(response)  # Convertiamo in stringa per evitare problemi di formattazione
    })

# Stampa output in formato JSON
print(json.dumps(results, indent=4))


Started parsing the file under job_id 961aebc0-13da-4d37-bfa9-75b1b0faf6c6


1it [00:00, 9137.92it/s]


[
    {
        "product": "Digi-Tatoo Decal Skin Compatible With MacBook Pro 13 inch (Model A2338/ A2289/ A2251) - Protective and Decorative Full Body Laptop Skin Decal Sticker, Anti-Scratch Vinly Skin Sticker Wrap [Fresh Marble]",
        "material": "Vinyl",
        "co2_emission": "no value"
    },
    {
        "product": "MOSISO Plastic Hard Shell Case & Keyboard Cover Compatible MacBook Air 11 Inch (Models: A1370 & A1465), Transparent Black",
        "material": "Acrylonitrile butadiene styrene (ABS)",
        "co2_emission": "4.2 kg"
    },
    {
        "product": "Fishfinder, Depth Finder Poly Sun Cover for 3\" - 4\" Models - Protects Your Screen from Sun / Weather Damage",
        "material": "Glass",
        "co2_emission": "1.2 kg"
    },
    {
        "product": "Dock Audio Extender Adapter Converter Cable for iPod iPhone 4 4S iPhone 5 5S iPad, Samsung & Other Smart Phones",
        "material": "Polyvinyl chloride (PVC)",
        "co2_emission": "2.4 kg"
    },
    {
    