# Urban Air Quality KG: Explicit Q&A with Local RAG

This notebook explicitly demonstrates how to answer natural-language questions using a local Retrieval-Augmented Generation (RAG) pipeline explicitly built upon Neo4j and local LLM models.

## 🔧 Explicit Environment Setup

Ensure your explicit dependencies are installed:
```bash
!pip install -r ../requirements.txt

In [1]:
import sys
from pathlib import Path

# Explicitly add src directory to Python path
sys.path.append(str(Path("../src").resolve()))

# Explicit imports from existing local RAG module
from neo4j_local_rag import setup_graph, setup_llm, setup_rag_pipeline, query_kg

  embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-mpnet-base-v2')
  from .autonotebook import tqdm as notebook_tqdm


# Local Large Language Model is used to drive the Q&A

The LLM we used is mistral-7b-instruct-v0.2.Q4_K_M.gguf

Input the model path to the "model_path"
Otherwise, please use the code to download the LLM, and model path will be shown at the end. 

In [None]:
# Only use this if you do not have a local LLM stored in your laptop

import os
import requests
from pathlib import Path

def download_mistral_model(model_url=None, save_dir="models"):
    # Explicitly set default model URL if not provided
    if model_url is None:
        model_url = (
            "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/"
            "resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf"
        )

    # Explicitly confirm the file name from the URL
    model_name = model_url.split("/")[-1]

    # Explicitly create the save directory if it doesn't exist
    Path(save_dir).mkdir(parents=True, exist_ok=True)

    model_path = Path(save_dir) / model_name
    print(model_path)

    if model_path.exists():
        print(f"✅ Model already explicitly exists at: {model_path}")
    else:
        print(f"⬇️ Explicitly downloading the model from {model_url} ...")
        response = requests.get(model_url, stream=True)

        if response.status_code == 200:
            with open(model_path, 'wb') as file:
                for chunk in response.iter_content(chunk_size=8192):
                    file.write(chunk)
            print(f"✅ Model downloaded and explicitly saved at: {model_path}")
        else:
            raise Exception(f"❌ Failed explicitly to download model. HTTP status: {response.status_code}")

    return str(model_path)

In [3]:
# Explicitly set up Neo4j graph (prompts securely for password)
graph = setup_graph()

# Explicitly specify your LLM model path here (adjust as needed)
model_path = "/Users/nxx20/Library/Application Support/nomic.ai/GPT4All/mistral-7b-instruct-v0.2.Q4_K_M.gguf"

# Explicitly set up local LLM
llm = setup_llm(model_path)

# Explicitly configure RAG pipeline
qa_chain = setup_rag_pipeline(graph, llm)

Enter Neo4j password explicitly (leave empty for default):  ········


  graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USER, password=password)
llama_model_load_from_file_impl: using device Metal (Apple M3 Pro) - 11255 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/nxx20/Library/Application Support/nomic.ai/GPT4All/mistral-7b-instruct-v0.2.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - k

## ❓ Explicit Natural-Language Q&A Example

Explicitly use your local RAG pipeline to answer urban air quality related questions.

In [5]:
questions = [
    "What pollutants primarily come from vehicles?",
    "How does wind speed influence pollutant dispersion?",
    "What mitigation measures effectively reduce NOx emissions?"
]

for question in questions:
    print("\n" + "="*80)
    query_kg(qa_chain, question)




  result = qa_chain({"query": question})
llama_perf_context_print:        load time =    7375.54 ms
llama_perf_context_print: prompt eval time =    7375.26 ms /   150 tokens (   49.17 ms per token,    20.34 tokens per second)
llama_perf_context_print:        eval time =    2669.98 ms /    64 runs   (   41.72 ms per token,    23.97 tokens per second)
llama_perf_context_print:       total time =   10058.95 ms /   214 tokens



🔍 Question explicitly asked: What pollutants primarily come from vehicles?

📖 Answer from local LLM:
  The primary pollutants that come from vehicles are exhaust gases, primarily carbon monoxide (CO), nitrogen oxides (NOx), and particulate matter (PM). Additionally, evaporative emissions from fuel storage and handling can contribute to volatile organic compounds (VOCs) in the air.



Llama.generate: 48 prefix-match hit, remaining 94 prompt tokens to eval
llama_perf_context_print:        load time =    7375.54 ms
llama_perf_context_print: prompt eval time =    3235.32 ms /    94 tokens (   34.42 ms per token,    29.05 tokens per second)
llama_perf_context_print:        eval time =    4133.89 ms /    90 runs   (   45.93 ms per token,    21.77 tokens per second)
llama_perf_context_print:       total time =    7388.16 ms /   184 tokens



🔍 Question explicitly asked: How does wind speed influence pollutant dispersion?

📖 Answer from local LLM:
  Wind speed plays a significant role in pollutant dispersion. Faster wind speeds can lead to more efficient mixing and dilution of pollutants, resulting in reduced concentrations near the source. However, very strong winds can also transport pollutants over longer distances, potentially impacting areas far from the source. Overall, understanding the relationship between wind speed and pollutant dispersion is crucial for developing effective strategies to mitigate air pollution.



Llama.generate: 48 prefix-match hit, remaining 109 prompt tokens to eval
llama_perf_context_print:        load time =    7375.54 ms
llama_perf_context_print: prompt eval time =    5554.50 ms /   109 tokens (   50.96 ms per token,    19.62 tokens per second)
llama_perf_context_print:        eval time =   22383.38 ms /   511 runs   (   43.80 ms per token,    22.83 tokens per second)
llama_perf_context_print:       total time =   28121.38 ms /   620 tokens



🔍 Question explicitly asked: What mitigation measures effectively reduce NOx emissions?

📖 Answer from local LLM:
  Some effective measures for reducing NOx emissions include:
1. Selective Catalytic Reduction (SCR) systems, which use a catalyst to convert NOx into harmless nitrogen and water.
2. Lean-burn engines, which operate at lean air-fuel ratios, reducing the amount of excess fuel that can contribute to NOx emissions.
3. Exhaust Gas Recirculation (EGRR) systems, which recirculate a portion of the exhaust gas back into the engine intake, reducing the amount of fresh oxygen entering the engine and thus reducing the potential for NOx emissions.
4. Use of low-NOx fuels, such as natural gas or biodiesel, which produce lower amounts of NOx when burned compared to conventional diesel fuel.
5. Implementation of engine idling reduction strategies, such as use of auxiliary power units (APUs) or implementation of predictive maintenance schedules that minimize the need for prolonged idling 

In [7]:
questions = [
    "What pollutants primarily come from vehicles?",
    "How does wind speed influence pollutant dispersion?",
    "What mitigation measures effectively reduce NOx emissions?"
]

for question in questions:
    print("\n" + "="*80)
    query_kg(qa_chain, question, source_shown=1)




Llama.generate: 48 prefix-match hit, remaining 102 prompt tokens to eval
llama_perf_context_print:        load time =    7375.54 ms
llama_perf_context_print: prompt eval time =    4967.11 ms /   102 tokens (   48.70 ms per token,    20.54 tokens per second)
llama_perf_context_print:        eval time =    2667.13 ms /    64 runs   (   41.67 ms per token,    24.00 tokens per second)
llama_perf_context_print:       total time =    7647.56 ms /   166 tokens



🔍 Question explicitly asked: What pollutants primarily come from vehicles?

📖 Answer from local LLM:
  The primary pollutants that come from vehicles are exhaust gases, primarily carbon monoxide (CO), nitrogen oxides (NOx), and particulate matter (PM). Additionally, evaporative emissions from fuel storage and handling can contribute to volatile organic compounds (VOCs) in the air.

📌 Explicitly retrieved sources from Neo4j:
- 
name: Cars and vans (petrol and diesel)
category: MobileSource
- 
name: Vehicle exhaust
category: Uncategorized
- 
name: Evaporative emissions (fuel storage and handling)
category: AreaSource
- 
name: Biogenic emissions (vegetation)
category: NaturalSource
- 
name: Gasoline vehicles
category: MobileSource



Llama.generate: 48 prefix-match hit, remaining 94 prompt tokens to eval
llama_perf_context_print:        load time =    7375.54 ms
llama_perf_context_print: prompt eval time =    3524.11 ms /    94 tokens (   37.49 ms per token,    26.67 tokens per second)
llama_perf_context_print:        eval time =    3879.08 ms /    85 runs   (   45.64 ms per token,    21.91 tokens per second)
llama_perf_context_print:       total time =    7423.51 ms /   179 tokens



🔍 Question explicitly asked: How does wind speed influence pollutant dispersion?

📖 Answer from local LLM:
  Wind speed plays a significant role in pollutant dispersion. Faster wind speeds can lead to more efficient mixing and dilution of pollutants, resulting in reduced concentrations near the source. Conversely, slower wind speeds can result in less efficient mixing and dilution, leading to higher concentrations near the source. Overall, wind speed is an important factor influencing pollutant dispersion and air quality.

📌 Explicitly retrieved sources from Neo4j:
- 
name: Windblown dust
category: NaturalSource
- 
name: Biogenic emissions (vegetation)
category: NaturalSource
- 
name: Dust
category: NaturalSource
- 
name: Evaporative emissions (fuel storage and handling)
category: AreaSource
- 
name: Aircraft emissions
category: MobileSource



Llama.generate: 48 prefix-match hit, remaining 109 prompt tokens to eval
llama_perf_context_print:        load time =    7375.54 ms
llama_perf_context_print: prompt eval time =    5478.87 ms /   109 tokens (   50.26 ms per token,    19.89 tokens per second)
llama_perf_context_print:        eval time =   11645.85 ms /   273 runs   (   42.66 ms per token,    23.44 tokens per second)
llama_perf_context_print:       total time =   17200.26 ms /   382 tokens



🔍 Question explicitly asked: What mitigation measures effectively reduce NOx emissions?

📖 Answer from local LLM:
  Some effective measures to reduce NOx emissions include:
1. Selective Catalytic Reduction (SCR) technology, which uses a catalyst to convert NOx into harmless nitrogen and water.
2. Lean-burn engines, which operate at lean air-fuel ratios, reducing the amount of fuel that is burned and therefore reducing NOx emissions.
3. Exhaust Gas Recirculation (EGRR) systems, which recirculate a portion of the exhaust gas back into the engine intake, reducing the amount of fresh air that enters the engine and therefore reducing NOx emissions.
4. Use of low-NOx fuels, such as natural gas or biodiesel, which produce lower amounts of NOx when burned compared to conventional diesel fuel.
5. Implementation of emission standards and regulations, such as the European Union's (EU) Euro 6 emissions standard for new passenger cars and light commercial vehicles, which set strict limits on the a