## Pulls feature from the Mark's featureful (aka setupdataset_mod.ipynb) data
"./resources/processed/processed_for_kg_v2.pkl"
<br>
<br>
and try to populate the kg with those feature

In [1]:
import getpass
import os
import pickle

## Battery Dataset Features

This dataset contains time-series data related to battery degradation. Each battery is identified by a unique `battery_id`.
The dataset is divided into:
- `b1` (train set)
- `b2` (validation set)
- `b3` (test set, not included)

Each battery ID (e.g., `b1c1`) contains the following features:

Attributes:
    
    - cycle (int): 
        The total number of charge-discharge cycles completed by the battery.
        Example: 2161

    - charging_policy (str): 
        The charging protocol applied to the battery.
        Example: "3.6C(80%)-3.6C"

    - q_d_n (list[float]): 
        List of normalized discharge capacity values over cycles.
        Example: [1.0499999523162842, ..., 1.075301170349121,0.0, 0.0]

    - trimmed_q_d_n (list[float]): 
        A trimmed version of `q_d_n`, typically containing the relevant capacity data after preprocessing.
        Example: [1.0499999523162842, ..., 0.8800023198127747]

    - slope_all_cycles (float): 
        Slope of discharge capacity over all cycles.
        Example: -7.866618810898173e-05

    - slope_last_{N}_cycles (float): 
        Slope of discharge capacity over the last N cycles, where N can be 
        10, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000.
        Example: slope_last_100_cycles = -0.0003626042604446411

    - mean_grad_last_{N}_cycles (numpy.float64): 
        Mean gradient of discharge capacity over the last N cycles, where N can be 
        10, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000.
        Example: mean_grad_last_100_cycles = -0.00036493629217147826

---

## NODE & RELATION IDEA

### **Nodes**
1. **Battery**
   - `battery_id` (string, e.g., `"b1c1"`) → Unique identifier for each battery.
   - `total_cycles` (int) → Total charge-discharge cycles completed. **(Renamed for clarity)**
   - `slope_all_cycles` (float) → Degradation trend over all cycles.
   - `slope_last_{k}_cycles` (float) → Degradation trend over the last `k` cycles, where `k` ∈ {10, 50, …, 1000}.
   - `mean_grad_last_{k}_cycles` (float) → Mean degradation rate over the last `k` cycles.
   - `trimmed_q_d_n` (list[float]) → Normalized discharge capacity values (⚠️ **Consider storing summary stats instead of full lists in Neo4j for efficiency.??**)

2. **ChargingPolicy**
   - `policy_id` (auto-generated, optional) → Unique identifier for each policy (useful for indexing). **(Can be omitted if Neo4j auto-generates it)**
   - `charging_policy` (string, e.g., `"3.6C(80%)-3.6C"`) → The applied charging protocol.



### **Relationships**
1. **`USED_BY`** (`(:Battery) -[:USED_BY]-> (:ChargingPolicy)`)  
   - **Links each battery to the charging protocol it follows.**  


## Load feature engineered dataset to dict

In [2]:

# Load battery dataset
DATASET_PATH = "/home/jeans/nvaitc/battery_timeseries/resources/processed/processed_for_kg_v2.pkl"

with open(DATASET_PATH, "rb") as f:
    battery_data = pickle.load(f)
print("Battery data loaded successfully.")
print(battery_data.keys())
print(battery_data["b1c1"].keys())

Battery data loaded successfully.
dict_keys(['b1c1', 'b1c3', 'b1c5', 'b1c7', 'b1c11', 'b1c15', 'b1c17', 'b1c19', 'b1c21', 'b1c24', 'b1c26', 'b1c28', 'b1c30', 'b1c32', 'b1c34', 'b1c36', 'b1c38', 'b1c40', 'b1c42', 'b1c44', 'b2c0', 'b2c2', 'b2c4', 'b2c6', 'b2c11', 'b2c13', 'b2c17', 'b2c19', 'b2c21', 'b2c23', 'b2c25', 'b2c27', 'b2c29', 'b2c31', 'b2c33', 'b2c35', 'b2c37', 'b2c39', 'b2c41', 'b2c43', 'b2c45', 'b1c0', 'b1c2', 'b1c4', 'b1c6', 'b1c9', 'b1c14', 'b1c16', 'b1c18', 'b1c20', 'b1c23', 'b1c25', 'b1c27', 'b1c29', 'b1c31', 'b1c33', 'b1c35', 'b1c37', 'b1c39', 'b1c41', 'b1c43', 'b1c45', 'b2c1', 'b2c3', 'b2c5', 'b2c10', 'b2c12', 'b2c14', 'b2c18', 'b2c20', 'b2c22', 'b2c24', 'b2c26', 'b2c28', 'b2c30', 'b2c32', 'b2c34', 'b2c36', 'b2c38', 'b2c40', 'b2c42', 'b2c44', 'b2c46'])
dict_keys(['q_d_n', 'cycle', 'charging_policy', 'trimmed_q_d_n', 'slope_all_cycles', 'slope_last_10_cycles', 'slope_last_50_cycles', 'slope_last_100_cycles', 'slope_last_200_cycles', 'slope_last_300_cycles', 'slope_last_400

## NEO4J

In [3]:
# %pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j

In [4]:
from neo4j import GraphDatabase

class BatteryGraph:
    def __init__(self, uri, user, password):
        self.driver = GraphDatabase.driver(uri, auth=(user, password))

    def close(self):
        self.driver.close()

    def insert_battery_data(self, battery_id, battery_data):
        with self.driver.session() as session:
            # Insert ChargingPolicy Node (if not exists)
            policy = battery_data["charging_policy"]
            session.run(
                """
                MERGE (cp:ChargingPolicy {charging_policy: $policy})
                """,
                policy=policy
            )

            # Insert Battery Node
            session.run(
                """
                MERGE (b:Battery {battery_id: $battery_id})
                SET b.total_cycles = $total_cycles,
                    b.slope_all_cycles = $slope_all_cycles,
                    b.trimmed_q_d_n_avg = $trimmed_q_d_n_avg,
                    b.name = $battery_id  // Ensures battery_id is displayed as the node name in Neo4j
                """,
                battery_id=battery_id,
                total_cycles=battery_data["cycle"],
                slope_all_cycles=battery_data["slope_all_cycles"],
                trimmed_q_d_n_avg=sum(battery_data["trimmed_q_d_n"]) / len(battery_data["trimmed_q_d_n"])
            )

            # Insert slope_last_{N}_cycles and mean_grad_last_{N}_cycles attributes
            for key, value in battery_data.items():
                if key.startswith("slope_last_") or key.startswith("mean_grad_last_"):
                    session.run(
                        f"""
                        MATCH (b:Battery {{battery_id: $battery_id}})
                        SET b.{key} = $value
                        """,
                        battery_id=battery_id,
                        value=value
                    )

            # Create USED_BY relationship in the new direction: (cp)-[:USED_BY]->(b)
            session.run(
                """
                MATCH (b:Battery {battery_id: $battery_id})
                MATCH (cp:ChargingPolicy {charging_policy: $policy})
                MERGE (cp)-[:USED_BY]->(b)
                """,
                battery_id=battery_id,
                policy=policy
            )



## Set up Neo4J

populate the kg with battery dataset

In [5]:
# Neo4j connection settings
# !!should be set as environment variables when running in production
URI = "neo4j+s://3b31837b.databases.neo4j.io"  # Change this if your Neo4j instance runs elsewhere
USERNAME = "neo4j" # Replace with your actual username
PASSWORD = "D4W3Zfi44nAJfStBuxSE2DpKhlk_nMP6ybEjvOX5qxw"  # Replace with your actual password

# Initialize Neo4j connection
graph = BatteryGraph(URI, USERNAME, PASSWORD)

# Insert each battery's data into Neo4j
for battery_id, battery_info in battery_data.items():
    graph.insert_battery_data(battery_id, battery_info)

# Close connection
graph.close()

print("Battery data successfully inserted into Neo4j.")

Battery data successfully inserted into Neo4j.


In [4]:
if "NVIDIA_API_KEY" not in os.environ:
    os.environ["NVIDIA_API_KEY"] = getpass.getpass()
    
# from langchain_nvidia_ai_endpoints import ChatNVIDIA
# client = ChatNVIDIA(
#   model="mistralai/mixtral-8x22b-instruct-v0.1",
#   api_key=os.environ["NVIDIA_API_KEY"], 
#   temperature=0.5,
#   top_p=1,
#   max_tokens=1024,
# )
# #test llm
# for chunk in client.stream([{"role":"user","content":"Write a limerick about the wonders of GPU computing."}]): 
#   print(chunk.content, end="")

A GPU so swift and so clever,
In computations it's quite the endeavor.
With its thousands of cores,
On complex tasks it roars,
Solving problems like never, forever!