## Pulls feature from the Mark's featureful (aka setupdataset_mod.ipynb) data
"./resources/processed/processed_for_kg_v2.pkl"
<br>
<br>
and try to populate the kg with those feature

# Battery Dataset Features

This dataset contains time-series data related to battery degradation. Each battery is identified by a unique `battery_id`.
The dataset is divided into:
- `b1` (train set)
- `b2` (validation set)
- `b3` (test set, not included)

Each battery ID (e.g., `b1c1`) contains the following features:

Attributes:
    
    - cycle (int): 
        The total number of charge-discharge cycles completed by the battery.
        Example: 2161

    - charging_policy (str): 
        The charging protocol applied to the battery.
        Example: "3.6C(80%)-3.6C"

    - q_d_n (list[float]): 
        List of normalized discharge capacity values over cycles.
        Example: [1.0499999523162842, ..., 1.075301170349121,0.0, 0.0]

    - trimmed_q_d_n (list[float]): 
        A trimmed version of `q_d_n`, typically containing the relevant capacity data after preprocessing.
        Example: [1.0499999523162842, ..., 0.8800023198127747]

    - slope_all_cycles (float): 
        Slope of discharge capacity over all cycles.
        Example: -7.866618810898173e-05

    - slope_last_{N}_cycles (float): 
        Slope of discharge capacity over the last N cycles, where N can be 
        10, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000.
        Example: slope_last_100_cycles = -0.0003626042604446411

    - mean_grad_last_{N}_cycles (numpy.float64): 
        Mean gradient of discharge capacity over the last N cycles, where N can be 
        10, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000.
        Example: mean_grad_last_100_cycles = -0.00036493629217147826

Usage:
    This dataset can be used for analyzing battery degradation trends, estimating battery health, 
    and predicting remaining useful life (RUL).

## NODE & RELATION IDEA

NODE:

1.Battery:
    {
    - batteryID: (e.g., "b1c1").
    - mean_backhalf_grad (mean gradient from back-half elements of trimmed_q_d_n).
    - degradation_cycle (from cycle).
    }

2.Cycle:
    {
    - cycle_number : (e.g., 1, 2, ...).
    - discharge_capacity (q_d_n at [cycle_number]).
    - gradient (gradiant_q_d_n at [cycle_number]).
    }

<br>
RELATIONSHIPS:
1.HAS_CYCLE: Links Battery nodes to their respective Cycle nodes.
2.NEXT_CYCLE: Links consecutive Cycle nodes. (add gradient for transitions between cycles here too).

<br>

- (:ChargingProtocol) -[:USED_BY]-> (:Battery) # we will add the ChargingProtocol node later Ignore for now
- (bat_at_t:BatteryID) -[:NEXT_CYCLE]-> ("bat_at_t_plus_one:BatteryID) ; timestep 
- (:Battery) -[:HAS_CYCLE]-> (:BatteryID)


## NEO4J

In [13]:
import getpass
import os

if "NVIDIA_API_KEY" not in os.environ:
    os.environ["NVIDIA_API_KEY"] = getpass.getpass()

from typing import Any, Dict, List, Optional, Tuple, Type

from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars
from langchain_core.tools import tool
from langchain_nvidia_ai_endpoints import ChatNVIDIA

from langchain.agents import AgentExecutor
from langchain.agents.format_scratchpad import \
    format_to_openai_function_messages
from langchain.agents.output_parsers.openai_tools import \
    OpenAIToolsAgentOutputParser
from langchain.callbacks.manager import (AsyncCallbackManagerForToolRun,
                                         CallbackManagerForToolRun)
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.pydantic_v1 import BaseModel, Field
from langchain.schema import AIMessage, HumanMessage
from langchain.tools import BaseTool
from langchain.tools.render import format_tool_to_openai_function

In [14]:
%pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j

Note: you may need to restart the kernel to use updated packages.


In [None]:
# os.environ["NEO4J_URI"] = "neo4j+s://3b31837b.databases.neo4j.io"
# os.environ["NEO4J_USERNAME"] = "neo4j"
# os.environ["NEO4J_PASSWORD"] = "D4W3Zfi44nAJfStBuxSE2DpKhlk_nMP6ybEjvOX5qxw"

# from langchain_community.graphs import Neo4jGraph
# graph = Neo4jGraph(refresh_schema=False)

# graph.refresh_schema()
# print(graph.schema)


In [12]:
#load the pkl from resources/processed/bat_data_for_kg.pkl
import pickle
import pandas as pd
import numpy as np

path_to_pkl = "/home/jeans/nvaitc/battery_timeseries/resources/processed/bat_data_for_kg.pkl"
with open(path_to_pkl, "rb") as f:
    bat_data = pickle.load(f)
print(bat_data.keys())
print(len(bat_data.keys()))

dict_keys(['b1c1', 'b1c3', 'b1c5', 'b1c7', 'b1c11', 'b1c15', 'b1c17', 'b1c19', 'b1c21', 'b1c24', 'b1c26', 'b1c28', 'b1c30', 'b1c32', 'b1c34', 'b1c36', 'b1c38', 'b1c40', 'b1c42', 'b1c44', 'b2c0', 'b2c2', 'b2c4', 'b2c6', 'b2c11', 'b2c13', 'b2c17', 'b2c19', 'b2c21', 'b2c23', 'b2c25', 'b2c27', 'b2c29', 'b2c31', 'b2c33', 'b2c35', 'b2c37', 'b2c39', 'b2c41', 'b2c43', 'b2c45', 'b1c0', 'b1c2', 'b1c4', 'b1c6', 'b1c9', 'b1c14', 'b1c16', 'b1c18', 'b1c20', 'b1c23', 'b1c25', 'b1c27', 'b1c29', 'b1c31', 'b1c33', 'b1c35', 'b1c37', 'b1c39', 'b1c41', 'b1c43', 'b1c45', 'b2c1', 'b2c3', 'b2c5', 'b2c10', 'b2c12', 'b2c14', 'b2c18', 'b2c20', 'b2c22', 'b2c24', 'b2c26', 'b2c28', 'b2c30', 'b2c32', 'b2c34', 'b2c36', 'b2c38', 'b2c40', 'b2c42', 'b2c44', 'b2c46'])
83


In [15]:
# MIGHT BE BETTER WAY TO DO THIS as for now we are just using these
# Takse about 2 hours to populate the 35 batter data (out of 83 then error occurs)
import pickle
import os
from langchain_community.graphs import Neo4jGraph
from neo4j import GraphDatabase

# Neo4j connection details
os.environ["NEO4J_URI"] = "neo4j+s://3b31837b.databases.neo4j.io"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "D4W3Zfi44nAJfStBuxSE2DpKhlk_nMP6ybEjvOX5qxw"

# Initialize Neo4j connection
neo4j_uri = os.environ["NEO4J_URI"]
neo4j_user = os.environ["NEO4J_USERNAME"]
neo4j_password = os.environ["NEO4J_PASSWORD"]

driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))

def populate_kg(data):
    with driver.session() as session:
        for battery_id, battery_info in data.items():
            # Create Battery node
            mean_backhalf_grad = battery_info.get('mean_backhalf_grad', None)
            degradation_cycle = battery_info.get('cycle', None)

            session.run(
                """
                MERGE (b:Battery {batteryID: $battery_id})
                SET b.mean_backhalf_grad = $mean_backhalf_grad,
                    b.degradation_cycle = $degradation_cycle
                """,
                battery_id=battery_id,
                mean_backhalf_grad=mean_backhalf_grad,
                degradation_cycle=degradation_cycle,
            )

            # Iterate over cycles
            trimmed_q_d_n = battery_info.get("trimmed_q_d_n", [])
            gradient_q_d_n = battery_info.get("gradient_q_d_n", [])

            previous_cycle_node = None
            for cycle_number, (discharge_capacity, gradient) in enumerate(zip(trimmed_q_d_n, gradient_q_d_n), start=1):
                # Create Cycle node
                cycle_node = session.run(
                    """
                    MERGE (c:Cycle {cycle_number: $cycle_number, batteryID: $battery_id})
                    SET c.discharge_capacity = $discharge_capacity,
                        c.gradient = $gradient
                    RETURN c
                    """,
                    cycle_number=cycle_number,
                    battery_id=battery_id,
                    discharge_capacity=discharge_capacity,
                    gradient=gradient,
                )

                # Create HAS_CYCLE relationship
                session.run(
                    """
                    MATCH (b:Battery {batteryID: $battery_id}), (c:Cycle {cycle_number: $cycle_number, batteryID: $battery_id})
                    MERGE (b)-[:HAS_CYCLE]->(c)
                    """,
                    battery_id=battery_id,
                    cycle_number=cycle_number,
                )

                # Create NEXT_CYCLE relationship
                if previous_cycle_node is not None:
                    session.run(
                        """
                        MATCH (c1:Cycle {cycle_number: $prev_cycle, batteryID: $battery_id}),
                              (c2:Cycle {cycle_number: $current_cycle, batteryID: $battery_id})
                        MERGE (c1)-[:NEXT_CYCLE {gradient: $gradient}]->(c2)
                        """,
                        prev_cycle=cycle_number - 1,
                        current_cycle=cycle_number,
                        battery_id=battery_id,
                        gradient=gradient,
                    )
                previous_cycle_node = cycle_node

# Populate the KG
# populate_kg(bat_data)

print("Knowledge Graph populated successfully.")

Knowledge Graph populated successfully.


In [None]:
# def populate_kg_optimized(data):
#     with driver.session() as session:
#         # Begin a transaction
#         with session.begin_transaction() as tx:
#             for battery_id, battery_info in data.items():
#                 # Create Battery node
#                 mean_backhalf_grad = battery_info.get('mean_backhalf_grad', None)
#                 degradation_cycle = battery_info.get('cycle', None)
#                 tx.run(
#                     """
#                     MERGE (b:Battery {batteryID: $battery_id})
#                     SET b.mean_backhalf_grad = $mean_backhalf_grad,
#                         b.degradation_cycle = $degradation_cycle
#                     """,
#                     battery_id=battery_id,
#                     mean_backhalf_grad=mean_backhalf_grad,
#                     degradation_cycle=degradation_cycle,
#                 )

#                 # Prepare Cycles and Relationships
#                 trimmed_q_d_n = battery_info.get("trimmed_q_d_n", [])
#                 gradient_q_d_n = battery_info.get("gradient_q_d_n", [])
                
#                 cycle_queries = []
#                 relationship_queries = []

#                 for cycle_number, (discharge_capacity, gradient) in enumerate(zip(trimmed_q_d_n, gradient_q_d_n), start=1):
#                     # Create Cycle node
#                     cycle_queries.append(
#                         {
#                             "cycle_number": cycle_number,
#                             "battery_id": battery_id,
#                             "discharge_capacity": discharge_capacity,
#                             "gradient": gradient,
#                         }
#                     )
#                     # Create NEXT_CYCLE relationship
#                     if cycle_number > 1:
#                         relationship_queries.append(
#                             {
#                                 "prev_cycle": cycle_number - 1,
#                                 "current_cycle": cycle_number,
#                                 "battery_id": battery_id,
#                                 "gradient": gradient,
#                             }
#                         )

#                 # Execute batch cycle node creation
#                 if cycle_queries:
#                     tx.run(
#                         """
#                         UNWIND $cycles AS cycle
#                         MERGE (c:Cycle {cycle_number: cycle.cycle_number, batteryID: cycle.battery_id})
#                         SET c.discharge_capacity = cycle.discharge_capacity,
#                             c.gradient = cycle.gradient
#                         """,
#                         cycles=cycle_queries,
#                     )

#                 # Create HAS_CYCLE relationships
#                 tx.run(
#                     """
#                     UNWIND $cycles AS cycle
#                     MATCH (b:Battery {batteryID: cycle.battery_id}),
#                           (c:Cycle {cycle_number: cycle.cycle_number, batteryID: cycle.battery_id})
#                     MERGE (b)-[:HAS_CYCLE]->(c)
#                     """,
#                     cycles=cycle_queries,
#                 )

#                 # Create NEXT_CYCLE relationships
#                 if relationship_queries:
#                     tx.run(
#                         """
#                         UNWIND $relationships AS rel
#                         MATCH (c1:Cycle {cycle_number: rel.prev_cycle, batteryID: rel.battery_id}),
#                               (c2:Cycle {cycle_number: rel.current_cycle, batteryID: rel.battery_id})
#                         MERGE (c1)-[:NEXT_CYCLE {gradient: rel.gradient}]->(c2)
#                         """,
#                         relationships=relationship_queries,
#                     )

# # Populate the KG
# populate_kg_optimized(bat_data)

# print("Knowledge Graph populated successfully.")


In [4]:

# from langchain_nvidia_ai_endpoints import ChatNVIDIA
# client = ChatNVIDIA(
#   model="mistralai/mixtral-8x22b-instruct-v0.1",
#   api_key=os.environ["NVIDIA_API_KEY"], 
#   temperature=0.5,
#   top_p=1,
#   max_tokens=1024,
# )
# #test llm
# for chunk in client.stream([{"role":"user","content":"Write a limerick about the wonders of GPU computing."}]): 
#   print(chunk.content, end="")

A GPU so swift and so clever,
In computations it's quite the endeavor.
With its thousands of cores,
On complex tasks it roars,
Solving problems like never, forever!

# ADD CHARGING PROTOCOL node

In [17]:
# Define the mapping dictionary
mapping_dict = {'b1c0': '3.6C(80%)-3.6C',
 'b1c1': '3.6C(80%)-3.6C',
 'b1c2': '3.6C(80%)-3.6C',
 'b1c3': '4C(80%)-4C',
 'b1c4': '4C(80%)-4C',
 'b1c5': '4.4C(80%)-4.4C',
 'b1c6': '4.8C(80%)-4.8C',
 'b1c7': '4.8C(80%)-4.8C',
 'b1c9': '5.4C(40%)-3.6C',
 'b1c11': '5.4C(50%)-3C',
 'b1c14': '5.4C(60%)-3C',
 'b1c15': '5.4C(60%)-3C',
 'b1c16': '5.4C(60%)-3.6C',
 'b1c17': '5.4C(60%)-3.6C',
 'b1c18': '5.4C(70%)-3C',
 'b1c19': '5.4C(70%)-3C',
 'b1c20': '5.4C(80%)-5.4C',
 'b1c21': '5.4C(80%)-5.4C',
 'b1c23': '6C(30%)-3.6C',
 'b1c24': '6C(40%)-3C',
 'b1c25': '6C(40%)-3C',
 'b1c26': '6C(40%)-3.6C',
 'b1c27': '6C(40%)-3.6C',
 'b1c28': '6C(50%)-3C',
 'b1c29': '6C(50%)-3C',
 'b1c30': '6C(50%)-3.6C',
 'b1c31': '6C(50%)-3.6C',
 'b1c32': '6C(60%)-3C',
 'b1c33': '6C(60%)-3C',
 'b1c34': '7C(30%)-3.6C',
 'b1c35': '7C(30%)-3.6C',
 'b1c36': '7C(40%)-3C',
 'b1c37': '7C(40%)-3C',
 'b1c38': '7C(40%)-3.6C',
 'b1c39': '7C(40%)-3.6C',
 'b1c40': '8C(15%)-3.6C',
 'b1c41': '8C(15%)-3.6C',
 'b1c42': '8C(25%)-3.6C',
 'b1c43': '8C(25%)-3.6C',
 'b1c44': '8C(35%)-3.6C',
 'b1c45': '8C(35%)-3.6C',
 'b2c0': '1C(4%)-6C',
 'b2c1': '2C(10%)-6C',
 'b2c2': '2C(2%)-5C',
 'b2c3': '2C(7%)-5.5C',
 'b2c4': '3.6C(22%)-5.5C',
 'b2c5': '3.6C(2%)-4.85C',
 'b2c6': '3.6C(30%)-6C',
 'b2c10': '3.6C(9%)-5C',
 'b2c11': '4C(13%)-5C',
 'b2c12': '4C(31%)-5',
 'b2c13': '4C(40%)-6C',
 'b2c14': '4C(4%)-4.85C',
 'b2c17': '4.4C(24%)-5C',
 'b2c18': '4.4C(47%)-5.5C',
 'b2c19': '4.4C(55%)-6C',
 'b2c20': '4.4C(8%)-4.85C',
 'b2c21': '4.65C(19%)-4.85C',
 'b2c22': '4.65C(44%)-5C',
 'b2c23': '4.65C(69%)-6C',
 'b2c24': '4.8C(80%)-4.8C',
 'b2c25': '4.8C(80%)-4.8C',
 'b2c26': '4.8C(80%)-4.8C',
 'b2c27': '4.9C(27%)-4.75C',
 'b2c28': '4.9C(61%)-4.5C',
 'b2c29': '4.9C(69%)-4.25C',
 'b2c30': '5.2C(10%)-4.75C',
 'b2c31': '5.2C(37%)-4.5C',
 'b2c32': '5.2C(50%)-4.25C',
 'b2c33': '5.2C(58%)-4C',
 'b2c34': '5.2C(66%)-3.5C',
 'b2c35': '5.2C(71%)-3C',
 'b2c36': '5.6C(25%)-4.5C',
 'b2c37': '5.6C(38%)-4.25C',
 'b2c38': '5.6C(47%)-4C',
 'b2c39': '5.6C(58%)-3.5C',
 'b2c40': '5.6C(5%)-4.75C',
 'b2c41': '5.6C(65%)-3C',
 'b2c42': '6C(20%)-4.5C',
 'b2c43': '6C(31%)-4.25C',
 'b2c44': '6C(40%)-4C',
 'b2c45': '6C(4%)-4.75C',
 'b2c46': '6C(52%)-3.5C',
 'b2c47': '6C(60%)-3C',
 'b3c0': '5C(67%)-4C-newstructure',
 'b3c1': '5.3C(54%)-4C-newstructure',
 'b3c3': '5.6C(36%)-4.3C-newstructure',
 'b3c4': '5.6C(19%)-4.6C-newstructure',
 'b3c5': '5.6C(36%)-4.3C-newstructure',
 'b3c6': '3.7C(31%)-5.9C-newstructure',
 'b3c7': '4.8C(80%)-4.8C-newstructure',
 'b3c8': '5C(67%)-4C-newstructure',
 'b3c9': '5.3C(54%)-4C-newstructure',
 'b3c10': '4.8C(80%)-4.8C-newstructure',
 'b3c11': '5.6C(19%)-4.6C-newstructure',
 'b3c12': '5.6C(36%)-4.3C-newstructure',
 'b3c13': '5.6C(19%)-4.6C-newstructure',
 'b3c14': '5.6C(36%)-4.3C-newstructure',
 'b3c15': '5.9C(15%)-4.6C-newstructure',
 'b3c16': '4.8C(80%)-4.8C-newstructure',
 'b3c17': '5.3C(54%)-4C-newstructure',
 'b3c18': '5.6C(19%)-4.6C-newstructure',
 'b3c19': '5.6C(36%)-4.3C-newstructure',
 'b3c20': '5C(67%)-4C-newstructure',
 'b3c21': '3.7C(31%)-5.9C-newstructure',
 'b3c22': '5.9C(60%)-3.1C-newstructure',
 'b3c24': '5C(67%)-4C-newstructure',
 'b3c25': '5.3C(54%)-4C-newstructure',
 'b3c26': '5.6C(19%)-4.6C-newstructure',
 'b3c27': '5.6C(36%)-4.3C-newstructure',
 'b3c28': '3.7C(31%)-5.9C-newstructure',
 'b3c29': '5.9C(15%)-4.6C-newstructure',
 'b3c30': '5.3C(54%)-4C-newstructure',
 'b3c31': '5.9C(60%)-3.1C-newstructure',
 'b3c33': '5C(67%)-4C-newstructure',
 'b3c34': '5.3C(54%)-4C-newstructure',
 'b3c35': '5.6C(19%)-4.6C-newstructure',
 'b3c36': '5.6C(36%)-4.3C-newstructure',
 'b3c38': '5C(67%)-4C-newstructure',
 'b3c39': '5.3C(54%)-4C-newstructure',
 'b3c40': '5.6C(19%)-4.6C-newstructure',
 'b3c41': '5.6C(36%)-4.3C-newstructure',
 'b3c44': '5.3C(54%)-4C-newstructure',
 'b3c45': '4.8C(80%)-4.8C-newstructure'}

# Function to create ChargingProtocol nodes and relationships
def create_charging_protocol(tx, battery_id, protocol_name):
    query = """
    MERGE (cp:ChargingProtocol {protocolName: $protocol_name})
    WITH cp
    MATCH (b:Battery {batteryID: $battery_id})
    MERGE (cp)-[:USED_BY]->(b)
    """
    tx.run(query, protocol_name=protocol_name, battery_id=battery_id)

# Populate the knowledge graph
with driver.session() as session:
    for battery_id, protocol_name in mapping_dict.items():
        session.write_transaction(create_charging_protocol, battery_id, protocol_name)
        print(f"Created ChargingProtocol '{protocol_name}' and linked it to Battery '{battery_id}'.")

print("Knowledge graph successfully updated with ChargingProtocol nodes and USED_BY relationships.")

# Close the driver connection
driver.close()

  session.write_transaction(create_charging_protocol, battery_id, protocol_name)


Created ChargingProtocol '3.6C(80%)-3.6C' and linked it to Battery 'b1c0'.
Created ChargingProtocol '3.6C(80%)-3.6C' and linked it to Battery 'b1c1'.
Created ChargingProtocol '3.6C(80%)-3.6C' and linked it to Battery 'b1c2'.
Created ChargingProtocol '4C(80%)-4C' and linked it to Battery 'b1c3'.
Created ChargingProtocol '4C(80%)-4C' and linked it to Battery 'b1c4'.
Created ChargingProtocol '4.4C(80%)-4.4C' and linked it to Battery 'b1c5'.
Created ChargingProtocol '4.8C(80%)-4.8C' and linked it to Battery 'b1c6'.
Created ChargingProtocol '4.8C(80%)-4.8C' and linked it to Battery 'b1c7'.
Created ChargingProtocol '5.4C(40%)-3.6C' and linked it to Battery 'b1c9'.
Created ChargingProtocol '5.4C(50%)-3C' and linked it to Battery 'b1c11'.
Created ChargingProtocol '5.4C(60%)-3C' and linked it to Battery 'b1c14'.
Created ChargingProtocol '5.4C(60%)-3C' and linked it to Battery 'b1c15'.
Created ChargingProtocol '5.4C(60%)-3.6C' and linked it to Battery 'b1c16'.
Created ChargingProtocol '5.4C(60%)