In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install langchain-huggingface sentence_transformers faiss-cpu PyPDF2 pypdf -U langchain-community Groq -U langchain-groq

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.3.0-py3-none-any.whl.metadata (996 bytes)
Collecting sentence_transformers
  Downloading sentence_transformers-5.0.0-py3-none-any.whl.metadata (16 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.0 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting pypdf
  Downloading pypdf-5.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting Groq
  Downloading groq-0.30.0-py3-none-any.whl.metadata (16 kB)
Collecting langchain-groq
  Downloading langchain_groq-0.3.6-py3-none-any.whl.metadata (2.6 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-communit

In [3]:
from pypdf import PdfReader
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from groq import Groq
from langchain_groq import ChatGroq
from google.colab import userdata
import os

In [7]:
paragraphs = [
    """Consequently, deeper learning (DL) methods, exemplified by the convolutional neural network (CNN), have been employed to extract subtle fault features that are not easily
    detectable, utilizing vast amounts of data to provide more accurate diagnostic results [10], [11]. Sun et al. [12] proposed a hybrid fault diagnosis method that leverages the capabilities
    of CNN to integrate diverse frequency and sequence features. They combined a gap-gated recurrent unit network with adaptive batch normalization to refine feature extraction
    and enhance robustness. Zhang et al. [13] enhance the extraction of fault-related features from nonstationary bearing signals by incorporating a cascaded multiscale information
    fusion layer into the 2-D CNN architecture. Among these, researchers have discovered that stacking layers in a CNN model enables the extraction of more complex patterns and
    features, leading to improved model performance. Deeper CNN network models consequently exhibit higher accuracy and stronger generalization capabilities compared to shallow
    network models. However, layer stacking also introduces several new challenges. The increase in the number of layers in a CNN significantly inflates the model parameters, directly leading to
    high computational and time costs as well as increased memory requirements. As a result, the limited computational power and storage capacity of onboard equipment or
    wearable devices typically prevent the practical deployment of deep CNN models [14]"""]

raw_text = "\n".join(paragraphs)

text_splitter = RecursiveCharacterTextSplitter(separators=["\n\n", "\n", ". ", " ", ""], chunk_size=200, chunk_overlap=20)
text_chunks = text_splitter.split_text(raw_text)

In [None]:
# Embedding
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

# FAISS vectorstore
vectorstore_path = "./vectorstore"
store = FAISS.from_texts(text_chunks, embeddings)
store.save_local(vectorstore_path)

vectorstore = FAISS.load_local(vectorstore_path, embeddings, allow_dangerous_deserialization=True)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [None]:
# LLM setup: adjust as per available integrations.
import os
os.environ["GROQ_API_KEY"] = "Your_API_Key"

## Generic llama-3 loading -- adjust as per Groq/your integration
from langchain_groq import ChatGroq
llm = ChatGroq(model_name="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))

def retrieve_docs(query):
    return vectorstore.similarity_search(query, k=3)

def answer_with_context(query):
    docs = retrieve_docs(query)
    context_text = "\n\n".join([doc.page_content for doc in docs])
    prompt = f"Context:\n{context_text}\n\nQuestion: {query}\nAnswer:"
    response = llm.invoke(prompt)
    return response.content  # or response['text'] if using another LLM wrapper

In [None]:
user_question = "What is non stationary bearing signals?"
answer = answer_with_context(user_question)
print(answer)

Non-stationary bearing signals refer to audio or vibration signals that originate from rotating machinery, such as bearings, and have characteristics that change over time or frequency. These signals are often irregular, unpredictable, and can be affected by various factors like speed, load, and temperature, making them challenging to analyze and process.


The term "non-stationary" in this context means that the signal's statistical properties, such as mean, variance, or power spectral density, change over time or across different frequency bands. This non-stationarity can be due to various factors, including:

1. Time-varying operating conditions
2. Changes in bearing speed or load
3. Presence of faults or anomalies
4. Environmental factors like temperature or vibration

To analyze and process non-stationary bearing signals effectively, advanced signal processing techniques, such as those mentioned in the text (cascaded multiscale information and convolutional neural networks), are o

In [None]:
def extract_texts_from_pdfs(pdf_paths):
    all_text = ""
    for pdf_path in pdf_paths:
        reader = PdfReader(pdf_path)
        for page in reader.pages:
            page_text = page.extract_text()
            if page_text:
                all_text += page_text + "\n"
    return all_text

pdf_files = ["/content/1-s2.0-S0957417423035856-main.pdf", "/content/A_Fault_Diagnosis_Method_for_Variable_Condition_Equipment_Based_on_Knowledge_Transfer_and_Improved_Residual_Neural_Networks.pdf", "/content/A_Lightweight_Fault_Diagnosis_Method_Based_on_Knowledge_Distillation_Under_Time-Varying_Rotational_Speeds.pdf"]
combined_text = extract_texts_from_pdfs(pdf_files)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", ". ", " ", ""],
    chunk_size=1000,
    chunk_overlap=100
)
text_chunks = text_splitter.split_text(combined_text)

In [None]:
len(text_chunks)

204

# Create embeddings and FAISS vectorstore

In [None]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = FAISS.from_texts(text_chunks, embeddings)
vectorstore.save_local("./vectorstore")

# Load embeddings and existing FAISS vectorstore

In [5]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vectorstore = FAISS.load_local('/content/drive/MyDrive/vectorstore', embeddings, allow_dangerous_deserialization=True)

# 2. Chunk the new PDF
def pdf_to_chunks(pdf_path, chunk_size=1000, chunk_overlap=100):
    reader = PdfReader(pdf_path)
    text = "\n".join([page.extract_text() or "" for page in reader.pages])
    splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    return splitter.split_text(text)

new_pdf = "/content/An_Integrated_Framework_for_Bearing_Fault_Diagnosis_Convolutional_Neural_Network_Model_Compression_Through_Knowledge_Distillation.pdf"
new_chunks = pdf_to_chunks(new_pdf)

# 3. Add new chunks to vectorstore
vectorstore.add_texts(new_chunks)

# 4. Save the updated vectorstore
vectorstore.save_local('./vectorstore')

vectorstore_path = "/content/vectorstore"

vectorstore = FAISS.load_local(vectorstore_path, embeddings, allow_dangerous_deserialization=True)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [None]:
# 3. Retrieval and LLM (from your earlier code, adjust for your LLM integration)
os.environ["GROQ_API_KEY"] = userdata.get('sec_key')
llm = ChatGroq(model_name="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))

def retrieve_docs(query, k=3):
    return vectorstore.similarity_search(query, k=k)

def answer_with_context(query, k=3):
    docs = retrieve_docs(query, k)
    context_text = "\n\n".join([doc.page_content for doc in docs])
    prompt = f"Context:\n{context_text}\n\nQuestion: {query}\nAnswer:"
    response = llm.invoke(prompt)
    return response.content

In [None]:
user_question = "What is Improved probability-based knowledge distillation?"
answer = answer_with_context(user_question)
print(answer)

Improved probability-based knowledge distillation (PKD) is a method that learns the geometry of the feature space of the teacher network, which can significantly increase the quality of the learned model. It models the affinity between samples as a proximity distribution (PD) and takes into account both the PDs of samples and class labels to improve the quality and efficiency of knowledge distillation.


In [6]:
from langchain_groq import ChatGroq
from google.colab import userdata
import os

# Set API key securely
os.environ["GROQ_API_KEY"] = userdata.get('sec_key')
llm = ChatGroq(model_name="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))

# Assume `vectorstore` is already loaded as per previous steps

def retrieve_docs(query, k=3):
    return vectorstore.similarity_search(query, k=k)

def answer_with_context(query, k=3):
    docs = retrieve_docs(query, k)
    context_text = "\n\n".join([doc.page_content for doc in docs])
    prompt = f"Context:\n{context_text}\n\nQuestion: {query}\nAnswer:"
    response = llm.invoke(prompt)
    return response.content

def chat_loop():
    print("Start chatting with your document's AI! (type 'exit' or 'quit' to stop)")
    while True:
        user_question = input("\nYou: ")
        if user_question.strip().lower() in ["exit", "quit"]:
            print("Goodbye!")
            break
        answer = answer_with_context(user_question)
        print("\nAssistant:", answer)

# Start the chat
chat_loop()


Start chatting with your document's AI! (type 'exit' or 'quit' to stop)

You: What is knowledge distillation?

Assistant: Knowledge distillation is a machine learning technique used to transfer knowledge from a pre-trained teacher network to a student network. The basic idea is to use the soft labels generated by the teacher network to guide the training process of the student network. This involves making the student network learn from the teacher network's output, which is typically a soft probability distribution of the class labels, rather than just the hard labels themselves. The goal of knowledge distillation is to enable the student network to learn from the teacher network's knowledge and make predictions similar to the teacher network, while also reducing the size and computational complexity of the student network.

You: What is non stationary signals?

Assistant: Non-stationary signals are time-varying signals whose statistical properties, such as mean, variance, or power sp