<a href="https://colab.research.google.com/github/adharshrj/llmsinedu/blob/main/LLM_in_Education.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Installation

In [1]:
# Pip installation LangChain and Hugginface API
!pip install langchain
!pip install huggingface_hub

# Pip installation of additional needed libraries
!pip install sentence_transformers
!pip install faiss-cpu
!pip install "unstructured[all-docs]"




Env Setup

In [18]:
import os
import requests
os.environ["HUGGINGFACEHUB_API_TOKEN"] = ""
os.environ["HF_TOKEN"] = ""

Connect Google Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


Setup Loaders

In [4]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [25]:
def loadPDFFromLocal(pdf_file_path="/content/drive/MyDrive/LLM_Testing_Docs/clrs.pdf"):
    loader = PyPDFLoader(pdf_file_path)
    pages = loader.load_and_split()

    # Adding progress tracking
    total_pages = len(pages)
    for i, page in enumerate(pages):
        print(f"Processing page {i+1} of {total_pages}")

    return pages


In [40]:
def loadFromUrl(url="https://www.nrel.gov/docs/fy12osti/55871.pdf"):
  onlineLoader = OnlinePDFLoader(url)
  newPg = onlineLoader.load_and_split()

  print(newPg)
  return newPg

Split Documents (LLMS cannot read large amounts of data)

In [9]:
from langchain.text_splitter import CharacterTextSplitter

In [10]:
def splitDocument(loaded_docs):
    # Splitting documents into chunks
    splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
    chunked_docs = splitter.split_documents(loaded_docs)
    return chunked_docs

Create Embeddings

In [11]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

In [12]:
def createEmbeddings(chunked_docs):
    # Create embeddings and store them in a FAISS vector store
    embedder = HuggingFaceEmbeddings()
    vector_store = FAISS.from_documents(chunked_docs, embedder)
    return vector_store

Use those embeddings to feed the LLM model and Answer Questions

In [13]:
from langchain.chains.question_answering import load_qa_chain
from langchain import HuggingFaceHub

In [34]:
def loadLLMModel():
    llm=HuggingFaceHub(repo_id="mistralai/Mixtral-8x7B-Instruct-v0.1", model_kwargs={"temperature":0, "max_length":2048})
    chain = load_qa_chain(llm, chain_type="stuff")
    return chain

def askQuestions(vector_store, chain, question):
    # Ask a question using the QA chain
    similar_docs = vector_store.similarity_search(question)
    response = chain.run(input_documents=similar_docs, question=question)
    return response

In [35]:
chain = loadLLMModel()

Testing

In [None]:
PDF_loaded_docs = loadPDFFromLocal()
PDF_chunked_docs = splitDocument(PDF_loaded_docs)
PDF_vector_store = createEmbeddings(PDF_loaded_docs)

In [41]:
PDF_loaded_docs = loadFromUrl()
PDF_vector_store = createEmbeddings(PDF_loaded_docs)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


[Document(page_content='Wind Power Plant Prediction by Using Neural Networks\n\nPreprint\n\nZ. Liu and W. Gao University of Denver\n\nY.-H. Wan and E. Muljadi National Renewable Energy Laboratory\n\nTo be presented at the IEEE Energy Conversion Conference and Exposition Raleigh, North Carolina September 15–20, 2012\n\nNREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency & Renewable Energy, operated by the Alliance for Sustainable Energy, LLC.\n\nConference Paper NREL/CP-5500-55871 August 2012\n\nContract No. DE-AC36-08GO28308\n\nNOTICE\n\nThe submitted manuscript has been offered by an employee of the Alliance for Sustainable Energy, LLC (Alliance), a contractor of the US Government under Contract No. DE-AC36-08GO28308. Accordingly, the US Government and Alliance retain a nonexclusive royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for US Government purposes.\n\nThis report was prepar

In [43]:
PDF_response = askQuestions(PDF_vector_store, chain, "Summarize the content of this paper please")
print(PDF_response)

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

MAE (cid:3404)

1 n

(cid:2924)

(cid:3533) |x(cid:2919) (cid:3398) x(cid:3112)(cid:3541) |

(cid:2919)(cid:2880)(cid:2869)

(2)

RMSE (cid:3404) (cid:3497)

1 n

(cid:2924) (cid:3533)(cid:4666)x(cid:2919) (cid:3398) x(cid:3112)(cid:3557) (cid:4667)(cid:2870) (cid:2919)(cid:2880)(cid:2869)

(3)

MAPE (cid:3404)

1 n

(cid:2924) (cid:3533) (cid:3628) (cid:2919)(cid:2880)(cid:2869)

x(cid:2919) (cid:3398) x(cid:3112)(cid:3557) x(cid:2919)

(cid:3628) (cid:3400) 100

%

(4)

IV. COMPARISON AND ANALYSIS OF PREDICTION RESULTS

Based on section II, data as shown in Table III was selected to finish the WPP's power prediction model. Data of each group consists of wind speed, wind direction and wind power generation.

TABLE III DATA DESCRIPTION

Data Group A

B

Start time

4/1/2010 0:00

4/1/2011 0:10

End time

5/8/2010 23:50 5/8/2