## Part A: Build a code understanding model. Upload your own custom code files to the model and ask questions based on the code file as context. 


## Home work 3 :

https://www.muratkoklu.com/datasets/


Use your own data and complete the following steps. Complete the assignment within a .ipynb notebook. Submit a .zip file containing your data and the results.
Refer to demo_03-classification for examples of setting up a custom dataset.
Step 1. Create your own custom dataset featuring 3 custom categories of at least 100 images each
Step 2. Split this data between 80% training and 20% test
Step 3. Preprocess the data as you see fit
Step 4. Create a Convolutional Neural Network model to learn about your training set 
Step 5. Make predictions on the test data and compare them to the expected categories
Step 6: Use GoogleNet(InceptionNet) and add a LinearLayer on top of it.
Step 7: Train the GoogleNet model and compare the accuracy with the first model.

In [1]:
# Part of langchain.chains module, to faciliate creation of retrieval based QA system
# RetrevialQA first retrieves teh relevant text based on query query, and then generates based on documents
# and QA capabilities
from langchain.chains import RetrievalQA
# Generate and manage embeddings on HF, useful in searches, to understand
# meanings of texts and find similar ones
from langchain.embeddings import HuggingFaceEmbeddings
# Used for various NLP tasks, performance benefits of C++ implementation
from langchain.llms import LlamaCpp
# split texts into smaller chunks or segments
from langchain.text_splitter import RecursiveCharacterTextSplitter
# FAISS-Facebook AI Similarity Search for efficient similarity search
# and clustering of dense vector. Usefule for quickly retrieving docs or passages
# most relevant to query requiring fast nearest neighbor search in high dimension
from langchain.vectorstores import FAISS
# Load and process pdf documents
from langchain.document_loaders import PyPDFDirectoryLoader
# Download and cache files such as model weights from HF
from huggingface_hub import hf_hub_download

# To show progress bar
import tqdm as notebook_tqdm

# To profile, how long and how often each program code is executed
import cProfile
# if you want to run with sorting
import re 

# For bleu score calculation
from nltk.translate.bleu_score import sentence_bleu
from nltk.tokenize import word_tokenize

In [2]:
#load file where code exists
loader = PyPDFDirectoryLoader("HW12/")
data = loader.load()

In [3]:
print(data)



In [4]:
# Split the Extracted Data into Text Chunks
# overlap --refers to how much characters from the end of one 
# chunk of text are repeated at the beginning of the next chunk

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=15)

text_chunks = text_splitter.split_documents(data)
#text_chunks = text_splitter.split_text(data)

In [5]:
len(text_chunks)

16

In [6]:
# Show the third chunk
text_chunks[1]

Document(page_content='Ala_Idris\nAfter downloading the dataset ther e are 500 images within dataset.5/1/24, 9:36 PM Data 255 - Homework 3.ipynb - Colab\nhttps://colab.research.google.com/drive/1OvU2efzoIUoCPVijwbEyyg4ZGvoAcT-g#printMode=true 1/11', metadata={'source': 'HW12/Data 255 - Homework 3.pdf', 'page': 0})

In [7]:
# Downlaod the Embeddings
# allenai/specter: Specifically designed for scientific papers, this model might be particularly 
# effective if your documents are academic or highly technical.
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
                                   #"sentence-transformers/paraphrase-MiniLM-L6-v2")

In [8]:
# Create Embeddings for each of the Text Chunk
vector_store = FAISS.from_documents(text_chunks, embedding=embeddings)

In [9]:
# Takes about 3 minute to complete
# Location to download the model to run locally
# the model is downloaded to \.cache\huggingface\hub folder to be accessible

model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GGUF"
model_basename = "mistral-7b-instruct-v0.1.Q5_K_M.gguf"
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

# Initliaze an instance of LlamaCpp class which is
# a wrapper provided by LLaMA(LLM) in C++ library
# 
llm = LlamaCpp(
    # model can process input data in chunks, uneful when operating under memory contraints
    streaming = True,
    # location of the model which was just downloaded
    model_path=model_path,
    # Sets temperature for models output generation, it controls randomness, 
    # lower closer to 0 makes model more deterministic favoring outcome, higher temp
    # increases diversity but can reduce coherence
    temperature=0.25,
    # nucleus sampling, 1=model will consider the entire set of possbile next tokens at each
    # step generation, reducing value makes output more focussed and less random
    top_p=1,
    # print addtional message such as debugging, messaging, logs
    verbose=True,
    # context size the model can handle in terms of tokens, large size
    # allows for lenghty inputs
    n_ctx=4096
)

llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /Users/vanikancherlapalli/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.1-GGUF/snapshots/731a9fc8f06f5f5e2db8a0cf9d256197eb6e05d1/mistral-7b-instruct-v0.1.Q5_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.1
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llam

In [10]:
# Initialize langchain
# RetrievalQA - for question answering against an index; llm is the based model
# chain_type - type of chain to use --The stuff documents chain (“stuff" as in "to stuff" or "to fill") 
# It takes a list of documents, inserts them all into a prompt, and passes that prompt to an LLM.
# k=2 means the top two closest vectors are returned, which could be used to fetch the most 
# relevant documents or pieces of information in response to a query.
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=vector_store.as_retriever(search_kwargs={"k": 4}))

In [11]:
def process_query(query, chat_history):
    """
    Processes a single query using a QA system and updates the chat history.
    
    Parameters:
    - query (str): The query string to be processed by the QA system.
    - chat_history (list of dict): The chat history that maintains context across queries.
    
    Returns:
    - str: The answer corresponding to the query.
    """
    # Prepare the input as a single dictionary if the QA system expects it
    input_data = {
        'query': query,
        'chat_history': chat_history
    }

    # Update chat history with the current query
    chat_history.append({'role': 'user', 'content': query})
    
    # 'invoke' method that can accept structured input
    answer = qa.invoke(input_data)  # Pass a single dictionary as an argument
    
    # Update chat history with the provided answer
    chat_history.append({'role': 'assistant', 'content': answer['result']})
    
    # Return only the result part of the answer
    return answer['result']

In [12]:
# Initialize chat history
chat_history = []

In [13]:
## Answers to the queries

In [14]:
# Define the query (directly as a string)
query12 = "Summarize what the program is doing?"

# Process the query and print the results
answer = process_query(query12, chat_history)
print(f"Query: {query12}\nAnswer: {answer}\n")


llama_print_timings:        load time =    7619.93 ms
llama_print_timings:      sample time =      12.85 ms /   106 runs   (    0.12 ms per token,  8249.03 tokens per second)
llama_print_timings: prompt eval time =  143342.05 ms /  1643 tokens (   87.24 ms per token,    11.46 tokens per second)
llama_print_timings:        eval time =    9792.93 ms /   105 runs   (   93.27 ms per token,    10.72 tokens per second)
llama_print_timings:       total time =  153626.02 ms /  1748 tokens


Query: Summarize what the program is doing
Answer:  The program is loading and preparing a dataset for training a machine learning model using TensorFlow. It first loads the data from disk and counts the number of images in the dataset. It then manually iterates over the dataset to view the shape of the image and label tensors. Next, it configures the dataset for performance by enabling buffered pr efetching and caching the training data in memory. Finally, it trains the model using the configured dataset and visualizes the accuracy and loss on the training and validation sets.



In [15]:
# Define the query (directly as a string)
query2 = "How many images are in the program?"

# Process the query and print the results
answer = process_query(query2, chat_history)
print(f"Query: {query2}\nAnswer: {answer}\n")

Llama.generate: prefix-match hit

llama_print_timings:        load time =    7619.93 ms
llama_print_timings:      sample time =       1.32 ms /    12 runs   (    0.11 ms per token,  9070.29 tokens per second)
llama_print_timings: prompt eval time =  127292.27 ms /  1511 tokens (   84.24 ms per token,    11.87 tokens per second)
llama_print_timings:        eval time =    1010.09 ms /    11 runs   (   91.83 ms per token,    10.89 tokens per second)
llama_print_timings:       total time =  128592.49 ms /  1522 tokens


Query: How many images are in the program?
Answer:  There are 500 images in the dataset.



In [16]:
# Define the query (directly as a string)
query3 = "Are there any visualizations in the program?"

# Process the query and print the results
answer = process_query(query3, chat_history)
print(f"Query: {query3}\nAnswer: {answer}\n")

Llama.generate: prefix-match hit

llama_print_timings:        load time =    7619.93 ms
llama_print_timings:      sample time =       4.44 ms /    37 runs   (    0.12 ms per token,  8342.73 tokens per second)
llama_print_timings: prompt eval time =   53117.18 ms /   648 tokens (   81.97 ms per token,    12.20 tokens per second)
llama_print_timings:        eval time =    3290.55 ms /    37 runs   (   88.93 ms per token,    11.24 tokens per second)
llama_print_timings:       total time =   56605.93 ms /   685 tokens


Query: Are there any visualizations in the program?
Answer:  Yes, there are two visualizations in the program. The first one shows the training and validation loss over time, while the second one shows the training and validation accuracy over time.



In [17]:
# Define the query (directly as a string)
query4 = "How many lines of code in the program?"

# Process the query and print the results
answer = process_query(query4, chat_history)
print(f"Query: {query4}\nAnswer: {answer}\n")

Llama.generate: prefix-match hit

llama_print_timings:        load time =    7619.93 ms
llama_print_timings:      sample time =       1.99 ms /    17 runs   (    0.12 ms per token,  8542.71 tokens per second)
llama_print_timings: prompt eval time =   98919.77 ms /  1170 tokens (   84.55 ms per token,    11.83 tokens per second)
llama_print_timings:        eval time =    1494.51 ms /    16 runs   (   93.41 ms per token,    10.71 tokens per second)
llama_print_timings:       total time =  100663.26 ms /  1186 tokens


Query: How many lines of code in the program?
Answer:  The number of lines of code in a program is not relevant to the question.



In [18]:
# Define the query (directly as a string)
query5 = "any suggestions to make the program better?"

# Process the query and print the results
answer = process_query(query5, chat_history)
print(f"Query: {query5}\nAnswer: {answer}\n")

Llama.generate: prefix-match hit

llama_print_timings:        load time =    7619.93 ms
llama_print_timings:      sample time =      13.50 ms /   119 runs   (    0.11 ms per token,  8815.47 tokens per second)
llama_print_timings: prompt eval time =  203872.39 ms /  2390 tokens (   85.30 ms per token,    11.72 tokens per second)
llama_print_timings:        eval time =   11819.96 ms /   118 runs   (  100.17 ms per token,     9.98 tokens per second)
llama_print_timings:       total time =  216370.20 ms /  2508 tokens


Query: any suggestions to make the program better?
Answer: 

To improve the performance of the model, you can try using data augmentation and dropout techniques. Data augmentation can generate additional data from the training samples by applying random transformations such as rotation, flipping, and zooming. Dropout can randomly drop out some neurons during training to prevent overfitting. You can also try increasing the number of epochs or changing the learning rate to see if that improves the performance. Additionally, you may want to consider using a different model architecture or hyperparameters such as batch size, optimizer, and regularization techniques.

