# **Foundational NLP**

## **Pre-training and Key Concepts**

# **Table of Contents**

1.   [Introduction](#Introduction)
2.   [Prerequisites](#Prerequisites)
3.   [Step-by-Step-Guide](#Step-by-Step-Guide)
4.   [Code Examples](#Code-Examples)
5.   [Troubleshooting](#Troubleshooting)
6.   [Conclusion](#Conclusion)
7.   [References](#References)

## PRE-TRAINING

### DEFINITION
Pre-training is the initial phase of training a machine learning model, like a neural network, on a large, diverse   
general dataset before fine-tuning it for a specific task. This approach helps the model learn broad patterns   
that can be adapted, saving time and resources.


### Process of Pre-Training


The pre-training process can be broken down into several key steps, as outlined in recent literature:

1. Training on a Large Dataset: The model is trained on a vast, diverse dataset, such as internet text for LLMs or large image datasets for computer vision tasks. For example, datasets like FineWeb, with 15 trillion tokens, are used for LLMs, ensuring the model learns general language patterns .
  > - Data Sources: Common sources include web crawls like CommonCrawl, archiving billions of web pages since 2007, such as the April 2024 crawl with 2.7 billion pages and  386 TiB of uncompressed HTML .
  
  > - Data Processing: This involves filtering (e.g., URL blocklists for adult content, spam), text extraction (removing HTML, JavaScript), language filtering (e.g., fastText for English ≥ 0.65 confidence), and deduplication (e.g., MinHash), resulting in high-quality, diverse data.


2. Tokenization: For LLMs, text is converted into tokens, discrete units processed by the neural network. Byte Pair Encoding (BPE) is commonly used, reducing sequence length while expanding vocabulary (e.g., GPT-4 with 100,277 tokens). Tools like Tiktokenizer facilitate this process, optimizing input for efficiency.



3. Neural Network Training: The model, typically Transformer-based (e.g., nano-GPT with 85,000 parameters), is trained to predict the next token or perform a general task, using backpropagation and optimization techniques like gradient descent. The training involves computing loss (e.g., cross-entropy) and iteratively refining parameters. 


4. Saving the Model: Once trained, the model's parameters (weights) are saved, creating a base model that can be used for subsequent tasks.


5. Fine-Tuning for Specific Tasks: For a new task, the pre-trained model's parameters initialize a new model, which is then trained on the specific task's dataset.  
This fine-tuning adjusts the model to the new task, often requiring less data and computational resources than training from scratch.

### Theoretical Underpinnings of Pre-training

Recent research, such as the 2023 Nature Machine Intelligence article, provides insight into why pre-training improves downstream task performance. 
It suggests that pre-training induces a structured latent space within the model, capturing task-relevant relationships.The Structure-Inducing Pre-Training (SIPT) 
framework explicitly imposes deep, structural constraints during pre-training, enhancing the latent space's geometry. This framework theoretically guarantees 
improved fine-tuning performance, with empirical results showing SIPT outperforming baselines in 10/15 cases across modalities like Proteins, 
Abstracts, and Networks, with gains such as 17% improvement on ACL-ARC and 6% on SciERC relation extraction.

## **Introduction**
The principle of distributional semantics is encapsulated in J.R. Firth’s famous quote   <pre> ```“You shall know a word by the company it keeps”``` </pre> 
 this quote highlights the significance of contextual information in determining   
 word meaning and captures the importance of contextual information in defining word meanings.   
 This principle is a cornerstone in the development of word embeddings.

Word embeddings, also known as word vectors, provide a dense, continuous, and compact representation of words,  
encapsulating their semantic and syntactic attributes.   
They are essentially real-valued vectors, and the proximity of these vectors in a multidimensional   
space is indicative of the linguistic relationships between words 

The term  <pre> “embedding” </pre>  in this context refers to the transformation of discrete words into   
continuous vectors,   
achieved through word embedding algorithms. These algorithms are designed to convert   
words into vectors that encapsulate a significant portion of their semantic content.   
An example of the effectiveness of these embeddings is the vector arithmetic that yields meaningful analogies such as <pre> "uncle" - "man" + "woman" ≈ "aunt" </pre>





## **Prerequisites**

- Programming fundamentals (Python is the standard language for NLP)

- Basic probability and statistics as well as linear algebra concepts

- Machine learning concepts

- Text preprocessing techniques

- Linguistic Terminology

<a id='guide'></a>
## **Step-by-Step Guide**

## Word Embedding Techniques

- Count-Based Techniques (TF-IDF and BM25)  
- Co-occurrence Based/Static Embedding Techniques  
- Contextualized/Dynamic Representation Techniques (BERT, ELMo) 


### Bag of Words (BoW) 

Tokenization:
 - Split the text into words (tokens).  

Vocabulary Building:
 - Create a vocabulary list of all unique words in the corpus.

Vector Representation:
   - For each document, create a vector where each element corresponds to a word in the vocabulary. 
     The value is the count of occurrences of that word in the document.



**Example** 

Consider a corpus with the following two documents:
1. “The cat sat on the mat.”
2. “The dog sat on the log.”

Steps:

1. Tokenization:
   - Document 1: ["the", "cat", "sat", "on", "the", "mat"]
   - Document 2: ["the", "dog", "sat", "on", "the", "log"]


2. Vocabulary Building:
    - Vocabulary: ["the", "cat", "sat", "on", "mat", "dog", "log"]


3. Vector Representation:
   - Document 1: [2, 1, 1, 1, 1, 0, 0]
   - Document 2: [2, 0, 1, 1, 0, 1, 1]

    The resulting BoW vectors are:
   - Document 1: [2, 1, 1, 1, 1, 0, 0]
   - Document 2: [2, 0, 1, 1, 0, 1, 1]



###  Term Frequency-Inverse Document Frequency (TF-IDF)  

Term Frequency-Inverse Document Frequency (TF-IDF) is a statistical measure used to evaluate the importance of a 
word to a document in a collection or corpus. 
It is a fundamental technique in text processing that ranks the 
relevance of documents to a specific query, commonly applied in tasks such as document classification, search engine ranking, 
information retrieval, and text mining.

# Exercise


## Term Frequency (TF)

- Term Frequency measures how frequently a term occurs in a document. 
Since every document is different in length, it is possible that a term would 
appear much more times in long documents than shorter ones.
Thus, the term frequency is often divided by the document length (the total number of terms in the document) as a way of normalization:

 <pre> TF(t)=Number of times term t appears in a documentTotal number of terms in the document     
           --------------------------------------------------------------------------------------
                             Total number of terms in the document

  </pre>



## Inverse Document Frequency (IDF)

- Inverse Document Frequency measures how important a term is. 
 While computing TF, all terms are considered equally important.   
 However, certain terms, like “is”, “of”, and “that”, may appear a lot of times but have little importance.   
 Thus, we need to weigh down the frequent terms while scaling up the rare ones, by computing the following:

   <pre>  IDF(t)=log(Total number of documents)  
   -------------------------------------------
    (Number of documents with term t in it)  </pre>




## Example

Steps to Calculate TF-IDF

Step 1: TF (Term Frequency): Number of times a word appears in a document divided by the total number of words in that document.  
Step 2: IDF (Inverse Document Frequency): Calculated as log(N / df), where: 
N is the total number of documents in the collection.   
df is the number of documents containing the word.  
Step 3: TF-IDF: The product of TF and IDF.

Document Collection
- Doc 1: “The sky is blue.”
- Doc 2: “The sun is bright.”
- Total documents (N): 2










  

## WORD2VEC



### Motivation Behind Word2Vec: the Need for Context-based Semantic Understanding


- TF-IDF and BM25 are methods used in information retrieval to rank documents based on their relevance to a query.  
 While they provide useful measures for text analysis, they do not offer context-based “semantic” embeddings   
 (in the same way that Word2Vec or BERT embeddings do). Here’s why:

- TF-IDF: This method calculates a weight for each word in a document, which increases with the number of times   
the word appears in the document but decreases based on the frequency of the word across all documents. TF-IDF is  
 good at identifying important words in a document but doesn’t capture the meaning of the words or their   
 relationships with each other.   
 It’s more about word importance than word meaning

- In contrast, semantic embeddings (like those from Word2Vec, BERT, etc.) are designed to capture the  
 meanings of words and their relationships to each other. These embeddings represent words as vectors in a  
  way that words with similar meanings are located close to each other in the vector space, enabling the capture   
  of semantic relationships and nuances in language.  


- Therefore, while TF-IDF and BM25 are valuable tools for information retrieval and determining document   
relevance,they do not provide semantic embeddings of words or phrases. They are more focused on word  
occurrence and   
frequency rather than on capturing the underlying meanings and relationships of words.

### CORE IDEA

- Word2Vec employs a shallow neural network, trained on a large textual corpus, to predict the context surrounding a given word.   
The essence of Word2Vec lies in its ability to convert words into high-dimensional vectors. This representation allows the   
algorithm to capture the meaning, semantic similarity, and relationships with surrounding text. A notable feature of Word2Vec   
is its capacity to perform arithmetic operations with these vectors to reveal linguistic patterns, such as the famous analogy king - man + woman = queen.

### Word2Vec Architectures

- Word2Vec offers two distinct architectures for training:

  - Continuous Bag-of-Words (CBOW):
      This model predicts a target word based on its context words.   
        CBOW computes the conditional probability of a target word given the context words surrounding it across a window of size k.   
        The input is a summation of the word vectors of the surrounding context words, with the output being the current word.

## **CODE EXAMPLE**
## **RAG APPLICATION**

In many real-world scenarios, organizations maintain extensive collections of 
proprietary documents, 
such as technical manuals, from which precise information must be extracted. 
This challenge is often 
analogous to locating a needle in a haystack, given the sheer volume and complexity of the content.
While recent advancements, such as OpenAI’s introduction of GPT-4 Turbo, offer improved capabilities 
for processing lengthy documents, they are not without limitations. Notably, these models exhibit a 
tendency known as the “Lost in the Middle” phenomenon, wherein information positioned near the center 
of the context window is more likely to be overlooked or forgotten. This issue is akin to reading a 
comprehensive text such as the Bible, yet struggling to recall specific content from its middle chapters.
To address this shortcoming, the RAG approach has been introduced. This method involves segmenting documents 
into discrete units—typically paragraphs—and creating an index for each. Upon receiving a query, the system 
efficiently identifies and retrieves the most relevant segments, which are then supplied to the language model. 
By narrowing the input to only the most pertinent information, this strategy mitigates cognitive overload within 
the model and substantially improves the relevance and accuracy of its responses.

<img src="/Users/nanakwame/Downloads/indaba/IndabaX251/Foundational NLP/rag.png" width="400" alt="Rag Pipe">

In [2]:
import os
import pandas as pd
from typing import Iterator, AsyncIterator, List
from langchain.schema import Document
from langchain.document_loaders.base import BaseLoader

class KNUSTCsvDataLoader(BaseLoader):
    """A document loader that loads CSV documents."""

    def __init__(self, directory: str, encoding: str = 'latin1') -> None:
        """Initialize the loader with a directory.

        Args:
            directory: The path to the directory containing CSV files.
            encoding: The encoding to use for reading CSV files (default: 'latin1').
        """
        self.directory = directory
        self.encoding = encoding

    def load(self) -> List[Document]:
        return list(self.lazy_load())

    def lazy_load(self) -> Iterator[Document]:
        """A lazy loader that reads CSV files row by row."""
        for filename in os.listdir(self.directory):
            if filename.endswith('.csv'):
                file_path = os.path.join(self.directory, filename)
                try:
                    # Load CSV file with specified encoding
                    df = pd.read_csv(file_path, encoding=self.encoding)

                    # Validate required columns
                    required_columns = {"Subject", "Question", "Response"}
                    if not required_columns.issubset(df.columns):
                        raise ValueError(f"Missing required columns in {file_path}")

                    # Iterate over rows in chunks
                    for chunk in pd.read_csv(file_path, chunksize=1000, encoding=self.encoding):
                        for row in chunk.itertuples():
                            yield Document(
                                page_content=f"Subject: {row.Subject}\nQuestion: {row.Question}\nResponse: {row.Response}",
                                metadata={"subject": row.Subject}
                            )
                except UnicodeDecodeError as e:
                    print(f"Encoding error in {file_path}: {e}")
                    continue
                except Exception as e:
                    print(f"Error processing {file_path}: {e}")
                    continue

    async def alazy_load(self) -> AsyncIterator[Document]:
        """An async lazy loader that reads CSV files row by row."""
        for filename in os.listdir(self.directory):
            if filename.endswith('.csv'):
                file_path = os.path.join(self.directory, filename)
                try:
                    # Read CSV file synchronously with specified encoding
                    df = pd.read_csv(file_path, encoding=self.encoding)

                    # Validate required columns
                    required_columns = {"Subject", "Question", "Response"}
                    if not required_columns.issubset(df.columns):
                        raise ValueError(f"Missing required columns in {file_path}")

                    # Yield documents asynchronously
                    for chunk in pd.read_csv(file_path, chunksize=1000, encoding=self.encoding):
                        for row in chunk.itertuples():
                            yield Document(
                                page_content=f"Subject: {row.Subject}\nQuestion: {row.Question}\nResponse: {row.Response}",
                                metadata={"subject": row.Subject}
                            )
                except UnicodeDecodeError as e:
                    print(f"Encoding error in {file_path}: {e}")
                    continue
                except Exception as e:
                    print(f"Error processing {file_path}: {e}")
                    continue

# Usage
directory = "/Users/nanakwame/Downloads/indaba/IndabaX251/Foundational NLP/data"
loader = KNUSTCsvDataLoader(directory, encoding='latin1')
documents = list(loader.lazy_load())
print(f"Loaded {len(documents)} documents")

Loaded 6 documents


In [3]:
documents

[Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Subject: Time Travel Ethics\nQuestion: Should time travelers be allowed to invest in the stock market?\nResponse: No, it creates temporal imbalance and unfair economic advantage, violating ChronoCode 42B.'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Subject: Time Travel Ethics\nQuestion: Can altering a minor historical event avoid paradoxes?\nResponse: Even small changes can ripple unpredictably, risking causality collapse.'),
 Document(metadata={'subject': 'Human Psychology'}, page_content='Subject: Human Psychology\nQuestion: Why do people procrastinate despite knowing the consequences?\nResponse: ItÕs often due to fear of failure, task aversion, or dopamine-driven preference for instant rewards.'),
 Document(metadata={'subject': 'Human Psychology'}, page_content='Subject: Human Psychology\nQuestion: How does social media affect self-esteem?\nResponse: Frequent use can lead to social compari

## SPLITTING DOCUMENT

In [9]:
from langchain_text_splitters import TextSplitter, RecursiveCharacterTextSplitter



# Step 2: Split documents (optional, as your documents are likely short)
text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=50,
        chunk_overlap=10,
        length_function=len
    )
split_docs = text_splitter.split_documents(documents)
print(f"Split into {len(split_docs)} chunks")

Split into 35 chunks


In [10]:
split_docs

[Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Subject: Time Travel Ethics'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Question: Should time travelers be allowed to'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='to invest in the stock market?'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Response: No, it creates temporal imbalance and'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='and unfair economic advantage, violating'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='violating ChronoCode 42B.'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Subject: Time Travel Ethics'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='Question: Can altering a minor historical event'),
 Document(metadata={'subject': 'Time Travel Ethics'}, page_content='event avoid paradoxes?'),
 Document(metadata={'subject': 'Time Trave

 ## CREATE EMBEDDINGS AND VECTOR STORE

In [12]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
load_dotenv()

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store_file = FAISS.from_documents(split_docs, embeddings)
print("Vector store created")

Vector store created


In [14]:
embeddings 

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x121475550>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x121475e80>, model='text-embedding-3-large', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [13]:
vector_store_file

<langchain_community.vectorstores.faiss.FAISS at 0x121671400>

## SETUP RETRIEVER AND LANGUAGE MODEL



In [15]:
from langchain_openai import ChatOpenAI

retriever = vector_store_file.as_retriever(search_kwargs={"k": 3})
llm = ChatOpenAI(
        model="gpt-4o-mini", 
        temperature=0.7,
    )

## SETUP PROMPT TEMPLATE 

In [None]:
from langchain.prompts import PromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt_template = """Use the following pieces of context to answer the question. If you don't know the answer, say so.
    Context: {context}
    Question: {input}
    Answer: """
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "input"])




##  CREATE DOCUMENT CHAIN AND RAG CHAIN
document_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, document_chain)

In [18]:
def query_rag(rag_chain, question: str):
    """Query the RAG pipeline and return the answer with sources."""
    result = rag_chain.invoke({"input": question})
    answer = result["answer"]
    sources = result["context"]
    return answer, sources



In [19]:
# Example queries
questions = [
        "WHO IS THE PRESIDENT OF GHANA",
    ]

for question in questions:
        print(f"\nQuestion: {question}")
        answer, sources = query_rag(rag_chain, question)
        print(f"Answer: {answer}")
        print("Sources:")
        for i, doc in enumerate(sources, 1):
            print(f"{i}. {doc.page_content[:100]}...")


Question: WHO IS THE PRESIDENT OF GHANA
Answer: I don't know the answer.
Sources:
1. Subject: Human Psychology...
2. Subject: Human Psychology...
3. Subject: Programming...


In [20]:
questions = [
        "SHOULD TIME TRAVELERS BE ALLOWED TO INVEST IN STOCK MARKET",
    ]

for question in questions:
        print(f"\nQuestion: {question}")
        answer, sources = query_rag(rag_chain, question)
        print(f"Answer: {answer}")
        print("Sources:")
        for i, doc in enumerate(sources, 1):
            print(f"{i}. {doc.page_content[:100]}...")


Question: SHOULD TIME TRAVELERS BE ALLOWED TO INVEST IN STOCK MARKET
Answer: The question of whether time travelers should be allowed to invest in the stock market raises significant ethical and practical considerations. On one hand, allowing time travelers to invest could lead to market manipulation, as they would have access to future information that could give them an unfair advantage over regular investors. This could destabilize financial markets and undermine the principles of fairness and equality in investing.

On the other hand, proponents might argue that time travelers, like any other individuals, should have the right to participate in the market. However, the potential consequences of their investments could have far-reaching effects on the economy and society.

Ultimately, the decision would depend on the ethical frameworks and regulations established in a society that accommodates time travel. It may be prudent to implement strict guidelines or prohibitions to prevent 

## **CODE EXAMPLE**
## **NAMED-ENTITY RECOGNITION**

Name Entity recognition(NER) is a subtask of Natural language process(NLP) which focuses on identifying and grouping entities within a text or document.   
Entities present specific objects or names such as Persons, organizations, dates and times, countries, drugs, and various unique information within a document.

## Types-of-NER-models


NER is applied in various sectors of our daily lives; so of these applied areas are:

- Rule based NER models
- Machine learning (ML) models
- Deep learning models

Rule-based-model
A rule-based NER model is a .....

Examples of rule based approaches

a. Spacy Entity

b. NLTK

In [None]:
import nltk
import svgling
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')


sentence = "At eight o'clock on Thursday morning Arthur didn't feel very good. can arthur go to the Ghana"
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)


entities = nltk.chunk.ne_chunk(tagged)
entities

## Code-Examples-For-Spacy

In [None]:
#installations
%%capture
!pip install spacy
!pip install nltk
!python -m spacy download en_core_web_sm

import spacy
import EntityRuler

In [None]:
def spacy_rb_ner(patterns,text,model_name='en'):

  #create a blank model
  nlp = spacy.blank(model_name)

  #create an new entity to for NER
  ruler = nlp.add_pipe("entity_ruler")

  ruler.add_patterns(patterns)

  #extract components from the text
  doc = nlp(text)
  # print(doc)
  for ent in doc.ents:
      print(ent.text, ent.label_)

In [None]:
patterns = [{"label": "AGE", "pattern": [{"like_num": True}, {"lower": "years"}, {"lower": "old"}]}]
text = "John is 25 years old"
spacy_rb_ner(patterns,text)

## Exercises 1  

Please list some advantages and disadvantages as you try out these rule based name entity recognition models.

Advantages 1. 2.

Disadvantages 1. 2.


In [None]:
# Easy
#customize your own pattern and provide your text for testing
pattern =[]
text = ""
spacy_rb_ner(pattern,text)

In [None]:
#Hard

#1. dataset extraction from huggingface
import kagglehub

path = kagglehub.dataset_download("remakia/drugs-dictionary")
print("Path to dataset files:", path)

#2. load the json dictionary
import json
def read_json(json_file):

  return 0

#3. convert the drug dict into patterns
def generate_patern(drug_dict):

  return 0

json_file = "drug.json"
drug_dict = read_json(json_file)
pattern =generate_patern(drug_dict)
text = "Perfusion d'une ampoule de prexidine de lithium et introduction d'un antihistaminique par Cétirizine 10 mg x 2 par jour, avec diminution puis disparition de l'oedème."
text = text.lower()
spacy_rb_ner(pattern,text)

## Machine Learning-model
Conditional Random Fields - (provide explanation)  
SVM - (provide explanation)

Example-code
Below is an example code of Spacy, a machine learning NER model train with the theory of conditional random fields.

In [None]:
import pandas as pd
import spacy
import requests

nlp = spacy.load("en_core_web_sm")
pd.set_option("display.max_rows", 200)

content ="Esi is a 27-years-old individual who came back home from school. she is meant to go back to school soon to see her friends. Have you heard from Kwame because the last time i spoke to him, he said he was going to the Ghana, Kigali"

doc = nlp(content)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

## Exercise 1
We will train our own Spacy model.

In [None]:
def processed_data():
     return 0 

def normalization():
    return 0

def dataset_preprocessing(TRAIN_DATA,ner):

    for _, annotations in TRAIN_DATA:
        for ent in annotations.get('entities'):
                ner.add_label(ent[2])
    return ner

In [None]:
import random

from tqdm import tqdm

n_iter= 30 #number of times you want the model to train 
model= "name-of-blank-model"
nlp = spacy.load(model)


#create and set up a pipeline 
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner)

#data preprocessing
TRAIN_DATA=processed_data()
dataset_preprocessing(TRAIN_DATA)

#training 
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes):  # only train NER
    optimizer = nlp.begin_training()
    for itn in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
        for text, annotations in tqdm(TRAIN_DATA):
            nlp.update(
                [text],  
                [annotations],  
                drop=0.5,  
                sgd=optimizer,
                losses=losses)
        print(losses)

## **Conclusion and Comments From Participants**



# **Facilitator(s) Details**

**Facilitator(s):**

*   Name: FELIX TETTEH AKWERH
*   Email: felix.akwerh@knust.edu.gh



*   Name: ADWOA ASANTEWAA BREMANG
*   Email: adwoabremang@gmail.com

