<a href="https://colab.research.google.com/github/jeffheaton/app_deep_learning/blob/main/assignments/assignment_yourname_class6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/index.html)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

**Module 6 Assignment: Extract Text with LLM**

**Student Name: Your Name**

# Assignment Instructions

A [file](https://s3.amazonaws.com/data.heatonresearch.com/data/t81-558/sentences.csv) is provided that contains 100 English sentences. Sample sentences from this file include:

|id|sentence|
|---|---|
|1|Sarah found an old photograph in the attic.|
|2|By the window, Jake noticed a sparkling diamond necklace.|
|3|The antique clock was expertly fixed by Robert.|
|4|At the beach, Maria stumbled upon a washed-up bottle.|
|...|...|

For each of these sentences you should extract the name of the person from the sentence. The results of this assignment would look like the following for the above input.

|id|name|
|---|---|
|1|Sarah|
|2|Jake|
|3|Robert|
|4|Maria|
|...|...|

Use a large language model (LLM) to extract the single word action from each of these sentences.



# Google CoLab Instructions

If you are using Google CoLab, it will be necessary to mount your GDrive so that you can send your notebook during the submit process. Running the following code will map your GDrive to ```/content/drive```.

In [None]:
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# LangChain Setup

We must first install LangChain, refer to Module 6.2 for more information.

In [None]:
!pip install langchain langchain_openai

You will need a key for this assignment, for WUSTL students, look at Assignment 6 in Canvas.

In [None]:
from langchain_openai import OpenAI, ChatOpenAI

# Your OpenAI API key
# If you are in my class at WUSTL, get this key from the Assignment 6 description in Canvas.
OPENAI_KEY = '[Insert your API key]'

# This is the model you will generally use for this class
LLM_MODEL = 'gpt-3.5-turbo-instruct'

# Initialize the OpenAI LLM (Language Learning Model) with your API key
llm = OpenAI(openai_api_key=OPENAI_KEY, model=LLM_MODEL, temperature=0)

# Assignment Submit Function

You will submit the 10 programming assignments electronically.  The following submit function can be used to do this.  My server will perform a basic check of each assignment and let you know if it sees any basic problems.

**It is unlikely that should need to modify this function.**

In [None]:
import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    payload = []
    for item in data:
        if type(item) is PIL.Image.Image:
            buffered = BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif type(item) is pd.core.frame.DataFrame:
            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
    r= requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code==200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

# Assignment #6 Sample Code

The following code provides a starting point for this assignment.

In [None]:
import os
import pandas as pd
from scipy.stats import zscore
import string
from langchain.prompts import ChatPromptTemplate


# This is your student key that I emailed to you at the beginnning of the semester.
key = "uTtH5yNbPs9tZdRWsBf9V9FaQA9RU2iP5cL7F3zH" #"Gx5en9cEVvaZnjut6vfLm1HG4ZO4PsI32sgldAXj"  # This is an example key and will not work.

# You must also identify your source file.  (modify for your local setup)
file='/content/drive/My Drive/Colab Notebooks/assignment_yourname_class6.ipynb'  # Google CoLab
# file='C:\\Users\\jeffh\\projects\\t81_558_deep_learning\\assignments\\assignment_yourname_class6.ipynb'  # Windows
# file='/Users/jheaton/projects/t81_558_deep_learning/assignments/assignment_yourname_class6.ipynb'  # Mac/Linux

# Begin assignment


# Submit
submit(source_file=file,data=[df_submit],key=key,no=6)

# Assignment #6 MyCode


In [37]:
import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    payload = []
    for item in data:
        if type(item) is PIL.Image.Image:
            buffered = BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif type(item) is pd.core.frame.DataFrame:
            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
    r= requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code==200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))

In [39]:
import os
import pandas as pd
from scipy.stats import zscore
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics

key = "QGOMi9jY948rtuqknQ9Wb20gQ7BaRlg369Q6fiSX" 
file='E:\\WUSTL\\2024 SPRING\\INFO.558 Applications of Deep Neural Networks\\jheaton\\projects\\t81_558_deep_learning\\assignments\\assignment_ZihanLuo_class6.ipynb'


In [None]:
!pip install langchain langchain_openai

In [5]:
from langchain_openai import OpenAI, ChatOpenAI

# Your OpenAI API key
# If you are in my class at WUSTL, get this key from the Assignment 6 description in Canvas.

OPENAI_KEY = '' #have removed this sensitive information before committing files to a public repository on GitHub. 

# This is the model you will generally use for this class
LLM_MODEL = 'gpt-3.5-turbo-instruct'

# Initialize the OpenAI LLM (Language Learning Model) with your API key
llm = OpenAI(openai_api_key=OPENAI_KEY, model=LLM_MODEL, temperature=0)



In [21]:
df_sentences = pd.read_csv('sentences.csv')

In [44]:
results = []

In [17]:
from langchain.prompts import ChatPromptTemplate


In [26]:
PROMPT = """
Extract the name of the person from the provided sentence.

text: {text}"""

prompt_template = ChatPromptTemplate.from_template(PROMPT)

chain = prompt_template | llm

In [30]:
sentence_list = df_sentences['sentence'].tolist()
sentence_list

['Sarah found an old photograph in the attic.',
 'By the window, Jake noticed a sparkling diamond necklace.',
 'The antique clock was expertly fixed by Robert.',
 'At the beach, Maria stumbled upon a washed-up bottle.',
 'Ethan kept his diary locked at all times.',
 'Hidden under the bed, Emily found a dusty guitar.',
 'The vintage car was driven to the show by William.',
 'Between the pages of her book, Lily discovered a pressed flower.',
 'In his pocket, Tom always carried a silver coin.',
 "Jennifer's favorite painting was one of a serene lakeside.",
 'Up on the rooftop, Kevin set up a telescope.',
 'Hannah loved the porcelain doll she got for her birthday.',
 'Behind the couch, Brian discovered a lost remote control.',
 'At the flea market, Claire bought a ceramic vase.',
 "Derek's leather jacket was admired by everyone at the party.",
 'On her desk, Megan kept a crystal paperweight.',
 "Chris often played his grandfather's vintage violin.",
 'Beneath the autumn leaves, Alicia foun

In [45]:
for sentence_id, sentence in enumerate(sentence_list, start=1):
    response = chain.invoke({'text': sentence})

    name = response.strip()
    if name.startswith("Name:"):
        name = name.split(":")[1].strip()

    results.append({'id': sentence_id, 'name': name})


In [46]:
df_submit = pd.DataFrame(results)

In [47]:
df_submit.head()

Unnamed: 0,id,name
0,1,Sarah
1,2,Jake
2,3,Robert
3,4,Maria
4,5,Ethan


In [48]:
submit(source_file=file,data=[df_submit],key=key,no=6)


Success: Submitted Assignment 6 for luozihan:
You have submitted this assignment 2 times. (this is fine)
Error: For a row of column 'name', you have 'Owen', the solution is 'Oscar', note: if there are several differences, these values many not align exactly.
Error: For a row of column 'name', you have 'Penny', the solution is 'Owen', note: if there are several differences, these values many not align exactly.
Error: For a row of column 'name', you have 'Peter', the solution is 'Penny', note: if there are several differences, these values many not align exactly.
Error: For a row of column 'name', you have 'Rachel', the solution is 'Peter', note: if there are several differences, these values many not align exactly.
Error: For a row of column 'name', you have 'Rebecca', the solution is 'Rachel', note: if there are several differences, these values many not align exactly.
Error: For a row of column 'name', you have 'Robert', the solution is 'Rebecca', note: if there are several difference