In [1]:
## IPython extensions to reload modules before executing user code.
# Autorelad is an IPython extension to reload modules before executing user code.
%load_ext autoreload

# Reload all modules (except those excluded by %aimport) every time before executing the Python code typed.
%autoreload 2

In [2]:
import iep_goal_generator

from rag_utils import StudentProfile

### Setting Variables

In [3]:
### Set the OEPN AI Key

open_ai_key = "MY_OPEN_AI_API_KEY"  # Replace with your actual API key


### Create a New Agent
This process create a new **IEP Goal Generator system**. If not vector store path is provided:
- All available data are collected, and processed
- The data is then used to create a vector store, which will be subsequently used to by the system to generate goals

In [4]:
my_iep_assistant =  iep_goal_generator.My_IEP_Goal_Generator(open_ai_key=open_ai_key, model = "gpt-4")
my_iep_assistant.vectorstore

FAISS Vector database created successfully


<langchain_community.vectorstores.faiss.FAISS at 0x7a1d7e383c70>

### Creeate IEP Goals for a Student

**Student Information**
- 15-year-old sophomore with a behavior disorder
- Completed the O*Net Interest Profiler assessment
- Shows strong interest in the "Enterprising" category
- Career interests: retail sales, driver/sales worker
- Prefers hands-on lear.ning over academic instruction


**Assessment Results**

- O*Net Interest Profiler indicates strength in Enterprising activities
- Career suggestions include retail salesperson or driver/sales worker
- Student interview ("Vision for the Future") indicates interest in working at Walmart



### How does the assistant generate goals?

1. The assistant retrieves relevant documents. Optionally, the user can provide:
  1. A number (k) of most relevant documents. The default is set to 5.
  2. The minimum similarity score for a document to be kept. This would ensure that the assistant be more or less selective. The default is set to None.

2. Once the most relevant documents have been retrieved, the assistant uses the specified student profile to fill the prompt template described [here](rag_utils), which is then used to generate goals.
   1. The template instructs the agent to stick to factual data provided in the documents.
   2. It provides a format for the goals description, and instructs the agent to explain how the goals align to career and education standards.

In [5]:
%%time

clarences_profile = StudentProfile(
    name="Clarence",
    age='15',
    grade="10",
    career_interest_or_category="retail sales, driver/sales worker",
    learning_preferences="prefers hands-on learning over academic instruction",
    onnet_results="Enterprising",
    career_suggestions="retail salesperson or driver/sales worker",
    preferred_employers="Walmart"
)


iep_goals, relevant_docs = my_iep_assistant.generate_iep_goals(student_profile=clarences_profile
                                                , k=5
                                                , min_sim_score=None
                                                )


print("\nIEP GOALS:\n----------")
for line in iep_goals.content.split('\n'):
    print(line)

query = IEP goals, IEP transition plan, disabilities act, academic standards, career profiles for retail salesperson or driver/sales worker.
doc_info_categories =          {'career_profile', 'state_standards', 'idea'}
Number of retrieved documents: 5



IEP GOALS:
----------
**Postsecondary Goal:**  

1. Employment
    - Clarence will gain part-time employment in a retail sales position at Walmart or comparable retail outlet, working at least 20 hours per week upon graduation.

2. Education/Training
    - Clarence will complete a retail sales training program or specialized training for driver/sales workers within one year of high school graduation.

**Annual IEP Goal:**  
Clarence will develop his customer service and selling skills, including effective communication and understanding the products he is selling to be able to explain the features to customers. He will also make progress towards a proficiency in handling and identifying security risks by the end of the school year.

**S

In the example below, we restrict the similarity search to return at most five documents, but with the constraint that the similarity score must be 0.35 or higher. This will only return three four documents, compared to five earlier.

In [6]:
clarences_profile = StudentProfile(
    name="Clarence",
    age="15-year-old sophomore",
    grade="Sophomore / 10th grade",
    career_interest_or_category=" retail sales, driver/sales worker",
    learning_preferences="prefers hands-on learning over academic instruction",
    onnet_results="Enterprising",
    career_suggestions="retail salesperson or driver/sales worker",
    preferred_employers="Walmart"
)


iep_goals_strict, relevant_docs_strict = my_iep_assistant.generate_iep_goals(student_profile=clarences_profile
                                                , k=5
                                                , min_sim_score=0.35
                                                )


print("\nIEP GOALS:\n----------")
for line in iep_goals_strict.content.split('\n'):
    print(line)

query = IEP goals, IEP transition plan, disabilities act, academic standards, career profiles for retail salesperson or driver/sales worker.
doc_info_categories =          {'career_profile', 'state_standards'}
Number of retrieved documents: 4



IEP GOALS:
----------
**Postsecondary Goal:**  

1. Employment
Upon graduation, Clarence will independently maintain a part-time job within the retail sales or driving industry, for instance, at a preferred employer such as Walmart.

2. Education/Training
Upon graduation, Clarence will participate in job-specific training for retail sales and/or driver sales, focusing specifically on hands-on, practical learning experiences.

**Annual IEP Goal:**  
By the end of the school year, Clarence will demonstrate improved employability skills pertinent to the retail and driver sales industry, such as effective communication, understanding and applying sales techniques, and recognizing security risks with at least 80% accuracy, as measured by role-play s

#### Example of a profile that is not applicable
Here, we test the generator to see if it generates goals related to a career profile for which it has no documentation.
As you can see, the agent explains why no IEP goals can be generated:
 - No documentation found in the 'career_profile' category for the 'cinematographer' profession.

We have classified the documents in several categories, including 'career_profile', 'state_standards', and 'idea'. We constrain the assistant to proceed with generation only if at least one relevant document could be found in the 'career_profile', and 'state_standards' category. This reduces the likelihood that the agent will make stuff up.


In [7]:

daniellas_profile = StudentProfile(
    name="Daniella",
    age=13,
    grade= "freshman",
    career_interest_or_category="cinematographer",
    learning_preferences="prefers academic instruction",
    onnet_results="Enterprising",
    career_suggestions="cinematographer",
    preferred_employers="Paramount; Amazon Pictures"
)

iep_goals_3, relevant_docs_3 = my_iep_assistant.generate_iep_goals(student_profile=daniellas_profile
                                                , k=5
                                                , min_sim_score=None
                                                )


print("\nIEP GOALS:\n----------")
for line in iep_goals_3.content.split('\n'):
    print(line)

query = IEP goals, IEP transition plan, disabilities act, academic standards, career profiles for cinematographer.
doc_info_categories =          {'state_standards', 'idea'}
Number of retrieved documents: 5



IEP GOALS:
----------
No relevant document could be found for:
- the specific career interest(s) or suggestion(s): cinematographer 
- in the categories: career_profile



### Assessing IEP Goal Generator

Expected results have been provided for the sample Profile (Clarence). WE WILL FOCUS ONLY ON THE 'Measurable postsecondary goals' and 'Measurable annual goal aligned with standards'.

1. Measurable postsecondary goals:
 - Employment: "After high school, Clarence will obtain a full-time job at Walmart as a sales associate."
 - Education/Training: "After high school, Clarence will complete on-the-job training provided by Walmart and participate in employer-sponsored customer service workshops."

2. Measurable annual goal aligned with standards:

 - "In 36 weeks, Clarence will demonstrate effective workplace communication and customer service skills in role-play and community-based instruction settings by appropriately greeting customers, maintaining eye contact, listening actively, and responding to customer questions in 4 out of 5 observed opportunities."

#### Using Semantic similarity
An advantage of using semandic similarity over other methods (e.g.. BLEU, ROUGE) is that it evaluates how similar the meaning of two texts is — not just how similar the words are.

In [8]:
from langchain.embeddings import OpenAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity

reference_goals = '1. Measurable postsecondary goals: \
 - Employment: "After high school, Clarence will obtain a full-time job at Walmart as a sales associate."\
 - Education/Training: "After high school, Clarence will complete on-the-job training provided by Walmart and participate in employer-sponsored customer service workshops."\
2. Measurable annual goal aligned with standards: \
 - "In 36 weeks, Clarence will demonstrate effective workplace communication and customer service skills in role-play and community-based instruction settings by appropriately greeting customers, maintaining eye contact, listening actively, and responding to customer questions in 4 out of 5 observed opportunities."'


## CGals generated with more documents
iep_generated_goals_1 = iep_goals.content.split('IEP GOALS')[-1].split('Short-Term Objectives')[0]

## Goals generated with less documents
iep_generated_goals_2 = iep_goals_strict.content.split('IEP GOALS')[-1].split('Short-Term Objectives')[0]


## We use the same embedding model that was used to embed the documents
model = OpenAIEmbeddings()

# Embedding the expected output
reference_vec = model.embed_query(reference_goals)

# Embedding the goals generated with more documents
generated_vec_1    = model.embed_query(iep_generated_goals_1)

# Embedding the goals generated with more documents
generated_vec_2    = model.embed_query(iep_generated_goals_2)

similarity_1 = cosine_similarity([generated_vec_1], [reference_vec])[0][0]
similarity_2 = cosine_similarity([generated_vec_2], [reference_vec])[0][0]
print(f"Semantic similarity (1): {similarity_1:.3f}")
print(f"Semantic similarity (2): {similarity_2:.3f}")



  model = OpenAIEmbeddings()


Semantic similarity (1): 0.937
Semantic similarity (2): 0.921


The semantic similarity between the expected and generated goals is above 0.9, which indicates that they almost have the same meaning and indications.

The higher similarity score 0.933 suggests that the first goal report (retrieved with more documents) si semantically more similar to the second one, generated from a lower number relevant documents. While this makes sense, it is not always the case, due to the stochastic nature of text generation processes (unless controlled).

#### Assessing the quality, relevance and alignment of the goals using a Custom Function

It is important to test the system and assess the quality of relevance of the proposed goals. We have defined a scoring function that accesses the goals with respect to the following categories:
1. The 'SMART'ness of the goals. Are they 1) specific to the Student; 2) measurable - based on academic perfomance assessment; 3) Achievable; 4) relevant to career suggestion or interest; and time-bound
2. The alignment to career interests and state standards: 

We also make sure that relevant information are specificied in the expected sections of the goals. For instance, the short-term objectives must provide time elements to ensure a correct assement of the progression of the student towars its goals. The Annual IEP Goal also states what must be achieved, and how this could be measured.



--

In the first goal assessment below, the proposed goals clearly are SMART, and align with both career and educational standards. The documents retrieved were relvant and informative enough to generate such goals.

In [9]:
## First goal (more documents retrieved)

iep_goal_generator.GoalAssessment.evaluate_iep_goal(goal_as_text=iep_goals.content, student_profile=clarences_profile, retrieved_docs=relevant_docs)  

Total Score: 6/7


{'Specific': True,
 'Measurable': True,
 'Achievable': False,
 'Relevant': True,
 'Time-bound': True,
 'Aligned with Career Interest': True,
 'Aligned with Standards': True}

In the second goal assessment below, we used Goals generated with less documents. As you can see, some aspects are missing from the goals.

In [10]:
## First goal (less documents retrieved)

iep_goal_generator.GoalAssessment.evaluate_iep_goal(goal_as_text=iep_goals_strict.content
                                                    , student_profile=clarences_profile
                                                    , retrieved_docs=relevant_docs_strict
                                                    )  

Total Score: 6/7


{'Specific': True,
 'Measurable': True,
 'Achievable': True,
 'Relevant': True,
 'Time-bound': False,
 'Aligned with Career Interest': True,
 'Aligned with Standards': True}

### Just for Fun

We have added a RAG pipeline, that can be used to question the agent about IEP goals and different career profiles.

In [11]:
my_iep_assistant.create_rag_pipeline(k=5)


# query = "What are the professional skills required for a salesperson? Return the answer in bullet points."
query = "What is the typical salary for a truck driver? "

response = my_iep_assistant.qa_chain.invoke({"query": query})

print("\nQuestion:", query)
print("\nAnswer:")

for line in response["result"].split(".")[:-1]:
    print(f"\t{line.strip()}.")


Question: What is the typical salary for a truck driver? 

Answer:
	The median annual wage for light truck drivers was $44,140 in May 2024.


#### Launch an interactive demo

In [20]:
my_iep_assistant.launch_interactive_convo()


--- RAG Interactive Demo ---
Type 'exit' to end the demo


Question: How long does one need to study to become a data scientist?

Answer:
--------
- Data scientists typically need at least a bachelor’s degree in mathematics, statistics, computer science, or a related field. 
- Some employers require or prefer that candidates have a master’s or doctoral degree.
- Specific time to study or duration of these degrees is not mentioned in the context.


Question: What is the median salary range for data sciensts?

Answer:
--------
- The median annual wage for data scientists was $112,590 in May 2024.
- The lowest 10 percent earned less than $63,650.
- The highest 10 percent earned more than $194,410.


Question: what core/soft skills and attributes are required of data scientists?

Answer:
--------
- Computer skills: Ability to write code, analyze data, develop or improve algorithms, and use data visualization tools.
- Communication skills: Ability to convey the results of their analysis to

### Challenges

1. Certain government webpages (e.g.: BLS) could not be scraped automatically using a web crawler. This requires that the documents be manually downloaded.
   1. The documents must be manually updated and then parsed regularly, just to make sure that any changes are taken into consideration
2. In some cases that seem trivial, the state educational standard documents could only be rertrieved if the k-limit was increased
   1. A way to solve it would be to enforce the addition of documents for the relevant state educational standards
3. The system can be sentitive to the exact words used to descibe the career interests (e.g.: "retail salesperson" or "truck driver" vs "Driver/sales worker")
4. The assessment score could be better. We tweaked it a few times to add possible keywords that can indicate (time-boundedness, measuability of goals, etc.)
   1. There for, one would have to parse several documents to extract such words
5. With the limited system infrastructure, it is apparent that adding many more documents (for career profiles, different states, etc.), could significantly increase the time requirements.
