### __Reading Data__

In [None]:
from langchain.document_loaders.pdf import PyPDFDirectoryLoader

PATH = './data/in_use'

def load_documents(path: str) -> PyPDFDirectoryLoader:
    """Load PDF documents and print its content."""
    loader = PyPDFDirectoryLoader(path)
    return loader.load()

docs = load_documents(PATH)

In [2]:
print(docs[0])

page_content='' metadata={'producer': 'calibre (5.21.0) [https://calibre-ebook.com]', 'creator': 'calibre (5.21.0) [https://calibre-ebook.com]', 'creationdate': '2021-06-28T05:28:50+00:00', 'author': 'Desconocido', 'moddate': '2021-06-28T21:02:08-06:00', 'title': 'Sincerely, Your Autistic Child', 'source': 'data\\in_use\\source (1).pdf', 'total_pages': 201, 'page': 0, 'page_label': '1'}


### __Semantic Chunking w/ OpenAI Embeddings__

In [3]:
from dotenv import load_dotenv

load_dotenv()

True

In [4]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings

text_splitter = SemanticChunker(
    OpenAIEmbeddings(), breakpoint_threshold_type="percentile" # Look through params, see if breakpoint_threshold_type is optimal
)

In [5]:
pages = []
for doc in docs:
    pages.append(text_splitter.create_documents(texts=[doc.page_content], metadatas=[doc.metadata]))

In [6]:
len(pages)

1895

In [7]:
import itertools
pages_chunks = list(itertools.chain.from_iterable(pages))

In [8]:
len(pages_chunks)

4197

In [9]:
type(pages_chunks)

list

In [10]:
type(pages)

list

### __Store in DB__

In [11]:
print(pages_chunks[0])

page_content='' metadata={'producer': 'calibre (5.21.0) [https://calibre-ebook.com]', 'creator': 'calibre (5.21.0) [https://calibre-ebook.com]', 'creationdate': '2021-06-28T05:28:50+00:00', 'author': 'Desconocido', 'moddate': '2021-06-28T21:02:08-06:00', 'title': 'Sincerely, Your Autistic Child', 'source': 'data\\in_use\\source (1).pdf', 'total_pages': 201, 'page': 0, 'page_label': '1'}


In [None]:
from langchain.vectorstores.chroma import Chroma

def create_db(pages_chunks: list) -> Chroma:
    db = Chroma.from_documents(
        documents=pages_chunks,
        embedding=OpenAIEmbeddings(),
        persist_directory='./data/db'
    )
    db.persist()
    return db

db = create_db(pages_chunks)

In [19]:
type(db)

langchain_community.vectorstores.chroma.Chroma

### __Semantic Search Params__

In [22]:
QUERY = "My son was recently diagnosed with autism. I am his dad and I don't get what this means. Could you explain from an autistic person's point of view?"

In [None]:
search_results = db.similarity_search_with_score(QUERY, k=5) # Not using the score, as of yet. Also, try to find the most optimal search.

### __Prompting the Model__

In [None]:
PROMPT = '''
Answer the question primarily based on the following context:
{context}

------------------------------------- v ------------------------------------
Answer the question primarily based on the above context:
{question}
'''

In [25]:
from langchain_core.prompts import ChatPromptTemplate

context = '\n\n------------------------------------- v ------------------------------------\n\n'.join([piece.page_content for piece, _score in search_results])
final_prompt = ChatPromptTemplate.from_template(PROMPT).format(context=context, question=QUERY)

In [26]:
from langchain_community.llms.ollama import Ollama

model = Ollama(model='aya-expanse:latest')
response = model.invoke(final_prompt)
print(response)

  model = Ollama(model='aya-expanse:latest')


Based on the provided text, here’s an explanation about autism from the perspective of an autistic individual, particularly tailored to your situation as a newly informed parent:

**First, remember that your son is still the same child he was before the diagnosis.** The autism spectrum is a part of who he is, but it doesn't define *all* of him. He's still your son with his unique personality, interests, and abilities.

**Autism means that his brain processes information differently than typically developing individuals.** This can manifest in various ways:

* **Sensitivity to sensory input:**  Things like noises, lights, textures, or even smells might feel overwhelming or incomprehensible to him. Think of it as his senses being more intense.
* **Challenges with communication:** He may have difficulty expressing himself verbally or non-verbally. This could mean he struggles with words, gestures, or understanding social cues. 
* **Difficulties with social interactions:** He might find it

In [27]:
response = model.invoke(QUERY)
print(response)

As an AI language model, I cannot fully replicate the experience of being autistic, but I can provide you with some insights from an autistic individual's perspective to help you understand your son's diagnosis better. Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition that affects each person uniquely, so every autistic individual will have their own set of strengths and challenges. Here's what it might be like from an autistic person's point of view:

1. **Sensory Perception:** Autistic individuals often experience the world through their senses differently. This can mean heightened or reduced sensitivity to sounds, lights, textures, tastes, and smells. For example, certain fabrics might feel uncomfortable, specific noises (like vacuum cleaners or loud machinery) could be unbearable, or bright lights may cause discomfort or even pain. Understanding that this is not normal for neurotypical people (those without autism) can help you recognize your son's sensory ex

In [28]:
print(search_results)



### __Test the Outputs__

In [None]:
# Maybe print out the sources as well??
# Mess around with params and search/chunking params
# Experiment with more outputs