### Before start
- Initialize environment running this command: 
 python -m venv .venv

In [None]:
# Libs to install
!pip install langchain
!pip install python-dotenv
!pip install openai
!pip install pypdf
!pip install bs4


### Libraries & GPT Settings

In [25]:
# Libraries
import os
import openai
import os
import datetime

from dotenv import load_dotenv, find_dotenv
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import CharacterTextSplitter


In [4]:
# GPT API settings
os.environ["OPENAI_API_KEY"] = "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"

_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

current_date = datetime.datetime.now().date()

target_date = datetime.date(2024, 6, 12)

if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

## Functions

In [51]:
def chatWithGPT(prompt, model=llm_model):
    """
    chatWithGPT send the message to ChatGPT API and returns its answer
        :prompt: is the user prompt
        :model: (optional) indicates the GPT model
        :return: returns the answer from ChatGPT
    """
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    return response.choices[0].message["content"]

def readPDF(file_path):
    """
    readPDF loads all PDFs information
        :file_path: directory path of the pdf to load
        :return: returns the pdf text pages
    """
    loader = PyPDFLoader(file_path)
    pages = loader.load()
    return pages

def readWebURL(web_url):
    """
    readWebURL loads Web page's text information
        :web_url: URL of the web page to load
        :return: returns the pdf text pages
    """
    loader = WebBaseLoader(web_url)
    pages = loader.load()
    return pages

def getAllData(data_dirpath):
    """
    getAllData loads all data using readFunctions
        :data_dirpath: Directory path of all files to load
        :return: returns all data in a string
    """
    all_texts = ""
    dir_list = os.listdir(data_dirpath)

    for file in dir_list:
        if ".pdf" in file:
            # Read PDFs
            pdf_file = readPDF(data_dirpath + "/" + file) 
            for page in pdf_file:
                all_texts += page.page_content
        elif ".txt" in file:
            # Read web URLs in txt
            with open(data_dirpath + "/" + file) as f:
                lines = f.readlines()
                f.close()
            for line in lines:
                web_pages = readWebURL(line)
                for page in web_pages:
                    all_texts += page.page_content
                    
    return all_texts            
    
    
def getChunkText(text):
    """
    getChunkText function chunks all text data in chunks
        :text: text data
        :return: chunks of data
    """
    text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap = 200,
    length_function = len
    )

    chunks = text_splitter.split_text(text)
    return chunks

## Main

### Loading data

In [23]:

all_texts = getAllData("data")


#### all_texts in the output

In [24]:
print(all_texts)

An eBook byWomen’s Health
EXERCISE & 
EXERCISE & WOMEN’S HEALTH ESSAFOREWORD
ESME SOAN
From puberty to pregnancy, menarche to menopause, exercise is a form of medicine and a hugely important 
modality to support women’s health. One in two women in Australia are not sufficiently physically active, a 
statistic that is contributing to the rates of chronic disease burden for women.
‘Women’s Health’ is an umbrella term, used to describe all manner of health conditions and life stages for 
women, from puberty, pregnancy, postpartum and menopause. Through each of these life stages, exercise has 
an important therapeutic application for preventative health. Exercise really is a form of medicine – research 
has shown that it begins even before birth! In utero, exposures to physical activity can even change phenotype 
expressions for children of exercising mums, showing that exercise in pregnancy has both short and long term 
positive health effects for mum and child!
Exercising right through m

### Splitting

In [52]:
chunks = getChunkText(all_texts)

Created a chunk of size 7911, which is longer than the specified 1000
Created a chunk of size 1978, which is longer than the specified 1000
Created a chunk of size 1050, which is longer than the specified 1000
Created a chunk of size 1718, which is longer than the specified 1000


#### chunks in ouput

In [57]:
print(chunks[0])

An eBook byWomen’s Health
EXERCISE & 
EXERCISE & WOMEN’S HEALTH ESSAFOREWORD
ESME SOAN
From puberty to pregnancy, menarche to menopause, exercise is a form of medicine and a hugely important 
modality to support women’s health. One in two women in Australia are not sufficiently physically active, a 
statistic that is contributing to the rates of chronic disease burden for women.
‘Women’s Health’ is an umbrella term, used to describe all manner of health conditions and life stages for 
women, from puberty, pregnancy, postpartum and menopause. Through each of these life stages, exercise has 
an important therapeutic application for preventative health. Exercise really is a form of medicine – research 
has shown that it begins even before birth! In utero, exposures to physical activity can even change phenotype 
expressions for children of exercising mums, showing that exercise in pregnancy has both short and long term 
positive health effects for mum and child!
