<a href="https://colab.research.google.com/github/NiekVerhoeff/workshop/blob/main/chat_with_zip_openai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Chat with your files - OpenAI**

This notebook demonstrates a Retreival Augmented Generation system that can read data from different files zipped in a zip-file.

For this notebook you need an OpenAI API KEY. If your using Colab, you can set an environmentvariable with the name OPENAI_API_KEY in the secrets menu. You can find the secrets menu under the key symbol on the left.

# Setting things up

First we need to install some packages. Then we can upload a zip-file to the Colab. After that we import packages, libraries and secrets. If you haven't given this Colab access to your secrets, it will ask for it when you execute Initialize things. Finally in this stage you will read the files and embed the textdata into nodes.

In [None]:
#@title Initialize things

!pip install llama-index
!pip install docx2txt
#!pip install torch transformers python-pptx Pillow
%pip install llama-index-readers-web
%pip install llama-index-program-openai
%pip install llama-index-llms-openai

from llama_index.core import SimpleDirectoryReader
import nest_asyncio

nest_asyncio.apply()

import os
import openai
from google.colab import userdata
openai.api_key = userdata.get('OPENAI_API_KEY')
from pydantic import BaseModel, Field
from typing import List
from typing import Dict
from typing import Any
from llama_index.program.openai import OpenAIPydanticProgram
from llama_index.core.extractors import PydanticProgramExtractor
from llama_index.core.node_parser import SentenceSplitter

from llama_index.core.ingestion import IngestionPipeline

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

In [None]:
#@title Upload a zip-file with documents
#@markdown supported extensions are listed here: https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/

from google.colab import files
import zipfile
import os

# Upload the ZIP file
uploaded = files.upload()  # Select and upload the ZIP file

# Assuming there's only one ZIP file uploaded, get its filename
zip_filename = next(iter(uploaded.keys()))

# Extract the ZIP file
with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
    zip_ref.extractall()

print("Folder structure has been extracted.")


In [None]:
#@title Load data

# @markdown If you're working in Colab, go to the files menu by clicking the folder symbol on the left. Right click (or click on the three dots) on the folder you uploaded and choose Copy Path (Pad kopiëren) and paste below. Check the recursive box if the given directory has subdirectories.

directory = '/content/bronnen_in_bytes'#@param {type:"string"}

reader = SimpleDirectoryReader(
    input_dir=f"{directory}",
    recursive=False # @param {type:"boolean"}
)

all_docs = []

for docs in reader.iter_data():
    for doc in docs:
        doc.text = doc.text.upper()
        print(doc.text)
        all_docs.append(doc)

print(len(all_docs))

In [27]:
#@title Chat Away!

llm = OpenAI(model="gpt-4")
index = VectorStoreIndex.from_documents(all_docs)

chat_engine = index.as_chat_engine(chat_mode="best", llm=llm, verbose=True)

prompt = "Wat is er met het Gemeenteblad gebeurd?" #@param {type:"string"}

response = chat_engine.chat(
    prompt
)