# Automated article writing about your project with ChatGPT in fewer than 100 lines of code!

This Jupyter Notebook showcases an efficient method to automate article writing for your project using ChatGPT and PyDoxTools in fewer than 100 lines of code. The script demonstrates the following key functionalities:

- Indexing a directory containing files with PyDoxTools
- Employing an agent for information retrieval within those files
- Auto-generating a text based on a set objective

You can execute this notebook or simply refer to our concise script, which encompasses these steps in less than 100 lines of code:

https://github.com/Xyntopia/pydoxtools/blob/main/examples/automatic_project_writing.py


or open this notebook in colab:

https://colab.research.google.com/github/Xyntopia/pydoxtools/blob/main/examples/automated_blog_writing.ipynb

## Costs

ChatGPT is a paid service. Running this script once will cost you about 2-5 Cents. We are working on an implementation making use of ALpaca/GPT4all and similar models which can do the same for free, locally on your computer. Pydoxtools automatically caches all calls to ChatGPT. So subsequent runs usually turn out to be a little cheaper

### API key for ChatGPT

Generate an openai API-key for chatgpt here:  https://platform.openai.com/account/api-keys. You need to register an account for this.

!!! Important !!!  Do not share the API key with anybody. In order to be on a more safe side, save the API key in a file here in colab. This way you can share the notebook without sharing the API key.

Execute the cell below to create the file. The notebook will later open this file to access the API key. When the colab runtime gets automatically deleted, this file will also be deleted.


In [None]:
!touch /content/openai_api_key


then click this link: /content/openai_api_key and copy&paster your key into the file.

## Installation

After installation go to Runtime -> Restart runtime. In order to load the newly installed libraries into jupyter.


In [None]:
%%capture
!pip install -U pydoxtools[etl,inference,all]==0.6.2

In [None]:
#load the key as an environment variable:
import os
# load the key
with open('/content/openai_api_key') as f:
  os.environ['OPENAI_API_KEY']=f.read()

now we can initialize pydoxtools automatically loading our api key from the environment variable

In [None]:
import logging

import dask
from chromadb.config import Settings

import pydoxtools as pdx
from pydoxtools import agent as ag
from pydoxtools.settings import settings

logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
logging.getLogger("pydoxtools.document").setLevel(logging.INFO)

## configuration

Set dask scheduler to "synchronous" so that we can see everything thats happening locally and turn on cacching for pydoxtools for faster repeated execution

In [None]:
# pydoxtools.DocumentBag uses a dask scheduler for parallel computing
# in the background. For easier debugging, we set this to "synchronous"
dask.config.set(scheduler='synchronous')
# dask.config.set(scheduler='multiprocessing') # can als be used...

settings.PDX_ENABLE_DISK_CACHE = True  # turn on caching for pydoxtools

## download our project

you could also mount a google drive here. or simply load a folder on your computer is you#re running this notebook locally on your computer.

In [None]:
!cd /content
!git clone https://github.com/Xyntopia/pydoxtools.git

## Index initialization

Set our vectorstore settings. We are using chromadb here.

In [None]:
##### Use chromadb as a vectorstore #####
chroma_settings = Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory=str(settings.PDX_CACHE_DIR_BASE / "chromadb"),
    anonymized_telemetry=False
)
collection_name = "blog_index"

# create our source of information. It creates a list of documents
# in pydoxtools called "pydoxtools.DocumentBag" (which itself holds a list of pydoxtools.Document) and
# here we choose to use pydoxtools itself as an information source!
root_dir = "/content/pydoxtools"
ds = pdx.DocumentBag(
    source=root_dir,
    exclude=[  # ignore some files which make the indexing rather inefficient
        '.git/', '.idea/', '/node_modules', '/dist',
        '/__pycache__/', '.pytest_cache/', '.chroma', '.svg', '.lock',
        "/site/"
    ],
    forgiving_extracts=True
)


## Initialize agent, give it a writing objective and compute the index

For pydoxtools as in this example his will take about 5-10 minutes. And we will load about 4000 text snippets into our vector index for the [pydoxtools](https://github.com/Xyntopia/pydoxtools) project..

In [None]:
final_result = []

agent = ag.Agent(
    vector_store=chroma_settings,
    objective="Write a blog post, introducing a new library (which was developed by us, "
              "the company 'Xyntopia') to "
              "visitors of our corporate webpage, which might want to use the pydoxtools library but "
              "have no idea about programming. Make sure, the text is about half a page long.",
    data_source=ds
)
agent.pre_compute_index()

## Search for relevant Information

First answer a basic question, to get the algorithm started more quickly...

In [None]:

# first, add a basic answer, to get the algorithm started a bit more quickly :) 
# we could gather this information from a user in a "real app"
agent.add_question(question="Can you please provide the main topic of the project or some primary "
                            "keywords related to the project, "
                            "to help with identifying the relevant files in the directory?",
                    answer="python library, AI, pipelines")


Then search for more answers in our index

In [None]:
# first, gather some basic information...
questions = agent.execute_task(
  task="What additional information do you need to create a first, very short outline as a draft? " \
        "provide it as a ranked list of questions", save_task=True)
# we only use he first 5 questions to make it faster ;).
agent.research_questions(questions[:5], allowed_documents=["text/markdown"])


now write the text...

In [None]:
txt = agent.execute_task(task="Complete the overall objective, formulate the text "
                              "based on answered questions and format it in markdown.",
                          context_size=20, max_tokens=1000, formatting="txt")
final_result.append(txt)  # add a first draft to the result

critique = agent.execute_task(task="Given this text:\n\n```markdown\n{txt}\n```"
                                    "\n\nlist 5 points of critique about the text",
                              context_size=0, max_tokens=1000)

tasks = agent.execute_task(
    task="Given this text:\n\n```markdown\n{txt}\n```\n\n"
          f"and its critique: {critique}\n\n"
          "Generate instructions that would make it better. "
          "Sort them by importance and return it as a list of tasks",
    context_size=0, max_tokens=1000)

for t in tasks:
    task = "Given this text:\n\n" \
            f"```markdown\n{txt}\n```\n\n" \
            f"Make the text better by executing this task: '{t}' " \
            f"and integrate it into the given text, but keep the overall objective in mind."
    txt = agent.execute_task(task, context_size=10, max_tokens=1000, formatting="markdown")
    final_result.append([task, txt])


In [None]:
# for debugging, you can see all intermediate results, simply uncomment the variable to check:

#final_result  # for the evolution of the final text
#agent._debug_queue  # in order to check all requests made to llms and vectorstores etc...

## Final text

after all the processing, here is the final text:

In [None]:
print(txt)