# Creating a full project

## Example: Image Scribe

In our example, we will create a python pipeline for extracting images from pdfs, transcribing them to text using OCR, and storing that in a database. This application is particularly for PDFs that store content as images, such as scans of books or receipts.

### Building in stages

We are going to create a whole project using SimpleCoder and this Jupyter notebook. We are going to create multiple files which are going to be interdependant and must have consistent use of classes, function signatures, etc.

To do this, we are going to direct SimpleCoder to build out the project in several stages. Each stage will take input from the previous stages and add new details and features. 


### Setup

In [None]:
from simplecoder import SimpleCoder

C_WORKING_DIR = "~/Documents/projects/image-scribe"

### Use the git, young Padawan

Git will make it super easy for us to review the code that SimpleCoder generates at each stage.

In [None]:
!cd {C_WORKING_DIR}
!git init

## Step 1: about.md

### Create a project plan

This is our first step and the key here is to create some more information about your project from a brief description.

We will then use this about.md file as a backdrop for most of our other commands, to give our agent the proper context for task execution.

### "Prompt Engineering" - how to use words good

This portion of the instruction is actually key: **for a developer to implement**

It creates a very sepcific perspective and goal for the LLM to follow in writing this up, particularly that a developer will read this and will need certain information.

In [None]:
# Breaking these two parts out for clarity:

project_description="I need to build a python pipeline to process all PDFs in a directory, extract all pages from the pdf as images, then do ocr on all the images to find text, and then save the text to a mongodb database."

task_instruction="Write a description of the pipeline for a developer to implement. Use markdown, diagrams, text, or whatever you need to communicate the pipeline."

agenda = {
    "requirements": project_description + task_instruction,
    "output_file_name": "about.md",
    "working_dir": C_WORKING_DIR
}

# Create and run the agent
coder = SimpleCoder(**agenda)
print( await coder.run() )

### Read before you do the deed

General rule of thumb: Don't blindly trust AI output.

Let's take a look at this output before we start creating files. If we want, we can edit our instructions and try again.



In [None]:
!cat {C_WORKING_DIR}/"about.md"

## Step 2: files.md

### Create list of project resources

In [None]:

agenda = {
    "requirements": "You are a project manager. Read the file about.md and create a list of files that need to be created by a developer, in chronological order.",
    "output_file_name": "files.md",
    "input_file_list": ["about.md"],
    "working_dir": C_WORKING_DIR,
    "force_code": False
}

# Create a coder agent using the agenda
coder = SimpleCoder(**agenda)

# Run the agent
await coder.run()

### Review

Check the list of files we are going to be creating and remove anything extra like placeholders.

In [None]:
!cat {C_WORKING_DIR}/"files.md"

## Step 3: file_details.json

### Feed forward loop

An important aspect of our construction mechanism is to include previous outputs in our new request. 

Note the use of `input_file_list` for this purpose.

The meta data we are creating in this step will be used in the same way in the following steps.

In [None]:
agenda = {
    "requirements": """
    Read the file about.md file with the project description, then the files.md with the list of files will need, and create a file with a detailed description of each file that needs to be created.
    Include all necessary function signatures and docstrings in the description.
    List them in the order in which they need to be created, with the dependencies first.
    Use json fomrat like this {"files":[{"name":"filename.ext","description":"description of file"}]}.""",
    "output_file_name": "files_details.json",
    "input_file_list": ["about.md", "files.md"],
    "working_dir": C_WORKING_DIR,
    "force_code": False
}

# Create a coder agent using the agenda
coder = SimpleCoder(**agenda)

# Run the agent
await coder.run()

In [None]:
!cat {C_WORKING_DIR}/files_details.json

## Step 4: Create All Project Files

We are going to add some extra `voodoo` here to iterate over the list of files we created befofre, and execute SimpleCoder for each one.

In [None]:
# read the json file
import json
import os

file_path = os.path.expanduser(os.path.join(C_WORKING_DIR,"files_details.json"))

with open(file_path, "r") as f:
    data = json.load(f)

if not('files' in data and data['files']):
    raise Exception("Could not read files from json file")

# Make sure to include our project description
file_names_all=["about.md"]

# for each file instance, create a file with the name and description
for file in data['files']:
    print(f"working on file: {file['name']}")
    description = json.dumps(file, indent=4)

    agenda = {
        "requirements": f"{description}",
        "output_file_name": file['name'],
        "input_file_list": file_names_all,
        "working_dir": C_WORKING_DIR,
        "force_code": True
    }

    # Create a coder agent using the agenda
    coder = SimpleCoder(**agenda)

    # Run the agent
    await coder.run()

    # Add the file to our list for the next input stage
    file_names_all.append(file['name'])

print ("DONE")

## Step 5: Iterating and Refactoring

It is inevitable that the output code will require either fixes, changes in function, or new features. Refactoring with an LLM can make it super simple to make a lot of changes very quickly.

### Save that brain energy for the hard stuff!

We can use a refactoring routine as follows:

In [None]:
agenda = {
    "requirements": f"Refactor this file to accept all paths as parameters.",
    "output_file_name": "ocr.py",
    "working_dir": C_WORKING_DIR,
    "force_code": True
}

# Create a coder agent using the agenda
coder = SimpleCoder(**agenda)

# Run the agent
await coder.run()