In [None]:
#| hide
from formalyzer.core import *

# formalyzer

> Analyze PDF and web forms and fill in the forms

## Description: 
Formalyzer will scrape the text from the PDF recc letter, 
and for each URL in url_list, it will: 

- launch a browser tab for that url 
- fill in the form using what the LLM has gleaned from the recc letter
- attach the PDF via the form's upload/attachment button

...and do no more. 

The user will need to review the page and press the Submit button manually.


### Requirements:  
- Either `ollama` installed locally or `ANTHROPIC_API_KEY` environment variable set
- `beautifulsoup4, playwright, claudette, lisette, pypdf, fastcore` 

## Usage

On MacOS, startup the Chrome browser looking to port 9222 by executing this command in the terminal: 
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
```
Then you can run this command: 
```bash
formalyzer --debug <recc_info.txt> <recc_letter.pdf> <url_list.txt>
```
where `recc_info.txt` contains information about the recommender, their name, their title, their address, phone number and email. 
`urls_list.txt` is a file containing one URL per line.

### Installation

Install latest from the GitHub [repository][repo]:

```sh
$ pip install git+https://github.com/drscotthawley/formalyzer.git
```

or from [conda][conda]

```sh
$ conda install -c drscotthawley formalyzer
```

or from [pypi][pypi]


```sh
$ pip install formalyzer
```


[repo]: https://github.com/drscotthawley/formalyzer
[docs]: https://drscotthawley.github.io/formalyzer/
[pypi]: https://pypi.org/project/formalyzer/
[conda]: https://anaconda.org/drscotthawley/formalyzer


After installing, users need to run `playwright install chromium` to download the browser binaries.

# Demo 
Using `example/` data. On MacOS, from the main `formalyzer` package directory: 

1. Start up Chrome: ` /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug`
1. Launch a local web server: `python -m http.server 8000 --directory example/`
1. Set your `ANTHROPIC_API_KEY` shell environment variable.
1. Run the script: `formalyzer --debug example/recc_info.txt example/sample_letter.pdf example/sample_urls.txt`

## Local LLM Execution
For FERPA compliance, running a local model is preferable so that student data is not broadcast elsewhere. I recommend using `ollama` and starting with something medium-small like `qwen2.5:14b` (9 GB). Start up ollama: 
```
ollama serve & 
ollama pull qwen2.5:14b 
```
Then you can use the `--model` CLI flag, e.g. 
``` 
formalyzer --debug -model 'ollama/qwen2.5:14b' example/recc_info.txt example/sample_letter.pdf example/sample_urls.txt
```
The quality of the form-filling will vary depending on the quality and size of the model you get. Smaller models like `mistral` (4 GB) may hallucinate many of the form field IDs, resulting in a mostly-blank form in the end.  For a huge (41 GB) model, try `ollama/qwen2:72b`. 

## Developer Guide

### Install formalyzer in Development mode

```sh
# make sure formalyzer package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to formalyzer
$ nbdev_prepare
```

### Documentation

Documentation can be found hosted on this GitHub [repository][repo]'s [pages][docs]. Additionally you can find package manager specific guidelines on [conda][conda] and [pypi][pypi] respectively.

[repo]: https://github.com/drscotthawley/formalyzer
[docs]: https://drscotthawley.github.io/formalyzer/
[pypi]: https://pypi.org/project/formalyzer/
[conda]: https://anaconda.org/drscotthawley/formalyzer

## TODO: 
- Test with a less-than-superlative recc letter -- to make sure it's not just always selecting the top rating(s). 