In [None]:
#| hide
#from .core import *

# Alhazen

> An intelligent agent to help read and understand scientific research based on extant knowledge (i.e., what is already known and reported in the scientific literature, online databases, wikipedia, or any other sources that we can find).

This is an early proof-of-concept prototype developed within CZI's Research Science Team (RST). It is intended to be used as a downloadable library that can be run on a high-end local machine (M2 Apple Macbook with 32+GB of memory - no support for Windows or Linux yet). 

## Installation

### Install from source

```bash
git clone https://github.com/chanzuckerberg/alzhazen
conda create -n alhazen python=3.11
conda activate alhazen
cd alhazen
pip install -e .
```

## Other dependencies

### Databricks

You will need to run remote queries on CZI's Databricks general prod instance: <https://czi-shared-infra-czi-sci-general-prod-databricks.cloud.databricks.com/>

You will need to have a Databricks token in your environment variables. You can generate one by following the instructions here: <https://docs.databricks.com/dev-tools/api/latest/authentication.html#generate-a-token>

Set this token as an environment variable called `DB_TOKEN` in your shell:

```bash
export DB_TOKEN=<your token>
```

### GGUF Files from HuggingFace (TheBloke)

The tool uses quantized model files from HuggingFace and will place them for you into a temporary location on disk (`/tmp/alhazen/` is the default):

* [Llama-2-70B](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF) (recommended [file](https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF/blob/main/llama-2-70b-chat.Q5_K_M.gguf), requires 51.25 GB) 
* [Llama-2-13B](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF) (recommended [file](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/blob/main/llama-2-13b-chat.Q5_K_M.gguf), requires 11.73 GB) 

## How to use

We use the fire library to create a modular command line interface (CLI) for Alhazen.

For example to run the chatbot for the single paper QA task, execute the following command:

```bash
python -m fire alhazen.apps <tool_name> <tool_args>
```

for example, run the following command to chat with the single paper QA chatbot:

```bash
python -m fire alhazen.apps single_paper_chatbot '/path/to/pdf/or/nxml/files/' 'mistral-7b-instruct'`
```

## Code Status and Capabilities

This project is still very early, but we are attempting to provide access to 
the full range of capabilities of the project as we develop them. We will provide some access to each capability 
through [Gradio](https://gradio.app/) as we develop them, and will eventually synthesise them into a single
agent-driven interface.  

## Where does the Name 'Alhazen' come from?

One thousand years ago, Ḥasan Ibn al-Haytham (965-1039 AD) studied optics through experimentation and observation. He advocated that a hypothesis must be supported by experiments based on confirmable procedures or mathematical reasoning — an early pioneer in the scientific method _five centuries_ before Renaissance scientists started following the same paradigm ([Website](https://www.ibnalhaytham.com/), [Wikipedia](https://en.wikipedia.org/wiki/Ibn_al-Haytham), [Tbakhi & Amir 2007](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074172/)). 

We use the latinized form of his name ('Alhazen') to honor his contribution (which goes largely unrecognized from within non-Islamic communities). 

Famously, he was quoted as saying:

>The duty of the man who investigates the writings of scientists, if learning the truth is his goal, is to make himself an enemy of all that he reads, and, applying his mind to the core and margins of its content, attack it from every side. He should also suspect himself as he performs his critical examination of it, so that he may avoid falling into either prejudice or leniency.

Here, we seek to develop an AI capable of applying scientific knowledge engineering to support CZI's mission. We seek to honor Ibn al-Haytham's critical view of published knowledge by creating a AI-powered system for scientific discovery.

Note - when describing our agent, we will use non-gendered pronouns (they/them/it) to refer to the agent.