In [None]:
#| hide
#from .core import *

# Home - Alhazen 

> An intelligent agent to help read and understand scientific research based on extant knowledge (i.e., what is already known and reported in the scientific literature, online databases, wikipedia, or any other sources that we can find).

Alhazen is a framework for scientists to perform local studies of the literature. You can use it to build a local library of `scientific knowledge expressions` (papers, webpages, database records, etc.), use web-robots and other available tools to locate and download full text, and then use generative AIs to process the content of your library. 

The goal of this work is threefold: 

1. To provide a pragmatic AI tool that helps read, summarize, and synthesize available scientific knowledge.
2. To provide a platform for development of AI tools in the community. 
3. To actively develop working systems for high-value tasks within the Chan Zuckerberg Initiative's programs and partnerships. 

The system uses available tools within the rapidly-expanding ecosystem of generative AI models, including open models that can be run locally (such as Llama-2, Mixtral, Smaug, Olmo, etc. ) as well as state-of-the-art commercial  APIs (such as OpenAI, Gemini, Mistral, etc).

To use local models, it is recommended that Alhazen be run on a large, high end machine such as an M2 Apple Macbook with 48+GB of memory - we are not actively supporting Windows or Linux yet. 

> Note: Alhazen could conceivably be run on a lightweight machine calling external Large Language Models (LLM) APIs such as GPT-4 (but the goal is to investigate the use of long-running agents which would likely be prohibitively expensive with high-performing commerical endpoints).  

## Caution + Caveats

* This toolkit provides functionality to use agents to download information from the web. Care should be taken by users and developers should make sure they abide by data licensing requirements and third party websites terms and conditions and that that they don’t otherwise infringe upon third party privacy or intellectual property rights.
* All data generated by Large Language Models (LLMs) should be reviewed for accuracy. 

## Installation

### Install dependencies

#### Postgresql

Alhazen requires [postgresql@14](https://www.postgresql.org/download/macosx/) to run. Homebrew provides an installer:

```bash
$ brew install postgresql@14
```

which can be run as a service: 

```bash
$ brew services start postgresql@14
$ brew services list
```

If you install Postgresql via homebrew, you will need to create a `postgres` superuser to run the  `psql` command.
```
$ createuser -s postgres
```

Note that the [`Postgres.app`](https://postgresapp.com/) system also provides a nice GUI interface for Postgres but installing the [`pgvector`](https://github.com/pgvector/pgvector) package is a little more involved. 

#### Ollama

The tool uses the [Ollama](https://ollama.ai/) library to execute large language models locally on your machine. Note that to able to run the best performing models on a Apple Mac M1 or M2 machine, you will need at least 48GB of memory. 

### Install Alhazen source code

```bash
git clone https://github.com/chanzuckerberg/alzhazen
conda create -n alhazen python=3.11
conda activate alhazen
cd alhazen
pip install -e .
```

## How to use

As an experimental development platform that is at an early stage of development, Alhazen is still a little rough around the edges. We provide some 'cookbook' examples of how to use the systems' capabilities in notebooks as well as tools that can be run from the command line. We also provide a RAG-enabled interface to execute chat-based question answering over a local literature corpus.     

### Notebooks

We have developed numerous worked examples of corpora that can generated by running queries on public sources and then processing the results with LLM-enabled workflows. See the `nb_scratchpad/cookbook` subdirectory for examples. 

### Command Line Interface

We use the `fire` library to create a modular command line interface (CLI) for Alhazen. Use the following command structure to execute specific demo applications:

```bash
python -m fire alhazen.apps <tool_name> <tool_args>
```

for example, run the following command to chat with the single paper QA chatbot:

```bash
python -m fire alhazen.apps single_paper_chatbot '/path/to/pdf/or/nxml/files/' 'mistral-7b-instruct'`
```

## Code Status and Capabilities

This project is still very early, but we are attempting to provide access to 
the full range of capabilities of the project as we develop them. 

The system is built using the excellent [nbdev](https://nbdev.fast.ai/) environment. Jupyter notebooks in the `nbs` directory are processed based on directive comments contained within notebook cells (see [guide](https://nbdev.fast.ai/explanations/directives.html)) to generate the source code of the library, as well as accompanying documentation. 

Examples of the use of the library to address research / landscaping questions specified in the [use cases](docnb1_use_cases.html) can be found in the `nb_scratchpad/cookbook` subdirectory of this github repo.    

### Contributing

We warmly welcome contributions from the community! Please see our [contributing guide](https://github.com/chanzuckerberg/alhazen/blob/main/CONTRIBUTING.md) and don't hesitate to open an issue or send a pull request to improve Alhazen.  

This project adheres to the Contributor Covenant [code of conduct](https://github.com/chanzuckerberg/.github/blob/master/CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.


## Where does the Name 'Alhazen' come from?

One thousand years ago, Ḥasan Ibn al-Haytham (965-1039 AD) studied optics through experimentation and observation. He advocated that a hypothesis must be supported by experiments based on confirmable procedures or mathematical reasoning — an early pioneer in the scientific method _five centuries_ before Renaissance scientists started following the same paradigm ([Website](https://www.ibnalhaytham.com/), [Wikipedia](https://en.wikipedia.org/wiki/Ibn_al-Haytham), [Tbakhi & Amir 2007](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6074172/)). 

We use the latinized form of his name ('Alhazen') to honor his contribution (which goes largely unrecognized from within non-Islamic communities). 

Famously, he was quoted as saying:

>The duty of the man who investigates the writings of scientists, if learning the truth is his goal, is to make himself an enemy of all that he reads, and, applying his mind to the core and margins of its content, attack it from every side. He should also suspect himself as he performs his critical examination of it, so that he may avoid falling into either prejudice or leniency.

Here, we seek to develop an AI capable of applying scientific knowledge engineering to support CZI's mission. We seek to honor Ibn al-Haytham's critical view of published knowledge by creating a AI-powered system for scientific discovery.

Note - when describing our agent, we will use non-gendered pronouns (they/them/it) to refer to the agent.