# Step 3: Set Up a Neo4j Graph Database
We have the hospital data stored in csv files, but we need to have it in a graph database with enbeddings (vector) for the reviews.  

You’ll use (Neo4j AuraDB)[https://neo4j.com/cloud/aura-free/] for this. Once you create a free one you will receive a document with the credentials

```.env
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
NEO4J_URI=<YOUR_NEO4J_URI>
NEO4J_USERNAME=<YOUR_NEO4J_URI>
NEO4J_PASSWORD=<YOUR_NEO4J_PASSWORD>
```
paste them into your .env file

Here is a (document)[https://neo4j.com/developer/cypher/guide-sql-to-cypher/] about how to do queries in neo4j vs good old sql.

## The Hospital System Graph Database
This are the nodes

![db graphs](DBGraphs.png)

The majority of these properties come directly from the fields you explored in step 2. One notable difference is that Review nodes have an embedding property, which is a vector representation of the patient_name, physician_name, and text properties. This allows you to do vector searches over review nodes like you did with ChromaDB.

Here are the relationship properties:

![relationship](relationship.png)

As you can see, COVERED_BY is the only relationship with more than an id property. The service_date is the date the patient was discharged from a visit, and billing_amount is the amount charged to the payer for the visit.

## Upload Data to Neo4j
This will be an ETL (Extract, Transform & Load) project, self contained in a docker image.

```env
HOSPITALS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/hospitals.csv
PAYERS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/payers.csv
PHYSICIANS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/physicians.csv
PATIENTS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/patients.csv
VISITS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/visits.csv
REVIEWS_CSV_PATH=https://raw.githubusercontent.com/hfhoffman1144/langchain_neo4j_rag_app/main/data/reviews.csv
```

Notice that all of the CSV files are stored in a public location on GitHub. Because your Neo4j AuraDB instance is running in the cloud, it can’t access files on your local machine, and you have to use HTTP or upload the files directly to your instance. For this example, you can either use the link above, or upload the data to another location.

Once you have your .env file populated, open pyproject.toml, which provides configuration, metadata, and dependencies defined in the TOML format:
```ini
[project]
name = "hospital_neo4j_etl"
version = "0.1"
dependencies = [
   "neo4j==5.14.1",
   "retry==0.9.2"
]

[project.optional-dependencies]
dev = ["black", "flake8"]
```
The loader has this files:
```bash
├── docker-compose.yml
├── hospital_neo4j_etl
│   ├── Dockerfile
│   ├── pyproject.toml
│   └── src
│       ├── entrypoint.sh
│       └── hospital_bulk_csv_write.py

```
- hospital_bulk_csv_write.py: it contains the loader of the csv files on the data folder. It also create its unique keys and does retries in case connection lost
- entrypoint.sh runs the python script
- Dockerfile: creates the docker image capable of running the script. Will install the libraries listed in the toml

The ETL will run as a service called hospital_neo4j_etl, and it will run the Dockerfile in ./hospital_neo4j_etl using environment variables from .env. Since you only have one container, you don’t need docker-compose yet. However, you’ll add more containers to orchestrate with your ETL in the next section, so it’s helpful to get started on docker-compose.yml.

For docker installation:
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install docker.io -y

sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

sudo usermod -aG docker $USER
newgrp docker
sudo systemctl status docker

```

For running the ETL:
```bash
docker-compose up --build
```