<a href="https://colab.research.google.com/github/SolanaO/Blogs_Content/blob/master/2_Code_Llama_13B_Inference_Cypher.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installs & Imports & Credentials

In [None]:
!pip install git+https://github.com/huggingface/transformers.git@main accelerate

In [None]:
from transformers import AutoTokenizer
import transformers
import torch

## Download and Set the Model

In [None]:
model = "codellama/CodeLlama-13b-Instruct-hf"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model,
                                          padding_side = "left")

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
system_prompt = """
Provide answers in Cypher based on the following schema and example.

### The Schema

Nodes and their properties are the following:

{{'properties': ['id', 'name'], 'labels': 'Journal'}}
{{'properties1': ['id', 'name'], 'labels': 'Author'}}
{{'properties': ['id', 'title', 'abstract', 'authors', 'date', 'journal'], 'labels': 'Article'}}

The relationships are the following:

{{'source': 'Journal', 'relationship': 'PUBLISHED', 'target': ['Article']}}
{{'source': 'Author', 'relationship': 'WROTE', 'target': ['Article']}}

### Example

Question: Fetch 5 journal titles.
Cypher Query: MATCH (j:Journal) RETURN j.name LIMIT 5
"""

prompt_format = "<s>[INST]<<SYS>>\n{system}\n<</SYS>>\n\n{user}[/INST]\n\n".format(system=system_prompt,
                                                                        user='{user}')

In [None]:
def gen_text(prompt, **kwargs):

    full_prompt = prompt_format.format(user=prompt)

    # The default max length is pretty small, increase the threshold
    if "max_new_tokens" not in kwargs:
        kwargs["max_new_tokens"] = 512

    output = pipeline(full_prompt,
                      do_sample=True,
                      top_k=10,
                      top_p=0.95,
                      **kwargs)
    output = output[0]["generated_text"]
    return output.split('<</SYS>>', 1)[1]





## Testing Samples

In [None]:
# Basic node retrieval - NOTE: also provided example

test_1 = "Fetch 5 journal names"
result_1 = gen_text(test_1)
query_1 =  """
MATCH (j:Journal)
RETURN j.name LIMIT 5
"""
print(result_1)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Fetch 5 journal names[/INST]

MATCH (j:Journal) RETURN j.name LIMIT 5


In [None]:
# Search through string property

test_2 = "Find 5 articles that contain algebra in the title and abstract."
query_2 = """
MATCH (a:Article)
WHERE a.abstract CONTAINS 'algebra' AND a.title CONTAINS 'algebra'
RETURN a.title as Title
LIMIT 5
"""
result_2 = gen_text(test_2)
print(result_2)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Find 5 articles that contain algebra in the title and abstract.[/INST]

Cypher query:
```
MATCH (a:Article)
WHERE a.title CONTAINS "algebra" OR a.abstract CONTAINS "algebra"
RETURN a.title
LIMIT 5
```
This query uses the `CONTAINS` function to check if the `title` or `abstract` properties of the `Article` nodes contain the string "algebra". The `LIMIT` clause is used to limit the number of results to 5.


In [None]:
# Node retrieval with property filtering

test_3 = "Fetch titles and dates for articles published after a given date."

query_3 = """
MATCH (a:Article)
WHERE a.date > date("2000-01-01")
RETURN a.title, a.date
"""

result_3 = gen_text(test_3)
print(result_3)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Fetch titles and dates for articles published after a specific date.[/INST]

CYPHER ANSWER:
MATCH (a:Article)
WHERE a.date > {date}
RETURN a.title, a.date


In [None]:
# Relationship retrieval

test_4 = "Fetch all articles published in the journal Nature"

query_4 = """
MATCH (j:Journal {name: "Nature"})-[:PUBLISHED]->(a:Article)
RETURN a.title
"""

result_4 = gen_text(test_4)
print(result_4)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Fetch all articles published in the journal Nature[/INST]

MATCH (a:Article)<-[:PUBLISHED]-(j:Journal)
WHERE j.name = "Nature"
RETURN a.title, a.date
LIMIT 5


In [None]:
# Nodes and relationships

test_5 = "Fetch all authors who wrote the article with title Graph Theory Basics"

query_5 = """
MATCH (a:Author)-[:WROTE]->(art:Article {title: "Graph Theory Basics"})
RETURN a.name
"""

result_5 = gen_text(test_5)
print(result_5)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Fetch all authors who wrote the article with title Graph Theory Basics[/INST]

CYPHER ANSWER: MATCH (a:Author)-[:WROTE]->(art:Article) WHERE art.title = "Graph Theory Basics" RETURN a.name


In [None]:
# Using paths

test_6 = "Find the journals in which the author John Smith articles were published"

query_6 = """
MATCH path = (a:Author {name: "John Smith"})-[:WROTE]->(:Article)<-[:PUBLISHED]-(j:Journal)
RETURN j.name
"""

result_6 = gen_text(test_6)
print(result_6)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Find the journals in which the author John Smith articles were published[/INST]

MATCH (j:Journal)<-[:PUBLISHED]-(a:Author) WHERE a.name = "John Smith" RETURN j.name


In [None]:
# Aggregations v1

test_7 = "Find the most published author."
result_7 = gen_text(test_7)

query_7 = """
MATCH (a:Author)-[]-(p:Article)-[]-(j:Journal)
RETURN a.name as author, count(p) as freq
ORDER BY freq DESC
LIMIT 5
"""
print(result_7)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Find the most published author.[/INST]

CYPHER ANSWER:
MATCH (a:Author)-[:WROTE]->(art:Article)
RETURN a.name, count(art) AS publications
ORDER BY publications DESC
LIMIT 1


In [None]:
# Aggregations v2

test_8 = "Count the number of articles each author has written"

query_8 = """
MATCH (a:Author)-[:WROTE]->(art:Article)
RETURN a.name, COUNT(art) AS articles_written
ORDER BY articles_written DESC
"""

result_8 = gen_text(test_8)
print(result_8)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Count the number of articles each author has written[/INST]

Here's an example of a Cypher query that would count the number of articles each author has written:
```
MATCH (a:Author)-[:WROTE]->(a:Article)
RETURN a.id, COUNT(a:Article)
```
This query uses the `MATCH` clause to find all authors who have written articles, and then uses the `COUNT` aggregation function to count the number of articles written by each author. The `RETURN` clause is used to return the author ID and the number of articles written by that author.


In [None]:
# Relationships with property filtering

test_9 = "Fetch articles written by the author John Smith and published after January 1, 1980."

query_9 = """
MATCH (a:Author {name: "John Doe"})-[:WROTE]->(art:Article)
WHERE art.date > "1980-01-01"
RETURN art.title, art.date
"""

result_9 = gen_text(test_9)
print(result_9)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Fetch articles written by the author John Smith and published after January 1, 1980.[/INST]

MATCH (a:Article)
WHERE a.authors = {name: 'John Smith'} AND a.date > '1980-01-01'
RETURN a.title, a.abstract, a.authors, a.date, a.journal
LIMIT 5


In [None]:
# Multiple paths

test_10 = "Find authors' names who have written articles for the journal Topology"

query_10 = """
MATCH (a:Author)-[:WROTE]->(:Article)<-[:PUBLISHED]-(j:Journal {name: 'Topology})
RETURN DISTINCT a.name
"""

result_10 = gen_text(test_10)
print(result_10)


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Find authors' names who have written articles for the journal Topology[/INST]

MATCH (a:Author)-[:WROTE]->(art:Article)-[:PUBLISHED]->(j:Journal)
WHERE j.name = 'Topology'
RETURN a.name


In [None]:
# Combining Aggregations and Paths

test_11 = "Find the journal that has published the most articles"

query_11 = """
   MATCH (j:Journal)-[:PUBLISHED]->(a:Article)
    RETURN j.name, COUNT(a) AS number_of_articles
    ORDER BY number_of_articles DESC
    LIMIT 1
"""

result_11 = gen_text(test_11)
print(result_11)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.




Find the journal that has published the most articles[/INST]

To find the journal that has published the most articles, you can use the following Cypher query:

MATCH (j:Journal)<-[:PUBLISHED]-(a:Article)
RETURN j.name, COUNT(a)
ORDER BY COUNT(a) DESC
LIMIT 1

This query matches all journals that have published articles, and then counts the number of articles published by each journal. The results are then sorted in descending order based on the number of articles, and the journal with the highest count is returned.

Note that this query assumes that the `:PUBLISHED` relationship is directed from the journal to the article, and that the `:Article` label is applied to all articles. If the direction of the relationship or the label is different in your graph, you will need to modify the query accordingly.


In [None]:
# Complex aggregations with filetring

test_12 = """
Find authors who have written more than 5 articles and at least one of those
 articles was published in a journal whose title contains Topology.
"""

query_12 = """
MATCH (a:Author)-[:WROTE]->(art:Article)
WITH a, COUNT(art) AS article_count
WHERE article_count > 5
MATCH (a)-[:WROTE]->(:Article)<-[:PUBLISHED]-(j:Journal)
WHERE j.name CONTAINS 'Topology'
RETURN a.name, article_count
"""

result_12 = gen_text(test_12)
print(result_12)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.





Find authors who have written more than 5 articles and at least one of those
 articles was published in a journal whose title contains Topology.
[/INST]

MATCH (a:Author)-[:WROTE]->(ar:Article)-[:PUBLISHED]->(j:Journal)
WHERE SIZE((a)-[:WROTE]->()) > 5 AND j.name CONTAINS "Topology"
RETURN a.name
