<h1><center>WEAVIATE SEARCH REQUEST EXAMPLES</center></h1>

As described on their [website](https://weaviate.io/): "Weaviate is an open source vector search engine that stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance and scalability of a cloud-native database, all accessible through GraphQL, REST, and various language clients."

In other words Weaviate is a search engine that is especially created with vector search in mind hence includes some nifty vector search optimizations and search filters.

As for data CNBC news dataset from [data.world](https://data.world/crawlfeeds/cnbc-news-dataset) is used. It is relatively small (~600 articles), but it's sufficient for demo purposes.

# Preparations

Required imports.

In [1]:
import re
from datetime import datetime, timezone
from string import Template
from typing import Optional

import weaviate
from termcolor import colored


Helper utils.

In [2]:
def pprint_response(response: dict, description_size_limit: Optional[int] = 500) -> None:
    """Prints response in a prettier way.

    Parameters
    ----------
    response : dict
    description_size_limit : Optional[int], optional
        if set "descripton" and "short_description" will be limited to that amount, by default 500
    """
    response = response["data"]["Get"]

    for class_name in response:
        print(colored(class_name.upper(), "red", "on_white", attrs=["bold"]))

        for idx, resp in enumerate(response[class_name]):
            print(f"Response # {idx}")
            for key, value in resp.items():
                if key in {"description", "short_description"} and description_size_limit:
                    value = value[:description_size_limit]
                print(f"{colored(key, 'magenta', 'on_grey')}: {value}")
            print("=" * 120)


Establishing connection to Weaviate.

In [3]:
client = weaviate.Client("http://localhost:8080")
print(f"Weaviate:\n\tis live: {client.is_live()}\n\tis ready: {client.is_ready()}")

Weaviate:
	is live: True
	is ready: True


In [4]:
for weaviate_class in client.schema.get()["classes"]:
    class_name = weaviate_class["class"]
    response = client.query.aggregate(class_name).with_meta_count().do()
    class_object_count = response["data"]["Aggregate"][class_name][0]["meta"]["count"]
    print(f"Class name: {class_name}\n\tObjects count: {class_object_count}")

Class name: Article
	Objects count: 593
Class name: Author
	Objects count: 187


Now, as Weaviate instance is live and ready and contains articles/authors data we can start writing queries.

# Weaviate query

Weaviate query can be formed in two ways:
- using **GetBuilder** (basically calling query method from `client.query`)
- using **GraphQL** (passed as a raw string into `client.query.raw` method)

Typical GetBuilder looks like this:
```python
client.query.get( 
    "name of a class",
    "what properties to return")
    .with_near_text(
        "dictionary with text for vector search"
        )
    .with_where(
        "here are described filters like title/keyword to match"
    )
    .with_limit(
        "return top N responses"
    )
    .do() # runs query
```


GraphQL looks like this:
```text
{
  Get {
    Article(
      where: {operator: GreaterThanEqual, path: ["descriptionWordCount"], valueInt: 100}
      limit: 5
    ) {
      descriptionWordCount
      title
      url
    }
  }
}
```

The main limitation is that GraphQL has to be in a form of raw string hence it's a bit inconvenient to format with variables.

<div class="alert alert-block alert-warning"> <b>IMPORTANT</b>: the above is true at the moment of notebook creation.</br>
In order to see whether desired type of request is supported by GetBuilder check <a href="https://weaviate.io/developers/weaviate/current/graphql-references/filters.html">weaviate filter docs</a>.</div>

In my opinion Query builder approach looks better but unfortunately not all queries can be formed with the builder (but every request can be formed with GraphQL syntax).</br>
That's why I separated all examples into two categories: that can be formed without GraphQL syntax and that cannot.

# 1. Requests with filters that are supported by GetBuilder 

## 1.1. Near text filer (vector search) <a class="anchor" id="near-text-filer"></a>

For vector search [nearText](https://weaviate.io/developers/weaviate/current/graphql-references/filters.html#nearvector-filter) method can be used.

It expects 4 properties:
- **concepts** - text of the query
- **certainty** (optional) - "determine which data results to return. The value is a float between 0.0 (return all data objects, regardless similarity) and 1.0 (only return data objects that are matching completely, without any uncertainty). The certainty of a query result is computed by normalized distance of the fuzzy query and the data object in the vector space."
- **moveto** (optional) - reranks response with desired concepts that are closer to the top 
- **moveAwayFrom** (optional) - reranks response with undesired concepts are close to the bottom

<div class="alert alert-block alert-info">As the dataset is used to run queries against is relatively small some results (especially that are not the top 2) might look somewhat off. Basically the more date the more relevant articles can be found .</div>

Imagine that we are looking for articles with text about the state of retail in fashion. Anything that has certainty lower than 0.5 will be dropped. Return titles of top 5 results.

In [5]:
# for this section class_name and properties are shared between queries
class_name = "Article"
properties = [
    "title",
    "_additional {certainty}",
]

nearText = {
    "concepts": ["retail in fashion industry"],
    "certainty": 0.5,
}

response = client.query.get(class_name, properties).with_near_text(nearText).with_limit(5).do()

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.6842943}
[40m[35mtitle[0m: No More Retail 'Nirvana': Hedge Fund Manager
Response # 1
[40m[35m_additional[0m: {'certainty': 0.68259203}
[40m[35mtitle[0m: Chinese Retail Sales Surge, Partly on Inflation
Response # 2
[40m[35m_additional[0m: {'certainty': 0.6756055}
[40m[35mtitle[0m: UK retail sales growth muted as food sales dip
Response # 3
[40m[35m_additional[0m: {'certainty': 0.6677345}
[40m[35mtitle[0m: Nadja Swarovski: The luxury sector has been ‘incredibly resilient’ amid the pandemic
Response # 4
[40m[35m_additional[0m: {'certainty': 0.6403848}
[40m[35mtitle[0m: How teen created a profitable sneaker pawn shop


As we can see the higher certainty the more text of the article is related to fashion.

Now if we want to run the same query about fashion but don't want to include articles that describe fashion retail in european market:

In [6]:
nearText = {
    "concepts": ["retail in fashion industry"],
    "certainty": 0.5,
    "moveAwayFrom": {
        "concepts": ["european market"],
        "force": 0.9,
    },
}

response = client.query.get(class_name, properties).with_near_text(nearText).with_limit(5).do()

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.6714165}
[40m[35mtitle[0m: No More Retail 'Nirvana': Hedge Fund Manager
Response # 1
[40m[35m_additional[0m: {'certainty': 0.6694423}
[40m[35mtitle[0m: Chinese Retail Sales Surge, Partly on Inflation
Response # 2
[40m[35m_additional[0m: {'certainty': 0.6566581}
[40m[35mtitle[0m: UK retail sales growth muted as food sales dip
Response # 3
[40m[35m_additional[0m: {'certainty': 0.6343906}
[40m[35mtitle[0m: Nadja Swarovski: The luxury sector has been ‘incredibly resilient’ amid the pandemic
Response # 4
[40m[35m_additional[0m: {'certainty': 0.63371277}
[40m[35mtitle[0m: How teen created a profitable sneaker pawn shop


If we want articles about fashion retail not in europe but in asia:

In [7]:
nearText = {
    "concepts": ["what is the state of retail in fashion industry"],
    "certainty": 0.5,
    "moveAwayFrom": {
        "concepts": ["european market"],
        "force": 0.9,
    },
    "moveTo": {
        "concepts": ["asian market"],
        "force": 1.0,
    }
}

response = client.query.get(class_name, properties).with_near_text(nearText).with_limit(5).do()

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.70898855}
[40m[35mtitle[0m: Chinese Retail Sales Surge, Partly on Inflation
Response # 1
[40m[35m_additional[0m: {'certainty': 0.6663642}
[40m[35mtitle[0m: Global Markets Up, but Where Is the Retail Investor?
Response # 2
[40m[35m_additional[0m: {'certainty': 0.6534552}
[40m[35mtitle[0m: UK retail sales growth muted as food sales dip
Response # 3
[40m[35m_additional[0m: {'certainty': 0.64692867}
[40m[35mtitle[0m: JD.com wants a network of 5 million stores as e-commerce battle heats up 
Response # 4
[40m[35m_additional[0m: {'certainty': 0.6449816}
[40m[35mtitle[0m: Ping An Not Concerned about Pru-AIG deal


## 1.2. Filter by date

Weaviate can do not only vector search but also combine it with additional filters. Let's take a look at how to [filter articles by date](https://github.com/semi-technologies/weaviate/issues/1271).

Date has to be provided in supported [date format](https://weaviate.io/developers/weaviate/current/data-schema/datatypes.html#datatype-date).

In [8]:
date = datetime(year=2020, month=1, day=1, tzinfo=timezone.utc).isoformat()
print(date)


2020-01-01T00:00:00+00:00


To find articles related to computer risks in US published before January 1 2020:

In [9]:
nearText = {
    "concepts": ["computer risks in US"],
    "certainty": 0.5,
}

response = (
    client.query.get(
        "Article",
        [
            "title",
            "short_description",
            "published_at",
            "_additional {certainty}",
        ],
    )
    .with_near_text(nearText)
    .with_where(
        {
            "operator": "LessThan",
            "path": ["published_at"],
            "valueDate": date,
        }
    )
    .with_limit(3)
    .do()
)

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.7508904}
[40m[35mpublished_at[0m: 2017-09-20T10:05:17Z
[40m[35mshort_description[0m: Cybersecurity, health of the global economy, energy price shock and terrorist attacks are some of the top risks that concern businesses and may threaten their ability to operate, according to results from a new survey by the World Economic Forum (WEF) published Wednesday.The survey, conducted annually by the WEF's strategic partners Marsh & McLennan Companies and Zurich Insurance Group, highlighted the following ten risks that businesses are presently concerned about:While concerns about the ove
[40m[35mtitle[0m: Here are the top 10 major worries for global business leaders right now, according to WEF
Response # 1
[40m[35m_additional[0m: {'certainty': 0.64123785}
[40m[35mpublished_at[0m: 2014-10-02T19:56:18Z
[40m[35mshort_description[0m: For the second time in roughly three  months, JPMorgan Chase is  sc

## 1.3. Filter by keywords

As we have list of keywords for each article we can use it as a filter.

Let's say we are eager to find articles about predictions from banks and hedge fonds but keep only articles with "bonds" keyword.

In [10]:
nearText = {
    "concepts": ["banks hedge fonds predictions"],
    "certainty": 0.5,
}

class_name = "Article"
properties = [
    "title",
    "keywords",
    "short_description",
    "_additional {certainty}",
]

response = (
    client.query.get(class_name, properties)
    .with_near_text(nearText)
    .with_where(
        {
            "operator": "Equal",
            "path": ["keywords"],
            "valueText": ["bonds"],  # there are no articles with these keywords at the same time
        }
    )
    .with_limit(2)
    .do()
)

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.7086749}
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'Commodity markets', 'Currency markets', 'Bonds', 'SK Telecom Co Ltd', 'Microsoft Corp', 'Boeing Co', 'Currencies', 'Futures & Commodities', 'Markets', 'stocks', 'Stock Blog', 'source:tagname:CNBC US Source']
[40m[35mshort_description[0m: In a shorter holiday week, U.S. stocks ended flat Friday, as positives struggled with the Dubai debt news. How should investors prepare for next week? Alan Valdes, vice president at Kabrik Trading, and Doug Kreps, principal and managing director at Fort Pitt Capital Group, offered CNBC their portfolio advice. (See Kreps' stock picks, below.)"Sometimes your best trade is no trade at all," Valdes said.
[40m[35mtitle[0m: Portfolio Prep for Next Week: 'Don't Get Crazy'
Response # 1
[40m[35m_additional[0m: {'certainty': 0.66487396}
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'Commodity markets', 'Currency markets

The same query but with "fast money" keyword:

In [11]:
response = (
    client.query.get(class_name, properties)
    .with_near_text(nearText)
    .with_where(
        {
            "operator": "Equal",
            "path": ["keywords"],
            "valueText": ["fast money"],  # there are no articles with these keywords at the same time
        }
    )
    .with_limit(2)
    .do()
)

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.6835275}
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'S&P 500 Index', 'iShares 20+ Year Treasury Bond ETF', 'Fed Should Raise Interest Rates to 2-3 Percent: Einhorn', 'Bank Crisis Strikes Europe', 'CNBC TV', 'Fast Money', 'Fast Money: Behind The Money', 'source:tagname:CNBC US Source']
[40m[35mshort_description[0m: They are supposed to be the smart money—the best of the best—yet they can’t even beat a basic Treasury bond fund.
[40m[35mtitle[0m: Dumb Money: Hedge Funds Can't Even Beat Bond Funds
Response # 1
[40m[35m_additional[0m: {'certainty': 0.66039157}
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'S&P 500 Index', 'Alexion Pharmaceuticals Inc', 'Capital One Financial Corp', 'Goldman Sachs Group Inc', 'HP Inc', '3M Co', 'Microsoft Corp', 'Travelers Companies Inc', 'Wells Fargo & Co', "McDonald's Corp", 'Fast Money', 'CNBC TV', 'Fast Money Halftime Report', 'source:tagname:CNBC US Source']
[40

If we want to repeat the query but now filter for articles that include both "bonds" and "fast money" keywords the response will be empty. That's because there are no such articles containing both keywords at the same time.

In [12]:
response = (
    client.query.get(class_name, properties)
    .with_near_text(nearText)
    .with_where(
        {
            "operator": "Equal",
            "path": ["keywords"],
            "valueText": ["bonds", "fast money"],  # there are no articles with these keywords at the same time
        }
    )
    .with_limit(5)
    .do()
)

pprint_response(response)


[1m[47m[31mARTICLE[0m


Enter operator! We can specify that we want to modify our query about market predictions and articles should contain either "bonds" or "fast money" keywords.

The list of operators can be found on [this page](https://weaviate.io/developers/weaviate/current/graphql-references/filters.html).

In [13]:
where_filter = {
    "operator": "Or",
    "operands": [
        {
            "operator": "Equal",
            "path": ["keywords"],
            "valueText": "bonds",
        },
        {
            "operator": "Equal",
            "path": ["keywords"],
            "valueText": "fast money",
        },
    ],
}

response = client.query.get(class_name, properties).with_near_text(nearText).with_where(where_filter).with_limit(5).do()

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.7086749}
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'Commodity markets', 'Currency markets', 'Bonds', 'SK Telecom Co Ltd', 'Microsoft Corp', 'Boeing Co', 'Currencies', 'Futures & Commodities', 'Markets', 'stocks', 'Stock Blog', 'source:tagname:CNBC US Source']
[40m[35mshort_description[0m: In a shorter holiday week, U.S. stocks ended flat Friday, as positives struggled with the Dubai debt news. How should investors prepare for next week? Alan Valdes, vice president at Kabrik Trading, and Doug Kreps, principal and managing director at Fort Pitt Capital Group, offered CNBC their portfolio advice. (See Kreps' stock picks, below.)"Sometimes your best trade is no trade at all," Valdes said.
[40m[35mtitle[0m: Portfolio Prep for Next Week: 'Don't Get Crazy'
Response # 1
[40m[35m_additional[0m: {'certainty': 0.6835275}
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'S&P 500 Index', 'iShares 20+ Year Trea

## 1.4. Filter by minimum description length

In our weaviate schema we have property "descriptionWordCount" which stores description length in words. We can return articles filtered by that parameter. Let's say we are looking for article longer that 100 words and are related to pandemic affect on stock market.

In [14]:
nearText = {
    "concepts": ["pandemic covid affect on stock market"],
    "certainty": 0.5,
}

response = (
    client.query.get(
        "Article",
        [
            "title",
            "description",
            "published_at",
            "descriptionWordCount",
            "_additional {certainty}",
        ],
    )
    .with_where(
        {
            "operator": "GreaterThan",
            "path": ["descriptionWordCount"],
            "valueInt": 100,
        }
    )
    .with_near_text(nearText)
    .with_limit(5)
    .do()
)

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.7150651}
[40m[35mdescription[0m: Germany is considering how to implement a gradual recovery from the coronavirus pandemic, the country's health minister, Jens Spahn, told CNBC on Monday. "We are thinking about step by step, that is important ... going back to a new normal," Spahn said on "Closing Bell."Spahn, who was speaking from Berlin, stressed that it will indeed be a new normal because "all the measures we have taken like keeping distance, wearing masks, no parties ... are definitely measures that need to be there in place
[40m[35mdescriptionWordCount[0m: 364
[40m[35mpublished_at[0m: 2020-04-13T21:08:49Z
[40m[35mtitle[0m: German health minister says the country is considering steps to return to a 'new normal'
Response # 1
[40m[35m_additional[0m: {'certainty': 0.71365166}
[40m[35mdescription[0m: Here are the most important news, trends and analysis that investors need to start their 

## 1.5. Filter by title (text search)

Despite title property is being vectorized we still can do text search against title. That's possible because Weaviate stores both text and vector representation (if it's specified in schema file).

If we are looking articles about acquisition that are about to happen and title should include words "google" and "motorola":

In [15]:
nearText = {
    "concepts": ["future acquisition"],
    "certainty": 0.5,
}

where_filter = {
    "operator": "Equal",
    "path": ["title"],
    "valueText": ["google motorola"],
}

response = (
    client.query.get(
        "Article",
        [
            "title",
            "description",
            "_additional {certainty}",
        ],
    )
    .with_where(where_filter)
    .with_near_text(nearText)
    .with_limit(5)
    .do()
)

pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.629853}
[40m[35mdescription[0m: Google's $12.5 billion acquisition of Motorola Mobilitygives the Internet search company patent protection while putting it squarely into the smartphone hardware business, Executive Chairman Eric Schmidt told CNBC Monday. But the acquisition won't affect how Google  does business with the other phone makers that use its Android operating system, including HTC, he added."We are going into the hardware business, but we’re going to keep it separate and we’re going to treat everybody else on a fair 
[40m[35mtitle[0m: Motorola Deal Buys Google Patent Protection: Schmidt


## 1.6. "Like" filter (fuzzy match)

Weaviate also provides ability to run fuzzy text search with ["Like" operator](https://weaviate.io/developers/weaviate/current/graphql-references/filters.html#like-operator).

Notes from official docs:

> Each query using the Like operator iterates over the entire inverted index for that property. The search time will go up linearly with the dataset size. Be aware that there might be a point where this query is too expensive and will not work anymore.

So if we are looking for author with first name Katie and last name starts with H:

In [16]:
where_filter = {
    "operator": "Like",
    "path": ["name"],
    "valueString": "Katie H*",
}

response = (
    client.query.get(
        "Author",
        [
            "name",
        ],
    )
    .with_where(where_filter)
    .with_limit(5)
    .do()
)

pprint_response(response)


[1m[47m[31mAUTHOR[0m
Response # 0
[40m[35mname[0m: Katie Holliday


# 2. Filters that accepts only raw string (GraphQL syntax)

As it was mentioned in the beginning of the notebook in order to use GraphQL syntax we need to provide it as a raw string into `client.query.raw` method.

It's inconvenient in cases when variables has to be provided as values for GraphQL search.

It can be tackled with any of these options: 
- with regex
- with .format method
- with string's builtin [Template class](https://docs.python.org/3.6/library/string.html#template-strings) (safer varian as you can see in an [example](https://realpython.com/python-string-formatting/#4-template-strings-standard-library))

Let's say we have such query for filtering articles by keywords.

GraphQL query will look like this:

```text
{
  Get {
    Article(where: {
        path: ["keywords"],
        operator: Equal,
        valueText: "bonds",
      }, limit: 5) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}
```
The goal is to create raw string where `path` and `valueText` are filled with provided variables.

Let's take a look at each option with an example.

### 2.0.1. Regex approach of formatting raw string

Create raw string with some arbitrary placeholder which then will be replaced by regex. 

In [17]:
query = """
{
  Get {
    Article(where: {
        path: ["{placeholder}"],
        operator: Equal,  # operator
        valueText: "{placeholder}",
      }, limit: 5) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}
"""


def format_raw_request(query: str, *args, template: str = "{placeholder}"):
    for arg in args:
        query = re.sub(template, arg, query, count=1)
    return query


query = format_raw_request(query, "keywords", "bonds")
print(query)

response = client.query.raw(query)
pprint_response(response)



{
  Get {
    Article(where: {
        path: ["keywords"],
        operator: Equal,  # operator
        valueText: "bonds",
      }, limit: 5) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}

[1m[47m[31mARTICLE[0m
Response # 0
[40m[35mhasAuthors[0m: [{'name': 'Tom DiChristopher'}]
[40m[35mtitle[0m: Instead of shunning Saudi Arabia after Khashoggi killing, investors flock to $7.5 billion bond sale
Response # 1
[40m[35mhasAuthors[0m: [{'name': 'Silvia Amaro'}]
[40m[35mtitle[0m: US Treasurys lower as investors focus on data, monitor Russia-US relations
Response # 2
[40m[35mhasAuthors[0m: [{'name': 'Unknown'}]
[40m[35mtitle[0m: 5 Stocks Insiders Love Right Now
Response # 3
[40m[35mhasAuthors[0m: [{'name': 'Elliot Smith'}]
[40m[35mtitle[0m: 10-year Treasury yield falls to 0.8% as investors return to safety amid pause in stock rally
Response # 4
[40m[35mhasAuthors[0m: [{'name': 'Herb Greenberg'}]
[40m[35m

### 2.0.2. .format approach of formatting raw string

As for me it is the most inconvenient approach as it requires to escape curly braces by changing single brace to double (in places where we don't wont to format with provided variable).

In [18]:
query = """
{{
  Get {{
    Article(where: {{
        path: ["{path}"],    # Path to the property that should be used
        operator: Equal,  # operator
        valueText: "{valueText}"         # value (which is always = to the type of the path property)
      }}, limit: 5) {{
      title
      hasAuthors {{
        ... on Author {{
          name
        }}
      }}
    }}
  }}
}}
"""

query = query.format(path="keywords", valueText="bonds")
print(query)

response = client.query.raw(query)
pprint_response(response)



{
  Get {
    Article(where: {
        path: ["keywords"],    # Path to the property that should be used
        operator: Equal,  # operator
        valueText: "bonds"         # value (which is always = to the type of the path property)
      }, limit: 5) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}

[1m[47m[31mARTICLE[0m
Response # 0
[40m[35mhasAuthors[0m: [{'name': 'Tom DiChristopher'}]
[40m[35mtitle[0m: Instead of shunning Saudi Arabia after Khashoggi killing, investors flock to $7.5 billion bond sale
Response # 1
[40m[35mhasAuthors[0m: [{'name': 'Silvia Amaro'}]
[40m[35mtitle[0m: US Treasurys lower as investors focus on data, monitor Russia-US relations
Response # 2
[40m[35mhasAuthors[0m: [{'name': 'Unknown'}]
[40m[35mtitle[0m: 5 Stocks Insiders Love Right Now
Response # 3
[40m[35mhasAuthors[0m: [{'name': 'Elliot Smith'}]
[40m[35mtitle[0m: 10-year Treasury yield falls to 0.8% as investors return 

### 2.0.3. Template approach of formatting raw string

As for me the easiest way: mark with dollar sing place where to put provided variable, convert raw string into Template class as substitute.

In [19]:
query = """
{
  Get {
    Article(where: {
        path: ["${path}"],    # Path to the property that should be used
        operator: Equal,  # operator
        valueText: "${valueText}"         # value (which is always = to the type of the path property)
      }, limit: 5) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}
"""

query = Template(query)
query = query.substitute(path="keywords", valueText="bonds")
print(query)

response = client.query.raw(query)
pprint_response(response)



{
  Get {
    Article(where: {
        path: ["keywords"],    # Path to the property that should be used
        operator: Equal,  # operator
        valueText: "bonds"         # value (which is always = to the type of the path property)
      }, limit: 5) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
}

[1m[47m[31mARTICLE[0m
Response # 0
[40m[35mhasAuthors[0m: [{'name': 'Tom DiChristopher'}]
[40m[35mtitle[0m: Instead of shunning Saudi Arabia after Khashoggi killing, investors flock to $7.5 billion bond sale
Response # 1
[40m[35mhasAuthors[0m: [{'name': 'Silvia Amaro'}]
[40m[35mtitle[0m: US Treasurys lower as investors focus on data, monitor Russia-US relations
Response # 2
[40m[35mhasAuthors[0m: [{'name': 'Unknown'}]
[40m[35mtitle[0m: 5 Stocks Insiders Love Right Now
Response # 3
[40m[35mhasAuthors[0m: [{'name': 'Elliot Smith'}]
[40m[35mtitle[0m: 10-year Treasury yield falls to 0.8% as investors return 

<ins>From here and on Template approach is used.</ins>

## 2.1. Filter authors by number of wrote articles

In the schema two classes are defined:
- Article 
- Author

Each article has reference to author object, each auhor - to article object. With that in mind we can find authors that wrote at least 2 articles and take first 5. Weaviate will do [counting automatically](https://weaviate.io/developers/weaviate/current/graphql-references/filters.html#filter-objects-by-count-of-reference).

In [20]:
query = """
{
  Get {
    Author(
      where:{
        valueInt: ${limit},
        operator: GreaterThanEqual,
        path: ["wroteArticles"]
      }, limit: 5
    ) {
      name
      wroteArticles {
        ... on Article {
          title
        }
      }
    }
  }
}
"""

query = Template(query).substitute(limit=2)
response = client.query.raw(query)
pprint_response(response)


[1m[47m[31mAUTHOR[0m
Response # 0
[40m[35mname[0m: Leslie Josephs
[40m[35mwroteArticles[0m: [{'title': "Georgia's lieutenant governor says he will 'kill' Delta tax break unless airline reinstates relationship with NRA"}, {'title': 'Raytheon and United Technologies agree to all-stock merger that would create aerospace behemoth'}]
Response # 1
[40m[35mname[0m: Tyler Clifford
[40m[35mwroteArticles[0m: [{'title': 'Cramer adds new stocks, recommends buying 12 laggards in his Covid-19 index'}, {'title': "'I come to bury Bitcoin, not to praise it': UBS\xa0"}]
Response # 2
[40m[35mname[0m: Sharon Epperson
[40m[35mwroteArticles[0m: [{'title': "Energy Falls Despite 'Above Average' Hurricane Forecast"}, {'title': "Want to start a business? Here's what you need to know"}]
Response # 3
[40m[35mname[0m: Pippa Stevens
[40m[35mwroteArticles[0m: [{'title': 'Guggenheim says solar sell-off is a buying opportunity and has an unusual favorite stock'}, {'title': 'Should CEO pay b

## 2.3. Filter articles by author

Also using Article-Author cross referencing we can filter articles that [are wrote by specific authors](https://weaviate.io/developers/weaviate/current/graphql-references/filters.html#beacon-reference-filters).

In [21]:
query = """
{
  Get {
    Article(
      where:{
        valueString: "${authorName}",
        operator: Equal,
        path: ["hasAuthors", "Author", "name"]
      }, limit: 5
    ) {
      title
      hasAuthors {
        ... on Author {
          name
        }
      }
    }
  }
 }
"""

query = Template(query).substitute(authorName="Tyler")
response = client.query.raw(query)
pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35mhasAuthors[0m: [{'name': 'Tyler Clifford'}]
[40m[35mtitle[0m: Cramer adds new stocks, recommends buying 12 laggards in his Covid-19 index
Response # 1
[40m[35mhasAuthors[0m: [{'name': 'Tyler Clifford'}]
[40m[35mtitle[0m: 'I come to bury Bitcoin, not to praise it': UBS 
Response # 2
[40m[35mhasAuthors[0m: [{'name': 'Tyler Bailey'}]
[40m[35mtitle[0m: Your first trade for Wednesday, January 15


# 3. All combined

As a final step here is the query do:
- vector search
- filter by:
    - publish date
    - keywords
    - title
    - author
    - description word count

In [22]:
query = """
{
  Get {
    Article(
      limit: 5
      nearText: 
      {
        concepts: ["debt of european banks"],
        moveTo:
        {
          concepts: ["deflation"],
          force: 0.9
        }
      }
      where: 
      {
        operator: And,
        operands: 
        [
          {
            operator: GreaterThan,
            path: ["published_at"],
            valueDate: "2015-01-01T0:00:00.52Z",
          },
          {
            operator: Equal,
            path: ["keywords"],
            valueText: "bonds",
          },
          {
            operator: Equal,
            path: ["title"],
            valueText: "Europe Japan",
          },
          {
            operator: Equal,
            path: ["hasAuthors", "Author", "name"],
            valueString: "Tom DiChristopher",
          },
          {
            operator: GreaterThan,
            path: ["descriptionWordCount"],
            valueInt: 100,
          }
        ]
      }
    ) 
    {
      title
      description
      keywords
      published_at
      descriptionWordCount
      hasAuthors {
        ... on Author {
          name
        }
      }
      _additional {
        certainty
      }
    }
  }
}
"""

response = client.query.raw(query)
pprint_response(response)


[1m[47m[31mARTICLE[0m
Response # 0
[40m[35m_additional[0m: {'certainty': 0.7344097}
[40m[35mdescription[0m: Europe could be looking at a Japan-style deflationary environment  for the next five years, investor Marc Lasry told CNBC on  Wednesday. Read MoreEuro tests low last  seen at its birth in1999  Lasry's Avenue Capital is continuing to buy credit-side debt at a  discount in Europe. Over the last three or four years, the amount  of debt that European banks have sold has increased by 100  percent, he said in a "Squawk  Box" interview.   "The way that the banks were able to sell this debt is, they kee
[40m[35mdescriptionWordCount[0m: 344
[40m[35mhasAuthors[0m: [{'name': 'Tom DiChristopher'}]
[40m[35mkeywords[0m: ['cnbc', 'Articles', 'World Markets', 'Bonds', 'Markets', 'Corporate bonds', 'Investment strategy', 'Corporate Debt', 'Investing', 'Credit and Debt', 'Squawk on the Street', 'Market Outlook', 'Europe Economy', 'source:tagname:CNBC US Source']
[40m[35mpublis