# Elasticsearch queries

Now that we have created our first index we can do some queries on it. To learn the basics of Elasticearch queries, we advice you to have a look to [this document](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html).

As before, we will use the Python client and we will first connect it to our index.

In [None]:
from elasticsearch import Elasticsearch
from pprint import pprint
client = Elasticsearch("http://localhost:9200")

## Basic queries

Let's start with a simple query: find the articles whose headline contains the word `football` and count the results.

In [None]:
query = {
    "query_string": {
        "query": "football",
        "default_field": "headline"
    }
}

result = client.search(index="articles", query=query)

# Print the number of results
print(f"Nb articles: {result['hits']['total']['value']}")

# Print the first article
for hit in result['hits']['hits'][:1]:
  pprint(hit)

Some texts will contain the word `soccer` instead of `football`. ElasticSearch allows us to use boolean operators (`AND`, `OR`, `NOT`) to complexify a little bit our queries. Let's give an example here:

In [None]:
query = {
    "query_string": {
        "query": "soccer OR football",
        "default_field": "headline"
    }
}

result = client.search(index="articles", query=query)

# Print the number of results
print(f"Nb articles: {result['hits']['total']['value']}")

# Print the first article
for hit in result['hits']['hits'][:1]:
  pprint(hit)

**Exercise:** Create a query that gets the articles whose `short_description` contains `football` or `soccer` but not `player` and display the 10 first results

## Filters

We can complexify queries by adding filters. For example we could get the articles about Madonna that were published in 2017.

In [None]:
query = {
     "bool":{
         "must":{
            "query_string":{
               "query":"Madonna",
               "default_field": "headline"
            }
         },
         "filter":{
            "range": {
                "date":{
                    "gte":"2017-01-01",
                    "lt": "2018-01-01"
                }  
            }
         }
      }
}

result = client.search(index="articles", query=query)


# Print the number of results
print(f"Nb articles: {result['hits']['total']['value']}")

# Print the first article
for hit in result['hits']['hits'][:1]:
  pprint(hit)

## Your Turn

Based on your knowledge and some Google search try to create the following queries:

- Search for the articles about Donald Trump during his mandate (Jan 2017 - Jan 2021)
- Count the total number of articles in the category `WORLD NEWS`
- Search for the articles that contains the word `computer` in both `headline` and `short_description`
- Search for the articles that strictly match `computer science`
- Query all data that has the phrase `Barack Obama` in the headline. Then, perform aggregations to count the number of those articles per `category`. Like in MongoDB, you can have a look to [aggregation pipelines](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline.html)

In [None]:
## Your code here (feel free to add more code cells!)