# Python Wrapping Elastic

We have to first create a connection to our deployment on Elasticsearch. This is a similar step from part 1.


In [1]:
from getpass import getpass  # For securely getting user input
from elasticsearch import Elasticsearch

# Prompt the user to enter their Elastic Cloud ID and API Key securely
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")
ELASTIC_API_KEY = getpass("Elastic API Key: ")

# Create an Elasticsearch client using the provided credentials
client = Elasticsearch(
    cloud_id=ELASTIC_CLOUD_ID,  # cloud id can be found under deployment management
    api_key=ELASTIC_API_KEY # API keys can be generated under management / security
)

### Doing a simple queiry in python

In [2]:
response = client.search(index="hp", query={
    "match": {
        "column12": "Dumbledores Army"
    }
})

In [3]:
print(response)

{'took': 1, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 31, 'relation': 'eq'}, 'max_score': 3.8230174, 'hits': [{'_index': 'hp', '_id': 'iFZNLZABzx_Mddi1a7YD', '_score': 3.8230174, '_ignored': ['column15', 'column14'], '_source': {'column1': 84, 'column12': 'Dumbledores Army', 'column11': 'Unknown', 'column10': 'Red', 'column5': 'Hufflepuff', 'column4': 'Student', 'column3': 'Male', 'column2': 'Justin FinchFletchley', 'column15': 'Unknown', 'column14': 'Unknown', 'column13': 'Unknown', 'column9': 'muggleborn', 'column8': 'Human', 'column7': 'Noncorporeal', 'column6': 'Unknown'}}, {'_index': 'hp', '_id': 'iVZNLZABzx_Mddi1a7YD', '_score': 3.8230174, '_ignored': ['column15', 'column14'], '_source': {'column1': 85, 'column12': 'Dumbledores Army', 'column11': 'Unknown', 'column10': 'Blonde', 'column5': 'Hufflepuff', 'column4': 'Student', 'column3': 'Male', 'column2': 'Zacharias Smith', 'column15': 'Unknown', 'column14'

We see the same json response as we got in the direct console calls. However, since we're already working in python, we can also clean up our response and make it more understandable:

In [5]:
print("We get back {total} results, here are the top ones:".format(total=response["hits"]['total']['value']))
for hit in response["hits"]["hits"]:
    print(hit['_source']['column2'])

We get back 31 results, here are the top ones:
Justin FinchFletchley
Zacharias Smith
Hannah Abbott
Ernest Macmillan
Susan Bones
Dennis Creevey
Dean Thomas
Seamus Finnigan
Angelina Johnson
Katie Bell


### Working with our Harry Potter Script to create improved searches

We will now reintroduce our scripts from the first Harry Potter Movie, and prepare the data for search

import

In [6]:
import pandas as pd
hp_script = pd.read_csv("../Data/Harry_Potter_1.csv", sep = ";" )

clean

In [7]:
import re
unique_chars = hp_script["Character"].unique()
print("There are {} unique characters: {}".format(len(unique_chars), unique_chars))
hp_script = hp_script.applymap(lambda x: re.sub(r'[^ \w+]', '', str(x).strip()))
unique_chars = hp_script["Character"].unique()
print("There are {} unique characters: {}".format(len(unique_chars), unique_chars))

There are 91 unique characters: ['Dumbledore' 'McGonagall' 'Hagrid' 'Petunia' 'Dudley' 'Vernon' 'Harry'
 'Snake' 'Someone' 'Barkeep\xa0Tom' 'Man' 'Witch' 'Quirrell' 'Boy'
 'Goblin' 'Griphook' 'Ollivander' 'Trainmaster' 'Mrs. Weasley' 'George'
 'Fred' 'Ginny' 'Ron' 'Woman' 'Hermione' 'Neville' 'Malfoy' 'Whispers'
 'Sorting Hat' 'Seamus' 'Percy' 'Sir Nicholas' 'Girl' 'Man in paint'
 'Fat Lady' 'Snape' 'Dean' 'Madam Hooch' 'Class' 'Harry ' 'Fred  ' 'Ron  '
 'George  ' 'Harry  ' 'Hermione  ' 'Ron ' 'Hermione ' 'Filch' 'All  '
 'Oliver ' 'Oliver  ' 'Flitwick' 'Draco  ' 'Flitwick  ' 'Seamus  '
 'Girl  ' 'Boy  ' 'Percy  ' 'McGonagall ' 'Ron and Harry' 'McGonagall  '
 'Quirrell  ' 'Snape  ' 'OIiver  ' 'Lee Jordan' 'Hagrid ' 'Gryffindors  '
 'Flint  ' 'Crowd  ' 'Flint' 'Hagrid  ' 'Man  ' 'Lee  Jordan'
 'Madam Hooch ' 'Quirrell ' 'Filch  ' 'Dumbledore  ' 'Hermoine'
 'Ron and Harry  ' 'All 3  ' 'Filch ' 'Firenze  ' 'Firenze ' 'Snape '
 'Neville  ' 'Ron   ' 'Voldemort ' 'Voldemort' 'Voldemort  ' '

  hp_script = hp_script.applymap(lambda x: re.sub(r'[^ \w+]', '', str(x).strip()))


index and insert

In [9]:
hp_script["Line_number"] = hp_script.index

index = "hp_script_1"
settings = {}
mappings = {
    "_meta" : {
        "created_by" : "Iulia Feroli"
    },
    "properties" : {
        "Line_number" : {
            "type" : "long"
        },
        "Character" : {
            "type" : "keyword",
            "type" : "text"
        },
        "Sentence" : {
            "type" : "text"
        }
    }
}

client.indices.create(index=index, settings=settings, mappings=mappings)


BadRequestError: BadRequestError(400, 'resource_already_exists_exception', 'index [hp_script_1/MrJfFxveTFiNOqwVi12UoA] already exists')

Now we can add our documents to the index. We can easily convert our dataframe into a dictionary to see the format each of our documents will take in the index.

In [10]:
from json import loads
docs = hp_script.to_json(orient = "records")
hp_script_docs = loads(docs)
hp_script_docs[0:5]

[{'Character': 'Dumbledore',
  'Sentence': 'I shouldve known that you would be here Professor McGonagall',
  'Line_number': 0},
 {'Character': 'McGonagall',
  'Sentence': 'Good evening Professor Dumbledore',
  'Line_number': 1},
 {'Character': 'McGonagall',
  'Sentence': 'Are the rumors true Albus',
  'Line_number': 2},
 {'Character': 'Dumbledore',
  'Sentence': 'Im afraid so professor',
  'Line_number': 3},
 {'Character': 'Dumbledore',
  'Sentence': 'The good and the bad',
  'Line_number': 4}]

We can index documents into our new index either one by one with the index function or, more conveniently, when dealing with large numbers of documents, using the bulk helper. Let's run a test to see how index and delete work. See more info in the docs here

In [11]:
doc_test = {
    'Character': 'Iulia Feroli',
    'Sentence': "Wow, I've just added myself to the Harry Potter Books, I have so much to say!",
    'Line_number': 0
}

response = client.index(index = index, id = 1, document = doc_test)
print(response['result'])

response = client.search(index = index)
for hit in response["hits"]["hits"]:
    print(hit['_source'])

created
{'Character': 'Dumbledore', 'Sentence': 'I shouldve known that you would be here Professor McGonagall', 'Line_number': 0}
{'Character': 'McGonagall', 'Sentence': 'Good evening Professor Dumbledore', 'Line_number': 1}
{'Character': 'McGonagall', 'Sentence': 'Are the rumors true Albus', 'Line_number': 2}
{'Character': 'Dumbledore', 'Sentence': 'Im afraid so professor', 'Line_number': 3}
{'Character': 'Dumbledore', 'Sentence': 'The good and the bad', 'Line_number': 4}
{'Character': 'McGonagall', 'Sentence': 'And the boy', 'Line_number': 5}
{'Character': 'Dumbledore', 'Sentence': 'Hagrid is bringing him', 'Line_number': 6}
{'Character': 'McGonagall', 'Sentence': 'Do you think it wise to trust Hagrid with something as important as this', 'Line_number': 7}
{'Character': 'Dumbledore', 'Sentence': 'Ah Professor I would trust Hagrid with my life', 'Line_number': 8}
{'Character': 'Hagrid', 'Sentence': 'Professor Dumbledore sir', 'Line_number': 9}


In [12]:
response = client.delete(index = index, id = 1)
print(response["result"])

deleted


And now let's index all our Harry Potter script lines.

In [13]:

from elasticsearch.helpers import bulk

response = bulk(client = client, index = index, actions = iter(hp_script_docs), stats_only = True )

And that's it! Let's see if the bulk ingest worked by doing a general search of all index documents:

### Index Searches

In [14]:
response = client.search(index = index)

print("We get back {total} results, here are the top ones:".format(total=response["hits"]['total']['value']))
for hit in response["hits"]["hits"]:
    print(hit['_source'])

We get back 3174 results, here are the top ones:
{'Character': 'Dumbledore', 'Sentence': 'I shouldve known that you would be here Professor McGonagall', 'Line_number': 0}
{'Character': 'McGonagall', 'Sentence': 'Good evening Professor Dumbledore', 'Line_number': 1}
{'Character': 'McGonagall', 'Sentence': 'Are the rumors true Albus', 'Line_number': 2}
{'Character': 'Dumbledore', 'Sentence': 'Im afraid so professor', 'Line_number': 3}
{'Character': 'Dumbledore', 'Sentence': 'The good and the bad', 'Line_number': 4}
{'Character': 'McGonagall', 'Sentence': 'And the boy', 'Line_number': 5}
{'Character': 'Dumbledore', 'Sentence': 'Hagrid is bringing him', 'Line_number': 6}
{'Character': 'McGonagall', 'Sentence': 'Do you think it wise to trust Hagrid with something as important as this', 'Line_number': 7}
{'Character': 'Dumbledore', 'Sentence': 'Ah Professor I would trust Hagrid with my life', 'Line_number': 8}
{'Character': 'Hagrid', 'Sentence': 'Professor Dumbledore sir', 'Line_number': 9}


### Using NLPs to drive searches
Natural language search, also known as “conversational search” or natural language processing search, lets users perform a search in everyday language. For example, instead of searching for “vitamin b complex” and then adjusting filters to show results under $40, a user can type or speak “I want vitamin b complex for under $40.” And attractive, relevant results will be returned. 

We can see this already in our searches where NLP techniques retrieve similar results

In [15]:
response = client.search(index = index, query={
    "match" : {
        "Sentence" : "shouldn't have said that"
    }
})

print("We get back {total} results, here are the top ones:".format(total=response["hits"]['total']['value']))
for hit in response["hits"]["hits"]:
    print(hit["_score"], hit['_source'])

We get back 314 results, here are the top ones:
11.80299 {'Character': 'Hagrid', 'Sentence': 'I shouldnt have said that', 'Line_number': 961}
11.80299 {'Character': 'Hagrid', 'Sentence': 'I shouldnt have said that', 'Line_number': 961}
10.997591 {'Character': 'Hagrid', 'Sentence': 'I should not have said that', 'Line_number': 962}
10.997591 {'Character': 'Hagrid', 'Sentence': 'I should not have said that', 'Line_number': 963}
10.997591 {'Character': 'Hagrid', 'Sentence': 'I should not have said that', 'Line_number': 962}
10.997591 {'Character': 'Hagrid', 'Sentence': 'I should not have said that', 'Line_number': 963}
7.5856233 {'Character': 'Hagrid', 'Sentence': 'Shouldnta said that  No more questions', 'Line_number': 945}
7.5856233 {'Character': 'Hagrid', 'Sentence': 'Shouldnta said that  No more questions', 'Line_number': 945}
6.296635 {'Character': 'Neville', 'Sentence': 'She said that shed been in there all afternooncrying', 'Line_number': 800}
6.296635 {'Character': 'Neville', 'Sen