# OpenSearch + Python 101

## 0️⃣ Prerequisites

Before you begin, a few things must be in place. Follow these steps: 
1. Install docker.
2. Download the data.
3. Create a virtual environment and install the required packages. You can create one with venv by running these commands in the terminal:
```shell
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install pandas==1.4.3 notebook==6.3.0 opensearch-py==2.0.0
```

## 1️⃣ Run a Local OpenSearch Cluster

Using Docker is the simplest method for running OpenSearch locally. Run the following command in a terminal to launch a single-node cluster:

```shell
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:2.2.0
```


## 2️⃣ Connect to Your Cluster

In [1]:
from opensearchpy import OpenSearch

client = OpenSearch(
    hosts = [{"host": "localhost", "port": 9200}],
    http_auth = ("admin", "admin"),
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False,
)

In [2]:
client.info()

{'name': 'b4a9c514a1a6',
 'cluster_name': 'docker-cluster',
 'cluster_uuid': 'o0AejnfGRdOgIrBmesXuJQ',
 'version': {'distribution': 'opensearch',
  'number': '2.2.0',
  'build_type': 'tar',
  'build_hash': 'b1017fa3b9a1c781d4f34ecee411e0cdf930a515',
  'build_date': '2022-08-09T02:28:05.169390805Z',
  'build_snapshot': False,
  'lucene_version': '9.3.0',
  'minimum_wire_compatibility_version': '7.10.0',
  'minimum_index_compatibility_version': '7.0.0'},
 'tagline': 'The OpenSearch Project: https://opensearch.org/'}

## 3️⃣ Read the data

In [3]:
import pandas as pd

df = (
	pd.read_csv("wiki_movie_plots_deduped.csv")
    .dropna()
    .sample(5000, random_state=42)
)

## 4️⃣ Create an index

In [4]:
body = {
    "mappings":{
        "properties": {
            "title": {"type": "text", "analyzer": "english"},
            "ethnicity": {"type": "text", "analyzer": "standard"},
            "director": {"type": "text", "analyzer": "standard"},
            "cast": {"type": "text", "analyzer": "standard"},
            "genre": {"type": "text", "analyzer": "standard"},
            "plot": {"type": "text", "analyzer": "english"},
            "year": {"type": "integer"},
            "wiki_page": {"type": "keyword"}
        }
    }
}
response = client.indices.create("movies", body=body)

## 5️⃣ Add data yo your index

### Using `client.index()`

In [5]:
for i, row in df.iterrows():
    body = {
        "title": row["Title"],
        "ethnicity": row["Origin/Ethnicity"],
        "director": row["Director"],
        "cast": row["Cast"],
        "genre": row["Genre"],
        "plot": row["Plot"],
        "year": row["Release Year"],
        "wiki_page": row["Wiki Page"]
    }    
    client.index(index="movies", id=i, body=body)

### Using `bulk()`

In [6]:
from opensearchpy.helpers import bulk

bulk_data = []
for i,row in df.iterrows():
    bulk_data.append(
        {
            "_index": "movies",
            "_id": i,
            "_source": {        
                "title": row["Title"],
                "ethnicity": row["Origin/Ethnicity"],
                "director": row["Director"],
                "cast": row["Cast"],
                "genre": row["Genre"],
                "plot": row["Plot"],
                "year": row["Release Year"],
                "wiki_page": row["Wiki Page"],
            }
        }
    )
bulk(client, bulk_data)

(5000, [])

In [7]:
client.indices.refresh(index="movies")
client.cat.count(index="movies", format="json")

[{'epoch': '1661959644', 'timestamp': '15:27:24', 'count': '5000'}]

## 6️⃣ Search your data


In [8]:
resp = client.search(
    index="movies",
    body={
        "query": {
            "bool": {
                "must": {
                    "match_phrase": {
                        "cast": "jack nicholson",
                    }
                },
                "filter": {"bool": {"must_not": {"match_phrase": {"director": "tim burton"}}}},
            },
        },            
    }
)
resp

{'took': 4,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 7, 'relation': 'eq'},
  'max_score': 10.95239,
  'hits': [{'_index': 'movies',
    '_id': '8812',
    '_score': 10.95239,
    '_source': {'title': 'The King of Marvin Gardens',
     'ethnicity': 'American',
     'director': 'Bob Rafelson',
     'cast': 'Jack Nicholson, Ellen Burstyn, Bruce Dern',
     'genre': 'drama',
     'plot': "David and Jason are estranged brothers, the former a depressive living with his grandfather in Philadelphia where he runs a late-night radio talk show and the latter an extrovert con man working for gang boss Lewis in Atlantic City, where he lives with the manic-depressive Sally, former beauty queen and prostitute, and her stepdaughter Jessica. Begging David to come to Atlantic City and bail him out of jail, Jason once freed persuades him to stay on in his hotel suite with the two women.\r\nTensions grow between the four as Jas

## 7️⃣ Delete documents from the index

In [9]:
client.delete(index = "movies", id = "9140")

{'_index': 'movies',
 '_id': '9140',
 '_version': 3,
 'result': 'deleted',
 '_shards': {'total': 2, 'successful': 1, 'failed': 0},
 '_seq_no': 10000,
 '_primary_term': 1}

## 8️⃣ Delete an index

In [10]:
client.indices.delete(index='movies')

{'acknowledged': True}