# OpenSearch + Python 101

## 0️⃣ Prerequisites

Before you begin, a few things must be in place. Follow these steps: 
1. Install docker.
2. Download the data.
3. Create a virtual environment and install the required packages. You can create one with venv by running these commands in the terminal:
```shell
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install pandas==1.4.3 notebook==6.3.0 opensearch-py==2.0.0
```

## 1️⃣ Run a Local OpenSearch Cluster

Using Docker is the simplest method for running OpenSearch locally. Run the following command in a terminal to launch a single-node cluster:

```shell
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:2.2.0
```


## 2️⃣ Connect to Your Cluster

In [None]:
from opensearchpy import OpenSearch

client = OpenSearch(
    hosts = [{"host": "localhost", "port": 9200}],
    http_auth = ("admin", "admin"),
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False,
)

In [None]:
client.info()

## 3️⃣ Read the data

In [None]:
import pandas as pd

df = (
	pd.read_csv("wiki_movie_plots_deduped.csv")
    .dropna()
    .sample(5000, random_state=42)
    .reset_index(drop=True)
)

## 4️⃣ Create an index

In [None]:
body = {
    "mappings":{
        "properties": {
            "title": {"type": "text", "analyzer": "english"},
            "ethnicity": {"type": "text", "analyzer": "standard"},
            "director": {"type": "text", "analyzer": "standard"},
            "cast": {"type": "text", "analyzer": "standard"},
            "genre": {"type": "text", "analyzer": "standard"},
            "plot": {"type": "text", "analyzer": "english"},
            "year": {"type": "integer"},
            "wiki_page": {"type": "keyword"}
        }
    }
}
response = client.indices.create("movies", body=body)

## 5️⃣ Add data yo your index

### Using `client.index()`

In [None]:
for i, row in df.iterrows():
    body = {
        "title": row["Title"],
        "ethnicity": row["Origin/Ethnicity"],
        "director": row["Director"],
        "cast": row["Cast"],
        "genre": row["Genre"],
        "plot": row["Plot"],
        "year": row["Release Year"],
        "wiki_page": row["Wiki Page"]
    }    
    client.index(index="movies", id=i, body=body)

### Using `bulk()`

In [None]:
from opensearchpy.helpers import bulk

bulk_data = []
for i,row in df.iterrows():
    bulk_data.append(
        {
            "_index": "movies",
            "_id": i,
            "_source": {        
                "title": row["Title"],
                "ethnicity": row["Origin/Ethnicity"],
                "director": row["Director"],
                "cast": row["Cast"],
                "genre": row["Genre"],
                "plot": row["Plot"],
                "year": row["Release Year"],
                "wiki_page": row["Wiki Page"],
            }
        }
    )
bulk(client, bulk_data)

In [None]:
client.indices.refresh(index="movies")
client.cat.count(index="movies", format="json")

## 6️⃣ Search your data


In [None]:
resp = client.search(
    index="movies",
    body={
        "query": {
            "bool": {
                "must": {
                    "match_phrase": {
                        "cast": "jack nicholson",
                    }
                },
                "filter": {"bool": {"must_not": {"match_phrase": {"director": "tim burton"}}}},
            },
        },            
    }
)
resp

## 7️⃣ Delete documents from the index

In [None]:
client.delete(index = "movies", id = "2500")

## 8️⃣ Delete an index

In [None]:
client.indices.delete(index='movies')