# Document

> `Document`对象是`elasticsearch_dsl`中的一个类，用于表示`Elasticsearch`中的文档。它的参数可以包括文档中的`字段`、`类型`和`属性`等信息。

In [None]:
from elasticsearch_dsl import (
    connections, Index, Document, Text, Integer, Boolean, Keyword)

In [None]:
# 连接到 Elasticsearch
connection = connections.create_connection(hosts=["localhost"])

# 创建`Document`

以下是一个示例，演示如何使用`Document`对象创建一个`Elasticsearch`文档：

In [None]:
# 定义索引和映射
class Books(Document):
    title = Text()
    description = Text()
    category = Keyword()
    price = Integer()
    in_stock = Boolean()

    class Index:
        name = "books"

In [None]:
# 如果存在则删除已经创建的索引
if Index('books').exists():
    Index("books").delete()

在这个示例中，我们定义了一个名为`Books`的`Document`类，它具有`title、description、category、price和in_stock`五个字段。其中，

1. `Text`类型：`title、description`
1. `Integer`类型: `price`
1. `Keyword`类型: `category`
1. `Boolean`类型: `in_stock`

# 读取一些数据构建这个文档

In [None]:
import pandas as pd

df_books = pd.read_csv('./data/books.csv', encoding='gbk')
df_books

## 构建文档

实例化`Books`，并设置了`title, description, category, price, in_stock`5个字段的值。

In [None]:
data = df_books.iloc[0, :]

book = Books(
    title=data.title, description=data.description, 
    category=data.category, price=data.price, in_stock=data.in_stock)

## 批量保存

然后调用`.save()`方法将这个文档保存到`Elasticsearch`，下面写了一个`for-loop`，将全部数据保存到`book`index。

In [None]:
for row_no, row in df_books.iterrows():
    book = Books(
        title=row.title, description=row.description, 
        category=row.category, price=row.price, in_stock=row.in_stock)
    book.save()

# 补充

## 创建`Documnet`时设定分词器

In [1]:
from elasticsearch_dsl import connections, Index, Text, analyzer, tokenizer, serializer, Document

# 连接到 Elasticsearch
connection = connections.create_connection(hosts=["localhost"])

In [8]:
# 定义分析器和分词器
tokenizer_ik = tokenizer('ik_smart')
analyzer_ik = analyzer('ik_smart', tokenizer=tokenizer_ik)

# 定义索引
index_name = 'ik_index'
ik_index = Index(index_name)

# 如果存在则删除已经创建的索引
if Index(index_name).exists():
    Index(index_name).delete()

In [9]:
# 定义映射
@ik_index.document
class MyDocument(Document):
    title = Text(analyzer=analyzer_ik, search_analyzer=analyzer_ik)
    content = Text(analyzer=analyzer_ik, search_analyzer=analyzer_ik)

    class Index:
        name = "ik_index"
        settings = {
            "analysis": {
                "analyzer": {
                    "ik_smart": {
                        "tokenizer": "ik_smart"
                    }
                },
                "search_analyzer": {
                    "ik_smart": {
                        "tokenizer": "ik_smart"
                    }
                },
            }
        }

# 创建索引并将映射与索引关联
ik_index.create()

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'ik_index'}

In [4]:
# 查看mapping
ik_index.get_mapping()

{'ik_index': {'mappings': {'properties': {'content': {'type': 'text',
     'analyzer': 'ik_smart'},
    'title': {'type': 'text', 'analyzer': 'ik_smart'}}}}}

In [5]:
# 查看setting
ik_index.get_settings()

{'ik_index': {'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}},
    'number_of_shards': '1',
    'provided_name': 'ik_index',
    'creation_date': '1694076692690',
    'analysis': {'analyzer': {'ik_smart': {'type': 'custom',
       'tokenizer': 'ik_smart'}}},
    'number_of_replicas': '1',
    'uuid': 'e78ga8soQB2XlfZrIOG_QQ',
    'version': {'created': '8070099'}}}}}

# kibana查看

## 索引管理

**在索引管理中可以看到Index：books**

<center>
    <img src='./img/doc_01.png'>
</center>

## 创建数据视图

**在数据视图中创建数据视图**

<center>
    <img src='./img/doc_02.png'>
</center>

## Discover

**在Discover中切换到books，进行数据查看与索引**

<center>
    <img src='./img/doc_03.png'>
</center>

------