# LlamaIndex LlamaIndex Doc-Metadata : LlamaIndex Intro. Tutorial
Alejandro Ricciardi (Omegapy)  
created date: 12/26/2023 
GitHub: https://github.com/Omegapy

Projects Description:
Load data and create document objects in LlamaIndex.
Specifically markdown files for now. And read Metadata.

Initialization
- For markdown Docs. Reader see llama_docs_bot/markdown_docs_reader.py (it Extract text from markdown files into Document objects.)
- Load markdown docs from a director function
- loads project documents from each folder

Read Metadata
- MetadataMode make printed file look nice
- Improve the Metadata print formatting 

Metadata Printing Advanced Customization
- Hide the File Name from the LLM
- Hide the File Name from the Embedding Model 

credit: LlamaIndex https://www.youtube.com/watch?v=nGNoacku0YY

## Initialization

### Markdown Docs Reader (MarkdownDocsReader.py - class)
LlamaIndex has provided an implementation of a custom markdown loaded in the source code. Let's test it out to see how it works!

In [24]:
from llama_docs_bot.markdown_docs_reader import MarkdownDocsReader

In [25]:
from llama_index import SimpleDirectoryReader

#### Load markdown docs from a directory, excluding all other file types.

In [26]:
def load_markdown_docs(filepath):
    loader = SimpleDirectoryReader(
        input_dir=filepath, 
        exclude=["*.rst", "*.ipynb", "*.py", "*.bat", "*.txt", "*.png", "*.jpg", "*.jpeg", "*.csv", "*.html", "*.js", "*.css", "*.pdf", "*.json"],
        file_extractor={".md": MarkdownDocsReader()},
        recursive=True
    )

    return loader.load_data()

### loads project documents from each folder.
separate for now, in order to create separate indexes later

In [27]:
getting_started_docs = load_markdown_docs("docs/getting_started")
community_docs = load_markdown_docs("docs/community")
data_docs = load_markdown_docs("docs/core_modules/data_modules")
agent_docs = load_markdown_docs("docs/core_modules/agent_modules")
model_docs = load_markdown_docs("docs/core_modules/model_modules")
query_docs = load_markdown_docs("docs/core_modules/query_modules")
supporting_docs = load_markdown_docs("docs/core_modules/supporting_modules")
tutorials_docs = load_markdown_docs("docs/end_to_end_tutorials")
contributing_docs = load_markdown_docs("docs/development")

## Read Metadata

### MetadataMode make printed file look nice

In [28]:
# Not formated with MetadataMode
print(agent_docs[5])

Doc ID: f527eab6-b9bb-4b45-b7d9-ddc48b1b1ce3
Text: You can learn more about our Tool abstractions in our Tools
section.


In [29]:
from llama_index.schema import MetadataMode

In [30]:
# Formated with MetadataMode
print(agent_docs[5].get_content(metadata_mode=MetadataMode.ALL))

File Name: docs\core_modules\agent_modules\agents\root.md
Content Type: text
Header Path: Data Agents/Concept/Tool Abstractions
Links: 
file_path: docs\core_modules\agent_modules\agents\root.md
file_name: root.md
file_type: None
file_size: 2409
creation_date: 2023-12-26
last_modified_date: 2023-12-23
last_accessed_date: 2023-12-26

You can learn more about our Tool abstractions in our Tools section.


### Improve the Metadata print formatting 

In [31]:
text_template = "Content Metadata:\n{metadata_str}\n\nContent:\n{content}"

metadata_template = "{key}: {value},"
metadata_seperator= " "

for doc in agent_docs:
    doc.text_template = text_template
    doc.metadata_template = metadata_template
    doc.metadata_seperator = metadata_seperator

In [32]:
print(agent_docs[0].get_content(metadata_mode=MetadataMode.ALL))

Content Metadata:
File Name: docs\core_modules\agent_modules\agents\modules.md, Content Type: text, Header Path: Module Guides, Links: , file_path: docs\core_modules\agent_modules\agents\modules.md, file_name: modules.md, file_type: None, file_size: 646, creation_date: 2023-12-26, last_modified_date: 2023-12-23, last_accessed_date: 2023-12-26,

Content:
These guide provide an overview of how to use our agent classes.

For more detailed guides on how to use specific tools, check out our tools module guides.


## Metadata Printing Advanced Customization

### Hide the File Name from the LLM

In [33]:
agent_docs[0].excluded_llm_metadata_keys = ["file_name"]
print(agent_docs[0].get_content(metadata_mode=MetadataMode.LLM))

Content Metadata:
File Name: docs\core_modules\agent_modules\agents\modules.md, Content Type: text, Header Path: Module Guides, Links: , file_path: docs\core_modules\agent_modules\agents\modules.md, file_type: None, file_size: 646, creation_date: 2023-12-26, last_modified_date: 2023-12-23, last_accessed_date: 2023-12-26,

Content:
These guide provide an overview of how to use our agent classes.

For more detailed guides on how to use specific tools, check out our tools module guides.


### Hide the File Name from the embedding model

In [34]:
agent_docs[0].excluded_embed_metadata_keys = ["file_name"]
print(agent_docs[0].get_content(metadata_mode=MetadataMode.EMBED))

Content Metadata:
File Name: docs\core_modules\agent_modules\agents\modules.md, Content Type: text, Header Path: Module Guides, Links: , file_path: docs\core_modules\agent_modules\agents\modules.md, file_type: None, file_size: 646, creation_date: 2023-12-26, last_modified_date: 2023-12-23, last_accessed_date: 2023-12-26,

Content:
These guide provide an overview of how to use our agent classes.

For more detailed guides on how to use specific tools, check out our tools module guides.
