- [Get Started](https://python.langchain.com/docs/expression_language/get_started)
- [Retrieval](https://python.langchain.com/docs/expression_language/cookbook/retrieval)

### GitHub Token from File 
*⚠️ Add `TOKEN.txt` to .gitignore* 

Create a GitHub Token using this [Link](https://github.com/settings/tokens/new) and configure it with the following parameters:
* `read:packages`
* `read:org`
* `read:discussion`
* `read:project`

In [1]:
access_token = open('TOKEN.txt', 'r').read()

## Import Markdowns

In [2]:
org_name = 'dev-launchers'

import requests

def get_organization_repositories(org_name, access_token):
    headers = {
        'Authorization': f'token {access_token}',
        'Accept': 'application/vnd.github.v3+json'
    }

    params = {
        'sort': 'updated',
        'direction': 'desc',
        'per_page': 100
    }

    # Retrieve the list of repositories for the organization
    response = requests.get(f'https://api.github.com/orgs/{org_name}/repos', headers=headers, params=params)

    if response.status_code == 200:
        repositories = response.json()
        return repositories
    else:
        print(f'Error {response.status_code}: Unable to retrieve organization repositories.')
        return None

repositories = get_organization_repositories(org_name, access_token)

repo_readme_list = []

if repositories:
    for repo in repositories:
        # Create Links
        repo_readme = repo['html_url'].replace("https://github.com", "https://raw.githubusercontent.com") + "/main/README.md"
        # Adding to List
        repo_readme_list.append(repo_readme)

import os
import requests

def download_files(urls, folder_name="Folder"):
    # Create the folder if it does not exist
    if not os.path.exists(folder_name):
        os.makedirs(folder_name)

    for url in urls:
        # Get the file name from the URL
        file_name = url.split("/")[-3]
        
        # Concatenate the full file path
        file_path = os.path.join(folder_name, file_name)

        # Download the file
        response = requests.get(url)
        if response.status_code == 200:
            # Write the content into the local file
            with open(f"{file_path}.md", 'wb') as f:
                f.write(response.content)
            print(f"The file {file_name} has been successfully downloaded.")
        else:
            print(f"Failed to download the file {file_name}.")


download_files(repo_readme_list)

The file strapi has been successfully downloaded.
The file strapiv4 has been successfully downloaded.
Failed to download the file dev-launchers-platform.
Failed to download the file react-course-finals.
Failed to download the file discord-gateway.
Failed to download the file auth-proxy.
The file onboarding-bot-model has been successfully downloaded.
The file webhook-workers has been successfully downloaded.
Failed to download the file VictorDiniz89.
The file onboarding-bot has been successfully downloaded.
The file platform__dl-edu has been successfully downloaded.
The file minecraft__dev-launchers-library has been successfully downloaded.
The file community-minecraft has been successfully downloaded.
Failed to download the file stories.
Failed to download the file monorepo.
Failed to download the file platform__dl-ideas.
The file project__mhw-guides has been successfully downloaded.
The file devbots__general has been successfully downloaded.
Failed to download the file Dev-Recruiters.

## File Directory

In [3]:
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import MarkdownHeaderTextSplitter

# Load Files 
loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader, show_progress=True, use_multithreading=True)
documents = loader.load()


# Split Markdowns by titles and subtitles 
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]

markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
docs = []

for document in documents:
    docs.extend(markdown_splitter.split_text(document.page_content))

100%|██████████| 27/27 [00:00<00:00, 2666.75it/s]


## Embeddings

In [4]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings()

  return self.fget.__get__(instance, owner)()


## Vectorstore

In [11]:
from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(docs, embeddings)

#retriever = vectorstore.as_retriever()
retriever = vectorstore.as_retriever(search_kwargs={"k": 1}) # Limit to First top document correlation

## Model

- https://python.langchain.com/docs/integrations/chat/huggingface
- https://python.langchain.com/docs/integrations/llms/huggingface_pipelines#create-chain

In [1]:
# from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
# from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# model_id = "databricks/dolly-v2-3b"
# tokenizer = AutoTokenizer.from_pretrained(model_id)
# model = AutoModelForCausalLM.from_pretrained(model_id)
# pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10)
# hf = HuggingFacePipeline(pipeline=pipe)

from langchain_community.llms import HuggingFaceHub
import dotenv

dotenv.load_dotenv()

hf = HuggingFaceHub(repo_id="databricks/dolly-v2-3b", 
                    model_kwargs={"temperature": 0.5, 
                                  "max_new_tokens": 64},
                    )

  warn_deprecated(


## Prompt

In [12]:
# https://python.langchain.com/docs/expression_language/get_started

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()

In [13]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | hf | output_parser

## Input 

In [14]:
query = "How install huggingface hub"

truc = chain.invoke(query)

In [15]:
truc

"Human: Answer the question based only on the following context:\n[Document(page_content='```shell\\npip install huggingface_hub\\npip install transformers\\n```', metadata={'Header 1': 'Install'})]\n\nQuestion: How install huggingface hub\n\nAnswer: pip install huggingface_hub\n\nDocument 2:\n[Document(page_content='```shell\\npip install huggingface_hub\\npip install transformers\\n```', metadata={'Header 1': 'Install'})]\n\nQuestion: How install huggingface"