# **Simple Chain**

In [None]:
# Import required modules for using HuggingFace LLMs, environment variables, prompt templates, and output parsers.
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os
# Load environment variables (such as API keys) from a .env file.
load_dotenv()

True

In [None]:
# Initialize the HuggingFaceEndpoint with the Meta-Llama model for text generation.

llm = HuggingFaceEndpoint(
    repo_id = "meta-llama/Meta-Llama-3-8B-Instruct",
    task = 'text-generation'
)

# Wrap the endpoint in a ChatHuggingFace object for chat-style interaction.
model = ChatHuggingFace(llm=llm)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Create a prompt template to generate 5 interesting facts about a given topic.

prompt = PromptTemplate(
    template="Generate 5 interesting facts about {topic}",
    input_variables=["topic"],

)

In [None]:
# Create a string output parser to extract plain text from the model's response.

parser = StrOutputParser()

In [None]:
# Build a chain that sequentially applies the prompt, model, and parser.

chain = prompt | model | parser

In [None]:
# Invoke the chain with the topic "Python programming language" and print the generated facts.

result = chain.invoke({"topic": "Python programming language"})

print(result)

Here are 5 interesting facts about the Python programming language:

**1. Python was named after a British comedy group**
The Python programming language was named after the British comedy group Monty Python's Flying Circus, which was known for its surreal and often absurd sense of humor. Guido van Rossum, the creator of Python, was a fan of the group and chose the name as a nod to their creative and unconventional approach to comedy.

**2. Python was created in just 12 weeks**
Guido van Rossum started working on Python in December 1989 and released the first version, Python 0.9.1, in February 1991. He spent only 12 weeks developing the language, which is incredibly fast considering the complexity of the task. Van Rossum has said that he wanted to create a language that was easy to learn and use, and he achieved this goal with remarkable speed.

**3. Python is a dynamically-typed language**
Unlike languages like Java and C++, which are statically-typed, Python is dynamically-typed. Thi

In [None]:
# Install the grandalf package, which is used for graph visualization of chains.
#!pip install grandalf

Collecting grandalf
  Downloading grandalf-0.8-py3-none-any.whl.metadata (1.7 kB)
Downloading grandalf-0.8-py3-none-any.whl (41 kB)
Installing collected packages: grandalf
Successfully installed grandalf-0.8


In [None]:
# Visualize the structure of the simple chain as an ASCII graph.

chain.get_graph().print_ascii()

     +-------------+       
     | PromptInput |       
     +-------------+       
            *              
            *              
            *              
    +----------------+     
    | PromptTemplate |     
    +----------------+     
            *              
            *              
            *              
   +-----------------+     
   | ChatHuggingFace |     
   +-----------------+     
            *              
            *              
            *              
   +-----------------+     
   | StrOutputParser |     
   +-----------------+     
            *              
            *              
            *              
+-----------------------+  
| StrOutputParserOutput |  
+-----------------------+  


# **Sequential Chain**

In [None]:
# Import required modules for sequential chain construction and environment setup.

from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from dotenv import load_dotenv

# Load environment variables (such as API keys) from a .env file.

load_dotenv()

True

In [None]:
# Initialize the HuggingFaceEndpoint with the Meta-Llama model for text generation.

llm = HuggingFaceEndpoint(
    repo_id = "meta-llama/Meta-Llama-3-8B-Instruct",
    task = 'text-generation'
)

# Wrap the endpoint in a ChatHuggingFace object for chat-style interaction.
model = ChatHuggingFace(llm=llm)

In [None]:
# Create the first prompt template to generate a detailed report on a given topic.

prompt1 = PromptTemplate(
    template="Generate a detailed report on {topic}",
    input_variables=["topic"],
)


In [None]:
# Create the second prompt template to generate a 5-point summary from a given text.

prompt2 = PromptTemplate(
    template="Generate a 5 pointer summary from the following text \n {text}",
    input_variables=["text"],
)

In [None]:
# Create a string output parser to extract plain text from the model's response.

parser = StrOutputParser()

In [None]:
# Build a sequential chain that generates a report, then summarizes it in 5 points.

chain = prompt1 | model | parser | prompt2 | model | parser

In [None]:
# Invoke the sequential chain with the topic "importance of SQL in data science" and print the summary.

result = chain.invoke({"topic": "importance of SQL in data science",})

print(result)

Here is a 5-point summary of the report:

**Point 1: Importance of SQL in Data Science**
SQL is a fundamental tool in data science, enabling data scientists to manipulate, analyze, and integrate large datasets. Its importance lies in its ability to extract insights from data, inform business decisions, and gain a competitive advantage.

**Point 2: Applications of SQL in Data Science**
SQL is used in various applications, including business intelligence, data mining, data warehousing, machine learning, and data science tools. Its applications are diverse and essential for data scientists to work efficiently and accurately.

**Point 3: Benefits of SQL in Data Science**
The benefits of using SQL in data science include improved efficiency, increased accuracy, enhanced decision making, cost savings, and a competitive advantage. SQL enables data scientists to work more efficiently, reducing the time and effort required to analyze data.

**Point 4: Recommendations for Using SQL in Data Scien

In [None]:
# Visualize the structure of the sequential chain as an ASCII graph.

chain.get_graph().print_ascii()

     +-------------+       
     | PromptInput |       
     +-------------+       
            *              
            *              
            *              
    +----------------+     
    | PromptTemplate |     
    +----------------+     
            *              
            *              
            *              
   +-----------------+     
   | ChatHuggingFace |     
   +-----------------+     
            *              
            *              
            *              
   +-----------------+     
   | StrOutputParser |     
   +-----------------+     
            *              
            *              
            *              
+-----------------------+  
| StrOutputParserOutput |  
+-----------------------+  
            *              
            *              
            *              
    +----------------+     
    | PromptTemplate |     
    +----------------+     
            *              
            *              
            *       

# **Parallel Chain RunnableParallel**

In [None]:
# Import required modules for parallel chain construction, prompt templates, and output parsing.

from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnableParallel
from dotenv import load_dotenv

# Load environment variables (such as API keys) from a .env file.

load_dotenv()

True

In [None]:
# Initialize the first HuggingFaceEndpoint and model for generating notes.
llm = HuggingFaceEndpoint(
    repo_id = "meta-llama/Meta-Llama-3-8B-Instruct",
    task = 'text-generation'
)

model1 = ChatHuggingFace(llm=llm)

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Initialize the second HuggingFaceEndpoint and model for generating quiz questions.

llm = HuggingFaceEndpoint(
    repo_id = "mistralai/Mixtral-8x7B-Instruct-v0.1",
    task = 'text-generation'
)

model2 = ChatHuggingFace(llm=llm)

In [None]:
# Create a prompt template for generating short and simple notes from a given text.

prompt1 = PromptTemplate(
    template="Generate short and simple notes from the following text \n {text}",
    input_variables=['text']

)

In [None]:
# Create a prompt template for generating 5 short question-answers from a given text.

prompt2 = PromptTemplate(
    template='Generate 5 short question answers from the following text \n {text}',
    input_variables=['text']
)

In [None]:
#  Create a prompt template for merging notes and quiz into a single document.
prompt3 = PromptTemplate(
    template = "Merge the provided notes and quiz into a single document \n notes -> {notes} and quiz -> {quiz}",
    input_variables=['notes', 'quiz']
)

In [None]:
# Create a string output parser to extract plain text from the model's response.

parser = StrOutputParser()

In [None]:
# Build a parallel chain that generates notes and quiz in parallel using two different models.

parallel_chain = RunnableParallel({
    "notes" : prompt1 | model1 | parser,
    'quiz': prompt2 | model2 | parser
})

In [None]:
# Build a chain to merge the notes and quiz into a single document.

merge_chain = prompt3 | model1 | parser

In [None]:
# Combine the parallel and merge chains into a single workflow.

chain = parallel_chain | merge_chain

In [None]:
# Define a sample text about decision trees to be used as input for the chain.

text = """
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. A tree can be seen as a piecewise constant approximation.

For instance, in the example below, decision trees learn from data to approximate a sine curve with a set of if-then-else decision rules. The deeper the tree, the more complex the decision rules and the fitter the model.


Some advantages of decision trees are:

Simple to understand and to interpret. Trees can be visualized.

Requires little data preparation. Other techniques often require data normalization, dummy variables need to be created and blank values to be removed. Some tree and algorithm combinations support missing values.

The cost of using the tree (i.e., predicting data) is logarithmic in the number of data points used to train the tree.

Able to handle both numerical and categorical data. However, the scikit-learn implementation does not support categorical variables for now. Other techniques are usually specialized in analyzing datasets that have only one type of variable. See algorithms for more information.

Able to handle multi-output problems.

Uses a white box model. If a given situation is observable in a model, the explanation for the condition is easily explained by boolean logic. By contrast, in a black box model (e.g., in an artificial neural network), results may be more difficult to interpret.

Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.

Performs well even if its assumptions are somewhat violated by the true model from which the data were generated.

The disadvantages of decision trees include:

Decision-tree learners can create over-complex trees that do not generalize the data well. This is called overfitting. Mechanisms such as pruning, setting the minimum number of samples required at a leaf node or setting the maximum depth of the tree are necessary to avoid this problem.

Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. This problem is mitigated by using decision trees within an ensemble.

Predictions of decision trees are neither smooth nor continuous, but piecewise constant approximations as seen in the above figure. Therefore, they are not good at extrapolation.

The problem of learning an optimal decision tree is known to be NP-complete under several aspects of optimality and even for simple concepts. Consequently, practical decision-tree learning algorithms are based on heuristic algorithms such as the greedy algorithm where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. This can be mitigated by training multiple trees in an ensemble learner, where the features and samples are randomly sampled with replacement.

There are concepts that are hard to learn because decision trees do not express them easily, such as XOR, parity or multiplexer problems.

Decision tree learners create biased trees if some classes dominate. It is therefore recommended to balance the dataset prior to fitting with the decision tree.
"""

In [None]:
# Invoke the combined chain with the sample text to generate notes and quiz, then merge them, and print the result.

result = chain.invoke({'text':text })

print(result)

Here is the merged document:

**Advantages and Disadvantages of Decision Trees**

**Advantages of Decision Trees:**

1. Simple to understand and interpret.
2. Requires little data preparation.
3. Can handle both numerical and categorical data.
4. Can handle multi-output problems.
5. Uses a white box model, making it easy to explain decisions.
6. Can be validated using statistical tests.
7. Performs well even with assumptions violated.

**Disadvantages of Decision Trees:**

1. Can create over-complex trees that don't generalize well (overfitting).
2. Can be unstable due to small variations in data.
3. Predictions are piecewise constant, not smooth or continuous.
4. Not good at extrapolation.
5. Problem of learning optimal decision tree is NP-complete.
6. Can create biased trees if some classes dominate.
7. Some concepts are hard to learn with decision trees (e.g., XOR, parity, multiplexer problems).

**Quiz**

1. What is the goal of Decision Trees in supervised learning?
The goal of Dec

In [None]:
# Visualize the structure of the parallel and merge chain as an ASCII graph.

chain.get_graph().print_ascii()

              +---------------------------+              
              | Parallel<notes,quiz>Input |              
              +---------------------------+              
                  ***               ***                  
               ***                     ***               
             **                           **             
+----------------+                    +----------------+ 
| PromptTemplate |                    | PromptTemplate | 
+----------------+                    +----------------+ 
          *                                   *          
          *                                   *          
          *                                   *          
+-----------------+                  +-----------------+ 
| ChatHuggingFace |                  | ChatHuggingFace | 
+-----------------+                  +-----------------+ 
          *                                   *          
          *                                   *          
          *   