In [1]:
!ollama list

NAME               ID              SIZE      MODIFIED     
llama3.2:latest    a80c4f17acd5    2.0 GB    2 weeks ago     
phi3:latest        a2c89ceaed85    2.3 GB    5 months ago    


In [2]:
from langchain.llms import Ollama
from langchain.prompts import StringPromptTemplate, PromptTemplate

# Initialize the LLM (GPT-3.5 or GPT-4)
llm = Ollama(model='llama3.2',temperature=0.7)

# Define a prompt template to generate the book structure in JSON
structure_prompt_template = """
Book Context: {context}
Book Genre: {genre}

Based on the book context and above provided information, create a book structure in JSON format with the following details:
1. Book title (at least 4 titles).
2. Chapters, each with headings and subheadings.



Return the structure in JSON format like this:
{{
  "title": ["Book Title1", "Book Title2", ...],
  "chapters": [
    {{
      "chapter_title": "Chapter 1",
      "headings": [
        {{
          "heading": "Heading 1",
          "subheadings": ["Subheading 1", "Subheading 2"]
        }}
      ]
    }}
    
  }},
    ...
  ]
}}

don't return any other text except the JSON structure so that response can be parsed json.loads() function.
"""

structure_prompt_template = """
Book Context: {context}
Book Genre: {genre}

Using the provided book context and genre, create a complete book structure in JSON format with the following details:

1. Generate at least 4 potential book titles.
2. Organize the book into chapters, where each chapter includes:
   - A chapter title
   - Headings within the chapter
   - Subheadings under each heading

Return the output in the following JSON format:

{{
  "title": ["Book Title1", "Book Title2", "Book Title3", "Book Title4"],
  "chapters": [
    {{
      "chapter_title": "Chapter 1",
      "headings": [
        {{
          "heading": "Heading 1",
          "subheadings": ["Subheading 1", "Subheading 2"]
        }},
        {{
          "heading": "Heading 2",
          "subheadings": ["Subheading 1", "Subheading 2"]
        }}
      ]
    }},
    {{
      "chapter_title": "Chapter 2",
      "headings": [
        {{
          "heading": "Heading 1",
          "subheadings": ["Subheading 1", "Subheading 2"]
        }}
      ]
    }}
  ]
}}

Only return the JSON structure. No other text should be included so it can be easily parsed using the `json.loads()` function.

"""
# Function to generate book structure
def generate_book_structure(context,genre):
    prompt = PromptTemplate(input_variables=["context","genre"], template=structure_prompt_template).format(context=context, genre=genre)
    structure_response = llm(prompt)
    return structure_response


  llm = Ollama(model='llama3.2',temperature=0.7)


In [3]:

# prompt = PromptTemplate(input_variables=["context"], template=structure_prompt_template)
# prompt.invoke({"context":"The book is about a detective who solves crime in sifi world."})
BOOK_CONTEXT = "Techniques used in NLP"
GENRE = "Artificial Intelligence"

p = generate_book_structure(context=BOOK_CONTEXT, genre=GENRE)
print(p)

  structure_response = llm(prompt)


```
{
  "title": [
    "Understanding Natural Language Processing",
    "Techniques and Applications of NLP",
    "Advanced Methods in NLP",
    "Real-world Implementations of NLP"
  ],
  "chapters": [
    {
      "chapter_title": "Introduction to NLP",
      "headings": [
        {
          "heading": "What is NLP?",
          "subheadings": ["Definition", "Purpose"]
        },
        {
          "heading": "History of NLP",
          "subheadings": ["Early Developments", "Current Trends"]
        }
      ]
    },
    {
      "chapter_title": "Text Preprocessing",
      "headings": [
        {
          "heading": "Tokenization",
          "subheadings": ["Types of Tokenization", "Common Challenges"]
        },
        {
          "heading": "Stopword Removal",
          "subheadings": ["Why Remove Stopwords?", "Methods and Tools"]
        }
      ]
    },
    {
      "chapter_title": "Sentiment Analysis",
      "headings": [
        {
          "heading": "Basic Sentiment Analysis 

In [4]:
# convert text to json
# import json
# data = json.loads(p)

import re
import json

# data = json.loads(p)

# def extract_and_parse_json(llm_output):
#     try:
#         # Use regex to find the first occurrence of a JSON object
#         json_match = re.search(r'\{.*?\}', llm_output, re.DOTALL)
        
#         if json_match:
#             # Extract the JSON string
#             json_str = json_match.group(0)
            
#             # Parse the JSON string into a Python dictionary
#             parsed_json = json.loads(json_str)
#             return parsed_json
#         else:
#             print("No valid JSON found in the LLM output.")
#             return None
#     except json.JSONDecodeError as e:
#         print(f"Error parsing JSON: {e}")
#         return None


# def extract_and_parse_json(llm_output):
#     try:
#         # Use regex to find JSON content between triple backticks and curly braces
#         json_match = re.search(r'```.*?(\{.*?\})```', llm_output, re.DOTALL)
        
#         if json_match:
#             # Extract the JSON string
#             json_str = json_match.group(1)  # Group 1 contains the actual JSON
            
#             # Parse the JSON string into a Python dictionary
#             parsed_json = json.loads(json_str)
#             return parsed_json
#         else:
#             print("No valid JSON found in the LLM output.")
#             return None
#     except json.JSONDecodeError as e:
#         print(f"Error parsing JSON: {e}")
#         return None

def extract_and_parse_json(llm_output):
    try:
        # Use regex to find JSON content between triple backticks and curly braces
        json_match = re.search(r'```.*?(\{.*?\})```', llm_output, re.DOTALL)
        
        if json_match:
            # Extract the JSON string
            json_str = json_match.group(1)  # Group 1 contains the actual JSON
            
            # Clean up the JSON string by removing newlines and extra spaces
            json_str = re.sub(r'\s+', ' ', json_str)  # Replace multiple spaces/newlines with a single space
            json_str = json_str.strip()  # Remove leading/trailing whitespace
            
            # Parse the cleaned JSON string into a Python dictionary
            parsed_json = json.loads(json_str)
            return parsed_json
        else:
            print("No valid JSON found in the LLM output.")
            return None
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return None




In [5]:
def parse_json(input_string):
    # Remove leading and trailing whitespace
    input_string = input_string.strip()

    # Check if the input is in the 'json' block format
    if input_string.startswith('```json'):
        # Extract the JSON content between the backticks
        json_content = re.search(r'```json\s*(.*?)\s*```', input_string, re.DOTALL)
        if json_content:
            json_string = json_content.group(1)
        else:
            raise ValueError("Invalid JSON format")
    elif input_string.startswith('```'):
        # Extract the JSON content between the backticks
        json_content = re.search(r'```\s*(.*?)\s*```', input_string, re.DOTALL)
        if json_content:
            json_string = json_content.group(1)
        else:
            raise ValueError("Invalid JSON format")
    else:
        # Treat the input as normal JSON
        json_string = input_string

    # Attempt to parse the JSON
    try:
        return json.loads(json_string)
    except json.JSONDecodeError as e:
        raise ValueError(f"Failed to decode JSON: {e}")


In [6]:

data = parse_json(p)
print(data)

{'title': ['Understanding Natural Language Processing', 'Techniques and Applications of NLP', 'Advanced Methods in NLP', 'Real-world Implementations of NLP'], 'chapters': [{'chapter_title': 'Introduction to NLP', 'headings': [{'heading': 'What is NLP?', 'subheadings': ['Definition', 'Purpose']}, {'heading': 'History of NLP', 'subheadings': ['Early Developments', 'Current Trends']}]}, {'chapter_title': 'Text Preprocessing', 'headings': [{'heading': 'Tokenization', 'subheadings': ['Types of Tokenization', 'Common Challenges']}, {'heading': 'Stopword Removal', 'subheadings': ['Why Remove Stopwords?', 'Methods and Tools']}]}, {'chapter_title': 'Sentiment Analysis', 'headings': [{'heading': 'Basic Sentiment Analysis Techniques', 'subheadings': ['Rule-based Methods', 'Machine Learning Approaches']}, {'heading': 'Advanced Sentiment Analysis Models', 'subheadings': ['Deep Learning Architectures', 'Ensemble Methods']}]}, {'chapter_title': 'Named Entity Recognition', 'headings': [{'heading': 'Ba

In [7]:
for chapter in data['chapters']:
    print(chapter['chapter_title'])
    

Introduction to NLP
Text Preprocessing
Sentiment Analysis
Named Entity Recognition


In [69]:
# # Define a prompt template to generate chapter content in markdown
# chapter_content_prompt_template = """
# Based on the following chapter structure, generate a detailed content in markdown format.

# Chapter Title: {chapter_title}
# Agenda: A brief description of the agenda for this chapter.

# Headings and subheadings:
# {headings}

# Return the content in markdown format, structured by headings and subheadings.
# """

# # Function to generate chapter content
# def generate_chapter_content(chapter_title, headings):
#     prompt = PromptTemplate(input_variables=["chapter_title", "headings"], template=chapter_content_prompt_template).format(
#         chapter_title=chapter_title, headings=headings)
#     chapter_content = llm(prompt)
#     return chapter_content


In [None]:
# import docx

# # Function to save book content to Word
# def save_to_word(book_title, chapters):
#     doc = docx.Document()
    
#     # Set the title of the book
#     doc.add_heading(book_title, 0)
    
#     for chapter in chapters:
#         doc.add_heading(chapter['chapter_title'], level=1)
        
#         for heading in chapter['headings']:
#             doc.add_heading(heading['heading'], level=2)
            
#             if 'subheadings' in heading:
#                 for subheading in heading['subheadings']:
#                     doc.add_heading(subheading, level=3)
#                     # Insert content (in real use case, this would be the generated content)
#                     doc.add_paragraph("This is content for subheading: " + subheading)
    
#     # Save the document
#     doc.save("generated_book.docx")

# # Example usage with a JSON structure (replace with actual structure)
# save_to_word(book_structure['title'], book_structure['chapters'])


In [None]:
# for chapter in data['chapters']:
#     print(chapter['chapter_title'])
#     chapter_data = generate_chapter_content(chapter['chapter_title'], chapter['headings'])
    

In [8]:
import pypandoc

# Function to convert markdown to a .docx file
def markdown_to_docx(markdown_content, output_file):
    output = pypandoc.convert_text(markdown_content, 'docx', format='md', outputfile=output_file)
    if output == 0:
        print(f"File '{output_file}' created successfully!")
    else:
        print(f"Failed to create file {output} '{output_file}'.")

# # Example usage
# chapter_content = """
# # Chapter 1: Introduction

# ## What is AI?

# Artificial Intelligence (AI) is the simulation of human intelligence in machines.

# ## History of AI

# AI has been around since the mid-20th century, with the development of early neural networks and expert systems.
# <div style="page-break-before:always">&nbsp;</div>
# <p></p>
# ### Types of AI

# - Narrow AI
# - General AI
# - Superintelligence
# """

# # Save the markdown content as a .docx file
# markdown_to_docx(chapter_content, "Chapter_1_Introduction.docx")


In [9]:
import subprocess

def append_markdown_to_docx(large_docx_path, markdown_path, output_path):
    # Step 1: Convert markdown to docx using Pandoc
    temp_docx = "temp_small_file.docx"
    subprocess.run(['pandoc', markdown_path, '-o', temp_docx])
    
    # Step 2: Append the converted docx to the large docx
    subprocess.run(['pandoc', large_docx_path, temp_docx, '-o', output_path])

# Example usage
# append_markdown_to_docx('large_file.docx', 'small_file.md', 'merged_output.docx')


In [10]:
import subprocess
import tempfile
import os

def append_markdown_to_docx(large_docx_path, markdown_content, output_docx_path):
    # Step 1: Create a temporary markdown file from in-memory content
    with tempfile.NamedTemporaryFile(delete=False, suffix=".md") as temp_md_file:
        temp_md_file.write(markdown_content.encode('utf-8'))
        temp_md_file.flush()  # Make sure the content is written to disk
        temp_md_file_path = temp_md_file.name
    
    # Step 2: Convert the temporary markdown file to a temporary docx file using Pandoc
    temp_docx_file_path = tempfile.mktemp(suffix=".docx")
    subprocess.run(['pandoc', temp_md_file_path, '-o', temp_docx_file_path])
    
    # Step 3: Append the converted docx to the large docx file
    subprocess.run(['pandoc', large_docx_path, temp_docx_file_path, '-o', output_docx_path])

    # Step 4: Clean up the temporary files
    os.remove(temp_md_file_path)
    os.remove(temp_docx_file_path)

# Example usage
large_docx_path = 'Chapter_1_Introduction.docx'  # Path to the large docx file
markdown_content = "# Chapter 2\nThis is the content of chapter 2 in markdown format."  # Your in-memory markdown
output_docx_path = 'merged_output.docx'  # Output docx file after merging

append_markdown_to_docx(large_docx_path, markdown_content, output_docx_path)


In [11]:
def generate_doc_file(file_path,content):
    if os.path.exists(file_path):
        append_markdown_to_docx(file_path, content, file_path)
    else:
        markdown_to_docx(content, file_path)

In [12]:
# generate_doc_file("1.docx",markdown_content)

In [22]:
data

{'title': ['Understanding Natural Language Processing',
  'Techniques and Applications of NLP',
  'Advanced Methods in NLP',
  'Real-world Implementations of NLP'],
 'chapters': [{'chapter_title': 'Introduction to NLP',
   'headings': [{'heading': 'What is NLP?',
     'subheadings': ['Definition', 'Purpose']},
    {'heading': 'History of NLP',
     'subheadings': ['Early Developments', 'Current Trends']}]},
  {'chapter_title': 'Text Preprocessing',
   'headings': [{'heading': 'Tokenization',
     'subheadings': ['Types of Tokenization', 'Common Challenges']},
    {'heading': 'Stopword Removal',
     'subheadings': ['Why Remove Stopwords?', 'Methods and Tools']}]},
  {'chapter_title': 'Sentiment Analysis',
   'headings': [{'heading': 'Basic Sentiment Analysis Techniques',
     'subheadings': ['Rule-based Methods', 'Machine Learning Approaches']},
    {'heading': 'Advanced Sentiment Analysis Models',
     'subheadings': ['Deep Learning Architectures', 'Ensemble Methods']}]},
  {'chapter_

In [23]:
# chapter_brief_template = """
# Suppose you are a book author and you are writing a book. You have a chapter in your book that you want to write a brief about. You have the following information:
# Book context: {context}
# Book genre: {genre}

# Chapter: {chapter}
# with these headings: {headings}

# return the content in markdown format for the chapter chapter introduction and basic agenda about what is going to be covered in the chapter. Keep it in mind that headings and sub headings will be covered in the next steps.
# """

chapter_brief_template = """
Imagine you are an author crafting a book. You need to write a compelling introduction for a specific chapter, emphasizing its overall theme and significance. You have the following details:

Book Context: {context}
Book Genre: {genre}

Chapter Title: {chapter}
Headings to be Covered: {headings}

Please provide a markdown-formatted introduction that outlines the main ideas and themes of the chapter. The introduction should capture the essence of what will be discussed, including relevant examples or a narrative, while avoiding any specific mention of the headings or subheadings. Focus on setting the stage for the reader, highlighting the chapter's importance and intriguing aspects without breaking it down into bullet points, as those will be addressed later in detail.

"""

chapter_prompt = PromptTemplate(input_variables=["context","genre","chapter","headings"], template=chapter_brief_template)

t = chapter_prompt.format(context=BOOK_CONTEXT, genre=GENRE, chapter=data['chapters'][0]['chapter_title'], headings=data['chapters'][0]['headings'],)
print(t)


Imagine you are an author crafting a book. You need to write a compelling introduction for a specific chapter, emphasizing its overall theme and significance. You have the following details:

Book Context: Techniques used in NLP
Book Genre: Artificial Intelligence

Chapter Title: Introduction to NLP
Headings to be Covered: [{'heading': 'What is NLP?', 'subheadings': ['Definition', 'Purpose']}, {'heading': 'History of NLP', 'subheadings': ['Early Developments', 'Current Trends']}]

Please provide a markdown-formatted introduction that outlines the main ideas and themes of the chapter. The introduction should capture the essence of what will be discussed, including relevant examples or a narrative, while avoiding any specific mention of the headings or subheadings. Focus on setting the stage for the reader, highlighting the chapter's importance and intriguing aspects without breaking it down into bullet points, as those will be addressed later in detail.




In [24]:
res = llm(t)

In [25]:
print(res)

**Unlocking the Power of Human Language: The Foundations of NLP**

As artificial intelligence continues to revolutionize industries and transform the way we interact with technology, one crucial aspect has emerged as a linchpin in the development of intelligent machines: Natural Language Processing (NLP). The ability to understand, interpret, and generate human language is no longer a nicety, but a necessity for creating truly conversational AI systems. In this chapter, we will delve into the heart of NLP, exploring its definition, purpose, and evolution.

Imagine a world where machines can not only process vast amounts of data but also comprehend the nuances of human communication. A world where language barriers are bridged, and machines can converse with humans in their own language. This is the promise of NLP, and it's an area that has gained significant traction in recent years, driven by advancements in machine learning, deep learning, and cognitive computing.

From its humble be

In [26]:
heading_brief_template = """
You are author crafting book. You have done chapter introduction and going to write about heading. You have following information:

Book Context: {context}
Book Genre: {genre}
Chapter Title: {chapter}
Heading: {heading}
Heading Index: {index}
Sub-Headings: {sub_headings}

Return the content in markdown format for the heading {heading} in the chapter {chapter}. The content should provide a detailed overview of the heading, including key points, examples, and any relevant information that will help readers understand the topic. Avoid going into subheadings at this stage and focus on creating a comprehensive narrative for the heading.
"""

heading_prompt = PromptTemplate(input_variables=["context","genre","chapter","heading","index","sub_headings"], template=heading_brief_template)
p = heading_prompt.format(context=BOOK_CONTEXT, genre=GENRE, chapter=data['chapters'][0]['chapter_title'], heading=data['chapters'][0]['headings'][0]['heading'], index=1, sub_headings=data['chapters'][0]['headings'][0]['subheadings'])
print(p)


You are author crafting book. You have done chapter introduction and going to write about heading. You have following information:

Book Context: Techniques used in NLP
Book Genre: Artificial Intelligence
Chapter Title: Introduction to NLP
Heading: What is NLP?
Heading Index: 1
Sub-Headings: ['Definition', 'Purpose']

Return the content in markdown format for the heading What is NLP? in the chapter Introduction to NLP. The content should provide a detailed overview of the heading, including key points, examples, and any relevant information that will help readers understand the topic. Avoid going into subheadings at this stage and focus on creating a comprehensive narrative for the heading.



In [27]:
res = llm(p)
print(res)

# What is NLP?

NLP stands for Natural Language Processing, a subfield of artificial intelligence (AI) that deals with the interaction between computers and humans in natural language. At its core, NLP aims to enable machines to understand, interpret, and generate human language.

In essence, NLP is about developing algorithms and statistical models that can process and make sense of human language, allowing computers to analyze, classify, categorize, and extract insights from vast amounts of text data. This enables machines to perform tasks such as speech recognition, sentiment analysis, machine translation, text summarization, and more.

NLP has numerous applications across various industries, including but not limited to:

* Sentiment analysis in customer service
* Speech recognition in voice assistants
* Machine translation in international business
* Text classification in spam detection

The field of NLP is rapidly evolving, with advancements in deep learning techniques, neural n

In [31]:
sub_heading_brief_template = """
you are author crafting book. You have done chapter introduction and heading. Now you are going to write about sub-heading. You have following information:

Book Context: {context}
Book Genre: {genre}
Chapter Title: {chapter}
Heading: {heading}
Sub-Heading Index: {index}

Return the content in markdown format for the sub-heading {sub_heading} under the heading {heading} in the chapter {chapter}. The content should provide a detailed overview of the sub-heading, including key points, examples, and any relevant information that will help readers understand the topic. Focus on creating a narrative that complements the heading and adds depth to the chapter's structure.
"""

sub_heading_prompt = PromptTemplate(input_variables=["context","genre","chapter","heading","sub_heading","index"], template=sub_heading_brief_template)
p = sub_heading_prompt.format(context=BOOK_CONTEXT, genre=GENRE, chapter=data['chapters'][0]['chapter_title'], heading=data['chapters'][0]['headings'][0]['heading'], sub_heading=data['chapters'][0]['headings'][0]['subheadings'][0], index=1)
print(p)


you are author crafting book. You have done chapter introduction and heading. Now you are going to write about sub-heading. You have following information:

Book Context: Techniques used in NLP
Book Genre: Artificial Intelligence
Chapter Title: Introduction to NLP
Heading: What is NLP?
Sub-Heading Index: 1

Return the content in markdown format for the sub-heading Definition under the heading What is NLP? in the chapter Introduction to NLP. The content should provide a detailed overview of the sub-heading, including key points, examples, and any relevant information that will help readers understand the topic. Focus on creating a narrative that complements the heading and adds depth to the chapter's structure.



In [32]:
res = llm(p)

In [33]:
print(res)

**What is NLP?**

### Definition
#### **1. Understanding the Foundations of NLP**

NLP (Natural Language Processing) is a subfield of artificial intelligence (AI) that deals with the interaction between computers and humans in natural language. It encompasses a range of techniques and methods used to enable computers to understand, interpret, and generate human language.

At its core, NLP aims to bridge the gap between human communication and machine processing, allowing computers to comprehend and respond to human input in a more effective and intuitive manner. This is achieved through a combination of natural language understanding (NLU), speech recognition, text analysis, and machine learning algorithms.

To illustrate this concept, consider a simple example: imagine a virtual assistant like Siri or Alexa. When you ask Siri a question or provide feedback, it uses NLP to understand your voice, intent, and context, allowing it to respond accordingly. This process involves multiple sta