<img src="https://notebooklm.google.com/_/static/branding/v5/light_mode/notebook-logo.svg" alt="image" width="30%">

#Augument NotebookLM with data about Norwegian Agriculture

We retrieve [data about Norwegian agriculture](https://www.sciencedirect.com/science/article/pii/S2352340925000587) and convert it into PDF files of suitable size to use with NotebookLM. Then we add the files into NotebookLM so you can test it - ask questions about Norwegian agriculture

<div style="display: flex; justify-content: space-between; align-items: flex-start;">
    <div style="text-align: left;">
        <p style="color:#FFD700; font-size: 15px; font-weight: bold; margin-bottom: 1px; text-align: left;">Published on  September 16, 2025</p>
        <h4 style="color:#4B0082; font-weight: bold; text-align: left; margin-top: 6px;">Author: Olena Bugaiova</h4>
        <p style="font-size: 17px; line-height: 1.7; color: #333; text-align: center; margin-top: 20px;"></p>
        <a href="https://www.linkedin.com/in/olenabugaiova/" target="_blank" style="display: inline-block; background-color: #003f88; color: #fff; text-decoration: none; padding: 5px 10px; border-radius: 10px; margin: 15px;">LinkedIn</a>
        <a href="https://github.com/OlenaBugaiova" target="_blank" style="display: inline-block; background-color: transparent; color: #059c99; text-decoration: none; padding: 5px 10px; border-radius: 10px; margin: 15px; border: 2px solid #007bff;">GitHub</a>
        <a href="https://leetcode.com/u/olenabugaiova/" target="_blank" style="display: inline-block; background-color: #ff0054; color: #fff; text-decoration: none; padding: 5px 10px; border-radius: 10px; margin: 15px;">LeetCode</a>
        <a href="https://www.kaggle.com/bugaiovaolena" target="_blank" style="display: inline-block; background-color: #3a86ff; color: #fff; text-decoration: none; padding: 5px 10px; border-radius: 10px; margin: 15px;">Kaggle</a>
    </div>
</div>

# <p style="padding:15px;background-color:#fef88c;font-family:newtimeroman;font-size:100%;text-align:center;border-radius:3px;font-weight:200;border: 1px outset #77a528;">Import Libraries</p>

In [25]:
import kagglehub
bugaiovaolena_about_agriculture_in_norwegian_language_path = kagglehub.dataset_download('bugaiovaolena/about-agriculture-in-norwegian-language')

print('Data source import complete.')


Using Colab cache for faster access to the 'about-agriculture-in-norwegian-language' dataset.
Data source import complete.


In [26]:
pip install reportlab



In [27]:
import json
import kagglehub
from IPython.display import FileLink
from typing import Callable

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, PageBreak
from reportlab.lib.styles import getSampleStyleSheet, StyleSheet1
from reportlab.lib.units import inch

import warnings
warnings.filterwarnings("ignore")

# <p style="padding:15px;background-color:#fef88c;font-family:newtimeroman;font-size:100%;text-align:center;border-radius:3px;font-weight:200;border: 1px outset #77a528;">Load Data</p>

In [28]:
DIRECTORY_NAME = 'bugaiovaolena/about-agriculture-in-norwegian-language'

In [29]:
NIBIO_FILE_NAME = 'nibio_text_data.json'
PLANTEVERNLEKSIKONET_FILE_NAME = 'plantevernleksikonet_text_data.json'
NLR_FILE_NAME = 'nlr_text_data.json'

In [30]:
NLR_URLS_FILE_NAME = 'nlr urls.json'

Load text from the agricultural-related websites

In [31]:
def load_json_file_content(file_name):

    file_path = kagglehub.dataset_download(DIRECTORY_NAME, path = file_name)
    with open(file_path, 'r') as file_name:
        text_file_content_json = json.load(file_name)

    return text_file_content_json

In [32]:
nibio_text_file_content_json = load_json_file_content(NIBIO_FILE_NAME)
plantevernleksikonet_text_file_content_json = load_json_file_content(PLANTEVERNLEKSIKONET_FILE_NAME)
nlr_text_file_content_json = load_json_file_content(NLR_FILE_NAME)

nlr_urls_file_content_json = load_json_file_content(NLR_URLS_FILE_NAME)

Using Colab cache for faster access to the 'about-agriculture-in-norwegian-language' dataset.
Using Colab cache for faster access to the 'about-agriculture-in-norwegian-language' dataset.
Using Colab cache for faster access to the 'about-agriculture-in-norwegian-language' dataset.
Using Colab cache for faster access to the 'about-agriculture-in-norwegian-language' dataset.


# <p style="padding:15px;background-color:#fef88c;font-family:newtimeroman;font-size:100%;text-align:center;border-radius:3px;font-weight:200;border: 1px outset #77a528;">Create PDF files</p>

We generate PDF files from our text data for adding them to NotebookLM later

In [33]:
Content_Generating_Callable = Callable[[list, list, StyleSheet1], [list, list]]

def create_pdf_file(output_file_name: str, sections: list, content_generating_function: Content_Generating_Callable):

    doc = SimpleDocTemplate(output_file_name, pagesize=letter)
    styles = getSampleStyleSheet()

    table_of_contents = []
    table_of_contents.append(Paragraph('Table of Contents<br/><br/>', styles['h1']))
    table_of_contents, contents = content_generating_function(sections, table_of_contents, styles)

    story = table_of_contents
    story.append(PageBreak())
    story.extend(contents)

    doc.build(story)
    print(output_file_name, 'generated successfully')

**Create NIBIO PDF file**


> Text data, collected from Nibio.no, has the following hierarchical structure:
>
> [{"title": title of an article, "text": content of an article, "children": [a list of more narrow articles under this title with the same structure as above]}]

In [34]:
def add_nibio_file_content(sections: list, table_of_contents: list, styles: StyleSheet1) -> (list, list):

    toc_header_level = 0
    contents = []
    def prepare_sections_for_print(sections: list, toc_header_level: int, including_sections_numbers: str):

        toc_header_level += 1
        section_nummer = 1

        while len(sections) > 0:

            section = sections.pop(0)

            title = section['title']
            article = section['text']

            toc_paragraph_style = styles['h' + str(toc_header_level)]
            toc_paragraph_style.leftIndent = 0.5 * toc_header_level * inch
            toc_title = including_sections_numbers + str(section_nummer) + '. ' + title
            table_of_contents.append(Paragraph(toc_title, toc_paragraph_style))

            contents.append(Paragraph(toc_title, styles['h1']))
            contents.append(Paragraph('<br/>', styles['h1']))

            for paragraph in article.split('\n '):
                contents.append(Paragraph(paragraph, styles['Normal']))

            contents.append(Paragraph('<br/><br/>', styles['h1']))

            if 'children' in section:
                contents.append(PageBreak())
                including_sections_numbers_next_level = including_sections_numbers + str(section_nummer) + '. '
                prepare_sections_for_print(section['children'], toc_header_level, including_sections_numbers_next_level)

            section_nummer += 1

    prepare_sections_for_print(sections, toc_header_level, '')

    return table_of_contents, contents

**Create NLR PDF file**


> Text data, collected from nlr.no, has the following structure:{title: content of an article}



> nlr_urls_file is the file for web scraping text data from nlr.no. It helps to create URLs in the following form: https://www.nlr.no/fagartikler/kategori/region/title. It has the following structure: { url prefix: [corresponding titles]}. URL prefix contains information about each category and region encoded in the format: https://www.nlr.no/fagartikler/kategori/region. Corresponding titles is a set of titles under a selected category and region.

In [35]:
URL_PREFIX = 'https://www.nlr.no/fagartikler/'
nlr_topic_regions_map = {}

for url, _ in nlr_urls_file_content_json.items():

    url_parts = url.split('/')

    topic = url_parts[-2]
    region = url_parts[-1]

    if topic not in nlr_topic_regions_map:
        nlr_topic_regions_map[topic] = []

    nlr_topic_regions_map[topic].append(region)

In [36]:
def add_nlr_file_content(topics: list, table_of_contents: list, styles: StyleSheet1) -> (list, list):

    contents = []
    topic_count = 1

    for topic in topics:

        # adding topic
        toc_paragraph_style = styles['h2']
        toc_paragraph_style.leftIndent = 0

        toc_title_topic = str(topic_count) + '. ' + topic.capitalize()
        table_of_contents.append(Paragraph(toc_title_topic, toc_paragraph_style))
        contents.append(Paragraph(toc_title_topic, styles['h1']))

        regions = nlr_topic_regions_map[topic]
        region_count = 1

        for region in regions:

            # adding region
            toc_paragraph_style = styles['h3']
            toc_paragraph_style.leftIndent = 0.5 * inch

            toc_title_region = str(topic_count) + '. ' + str(region_count) + '. ' + region.capitalize()
            table_of_contents.append(Paragraph(toc_title_region, toc_paragraph_style))
            contents.append(Paragraph(toc_title_region, styles['h1']))

            url = URL_PREFIX + topic + '/' + region
            titles = nlr_urls_file_content_json[url]
            title_count = 1

            for title in titles:

                # adding title
                if title and title in nlr_text_file_content_json and title != 'Økologisk sortsprøvning':

                    toc_paragraph_style = styles['h4']
                    toc_paragraph_style.leftIndent = 1.0 * inch

                    toc_title_count = str(topic_count) + '. ' + str(region_count) + '. ' + str(title_count) + '. '
                    toc_title = toc_title_count + title.lower().capitalize()
                    table_of_contents.append(Paragraph(toc_title, toc_paragraph_style))

                    contents.append(Paragraph(toc_title, styles['h2']))
                    contents.append(Paragraph('<br/>', styles['h1']))

                    # adding article
                    article = nlr_text_file_content_json[title]

                    for paragraph in article.split('\n '):
                        contents.append(Paragraph(paragraph, styles['Normal']))

                    contents.append(Paragraph('<br/><br/>', styles['h1']))
                    title_count += 1
            region_count += 1
        topic_count += 1

    return table_of_contents, contents

**Create plantevernleksikonet PDF file**


> Text data, collected from Plantevernleksikonet.no, has the following structure:
>
> {title: {"page_number": number in a URL of an article, "headers": a Latin name and a group found under the title of an article, "summary": a brief description of an article that can be found at the top of a webpage, "content": the main text of an article} }
>
> where the URL of an article is in the following form https://www.plantevernleksikonet.no/l/oppslag/page_number/


In [37]:
def add_plantevernleksikonet_file_content(sections: list, table_of_contents: list, styles: StyleSheet1) -> (list, list):

    contents = []
    title_count = 1

    for title in sections:

        if title:
            article_info = plantevernleksikonet_text_file_content_json[title]

            latin_name_and_group = article_info['headers']
            latin_name, _, group = latin_name_and_group.rpartition(' ')
            article = article_info['summary'] + '\n' + article_info['content']

            # adding title
            toc_paragraph_style = styles['h3']
            toc_paragraph_style.leftIndent = 0

            toc_title = str(title_count) + '. ' + title.lower().capitalize() + ' (' + group.lower().capitalize() + ')'
            contents_title = str(title_count) + '. ' + title.lower().capitalize()

            table_of_contents.append(Paragraph(toc_title, toc_paragraph_style))
            contents.append(Paragraph(contents_title, styles['h1']))

            # adding headers
            contents.append(Paragraph(latin_name, styles['h2']))
            contents.append(Paragraph(group.lower().capitalize(), styles['h3']))
            contents.append(Paragraph('<br/>', styles['h3']))

            # adding article
            for paragraph in article.split('\n '):
                contents.append(Paragraph(paragraph, styles['Normal']))

            contents.append(Paragraph('<br/><br/>', styles['h1']))
            title_count += 1

    return table_of_contents, contents

Generate PDF files

In [38]:
output_files_info_with_partitioning_per_website = {
    'NIBIO': {
        'sections': nibio_text_file_content_json.copy(), 'partitioning': [3]
    },
    'NLR': {
        'sections': list(nlr_topic_regions_map.keys()), 'partitioning': [4, 10]
    },
    'Plantevernleksikonet': {
        'sections': list(plantevernleksikonet_text_file_content_json.keys()), 'partitioning': [225, 544, 850]
    }
}

content_generating_functions = {
    'NIBIO': add_nibio_file_content,
    'NLR': add_nlr_file_content,
    'Plantevernleksikonet': add_plantevernleksikonet_file_content
}

In [39]:
output_files = []
for website_name, files_info in output_files_info_with_partitioning_per_website.items():

    output_file_name = website_name
    content_generating_function = content_generating_functions[output_file_name]

    sections = files_info['sections']
    files_info['partitioning'].append(len(sections))

    partition_start = 0
    for i, partition_end in enumerate(files_info['partitioning']):

        partition_file_name = output_file_name + str(i + 1) + '.pdf'
        create_pdf_file(
            partition_file_name, sections[partition_start: partition_end], content_generating_function
        )

        output_files.append(partition_file_name)
        partition_start = partition_end

NIBIO1.pdf generated successfully
NIBIO2.pdf generated successfully
NLR1.pdf generated successfully
NLR2.pdf generated successfully
NLR3.pdf generated successfully
Plantevernleksikonet1.pdf generated successfully
Plantevernleksikonet2.pdf generated successfully
Plantevernleksikonet3.pdf generated successfully
Plantevernleksikonet4.pdf generated successfully


We have added these files to the NotebookLM, and it is now available for [chatting about Norwegian agriculture in the Norwegian language](https://notebooklm.google.com/notebook/ac22921f-e4a1-4444-88b4-49ab23c9d387).