# Using Jupyter Notebooks
:label:`sec_jupyter`


This section describes how to edit and run the code
in each section of this book
using the Jupyter Notebook. Make sure you have
installed Jupyter and downloaded the
code as described in
:ref:`chap_installation`.
If you want to know more about Jupyter see the excellent tutorial in
their [documentation](https://jupyter.readthedocs.io/en/latest/).


## Editing and Running the Code Locally

Suppose that the local path of the book's code is `xx/yy/d2l-en/`. Use the shell to change the directory to this path (`cd xx/yy/d2l-en`) and run the command `jupyter notebook`. If your browser does not do this automatically, open http://localhost:8888 and you will see the interface of Jupyter and all the folders containing the code of the book, as shown in :numref:`fig_jupyter00`.

![The folders containing the code of this book.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter00.png?raw=1)
:width:`600px`
:label:`fig_jupyter00`


You can access the notebook files by clicking on the folder displayed on the webpage.
They usually have the suffix ".ipynb".
For the sake of brevity, we create a temporary "test.ipynb" file.
The content displayed after you click it is
shown in :numref:`fig_jupyter01`.
This notebook includes a markdown cell and a code cell. The content in the markdown cell includes "This Is a Title" and "This is text.".
The code cell contains two lines of Python code.

![Markdown and code cells in the "text.ipynb" file.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter01.png?raw=1)
:width:`600px`
:label:`fig_jupyter01`


Double click on the markdown cell to enter edit mode.
Add a new text string "Hello world." at the end of the cell, as shown in :numref:`fig_jupyter02`.

![Edit the markdown cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter02.png?raw=1)
:width:`600px`
:label:`fig_jupyter02`


As demonstrated in :numref:`fig_jupyter03`,
click "Cell" $\rightarrow$ "Run Cells" in the menu bar to run the edited cell.

![Run the cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter03.png?raw=1)
:width:`600px`
:label:`fig_jupyter03`

After running, the markdown cell is shown in :numref:`fig_jupyter04`.

![The markdown cell after running.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter04.png?raw=1)
:width:`600px`
:label:`fig_jupyter04`


Next, click on the code cell. Multiply the elements by 2 after the last line of code, as shown in :numref:`fig_jupyter05`.

![Edit the code cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter05.png?raw=1)
:width:`600px`
:label:`fig_jupyter05`


You can also run the cell with a shortcut ("Ctrl + Enter" by default) and obtain the output result from :numref:`fig_jupyter06`.

![Run the code cell to obtain the output.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter06.png?raw=1)
:width:`600px`
:label:`fig_jupyter06`


When a notebook contains more cells, we can click "Kernel" $\rightarrow$ "Restart & Run All" in the menu bar to run all the cells in the entire notebook. By clicking "Help" $\rightarrow$ "Edit Keyboard Shortcuts" in the menu bar, you can edit the shortcuts according to your preferences.

## Advanced Options

Beyond local editing two things are quite important: editing the notebooks in the markdown format and running Jupyter remotely.
The latter matters when we want to run the code on a faster server.
The former matters since Jupyter's native ipynb format stores a lot of auxiliary data that is
irrelevant to the content,
mostly related to how and where the code is run.
This is confusing for Git, making
reviewing contributions very difficult.
Fortunately there is an alternative---native editing in the markdown format.

### Markdown Files in Jupyter

If you wish to contribute to the content of this book, you need to modify the
source file (md file, not ipynb file) on GitHub.
Using the notedown plugin we
can modify notebooks in the md format directly in Jupyter.


First, install the notedown plugin, run the Jupyter Notebook, and load the plugin:

```
pip install d2l-notedown  # You may need to uninstall the original notedown.
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'
```

You may also turn on the notedown plugin by default whenever you run the Jupyter Notebook.
First, generate a Jupyter Notebook configuration file (if it has already been generated, you can skip this step).

```
jupyter notebook --generate-config
```

Then, add the following line to the end of the Jupyter Notebook configuration file (for Linux or macOS, usually in the path `~/.jupyter/jupyter_notebook_config.py`):

```
c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
```

After that, you only need to run the `jupyter notebook` command to turn on the notedown plugin by default.

### Running Jupyter Notebooks on a Remote Server

Sometimes, you may want to run Jupyter notebooks on a remote server and access it through a browser on your local computer. If Linux or macOS is installed on your local machine (Windows can also support this function through third-party software such as PuTTY), you can use port forwarding:

```
ssh myserver -L 8888:localhost:8888
```

The above string `myserver` is the address of the remote server.
Then we can use http://localhost:8888 to access the remote server `myserver` that runs Jupyter notebooks. We will detail on how to run Jupyter notebooks on AWS instances
later in this appendix.

### Timing

We can use the `ExecuteTime` plugin to time the execution of each code cell in Jupyter notebooks.
Use the following commands to install the plugin:

```
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
```

## Summary

* Using the Jupyter Notebook tool, we can edit, run, and contribute to each section of the book.
* We can run Jupyter notebooks on remote servers using port forwarding.


## Exercises

1. Edit and run the code in this book with the Jupyter Notebook on your local machine.
1. Edit and run the code in this book with the Jupyter Notebook *remotely* via port forwarding.
1. Compare the running time of the operations $\mathbf{A}^\top \mathbf{B}$ and $\mathbf{A} \mathbf{B}$ for two square matrices in $\mathbb{R}^{1024 \times 1024}$. Which one is faster?


[Discussions](https://discuss.d2l.ai/t/421)


In [1]:
pip install langchain



In [2]:
pip install langchain_community


Collecting langchain_community
  Downloading langchain_community-0.3.17-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core<1.0.0,>=0.3.34 (from langchain_community)
  Downloading langchain_core-0.3.35-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain<1.0.0,>=0.3.18 (from langchain_community)
  Downloading langchain-0.3.18-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.7.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-

In [13]:
#this is the way , how can we load the csv file using CSV loader
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='/content/iris.csv', metadata_columns=['species'], csv_args={"delimiter":","})
data = loader.load()
len(data)

150

In [17]:
data

[Document(metadata={'source': '/content/iris.csv', 'row': 0, 'species': 'setosa'}, page_content='sepal_length: 5.1\nsepal_width: 3.5\npetal_length: 1.4\npetal_width: 0.2'),
 Document(metadata={'source': '/content/iris.csv', 'row': 1, 'species': 'setosa'}, page_content='sepal_length: 4.9\nsepal_width: 3.0\npetal_length: 1.4\npetal_width: 0.2'),
 Document(metadata={'source': '/content/iris.csv', 'row': 2, 'species': 'setosa'}, page_content='sepal_length: 4.7\nsepal_width: 3.2\npetal_length: 1.3\npetal_width: 0.2'),
 Document(metadata={'source': '/content/iris.csv', 'row': 3, 'species': 'setosa'}, page_content='sepal_length: 4.6\nsepal_width: 3.1\npetal_length: 1.5\npetal_width: 0.2'),
 Document(metadata={'source': '/content/iris.csv', 'row': 4, 'species': 'setosa'}, page_content='sepal_length: 5.0\nsepal_width: 3.6\npetal_length: 1.4\npetal_width: 0.2'),
 Document(metadata={'source': '/content/iris.csv', 'row': 5, 'species': 'setosa'}, page_content='sepal_length: 5.4\nsepal_width: 3.9\np

In [18]:
data[0].metadata

{'source': '/content/iris.csv', 'row': 0, 'species': 'setosa'}

In [19]:
data[0].page_content

'sepal_length: 5.1\nsepal_width: 3.5\npetal_length: 1.4\npetal_width: 0.2'

In [22]:
for record in data[:1]:
  print(record)

page_content='sepal_length: 5.1
sepal_width: 3.5
petal_length: 1.4
petal_width: 0.2' metadata={'source': '/content/iris.csv', 'row': 0, 'species': 'setosa'}


In [2]:
#This is the way how can we use Pdf Loader using langchain
from langchain_community.document_loaders import UnstructuredPDFLoader

loader = UnstructuredPDFLoader('/content/resume.pdf', mode='elements', strategy='auto')

data = loader.load()
len(data)

23

In [3]:
data

[Document(metadata={'source': '/content/resume.pdf', 'coordinates': {'points': ((56.69291338582678, 42.41891338582673), (56.69291338582678, 60.41891338582673), (123.70691338582678, 60.41891338582673), (123.70691338582678, 42.41891338582673)), 'system': 'PixelSpace', 'layout_width': 595.28, 'layout_height': 841.89}, 'file_directory': '/content', 'filename': 'resume.pdf', 'languages': ['eng'], 'last_modified': '2025-02-14T08:09:32', 'page_number': 1, 'filetype': 'application/pdf', 'category': 'Header', 'element_id': 'ee60bf6d8d31be4a4323851ce8dc35f7'}, page_content='Resume'),
 Document(metadata={'source': '/content/resume.pdf', 'coordinates': {'points': ((56.69291338582678, 70.76537007874015), (56.69291338582678, 88.76537007874015), (278.7949133858268, 88.76537007874015), (278.7949133858268, 70.76537007874015)), 'system': 'PixelSpace', 'layout_width': 595.28, 'layout_height': 841.89}, 'file_directory': '/content', 'filename': 'resume.pdf', 'languages': ['eng'], 'last_modified': '2025-02-

In [8]:
# pip install unstructured/
# !pip install PDFMiner


In [9]:
# pip install "unstructured[pdf]"


In [12]:
#I am going to implement the manual way to load the pdf file
# !pip install pypdf

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('/content/resume.pdf')
pages = []
async for page in loader.alazy_load():
    pages.append(page)

In [13]:
print(f"{pages[0].metadata}\n")

{'producer': 'jsPDF 2.5.2', 'creator': 'PyPDF', 'creationdate': '2025-01-30T15:27:33+05:30', 'source': '/content/resume.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}



In [17]:
print(pages[0].page_content)

Resume
Name: Surbhi Singh Baghel
Position: Developer
Experience: 1 year Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Technologies: react-The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32., mongoDB-The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32., NextJs-The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32., NodeJs-The f

In [18]:
print(f"{pages[0].metadata}\n")
print(pages[0].page_content)

{'producer': 'jsPDF 2.5.2', 'creator': 'PyPDF', 'creationdate': '2025-01-30T15:27:33+05:30', 'source': '/content/resume.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}

Resume
Name: Surbhi Singh Baghel
Position: Developer
Experience: 1 year Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
Technologies: react-The first line of Lorem Ipsum, "Lorem ipsum dolor sit amet..", comes from a line in section 1.10.32., mongoDB-The first line of Lorem Ipsum, "Lorem ipsum dolo

In [23]:
#now i am going to learn how to load the json file
from langchain_community.document_loaders import JSONLoader
from pathlib import Path
from pprint import pprint
import json

file_path = '/content/AllDestination.json'
data = json.loads(Path(file_path).read_text())


In [30]:
pprint(data)

Pretty printing has been turned OFF


In [25]:
data[0]

{'id': 1,
 'name': 'Nubra Valley in Ladakh',
 'description': 'The barren rugged view of this valley, which lies between Kashmir and Tibet, will take your breath away. You can feast your eyes on charming monasteries, the Nubra and Shyok rivers, Bactrian camels, and sand dunes. A photographer’s delight, the valley is home to people of the Balti culture in Turtuk.',
 'imageUrl': ['https://media.istockphoto.com/id/619727964/photo/double-hump-camel-walking-in-the-desert.jpg?s=612x612&w=0&k=20&c=HrruNt1Jk-ogdVOpKfb2hRa3D8BUcYw9XwG9ERAzU-Y=',
  'https://media.istockphoto.com/id/528198351/photo/the-buddha-maitreya-statue-in-nubra-valley.jpg?s=612x612&w=0&k=20&c=A4HhaOfyUgdD7Cd6IigQ-8DsTOmFwmS_03kxpOJQCuY=',
  'https://media.istockphoto.com/id/1383220272/photo/landscape-sandstone-mountains-with-river-and-green-valley-in-himalayas-nubra-valley-jammu-and.jpg?s=1024x1024&w=is&k=20&c=tJjRDT7h7yLi-QL03W4KVxrLog6PG6hey5UBiIbAvSU=',
  'https://media.istockphoto.com/id/476448777/photo/buddhist-monks-ma

In [43]:
#now let see using a JSON loader, how to upload a json file
loader = JSONLoader(file_path=file_path, jq_schema = '.[].description', text_content=True)

data = loader.load()

In [38]:
# pip install jq

In [44]:
pprint(data)

[Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 1}, page_content='The barren rugged view of this valley, which lies between Kashmir and Tibet, will take your breath away. You can feast your eyes on charming monasteries, the Nubra and Shyok rivers, Bactrian camels, and sand dunes. A photographer’s delight, the valley is home to people of the Balti culture in Turtuk.'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 2}, page_content='Known as Mini Switzerland, Khajjiar offers lush meadows, snow-capped Himalayas, and dense forests. Adventure activities like trekking, zorbing, jungle safari, and paragliding are popular here.'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 3}, page_content='A UNESCO World Heritage site known for its vibrant flowers, attracting trekkers and photographers.'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 4}, page_content='The serene Dal Lake, also known a

In [46]:
loader = JSONLoader(file_path=file_path, jq_schema = '.[].name', text_content=True)

data = loader.load()

In [47]:
pprint(data)

[Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 1}, page_content='Nubra Valley in Ladakh'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 2}, page_content='Khajjiar in Himachal Pradesh'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 3}, page_content='Valley of Flowers in Uttarakhand'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 4}, page_content='Dal Lake in Srinagar'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 5}, page_content='Munnar in Kerala'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 6}, page_content='Dudhsagar Falls in Goa'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 7}, page_content='Yumthang Valley in Sikkim'),
 Document(metadata={'source': '/content/AllDestination.json', 'seq_num': 8}, page_content='Tawang in Arunachal Pradesh'),
 Document(metadata={'source': '/content/AllDesti

In [1]:
# loader = JSONLoader(file_path=file_path, jq_schema = '.', content_key='', text_content=True)

# data = loader.load()

Now I am going to learn how to split the Data of the Loaded
Doc.

In [2]:
%pip install -qU langchain-text-splitters

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/413.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━[0m [32m327.7/413.2 kB[0m [31m10.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.2/413.2 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25h

How to: 1. R
ecursively split text

In [29]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

with open('/content/lang_data.txt') as f:
  data = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 100,
    chunk_overlap = 30,
    length_function = len,
    is_separator_regex=False
)

texts = text_splitter.create_documents([data])
# texts = text_splitter.split_text([data])
print("here is my index 1 data:\n", texts[0])
print("here is my index 2 data:\n", texts[1])
print("here is my index 3 data:\n", texts[2])
print("here is my index 4 data:\n", texts[3])
print("here is my index 5 data:\n", texts[4])
print(data)

here is my index 1 data:
 page_content='This is a excellent start to the film career of Mickey Rooney. His talents here shows that a long'
here is my index 2 data:
 page_content='here shows that a long career is ahead for him. The car and truck chase is exciting for the 1937'
here is my index 3 data:
 page_content='is exciting for the 1937 era. This start of the Andy Hardy series is an American treasure in my'
here is my index 4 data:
 page_content='is an American treasure in my book. Spring Byington performance is excellent as usual. Please Mr'
here is my index 5 data:
 page_content='excellent as usual. Please Mr Rooney or owners of the film rights, take a chance and get this'
This is a excellent start to the film career of Mickey Rooney. His talents here shows that a long career is ahead for him. The car and truck chase is exciting for the 1937 era. This start of the Andy Hardy series is an American treasure in my book. Spring Byington performance is excellent as usual. Please Mr Roo

In [19]:
text_splitter.split_text(data)[:]

['This is a excellent start to the film career of Mickey Rooney. His talents here shows that a long',
 'here shows that a long career is ahead for him. The car and truck chase is exciting for the 1937',
 'is exciting for the 1937 era. This start of the Andy Hardy series is an American treasure in my',
 'is an American treasure in my book. Spring Byington performance is excellent as usual. Please Mr',
 'excellent as usual. Please Mr Rooney or owners of the film rights, take a chance and get this',
 'take a chance and get this produced on DVD. I think it would be a winner.']

In [20]:
text_splitter.split_text(data)[:3]

['This is a excellent start to the film career of Mickey Rooney. His talents here shows that a long',
 'here shows that a long career is ahead for him. The car and truck chase is exciting for the 1937',
 'is exciting for the 1937 era. This start of the Andy Hardy series is an American treasure in my']

In [21]:
text_splitter.split_text(data)[:2]

['This is a excellent start to the film career of Mickey Rooney. His talents here shows that a long',
 'here shows that a long career is ahead for him. The car and truck chase is exciting for the 1937']

In [23]:
text_splitter.split_text(data)[:1]

['This is a excellent start to the film career of Mickey Rooney. His talents here shows that a long']

How to: 2. split by Character

In [14]:
from langchain_text_splitters import CharacterTextSplitter

with open('/content/lang_data.txt') as f:
  data = f.read()

text_splitter = CharacterTextSplitter(
    separator=" ",
    chunk_size = 10,
    chunk_overlap = 2,
    length_function = len,
    is_separator_regex=False
)

texts = text_splitter.create_documents([data])
# texts = text_splitter.split_text([data])
print("here is my index 1 data:\n", texts)
# print("here is my index 2 data:\n", texts[1])
# print("here is my index 3 data:\n", texts[2])
# print("here is my index 4 data:\n", texts[3])
# print("here is my index 5 data:\n", texts[4])
# print(data)



here is my index 1 data:
 [Document(metadata={}, page_content='This is a'), Document(metadata={}, page_content='excellent'), Document(metadata={}, page_content='start to'), Document(metadata={}, page_content='to the'), Document(metadata={}, page_content='film'), Document(metadata={}, page_content='career of'), Document(metadata={}, page_content='of Mickey'), Document(metadata={}, page_content='Rooney.'), Document(metadata={}, page_content='His'), Document(metadata={}, page_content='talents'), Document(metadata={}, page_content='here shows'), Document(metadata={}, page_content='that a'), Document(metadata={}, page_content='a long'), Document(metadata={}, page_content='career is'), Document(metadata={}, page_content='is ahead'), Document(metadata={}, page_content='for him.'), Document(metadata={}, page_content='The car'), Document(metadata={}, page_content='and truck'), Document(metadata={}, page_content='chase is'), Document(metadata={}, page_content='exciting'), Document(metadata={}, p

In [10]:
from langchain_text_splitters import CharacterTextSplitter

text = "This is a paragraph with multiple sentences. It is a good example of how to split text by character."

# Define a smaller chunk size to force splitting
splitter = CharacterTextSplitter(
    separator=" ",  # Split by spaces
    chunk_size=10,  # Force smaller chunks
    chunk_overlap=2,  # Overlap words between chunks
    length_function=len
)

chunks = splitter.split_text(text)

# Print the resulting chunks
print(chunks)


['This is a', 'paragraph', 'with', 'multiple', 'sentences.', 'It is a', 'a good', 'example of', 'of how to', 'to split', 'text by', 'character.']
