# Using Jupyter Notebooks
:label:`sec_jupyter`


This section describes how to edit and run the code
in each section of this book
using the Jupyter Notebook. Make sure you have
installed Jupyter and downloaded the
code as described in
:ref:`chap_installation`.
If you want to know more about Jupyter see the excellent tutorial in
their [documentation](https://jupyter.readthedocs.io/en/latest/).


## Editing and Running the Code Locally

Suppose that the local path of the book's code is `xx/yy/d2l-en/`. Use the shell to change the directory to this path (`cd xx/yy/d2l-en`) and run the command `jupyter notebook`. If your browser does not do this automatically, open http://localhost:8888 and you will see the interface of Jupyter and all the folders containing the code of the book, as shown in :numref:`fig_jupyter00`.

![The folders containing the code of this book.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter00.png?raw=1)
:width:`600px`
:label:`fig_jupyter00`


You can access the notebook files by clicking on the folder displayed on the webpage.
They usually have the suffix ".ipynb".
For the sake of brevity, we create a temporary "test.ipynb" file.
The content displayed after you click it is
shown in :numref:`fig_jupyter01`.
This notebook includes a markdown cell and a code cell. The content in the markdown cell includes "This Is a Title" and "This is text.".
The code cell contains two lines of Python code.

![Markdown and code cells in the "text.ipynb" file.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter01.png?raw=1)
:width:`600px`
:label:`fig_jupyter01`


Double click on the markdown cell to enter edit mode.
Add a new text string "Hello world." at the end of the cell, as shown in :numref:`fig_jupyter02`.

![Edit the markdown cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter02.png?raw=1)
:width:`600px`
:label:`fig_jupyter02`


As demonstrated in :numref:`fig_jupyter03`,
click "Cell" $\rightarrow$ "Run Cells" in the menu bar to run the edited cell.

![Run the cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter03.png?raw=1)
:width:`600px`
:label:`fig_jupyter03`

After running, the markdown cell is shown in :numref:`fig_jupyter04`.

![The markdown cell after running.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter04.png?raw=1)
:width:`600px`
:label:`fig_jupyter04`


Next, click on the code cell. Multiply the elements by 2 after the last line of code, as shown in :numref:`fig_jupyter05`.

![Edit the code cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter05.png?raw=1)
:width:`600px`
:label:`fig_jupyter05`


You can also run the cell with a shortcut ("Ctrl + Enter" by default) and obtain the output result from :numref:`fig_jupyter06`.

![Run the code cell to obtain the output.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter06.png?raw=1)
:width:`600px`
:label:`fig_jupyter06`


When a notebook contains more cells, we can click "Kernel" $\rightarrow$ "Restart & Run All" in the menu bar to run all the cells in the entire notebook. By clicking "Help" $\rightarrow$ "Edit Keyboard Shortcuts" in the menu bar, you can edit the shortcuts according to your preferences.

## Advanced Options

Beyond local editing two things are quite important: editing the notebooks in the markdown format and running Jupyter remotely.
The latter matters when we want to run the code on a faster server.
The former matters since Jupyter's native ipynb format stores a lot of auxiliary data that is
irrelevant to the content,
mostly related to how and where the code is run.
This is confusing for Git, making
reviewing contributions very difficult.
Fortunately there is an alternative---native editing in the markdown format.

### Markdown Files in Jupyter

If you wish to contribute to the content of this book, you need to modify the
source file (md file, not ipynb file) on GitHub.
Using the notedown plugin we
can modify notebooks in the md format directly in Jupyter.


First, install the notedown plugin, run the Jupyter Notebook, and load the plugin:

```
pip install d2l-notedown  # You may need to uninstall the original notedown.
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'
```

You may also turn on the notedown plugin by default whenever you run the Jupyter Notebook.
First, generate a Jupyter Notebook configuration file (if it has already been generated, you can skip this step).

```
jupyter notebook --generate-config
```

Then, add the following line to the end of the Jupyter Notebook configuration file (for Linux or macOS, usually in the path `~/.jupyter/jupyter_notebook_config.py`):

```
c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
```

After that, you only need to run the `jupyter notebook` command to turn on the notedown plugin by default.

### Running Jupyter Notebooks on a Remote Server

Sometimes, you may want to run Jupyter notebooks on a remote server and access it through a browser on your local computer. If Linux or macOS is installed on your local machine (Windows can also support this function through third-party software such as PuTTY), you can use port forwarding:

```
ssh myserver -L 8888:localhost:8888
```

The above string `myserver` is the address of the remote server.
Then we can use http://localhost:8888 to access the remote server `myserver` that runs Jupyter notebooks. We will detail on how to run Jupyter notebooks on AWS instances
later in this appendix.

### Timing

We can use the `ExecuteTime` plugin to time the execution of each code cell in Jupyter notebooks.
Use the following commands to install the plugin:

```
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
```

## Summary

* Using the Jupyter Notebook tool, we can edit, run, and contribute to each section of the book.
* We can run Jupyter notebooks on remote servers using port forwarding.


## Exercises

1. Edit and run the code in this book with the Jupyter Notebook on your local machine.
1. Edit and run the code in this book with the Jupyter Notebook *remotely* via port forwarding.
1. Compare the running time of the operations $\mathbf{A}^\top \mathbf{B}$ and $\mathbf{A} \mathbf{B}$ for two square matrices in $\mathbb{R}^{1024 \times 1024}$. Which one is faster?


[Discussions](https://discuss.d2l.ai/t/421)


In [4]:
#Solution 1
numbers = [34, 1, 12, 34, 65, 23, 20]

print("The list elements are:")
for n in numbers:
    print(n)


The list elements are:
34
1
12
34
65
23
20


In [5]:
#Solution 2
jobs = {"Sam": "Plumber",
        "George": "Physical Trainer",
        "Nancy": "Accountant",
        "Maya": "Teacher",
        "Alice": "Data Analyst",
        "Bob": "Chef",
        "Ibrahim": "Electrician"}

print("The dictionary key-value pairs are:")
for name, job in jobs.items():
    print(f"{name}: {job}")


The dictionary key-value pairs are:
Sam: Plumber
George: Physical Trainer
Nancy: Accountant
Maya: Teacher
Alice: Data Analyst
Bob: Chef
Ibrahim: Electrician


In [22]:
#Solution 3

file_path = "adult-1.data"

with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
    for i, line in enumerate(f):
        if i >= 20:
            break
        print(line.strip())

39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty, Wife, Black, Female, 0, 0, 40, Cuba, <=50K
37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial, Wife, White, Female, 0, 0, 40, United-States, <=50K
49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service, Not-in-family, Black, Female, 0, 0, 16, Jamaica, <=50K
52, Self-emp-not-inc, 209642, HS-grad, 9, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 45, United-States, >50K
31, 

In [9]:
#Problem 4
import pandas as pd
file_path = "adult-1.data"
df = pd.read_csv(file_path, header=None)
print("Number of rows in the DataFrame:", df[0].count())

Number of rows in the DataFrame: 32561


In [12]:
#Problem 5
import pandas as pd

# Load dataset (no headers provided)
file_path = "adult-1.data"
df = pd.read_csv(file_path, header=None, skipinitialspace=True)

# Column mapping (from UCI dataset description)
columns = [
    "age", "workclass", "fnlwgt", "education", "education-num",
    "marital-status", "occupation", "relationship", "race",
    "sex", "capital-gain", "capital-loss", "hours-per-week",
    "native-country", "income"
]
df.columns = columns

# 1. Find min, max, avg ages
min_age = df["age"].min()
max_age = df["age"].max()
avg_age = df["age"].mean()

print(f"Min age: {min_age}, Max age: {max_age}, Avg age: {avg_age:.2f}")

# 2. Number of Master's with income >50K and <=50K
masters_df = df[df["education"] == "Masters"]
masters_high = masters_df[masters_df["income"] == ">50K"].shape[0]
masters_low = masters_df[masters_df["income"] == "<=50K"].shape[0]

print(f"Masters >50K: {masters_high}, Masters <=50K: {masters_low}")

# 3. Percentage of Master's making more vs less than 50K
total_masters = masters_high + masters_low
masters_high_pct = (masters_high / total_masters) * 100 if total_masters > 0 else 0
masters_low_pct = (masters_low / total_masters) * 100 if total_masters > 0 else 0

print(f"Masters >50K: {masters_high_pct:.2f}%, Masters <=50K: {masters_low_pct:.2f}%")

# 4. Percentage of people making >50K vs <=50K for each education level
education_income_pct = df.groupby(["education", "income"]).size().unstack(fill_value=0)
education_income_pct["Total"] = education_income_pct.sum(axis=1)
education_income_pct[">50K_pct"] = (education_income_pct[">50K"] / education_income_pct["Total"]) * 100
education_income_pct["<=50K_pct"] = (education_income_pct["<=50K"] / education_income_pct["Total"]) * 100

print("\nPercentage of income levels by education:")
print(education_income_pct[[">50K_pct", "<=50K_pct"]])

# 5. Percentage of people making >50K vs <=50K for each occupation
occupation_income_pct = df.groupby(["occupation", "income"]).size().unstack(fill_value=0)
occupation_income_pct["Total"] = occupation_income_pct.sum(axis=1)
occupation_income_pct[">50K_pct"] = (occupation_income_pct[">50K"] / occupation_income_pct["Total"]) * 100
occupation_income_pct["<=50K_pct"] = (occupation_income_pct["<=50K"] / occupation_income_pct["Total"]) * 100

print("\nPercentage of income levels by occupation:")
print(occupation_income_pct[[">50K_pct", "<=50K_pct"]])


Min age: 17, Max age: 90, Avg age: 38.58
Masters >50K: 959, Masters <=50K: 764
Masters >50K: 55.66%, Masters <=50K: 44.34%

Percentage of income levels by education:
income         >50K_pct   <=50K_pct
education                          
10th           6.645230   93.354770
11th           5.106383   94.893617
12th           7.621247   92.378753
1st-4th        3.571429   96.428571
5th-6th        4.804805   95.195195
7th-8th        6.191950   93.808050
9th            5.252918   94.747082
Assoc-acdm    24.835989   75.164011
Assoc-voc     26.121563   73.878437
Bachelors     41.475257   58.524743
Doctorate     74.092010   25.907990
HS-grad       15.950862   84.049138
Masters       55.658735   44.341265
Preschool      0.000000  100.000000
Prof-school   73.437500   26.562500
Some-college  19.023454   80.976546

Percentage of income levels by occupation:
income              >50K_pct  <=50K_pct
occupation                             
?                  10.363538  89.636462
Adm-clerical       13.

In [16]:
#Problem 6
import pandas as pd

# --- 1. Create Sample Data (or replace with your CSV path) ---
data = {
    "App": ["App1", "App2", "App3", "App4", "App5", "App6"],
    "Category": ["Business", "Education", "Business", "Games", "Education", "Games"],
    "Rating": [4.5, 4.2, 4.7, 4.8, 4.1, 4.3],
    "Type": ["Free", "Paid", "Free", "Free", "Paid", "Free"]
}

df = pd.DataFrame(data)

# --- 2. Print DataFrame ---
print("DataFrame:\n", df, "\n")

# --- 3. Count Apps Per Category ---
category_counts = df["Category"].value_counts()
print("Number of apps in each category:\n", category_counts, "\n")

# --- 4. Find Category with Most Apps ---
max_category = category_counts.idxmax()
max_count = category_counts.max()
print(f"Category with most apps: {max_category} ({max_count} apps)\n")

# --- 5. Frequency of Distinct Values in Each Non-Numeric Column ---
print("Frequency of values in each non-numeric column:\n")
for column in df.select_dtypes(exclude=['number']).columns:
    print(f"Column: {column}")
    print(df[column].value_counts(), "\n")


DataFrame:
     App   Category  Rating  Type
0  App1   Business     4.5  Free
1  App2  Education     4.2  Paid
2  App3   Business     4.7  Free
3  App4      Games     4.8  Free
4  App5  Education     4.1  Paid
5  App6      Games     4.3  Free 

Number of apps in each category:
 Category
Business     2
Education    2
Games        2
Name: count, dtype: int64 

Category with most apps: Business (2 apps)

Frequency of values in each non-numeric column:

Column: App
App
App1    1
App2    1
App3    1
App4    1
App5    1
App6    1
Name: count, dtype: int64 

Column: Category
Category
Business     2
Education    2
Games        2
Name: count, dtype: int64 

Column: Type
Type
Free    4
Paid    2
Name: count, dtype: int64 

