# Using Jupyter Notebooks
:label:`sec_jupyter`


This section describes how to edit and run the code
in each section of this book
using the Jupyter Notebook. Make sure you have
installed Jupyter and downloaded the
code as described in
:ref:`chap_installation`.
If you want to know more about Jupyter see the excellent tutorial in
their [documentation](https://jupyter.readthedocs.io/en/latest/).


## Editing and Running the Code Locally

Suppose that the local path of the book's code is `xx/yy/d2l-en/`. Use the shell to change the directory to this path (`cd xx/yy/d2l-en`) and run the command `jupyter notebook`. If your browser does not do this automatically, open http://localhost:8888 and you will see the interface of Jupyter and all the folders containing the code of the book, as shown in :numref:`fig_jupyter00`.

![The folders containing the code of this book.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter00.png?raw=1)
:width:`600px`
:label:`fig_jupyter00`


You can access the notebook files by clicking on the folder displayed on the webpage.
They usually have the suffix ".ipynb".
For the sake of brevity, we create a temporary "test.ipynb" file.
The content displayed after you click it is
shown in :numref:`fig_jupyter01`.
This notebook includes a markdown cell and a code cell. The content in the markdown cell includes "This Is a Title" and "This is text.".
The code cell contains two lines of Python code.

![Markdown and code cells in the "text.ipynb" file.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter01.png?raw=1)
:width:`600px`
:label:`fig_jupyter01`


Double click on the markdown cell to enter edit mode.
Add a new text string "Hello world." at the end of the cell, as shown in :numref:`fig_jupyter02`.

![Edit the markdown cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter02.png?raw=1)
:width:`600px`
:label:`fig_jupyter02`


As demonstrated in :numref:`fig_jupyter03`,
click "Cell" $\rightarrow$ "Run Cells" in the menu bar to run the edited cell.

![Run the cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter03.png?raw=1)
:width:`600px`
:label:`fig_jupyter03`

After running, the markdown cell is shown in :numref:`fig_jupyter04`.

![The markdown cell after running.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter04.png?raw=1)
:width:`600px`
:label:`fig_jupyter04`


Next, click on the code cell. Multiply the elements by 2 after the last line of code, as shown in :numref:`fig_jupyter05`.

![Edit the code cell.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter05.png?raw=1)
:width:`600px`
:label:`fig_jupyter05`


You can also run the cell with a shortcut ("Ctrl + Enter" by default) and obtain the output result from :numref:`fig_jupyter06`.

![Run the code cell to obtain the output.](https://github.com/d2l-ai/d2l-en-colab/blob/master/img/jupyter06.png?raw=1)
:width:`600px`
:label:`fig_jupyter06`


When a notebook contains more cells, we can click "Kernel" $\rightarrow$ "Restart & Run All" in the menu bar to run all the cells in the entire notebook. By clicking "Help" $\rightarrow$ "Edit Keyboard Shortcuts" in the menu bar, you can edit the shortcuts according to your preferences.

## Advanced Options

Beyond local editing two things are quite important: editing the notebooks in the markdown format and running Jupyter remotely.
The latter matters when we want to run the code on a faster server.
The former matters since Jupyter's native ipynb format stores a lot of auxiliary data that is
irrelevant to the content,
mostly related to how and where the code is run.
This is confusing for Git, making
reviewing contributions very difficult.
Fortunately there is an alternative---native editing in the markdown format.

### Markdown Files in Jupyter

If you wish to contribute to the content of this book, you need to modify the
source file (md file, not ipynb file) on GitHub.
Using the notedown plugin we
can modify notebooks in the md format directly in Jupyter.


First, install the notedown plugin, run the Jupyter Notebook, and load the plugin:

```
pip install d2l-notedown  # You may need to uninstall the original notedown.
jupyter notebook --NotebookApp.contents_manager_class='notedown.NotedownContentsManager'
```

You may also turn on the notedown plugin by default whenever you run the Jupyter Notebook.
First, generate a Jupyter Notebook configuration file (if it has already been generated, you can skip this step).

```
jupyter notebook --generate-config
```

Then, add the following line to the end of the Jupyter Notebook configuration file (for Linux or macOS, usually in the path `~/.jupyter/jupyter_notebook_config.py`):

```
c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'
```

After that, you only need to run the `jupyter notebook` command to turn on the notedown plugin by default.

### Running Jupyter Notebooks on a Remote Server

Sometimes, you may want to run Jupyter notebooks on a remote server and access it through a browser on your local computer. If Linux or macOS is installed on your local machine (Windows can also support this function through third-party software such as PuTTY), you can use port forwarding:

```
ssh myserver -L 8888:localhost:8888
```

The above string `myserver` is the address of the remote server.
Then we can use http://localhost:8888 to access the remote server `myserver` that runs Jupyter notebooks. We will detail on how to run Jupyter notebooks on AWS instances
later in this appendix.

### Timing

We can use the `ExecuteTime` plugin to time the execution of each code cell in Jupyter notebooks.
Use the following commands to install the plugin:

```
pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user
jupyter nbextension enable execute_time/ExecuteTime
```

## Summary

* Using the Jupyter Notebook tool, we can edit, run, and contribute to each section of the book.
* We can run Jupyter notebooks on remote servers using port forwarding.


## Exercises

1. Edit and run the code in this book with the Jupyter Notebook on your local machine.
1. Edit and run the code in this book with the Jupyter Notebook *remotely* via port forwarding.
1. Compare the running time of the operations $\mathbf{A}^\top \mathbf{B}$ and $\mathbf{A} \mathbf{B}$ for two square matrices in $\mathbb{R}^{1024 \times 1024}$. Which one is faster?


[Discussions](https://discuss.d2l.ai/t/421)


In [5]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("prasad22/healthcare-dataset")

print("Path to dataset files:", path)

Using Colab cache for faster access to the 'healthcare-dataset' dataset.
Path to dataset files: /kaggle/input/healthcare-dataset


In [2]:
import pandas as pd

In [6]:
import os

# Construct the full path to the CSV file
csv_file_path = os.path.join(path, "healthcare_dataset.csv")

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(csv_file_path)

# Display the first few rows of the DataFrame
display(df.head())

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
0,Bobby JacksOn,30,Male,B-,Cancer,2024-01-31,Matthew Smith,Sons and Miller,Blue Cross,18856.281306,328,Urgent,2024-02-02,Paracetamol,Normal
1,LesLie TErRy,62,Male,A+,Obesity,2019-08-20,Samantha Davies,Kim Inc,Medicare,33643.327287,265,Emergency,2019-08-26,Ibuprofen,Inconclusive
2,DaNnY sMitH,76,Female,A-,Obesity,2022-09-22,Tiffany Mitchell,Cook PLC,Aetna,27955.096079,205,Emergency,2022-10-07,Aspirin,Normal
3,andrEw waTtS,28,Female,O+,Diabetes,2020-11-18,Kevin Wells,"Hernandez Rogers and Vang,",Medicare,37909.78241,450,Elective,2020-12-18,Ibuprofen,Abnormal
4,adrIENNE bEll,43,Female,AB+,Cancer,2022-09-19,Kathleen Hanna,White-White,Aetna,14238.317814,458,Urgent,2022-10-09,Penicillin,Abnormal


In [7]:
import os

# List files in the downloaded directory to find the CSV file
file_list = os.listdir(path)
print("Files in the dataset directory:", file_list)

Files in the dataset directory: ['healthcare_dataset.csv']


In [8]:
df.sample(5)

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
23623,rOBert baKER,52,Female,B+,Asthma,2023-07-29,Kimberly Mcdonald,Cook Group,Medicare,43893.268434,223,Elective,2023-08-05,Ibuprofen,Inconclusive
33888,SarAh goLdEn,69,Male,O+,Diabetes,2021-02-18,Kimberly Schmitt,Wiley LLC,Aetna,13405.333426,302,Urgent,2021-02-25,Aspirin,Abnormal
1097,JasmInE luNA,27,Male,A-,Obesity,2020-04-10,Dr. William Santiago,Horton-Boone,Cigna,11023.287474,142,Urgent,2020-05-10,Paracetamol,Abnormal
23603,Sheena cAstILLo,85,Male,O-,Obesity,2020-03-20,Debbie Johnson,Gentry and Sons,Medicare,6164.706766,316,Emergency,2020-04-13,Lipitor,Inconclusive
41240,GreGoRy Ramirez,50,Female,AB+,Asthma,2021-06-26,Jackie Garcia,Smith-Henry,UnitedHealthcare,17688.485633,114,Emergency,2021-07-19,Lipitor,Inconclusive


In [9]:
df.shape

(55500, 15)

In [10]:
num_rows=df.shape[0]
num_cols=df.shape[1]
print("Number of rows and column:", num_rows, num_cols)

Number of rows and column: 55500 15


In [12]:
df[["Medical Condition"]]

Unnamed: 0,Medical Condition
0,Cancer
1,Obesity
2,Obesity
3,Diabetes
4,Cancer
...,...
55495,Asthma
55496,Obesity
55497,Hypertension
55498,Arthritis


In [14]:
selected_columns=df[["Age","Gender","Medical Condition"]]
selected_columns

Unnamed: 0,Age,Gender,Medical Condition
0,30,Male,Cancer
1,62,Male,Obesity
2,76,Female,Obesity
3,28,Female,Diabetes
4,43,Female,Cancer
...,...,...,...
55495,42,Female,Asthma
55496,61,Female,Obesity
55497,38,Female,Hypertension
55498,43,Male,Arthritis



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.



In [19]:
filtered_df =df[df["Medical Condition"]=="Cancer"]
filtered_df

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
0,Bobby JacksOn,30,Male,B-,Cancer,2024-01-31,Matthew Smith,Sons and Miller,Blue Cross,18856.281306,328,Urgent,2024-02-02,Paracetamol,Normal
4,adrIENNE bEll,43,Female,AB+,Cancer,2022-09-19,Kathleen Hanna,White-White,Aetna,14238.317814,458,Urgent,2022-10-09,Penicillin,Abnormal
7,CHrisTInA MARtinez,20,Female,A+,Cancer,2021-12-28,Suzanne Thomas,"Powell Robinson and Valdez,",Cigna,45820.462722,277,Emergency,2022-01-07,Paracetamol,Inconclusive
9,ChRISTopher BerG,58,Female,AB-,Cancer,2021-05-23,Heather Day,Padilla-Walker,UnitedHealthcare,19784.631062,249,Elective,2021-06-22,Paracetamol,Inconclusive
10,mIchElLe daniELs,72,Male,O+,Cancer,2020-04-19,John Duncan,Schaefer-Porter,Medicare,12576.795609,394,Urgent,2020-04-22,Paracetamol,Normal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55448,tIFfANy miller,78,Male,B-,Cancer,2023-06-22,Jaime Valdez,Sons Williams and,UnitedHealthcare,17217.325440,287,Urgent,2023-06-30,Ibuprofen,Inconclusive
55456,DEBRa MIller,17,Male,AB-,Cancer,2021-07-29,Samantha Russell,Collier-Jordan,UnitedHealthcare,43230.028453,125,Urgent,2021-08-01,Ibuprofen,Abnormal
55479,mIcHAeL SanTiAgo,58,Female,B-,Cancer,2022-01-05,Andrea Fields,Berry-Nguyen,Medicare,45767.175201,299,Elective,2022-01-29,Ibuprofen,Normal
55484,keNNEtH alvarez,80,Male,O+,Cancer,2022-05-05,Andrew Conner,Sons Mayo and,Cigna,45653.802310,114,Elective,2022-05-17,Aspirin,Normal


In [22]:
filter_df=df[(df['Age']<=20) & (df["Medical Condition"]=="Cancer")]
filter_df

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
7,CHrisTInA MARtinez,20,Female,A+,Cancer,2021-12-28,Suzanne Thomas,"Powell Robinson and Valdez,",Cigna,45820.462722,277,Emergency,2022-01-07,Paracetamol,Inconclusive
62,tRAvIs carTeR,18,Male,A+,Cancer,2022-07-06,Megan Hahn,"Moss and Ferguson, Baker",UnitedHealthcare,48407.386291,325,Emergency,2022-07-18,Aspirin,Normal
121,RENEE bAilEY,19,Female,B+,Cancer,2021-07-03,Sarah Shaffer,"Johnson, and Ross Harris",Medicare,33681.572644,177,Elective,2021-07-31,Ibuprofen,Abnormal
123,DR. LaUreN ClaRk DDs,19,Male,B+,Cancer,2020-10-26,Brian Wagner,PLC Jimenez,UnitedHealthcare,49833.707718,302,Elective,2020-11-17,Lipitor,Inconclusive
172,rOBert WAlsH,20,Female,B+,Cancer,2020-11-22,Sabrina Rogers,"and Anderson Smith Sanchez,",Cigna,40598.422571,113,Elective,2020-12-17,Paracetamol,Normal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55200,RacHEL MOonEy,18,Male,O-,Cancer,2019-10-04,Jonathan Mcdonald,Daniel PLC,Blue Cross,46262.929016,295,Elective,2019-10-18,Paracetamol,Normal
55250,cheLSeA FrENCh,20,Male,A+,Cancer,2020-11-10,Rachel Rose,Johnson-Chavez,UnitedHealthcare,5877.265250,248,Emergency,2020-11-19,Ibuprofen,Inconclusive
55270,DeReK holMeS,19,Male,B-,Cancer,2021-10-07,Jesse Holland,Ltd Burke,Medicare,37247.769333,437,Elective,2021-11-01,Paracetamol,Abnormal
55406,mEGaN pHiLliPS,19,Female,O-,Cancer,2021-02-25,Valerie Christensen,Williams and Sons,Medicare,6896.013990,375,Emergency,2021-03-14,Penicillin,Abnormal


In [23]:
df_sorted=df.sort_values(by="Age")
df_sorted

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
53528,eLIzABETh cAsTILlO,13,Male,AB-,Hypertension,2021-01-03,Dawn Williams,Thomas-Pierce,Aetna,32900.044478,176,Elective,2021-01-27,Aspirin,Abnormal
50377,jamES BasS phD,13,Male,O+,Asthma,2020-12-30,Jennifer Hammond,"Pollard Wallace, Sims and",Blue Cross,22901.022707,476,Emergency,2021-01-17,Ibuprofen,Inconclusive
50823,DEaNnA pALMeR,13,Male,AB-,Obesity,2020-09-20,Barbara Butler,"and Sanchez Phillips, Brown",Medicare,23941.759486,163,Emergency,2020-09-23,Penicillin,Inconclusive
52172,CHArlEs joRDAN,13,Male,A+,Arthritis,2021-05-13,Brian Baker,"Arellano and Armstrong Hensley,",UnitedHealthcare,10597.383799,451,Emergency,2021-06-12,Penicillin,Normal
51095,cAthY BARNES,13,Male,O-,Obesity,2023-03-30,Kevin Ellison,Wood-Johnson,Medicare,29170.617795,348,Emergency,2023-04-29,Lipitor,Abnormal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
53825,mIchaeL POtTs,89,Male,A+,Hypertension,2023-01-19,Mary Vaughn,"and Perez, Bennett Townsend",Cigna,9542.739709,289,Emergency,2023-01-28,Paracetamol,Abnormal
55385,hEAthEr dawsOn,89,Male,O-,Arthritis,2022-07-18,Ann Hall,Moore-Gray,Aetna,17172.115995,455,Elective,2022-08-13,Ibuprofen,Normal
52857,THomAs PHIllIpS,89,Female,A+,Hypertension,2022-05-22,Mark White,"and Martin, Davidson Cox",Cigna,48274.348627,332,Urgent,2022-05-26,Penicillin,Inconclusive
52828,doNALD aViLA,89,Female,AB-,Asthma,2022-09-17,Christopher Allen,"Holmes and Howard Castro,",Aetna,14042.748908,199,Emergency,2022-09-27,Aspirin,Abnormal


In [25]:
df_sorted =df.sort_values(by="Age", ascending=False)
display(df_sorted)

Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
54813,JerEmY hArdIN JR.,89,Male,A+,Diabetes,2019-11-05,Willie Stevens,Gray-Solomon,Cigna,7242.641277,113,Elective,2019-11-28,Penicillin,Normal
54044,MiChAEL DOmINGuEz,89,Male,O+,Cancer,2021-09-07,Bridget Irwin,PLC White,Aetna,7628.951322,363,Elective,2021-09-24,Aspirin,Abnormal
53825,mIchaeL POtTs,89,Male,A+,Hypertension,2023-01-19,Mary Vaughn,"and Perez, Bennett Townsend",Cigna,9542.739709,289,Emergency,2023-01-28,Paracetamol,Abnormal
52857,THomAs PHIllIpS,89,Female,A+,Hypertension,2022-05-22,Mark White,"and Martin, Davidson Cox",Cigna,48274.348627,332,Urgent,2022-05-26,Penicillin,Inconclusive
52043,DAVId NeWTOn,89,Female,O+,Arthritis,2021-02-08,Jerry Hopkins,"Cooper Brown Parks, and",Aetna,34500.016817,242,Elective,2021-02-20,Penicillin,Inconclusive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50908,rOnalD daVis,13,Male,A+,Obesity,2023-12-06,Shannon Butler,"Kelly, and Gomez Williams",Cigna,3014.565852,241,Emergency,2024-01-04,Paracetamol,Normal
50823,DEaNnA pALMeR,13,Male,AB-,Obesity,2020-09-20,Barbara Butler,"and Sanchez Phillips, Brown",Medicare,23941.759486,163,Emergency,2020-09-23,Penicillin,Inconclusive
54302,dAWn chAveZ,13,Male,AB+,Cancer,2023-03-13,David Mcclure,"Taylor, Boyle Dalton and",UnitedHealthcare,43935.226905,478,Emergency,2023-03-25,Ibuprofen,Abnormal
53528,eLIzABETh cAsTILlO,13,Male,AB-,Hypertension,2021-01-03,Dawn Williams,Thomas-Pierce,Aetna,32900.044478,176,Elective,2021-01-27,Aspirin,Abnormal


In [27]:
avg_price = df["Age"].mean()
display(avg_price)

np.float64(51.53945945945946)

In [28]:
max_age = df["Age"].max()
display(max_age)

89

In [29]:
diag_count= df["Medical Condition"].value_counts()
display(diag_count)

Unnamed: 0_level_0,count
Medical Condition,Unnamed: 1_level_1
Arthritis,9308
Diabetes,9304
Hypertension,9245
Obesity,9231
Cancer,9227
Asthma,9185


In [32]:
grouped_df = df.groupby('Medication').agg({'Medication': 'count'})
display(grouped_df)

Unnamed: 0_level_0,Medication
Medication,Unnamed: 1_level_1
Aspirin,11094
Ibuprofen,11127
Lipitor,11140
Paracetamol,11071
Penicillin,11068
