# Jupyter Tips and Tricks

Jupyter notebooks has become a popular tool for data scientists and researchers. It is a great tool for prototyping and sharing code. In this notebook,let's see some tips and tricks that can make your Jupyter notebook experience more enjoyable and productive.

In [3]:
# so you can run any terminal command from Jupyter code cells
# ! will run any command that you would run in the terminal
!dir

 Volume in drive D is Data
 Volume Serial Number is 72A3-8E69

 Directory of d:\Github\Digital_Discourse_ETH713_Fall_2020

11/08/2024  07:27 PM    <DIR>          .
11/08/2024  07:27 PM    <DIR>          ..
11/30/2022  06:12 PM             1,356 .gitignore
01/29/2023  07:32 PM    <DIR>          .idea
01/29/2023  07:32 PM    <DIR>          .ipynb_checkpoints
11/08/2024  06:35 PM    <DIR>          analysis
05/13/2021  03:41 PM               277 Cool_Python_Libraries.md
01/29/2023  07:32 PM    <DIR>          data
11/02/2021  06:00 PM             5,206 Digital Discourse Final Project.ipynb
11/03/2021  05:23 PM                21 hello.py
11/08/2024  07:14 PM    <DIR>          img
11/08/2024  06:35 PM    <DIR>          JSON_APIs
11/08/2024  07:27 PM                 0 Jupyter_tips.ipynb
01/29/2023  07:32 PM    <DIR>          kaggle
11/02/2021  06:00 PM             1,084 LICENSE
11/08/2024  06:35 PM    <DIR>          OpenAI
05/13/2021  03:41 PM             4,119 Python Learning Resources.ipynb


In [None]:
# we can use some Jupyter magic commands such as timeit to measure the time it takes to run a cell
# this is useful when you want to compare the performance of different code snippets

In [None]:
%%timeit 
2**1000 # so Jupyter will run this code snippet multiple times and measure the time it takes to run it

754 ns ± 14.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [None]:
# it could your function or any code snippet that you want to measure its performance
# let's make a function that calculates the most common element in a list
def most_common(lst):
    return max(set(lst), key=lst.count)

# first let's test if this function works
words = ['apple', 'banana', 'apple', 'banana', 'apple', 'apple', 'apple', 'banana', 'apple', 'banana', "grape", "orange"]
most_common(words) # it should return 'apple'
# alternative would be to use the Counter class from the collections module
from collections import Counter
def most_common_counter(lst):
    return Counter(lst).most_common(1)[0][0] # [0][1] would be the count of the most common element
# [0][0] is used to get the most common element from the list of tuples returned by most_common method
# test that it works
most_common_counter(words) # it should return 'apple'

'apple'

In [9]:
# now let's test which function is better in terms of performance
# first let's generate 1000 random words from 26 words
# "apple", "banana", "citrus", "date", "elderberry", "fig", "grape", "honeydew", "kiwi", "lemon", "mango", "nectarine", "orange", "papaya", "quince", "raspberry", "strawberry", "tangerine", "ugli", "vanilla", "watermelon", "ximenia", "yuzu", "zucchini"
import random
fruits = ["apple", "banana", "citrus", "date", "elderberry", "fig", "grape", "honeydew", "kiwi", "lemon", "mango", "nectarine", "orange", "papaya", "quince", "raspberry", "strawberry", "tangerine", "ugli", "vanilla", "watermelon", "ximenia", "yuzu", "zucchini"]
random_words = [random.choice(fruits) for _ in range(1000)]
# we have 1000 random words in the list random_words
# now let's measure the performance of our functions
# first check that they work
most_common(random_words), most_common_counter(random_words) # they should return the same result

('vanilla', 'vanilla')

In [10]:
%%timeit
most_common(random_words)

259 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [11]:
%%timeit
most_common_counter(random_words)

31.6 µs ± 884 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


## Takeaway from timeit

%%timeit can be useful to measure performance

However, if one function takes 250ms and another takes 50ms, it is not going to make much difference if you can't see the difference in the output. 

One trick is to see the time taken by each cell. Simply see that the cell is executed and the time taken to execute the cell is displayed. 



In [12]:
# let's run a simple loop and measure the time it takes to run it
for _ in range(100_000_000):
    pass # running on empty loop

## Output and print

So by default Jupyter shows output of the last line of the cell. However, if you want to see the output of multiple lines, you can use print statement. 



In [None]:
# so let's see the difference between print and output of the last line cell
my_text = "Hello, World!\n"
my_text # this will print the text, technically it will print the repr of the object not the str

'Hello, World!\n'

In [None]:
# compare with regular print
print(my_text) # this will print the text as it is

# so last line of cell is more technical and it will print the repr of the object
# print will be more user friendly and it will print the str of the object

Hello, World!



In [17]:
# again for each code cell execution, the last line will be printed
"Valdis".upper() # this will print VALDIS

'VALDIS'

## Running python code from a different file

Usually we have all our code in a single notebook. However, if you want to run code from a different file, you can use the magic command %run followed by the file name.

So let's run hello.py file which has the following code:

```python
print("Hello World")
```

```python
%run hello.py
```

In [19]:
%run hello.py # this will run the hello.py script in the same directory as this notebook

Hello world!


In [None]:
my_greeting # so the variable from the script is available here!

'Hello, Discourse!'