# BLU04 - Learning Notebook - Part 1 of 3 - Notebook magics

## 1. Introduction

Welcome to data wrangling!

Up to this point of the academy, you already handled several datasets. Some were big, some were small. Some were very clean, some were a little messier (maybe with some missing values).

But in all the cases, they were pretty easy to handle (thanks pd.read_csv!), and conveniently accessible.

Well, in real life, that never happens.

![title](../media/all_the_data.png)

When dealing with a data science problem, you'll find that the data you need is probably scattered around multiple sources, stored in severeal different formats, and in need of deep cleaning!

But worry not, that's what we're here for :)
In this specialization you'll learn many tools that will turn you into a professional data wrangler.

Let's start with some handy Jupyter notebook's tools.

## 2. ! (system shell access)

In a Jupyter notebook, any statement that you start with an exclamation mark (!), will be sent to the underlying operating system.

In practice, this means that you can run shell commands in the notebooks, in the same way as you do in your computer terminal.

Let's see some examples. The first is to list the files in the current directory.

In [1]:
# list the current directory
# in Windows: ! dir
! ls

BLU04 - Learning Notebook - Part 1 of 3 - Notebook magics.ipynb
BLU04 - Learning Notebook - Part 2 of 3 - Reading files.ipynb
BLU04 - Learning Notebook - Part 3 of 3 - Data cleaning.ipynb


In order to see the contents of a file, we just have to use command **cat** (Unix) or **type** (Windows), followed by the file path.

In [2]:
# print the contents of a file
# in Windows: ! type ../data/lorem/lorem_ipsum_short.txt
! cat ../data/lorem/lorem_ipsum_short.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis et tristique enim. Sed eget venenatis eros, quis suscipit lorem. Ut malesuada, erat a scelerisque cursus, odio mi sodales ligula, at elementum ipsum dui condimentum ipsum. Nam imperdiet viverra dictum. Aenean commodo accumsan iaculis. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Maecenas sed suscipit odio, ornare ullamcorper felis. Donec sit amet egestas sem, ac semper sem. Integer mattis purus sed orci volutpat porta. Donec cursus dapibus interdum. Donec ornare mattis dolor. Sed quis scelerisque nisl, in efficitur odio. Phasellus posuere libero eget orci feugiat scelerisque. Phasellus ullamcorper tortor nec facilisis blandit.
Vestibulum tempus, purus id ultrices eleifend, lorem nunc malesuada odio, vitae aliquam elit ipsum a nulla. Vivamus sed neque arcu. Vivamus commodo nunc a est hendrerit tempor. Nullam viverra sit amet augue in sagittis. Pellentesque molestie porttitor volutpa

Sometimes files are very big. In that case, printing all the content of a file can be too expensive and not very usefull.

Thus, it's important to know how big is a file before opening it.
A good way to do it is by counting the number of lines that the file has.

In [3]:
# counting the number of lines in a file
# in Windows: ! type ../data/lorem/lorem_ipsum_long.txt | find /c /v ""

! wc -l ../data/lorem/lorem_ipsum_long.txt

     150 ../data/lorem/lorem_ipsum_long.txt


When a file is in fact very big, we can still preview it. But instead of printing all its content, w'll just want to print its first lines.

For this we can use the command **head** (Unix) or **more** (Windows).

In [4]:
# print the content of the first two lines of the file
# in Windows: ! more /e ../data/lorem/lorem_ipsum_long.txt P 2
! head -2 ../data/lorem/lorem_ipsum_long.txt

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed pharetra pulvinar nulla, nec ultricies velit posuere nec. Nulla rhoncus convallis lectus in dignissim. Nulla aliquet justo risus, ac dapibus urna faucibus aliquet. Cras in pretium leo. Etiam id neque a erat feugiat vehicula. Curabitur accumsan volutpat ante, vitae vestibulum lorem congue et. Vivamus fringilla massa id dictum bibendum. Maecenas iaculis arcu ut tellus varius, at imperdiet metus lacinia. Maecenas eget turpis metus. Maecenas est mauris, venenatis cursus nulla at, gravida posuere sapien. Fusce a ex purus. Pellentesque eleifend, lorem sed pulvinar scelerisque, orci eros maximus purus, id sagittis dolor est non nulla. Vestibulum ex metus, porttitor a leo ornare, pharetra pharetra libero. Suspendisse cursus ligula in ante tincidunt rhoncus malesuada eu justo.
Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nam nec sem nisl. Aliquam at velit neque. Cras nec sem fringilla, h

Similarly, we can also preview the last lines of the file, with the **tail** command on Unix.

Unfortunately, Windows doesn't have a built-in command equivalent to tail. But there are some packages that can be installed to fill this purpose (check this [stackoverflow post](https://stackoverflow.com/questions/187587/a-windows-equivalent-of-the-unix-tail-command)). Or you can always **type** the whole file (if not too big), and read the last lines...

In [1]:
# print the content of the last three lines of the file
# in Windows :(
! tail -3 ../data/lorem/lorem_ipsum_long.txt

Morbi sodales et felis in bibendum. Nullam sollicitudin dapibus tellus, at molestie dui sagittis sed. Proin a tellus ac mi pharetra ullamcorper et in diam. Donec at posuere massa. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Quisque faucibus justo sollicitudin tincidunt tincidunt. Mauris magna massa, convallis quis erat sit amet, varius consequat nibh. In ac elit eu purus tempor lobortis quis at ante. Aliquam erat volutpat. Etiam sed arcu ut ex venenatis suscipit. Quisque ac porta diam. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce tristique malesuada diam, at hendrerit neque euismod sit amet.
Aenean commodo posuere elit, et ornare mi bibendum eget. Mauris scelerisque elit neque, vel eleifend leo pellentesque et. Aliquam erat volutpat. Vestibulum porttitor, neque in eleifend tincidunt, dolor tortor sagittis velit, a vehicula odio lectus et neque. Etiam odio diam, congue nec neque quis, hendrerit convallis nisi. Nullam leo lig

## 3. % (magic commands)

In IPython, there are some commands which are called magic commands. Quoting IPython's [documentation](https://ipython.org/ipython-doc/3/interactive/reference.html#magic-command-system): "These [commands] allow you to control the behavior of IPython itself, plus a lot of system-type features. They are all prefixed with a % character, but parameters are given without parentheses or quotes."

There are two types of magic commands: line magics and cell magics.

Line magics are invoked like:

```
% command_name the_rest_of_the_line_is_interpreted_as_command_arguments
```

While cell magics are invoked like:

```
%% command_name the_rest_of_the_line 
and_all_the_other_lines
in_the_cell 
are_interpreted_as_command_arguments
```

[Here](http://ipython.readthedocs.io/en/stable/interactive/magics.html) you have a list with all the existing IPython's magic commands, but here we'll just see a couple of examples.

The command **magic** is a magic itself. By invoking it, without any additional arguments, the notebook will show you the magic commands' documentation

In [6]:
# print info about magic system
% magic

A very usefull magic, that you've probably already seen, is the **matplotlib inline** magic.

It enables the matplotlib inline backend for usage with the IPython Notebook. This means that in each cell that you write matplotlib's plotting commands, you will see the plots as output of the cell without the need to explicitly call the show method.

You can invoke it as follows, and it will be applied to all the cells in the notebook.

In [7]:
%matplotlib inline

Another super usefull magic is the **timeit**, which allows you to measure the execution time of a line or a cell.

Let's see an example of timeit as a line magic.

In [8]:
# the -n argument specifies the number of times the statement runs each time, which is called a loop
# the -r argument specifies the number of times the loop runs
%timeit -n3 -r5 [i for i in range(1000)]

37.6 µs ± 2.69 µs per loop (mean ± std. dev. of 5 runs, 3 loops each)


We can also use timeit as a cell magic to inspect the execution time of a whole cell.

In [9]:
%%timeit x = 3
result = x**100
[i for i in range(1000)]

34.9 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Another important magic is the **load_ext**, which allows us to load IPython extensions.

In this example, we're loading the **line_profiler** extension, which allows us to profile functions by measuring how much time is being spent in each line of the function.

In [10]:
%load_ext line_profiler

Let's see an example on how to use the line profiler. First, we define a (very dummy) function.

In [11]:
def very_dummy_function(r=1000):
    slow_execution_statement = [i for i in range(r)]
    fast_execution_statement = 2
    
    return 'hello!'

Then, we invoke the line profiler using the **lprun** magic like this.

In [12]:
%lprun -f very_dummy_function very_dummy_function(r=2000)

As it was expected, the function spends about 99% percent of it's execution time in the loop slow_execution_statement loop, and the remaining time is split between the fast_execution_statement assignement and the return statement.

## 4. Optional

You can read more about profiling code [here](https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html).