# Analyzing Data with Jupyter Notebook

This notebook is based in part on the Data Science Handbook by Jake VanderPlas that you can find [here](https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb)



## IPython Tips

Here are a few tips that will help you get started using a Jupyter Notebook.

### Tip 1 - Help

You can use the ? character at the end of a function or type to access the help for that function or type.

In [1]:
# Let's create a list and see how to get the length
v = [1.0, 2.0, 3.0]
len?

[0;31mSignature:[0m [0mlen[0m[0;34m([0m[0mobj[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Return the number of items in a container.
[0;31mType:[0m      builtin_function_or_method

In [2]:
len(v)

3

In [3]:
# We can even get information about the list itself:
v?

[0;31mType:[0m        list
[0;31mString form:[0m [1.0, 2.0, 3.0]
[0;31mLength:[0m      3
[0;31mDocstring:[0m  
Built-in mutable sequence.

If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.

In [4]:
# Let's create a function with a helpful description
def empty_function():
  """This is just an empty function. Please don't call it."""
  return 1

In [5]:
# Now the description is available by asking for help
empty_function?

[0;31mSignature:[0m [0mempty_function[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m This is just an empty function. Please don't call it.
[0;31mFile:[0m      /tmp/ipykernel_4971/3378048660.py
[0;31mType:[0m      function

In [6]:
# Two question marks will display the source for the function
empty_function??

[0;31mSignature:[0m [0mempty_function[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0mempty_function[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m  [0;34m"""This is just an empty function. Please don't call it."""[0m[0;34m[0m
[0;34m[0m  [0;32mreturn[0m [0;36m1[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /tmp/ipykernel_4971/3378048660.py
[0;31mType:[0m      function

### Tip 2 - Tab Completion

Well, I was going to add a section about how to use the \<TAB\> key to autocomplete, but it appears that Colab already has that feature built into the editor.

Just in case, try the command 

?>v.\<TAB\> 

below:

### Tip 3 - Magic Commands

No really, they are called magic because they start with a '%'

One of the most useful commands that can be used to split up a large notebook is the %run magic command. Using it, you
can run external python scripts or even IPython notebooks 
inside the context of the current notebook.

An example of this can be found in the HSV-AI Bug Analysis notebook [here](https://colab.research.google.com/github/HSV-AI/bug-analysis/blob/master/Doc2Vec.ipynb)

Two other very useful magic commands are **%time** and **%timeit**


In [7]:
# Using an example directly from VanderPlas:
print("Here's the output of the %timeit command:")
%timeit L = [n ** 2 for n in range(1000)]

print("\nHere's the output of the %time command:")
%time  L = [n ** 2 for n in range(1000)]

Here's the output of the %timeit command:
42.5 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Here's the output of the %time command:
CPU times: user 41 µs, sys: 0 ns, total: 41 µs
Wall time: 47.4 µs


### Tip 4 - Suppressing Output

Jupyter will send the result of the last command from a cell to the output. Sometimes you just want it to stay quiet though - especically if you are generating plots and other items that do not need the ouput text inserted.

In order to suppress the output, just end the line with a ';'

In [8]:
# This code has output
p1 = 'This little piggy had roast beef'
p1

'This little piggy had roast beef'

In [9]:
# This code does not have output
p2 = 'This little piggy had none'
p2;

### Tip 5 - %history Magic

This particular piece of magic is very helpful when you may have run cells out of order and are trying to troubleshoot what happened in what order.

In [10]:
%history

# Let's create a list and see how to get the length
v = [1.0, 2.0, 3.0]
len?
len(v)
# We can even get information about the list itself:
v?
# Let's create a function with a helpful description
def empty_function():
  """This is just an empty function. Please don't call it."""
  return 1
# Now the description is available by asking for help
empty_function?
# Two question marks will display the source for the function
empty_function??
# Using an example directly from VanderPlas:
print("Here's the output of the %timeit command:")
%timeit L = [n ** 2 for n in range(1000)]

print("\nHere's the output of the %time command:")
%time  L = [n ** 2 for n in range(1000)]
# This code has output
p1 = 'This little piggy had roast beef'
p1
# This code does not have output
p2 = 'This little piggy had none'
p2;
%history


### Tip 6- Shell Commands

Most Linux shell commands are available from the Jupyter notebook as well. One good example is shown below and sets us up to start doing some data analysis.

In [11]:
!wget https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv

--2024-01-19 17:05:14--  https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv
Resolving data.nasa.gov (data.nasa.gov)... 128.102.186.77, 2001:4d0:6311:2c05:60b0:5ad8:1210:ea07
Connecting to data.nasa.gov (data.nasa.gov)|128.102.186.77|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/csv]
Saving to: ‘rows.csv.3’

rows.csv.3              [      <=>           ]   3.77M  2.86MB/s    in 1.3s    

2024-01-19 17:05:16 (2.86 MB/s) - ‘rows.csv.3’ saved [3952161]

