# Introduction to Jupyter Notebook


Welcome to to the wonderful world of jupyter notebook. It is a "neat-o-rific" way of performing calculations and presenting results for scientists the world over!


Click on this "cell" to select it, then press Ctrl+Enter to "run" this cell.

- - -


# Types of Cells

## 1. Markdown 

### 1.1 The Basics

+ This is similar (in principle) to markup (i.e. HTML)

+ Basic syntax (and only basic formatting) 

    - **Bold** text is enclosed between a pair of \*\*

    - *Italic* text is enclosed between a pair of \*
    
    - Unordered lists start with a + or - (and can be nested, just use 4 or more spaces in front)
    
    - Links in parenthesis ( )
    
    
For more details see the [Github markdown reference](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

<!-- This is a markdown cell comment. It will not show when this cell is run -->   
   




### 1.2 Tables 

A simple table


|   Input size    |  Comparison Count  |
| :-------------: |:------------------:| 
|  100            |   205   |
|  200            |  1800 |
|  300            |  3200 |



### 1.3 Source code


Code may be written "inline" by enclosing it within a pair of backticks \`

This is useful for small segments, like refering to the function `int SortAnalysis(auto)`.


- - -

Longer code segments should be "fenced" above and below with three \`\`\`

For C++ we put the name of the language after the first fence (like \`\`\`c++)


 
   ```c++

   class Heap
   {
   
     private:
        vector<int> data = {};
     
     public:
         void insert(int);
         void heapifyDown(int);
         int deleteMin();
    };
    
    int main()
    {
       Heap pqueue; 
       
       auto x = 13.75;
       
       for (auto z: {10, 14, 8, 12, 7, 20, 17})
       {
          pqueue.insert(z);
       }
       
     }
```     

### 1.4 Math

   
Inline math can be written within a pair of \$ symbols \$. For example:

Algorithm `XYZ` is **clearly** $\Theta(n^2)$ in the worst case.

+ \leq , \geq for $\leq$ , $\geq$

</br> <!-- Yup, regular HTML also works -->

+ \log n for $\log n$

</br>

+ \sqrt{n+5} for $\sqrt{n+5}$

</br>

+ ^{} for superscript (powers) 
    - As an example $2^{2n}$

</br>

+ _{} for subscript 
    - As an example $\log_{2}n$

</br>

+ \Theta, \Omega, O for $\Theta$, $\Omega$, and $O$

- - -

For math to be "set apart", it is enclosed in double \$\$.   

For example \$\$ 2x \over \log(n^2) \$\$ will render as ...
   
   $$  2x \over \log(n^2)$$
   
- - -

Try writing some of your solutions to **§ 2.1 Q9** below

1. The function ... has a larger/smaller order of growth than ...
2. The function ... has a larger/smaller order of growth than ...



## 2. Code cells

These cells have the text **In[ ]** next to them and basically support *python* code directly

When executed, you will see the text **Out[ ]** immediately below the corresponding input cell

In [1]:
# Code comments start with a '#
# python code

a = 5
b = 6

print('Hello jupyter\n\n-from python-\n\n') 

print(a+b)

Hello jupyter

-from python-


11


### 2.1 Code Table

This is a more useful way of showing your data in tabular format if your data is in a `csv` file

First, we need to import required libraries


In [1]:
# For plotting and other good stuff

import matplotlib.pyplot as plt  # Used for plotting
import numpy as np               # Extra features
import pandas as pd              # More, extra features 

**NEXT**: Read the `csv` data from a file called `data.csv` (in the current folder)

In [2]:
df = pd.read_csv('data.csv', index_col='Size') # This cell does not output anything when run (see below)
                                               # The data is stored in the object called df

In [3]:
print(df)  # print df as plain text table

          Run1      Run2      Run3      Run4      Run5      Run6      Run7  \
Size                                                                         
1000    256352    257361    252662    245293    252518    245538    259505   
1500    565044    554801    568838    554934    567905    567314    544232   
2000   1013630    989099    963620   1016873   1012845    990815   1022071   
2500   1549473   1561043   1586564   1539539   1582122   1554209   1543806   
3000   2254895   2241892   2305853   2279715   2259956   2242343   2237505   
3500   3025413   3067987   3035571   3048835   3065339   3092628   3147162   
4000   3952266   3995087   3982366   3974685   3983351   3965461   4086375   
4500   5151517   5136970   5088039   5093245   5128897   5130397   5043579   
5000   6199968   6242346   6263048   6262625   6176489   6302697   6178821   
5500   7491488   7562594   7722383   7661364   7498564   7553316   7611781   
6000   8953671   9060798   9054023   9136892   9030042   8967691

In [4]:
df  # print df as pretty  table

Unnamed: 0_level_0,Run1,Run2,Run3,Run4,Run5,Run6,Run7,Run8,Run9,Run10,Run11,Run12,Run13,Run14,Run15,Run16,Run17,Run18,Run19,Run20
Size,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1000,256352,257361,252662,245293,252518,245538,259505,232459,261136,260299,245246,249187,246952,255838,246323,254402,248522,240087,248070,251976
1500,565044,554801,568838,554934,567905,567314,544232,560299,569025,572175,570501,578305,555254,553336,579708,554968,568795,565043,565069,567883
2000,1013630,989099,963620,1016873,1012845,990815,1022071,985823,1005862,1013151,992813,975332,996424,1003165,1031619,1001227,998299,984761,1034466,992777
2500,1549473,1561043,1586564,1539539,1582122,1554209,1543806,1576520,1562952,1580750,1564463,1502955,1586266,1529918,1545066,1580201,1578766,1550296,1548359,1575709
3000,2254895,2241892,2305853,2279715,2259956,2242343,2237505,2238234,2249177,2256887,2286659,2274245,2244137,2236330,2282204,2222765,2239486,2291336,2251709,2227801
3500,3025413,3067987,3035571,3048835,3065339,3092628,3147162,3112133,3016089,3032746,3080374,3054468,3100280,3085196,3041457,3097042,2985728,3001885,3022243,3006223
4000,3952266,3995087,3982366,3974685,3983351,3965461,4086375,3941233,3946875,3970963,4090850,4037171,3987701,3954194,4024706,4016903,3975804,3998759,4023232,4027457
4500,5151517,5136970,5088039,5093245,5128897,5130397,5043579,5156320,5058966,5188055,4950935,5001742,5049639,4993806,5089082,5018214,5041194,5072477,5096376,4994444
5000,6199968,6242346,6263048,6262625,6176489,6302697,6178821,6222286,6182946,6342704,6256800,6275356,6089236,6206645,6259170,6290689,6263227,6203311,6269004,6237778
5500,7491488,7562594,7722383,7661364,7498564,7553316,7611781,7501820,7674705,7580695,7559301,7605485,7659740,7647246,7548572,7599778,7668441,7639935,7497906,7607327


### 2.1 Plotting graphs

In [5]:
plt_x = [x for x in df.index ]                      # prepare data for x axis
plt_y = [df.loc[y].mean(axis=0) for y in plt_x]     # prepare data for y axis

In [6]:
print('X axis data')
print (plt_x)
print('\n\nY axis data')
print(plt_y)

X axis data
[1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500, 6000, 6500, 7000, 7500, 8000, 8500, 9000, 9500]


Y axis data
[250486.29999999999, 564171.44999999995, 1001233.6, 1559948.8500000001, 2256156.4500000002, 3055939.9500000002, 3996771.9500000002, 5074194.7000000002, 6236257.2999999998, 7594622.0499999998, 9008247.5999999996, 10570438.75, 12264175.75, 14087619.800000001, 16025893.65, 18085152.350000001, 20284184.100000001, 22503492.0]


In [7]:
fig = plt.figure(figsize=(12,8))
axes = fig.add_subplot(111) # 111 means working with one chart (whatever that means!!)

axes.plot(plt_x,plt_y)
axes.set_title('SortAnalysis Count Data')
axes.set_xlabel('Input size $n$')
axes.set_ylabel('Count')
#plt.show()

<matplotlib.text.Text at 0x7f800619bdd8>

AttributeError: module 'matplotlib.colors' has no attribute 'to_rgba'

### Comparing plots 


1. Generate linear and quadratic y values (based on the original `plt_x` data)
2. Plot all data on the same figure

In [8]:
linear_y = plt_x                       # every y value is the same as every x value in plt_x
quadratic_y = [y*y for y in plt_x]     # the square of every x value in plt_x 
cubic_y = [y*y*y for y in plt_x]

In [9]:
fig2 = plt.figure(figsize=(12,8))
axes = fig2.add_subplot(111) # 111 means working with one chart (whatever that means!!)

axes.plot(plt_x, linear_y, label = '$n$')  #added labels to each plot
axes.plot(plt_x, quadratic_y, label = '$n^2$')
axes.plot(plt_x, cubic_y,  label = '$n^3$')
# Place a legend to the right of this smaller subplot.
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

axes.set_title('Comparison Plot')
axes.set_xlabel('Input size $n$')
axes.set_ylabel('Count')
#plt.show()              # What??? This line isn't needed???????


<matplotlib.text.Text at 0x7f8005ee93c8>

AttributeError: module 'matplotlib.colors' has no attribute 'to_rgba'