## Jupyter Notebook and Linux

### Jupyter Noteboks

The current platform (Colab) you are working on is based on Jupyter notebooks. These notebooks integrate code and its output into a single document. All the code blocks in a notebook are run by a same python kernel, and so the variables, function definitions etc are persistant across the code blocks (i.e. the whole notebook is a like a single code file but we are executing parts of it at a time).

Basic commands:



*   Shift + Enter = Execute the code block and goto next block
*   Ctrl + Enter = Execute the code block
*   Ctrl + / = Comment out the selected lines
*   Tab = suggest auto-complete
*   Tab inside function braces = suggest docstring/parameters




[Jupyter magic commands](https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html):

Note : % is line magic and works only for that particular line while %% is cell magic and runs on the whole code block

*   %time - measure the [time taken](https://stackoverflow.com/a/47478852) to run the particular line of code
*   %%time - measure the time taken to run the particular block of code
*   %prun - run [code profiling](https://en.wikipedia.org/wiki/Profiling_(computer_programming) on the block

You can access commonly used code snippets and files by clicking on the arrow at the left margin of the screen.

See the [docs here](https://jupyter.readthedocs.io/en/latest/)

In [0]:
# Press shift+enter or ctrl+enter to execute this block

def simple_interest(principle, rate, time):
  '''
  Returns the simple interest.
  
  interest = principle*rate*time/100
  '''
  
  interest = principle*rate*time/100
  return interest

In [0]:
%time simple_interest(100, 1, 5)

simple_interest(100, 1, 5)
simple_interest(100, 1, 5)
simple_interest(100, 1, 5)
simple_interest(100, 1, 5)

In [0]:
%%time 

simple_interest(100, 1, 5)

a = 5**10000

In [0]:
%prun simple_interest(100, 1, 5)

### Linux



Prefix ! before these commands while running in a code block. (Even though it runs without prefixing, it is good practice to differentiate python code from linux commands)

Basic commands:



*   cd \<directory path>= change directory (goto directory path)
*   ls = list contents of current directory
*   rm \<file path> = remove file
*   rm -r \<dir path> = remove the directory and its contents
*   mkdir \<name of dir> = make a new directory
*   \<command1> && \<command2> = execute command1 then command2
*   \<command1> | \<command2> = [Pipe](https://www.guru99.com/linux-pipe-grep.html#1) command1 to command 2

[The Internet is your friend](http://cheatsheetworld.com/programming/unix-linux-cheat-sheet/)



In [0]:
!ls

In [0]:
!mkdir checkthisout && ls

In [0]:
!cd checkthisout && mkdir new && ls

In [0]:
!rm -r checkthisout/ && ls

## Python 3

A very basic tutorial. Checkout [this tutorial](https://jakevdp.github.io/PythonDataScienceHandbook/) for a indepth view.

#### Basics

In [0]:
# This is a comment

In [0]:
"""
This
is a
multiline comment
"""

In [0]:
# Print function
print("Hello")
print("World!")

In [0]:
# `end` argument in print function
print("Hello ",end='') # end='\n' by default
print("World!")

In [0]:
# Take input from user
inp = input('Enter some value: ')
print('Input:',inp)
print('Type: ',type(inp))

#### If-else

In [0]:
if 5<4:
    print('yes1')
elif 5<6:
    print('yes2')
else:
    print('no')

#### Loops

In [0]:
for i in range(5):
    if i<3:
        continue
    else:
        print(i)
        break

In [0]:
# while loop
i=0
while i<5:
    print(i)
    i += 1

#### Lists, Indexing, enumerate

In [0]:
l1 = [1,'hi',2.0]
print(l1)

In [0]:
l2 = list()
l2.append(1)
l2.append('hi')
l2.append(2.0)
print(l2)

In [0]:
l2[0]

In [0]:
l2[0] = 15

In [0]:
l2

In [0]:
l2[0:]

In [0]:
l2[:2]

In [0]:
l2[-1]

In [0]:
len(l2)

In [0]:
l1+l2

In [0]:
# Iterate list
for i in l1:
    print(i)

In [0]:
# Iterate list index
for i in range(len(l1)):
    print(i)

In [0]:
# enumerate
for key,value in enumerate(l1):
    print(key,value)

#### Tuples
- Similar to lists but not immutable

In [0]:
t1 = (1,2,'hi')

In [0]:
t1[0]

In [0]:
t1[0:]

In [0]:
t1[-1]

In [0]:
try:
  t1[0]= 15 # ERROR
except:
  print("Error occured")

In [0]:
# iterating tuple
for i in t1:
    print(i)

In [0]:
# enumerate
for key,value in enumerate(t1):
    print(key,value)

#### Dictionaries

In [0]:
d1 = {
    'a':1,
    'b':2
}

In [0]:
d1

In [0]:
d1['a']

In [0]:
d1['a'] = 3

In [0]:
d1

In [0]:
d2 = {}
d2['a'] = 1
d2['b'] = 2

In [0]:
d2

In [0]:
d3 = dict()
d3['a'] = 1
d3['b'] = 2

In [0]:
d3

In [0]:
d1.keys()

In [0]:
list(d1.keys())

In [0]:
d1.values()

In [0]:
list(d1.values())

In [0]:
d1.items()

In [0]:
list(d1.items())

In [0]:
# Iterate dictionary
for key,value in d1.items():
    print(key, value)

In [0]:
if 'a' in d1:
    print('yes')

In [0]:
len(d1)

#### Functions

In [0]:
def func_name(arg1, arg2):
    print(arg1,arg2)

In [0]:
func_name(1,'hi')

In [0]:
func_name([1,2,3,4],'hi')

In [0]:
# Variable arguments
def func_name(*args):
    for arg in args:
        print(arg)

In [0]:
func_name(1,6.0)

In [0]:
func_name(1,2,'hi')

In [0]:
# keyword arguments
def func_name(**kwargs):
    for key,value in kwargs.items():
        print(key,value)

In [0]:
func_name(a=1,b=6.0)

In [0]:
# Variable arguments
def func_name(*args,**kwargs):
    for arg in args:
        print(arg)
    for key,value in kwargs.items():
        print(key,value)

In [0]:
func_name('hi','there',a=1,b=6.0)

#### String formatting

In [0]:
# String formatting
print('Hello {} from {}'.format(1,2))

In [0]:
# String formatting
def dummy():
    return 4.0
print('My lucky number is {}'.format(dummy()))

In [0]:
x = 'My lucky number is {}'.format(dummy())
print(x)

#### List comprehension

In [0]:
[x ** 2 for x in range(1, 11) if x % 2 == 1]

#### map(function, iterable)
- returns a list of the results after applying the given function to each item of a given iterable (list, tuple etc.)

In [0]:
def check(x):
    if type(x) is str:
        return x
    else:
        return '-'
a = [1,2,4,'hi',4.5]
res = map(check,a)

In [0]:
list(res)

In [0]:
for res in map(check,a):
    print(res)

In [0]:
[x for x in map(check,a)]

In [0]:
[x for x in map(check,a) if x!='-']

#### zip

In [0]:
a = [1,2,3]
b = ['Me','You','Someone else']
c = [1.0,2.0,3.0]
res = zip(a, b, c)

In [0]:
list(res)

In [0]:
for res in zip(a, b, c):
    print(res)

In [0]:
[x for x in zip(a, b, c)]

In [0]:
[x for x in zip(a, b, c) if x[0]<3]

#### Importing Modules

Modules are like libraries in C++. You can access the functions using the module names.

See [docs here](https://docs.python.org/3/tutorial/modules.html)

In [0]:
# import modules
import numpy

In [0]:
numpy.__version__

In [0]:
# change alias while importing 

import numpy as np

In [0]:
np.__version__

In [0]:
numpy==np

In [0]:
from numpy import array

In [0]:
array([1,2,3,4])

## Numpy

The numpy package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.

To use numpy you need to import the module, using for example:


In [0]:
import numpy as np

In [0]:
test = np.array([1,2,3,4,5])
test2 = np.array([[1,2],[3,4]])

print(type(test), type(test2), test.shape, test2.shape)

In [0]:
print(test*2)

In [0]:
print(test + 4*test)

In [0]:
# dtype is the datatype

test3 = np.array([1,2,3,4,5], dtype=complex)

print("test3 =", test3)

print('test2 dtype =', test2.dtype)

test2 = test2.astype(np.uint8)

print('test2 dtype =', test2.dtype)

In [0]:
# indexed as row:col:step

print('test3', test3)
print('test3[::2]', test3[::2])
print('test3[1::2]', test3[1::2])

In [0]:
print('test2[1,1]', test2[1,1])
print('test2[1,:]', test2[1,:])
print('test2[:,1]', test2[:,1])

In [0]:
row_indices = [1, 2, 3]
print(test3[row_indices])

In [0]:
row_mask = np.array([1,0,1,0,0], dtype=bool)
print(test3[row_mask])

In [0]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])

print(A)

In [0]:
print('Dot \n', np.dot(A,A), '\n')

# Numpy arrays are broadcasted to fit the shape 
# of the other arrays in the operation
print('Example of broadcasting (1-A): \n', 1-A)

In [0]:
print(np.mean(A))

In [0]:
# in an array of shape (a,b,c,..), a is axis 0, b is axis 1 etc.
print(np.mean(A, axis=1))

In [0]:
print(np.sum(A, axis=0))

## Matplotlib

Plotting library, see [docs](https://matplotlib.org/3.1.1/contents.html).

In [0]:
import matplotlib.pyplot as plt

In [0]:
x = np.linspace(0, 5, 10)
y = x ** 2

In [0]:
# figure()
plt.plot(x, y, 'r', label='label')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.title('title')
plt.show()

In [0]:
# random examples

xx = np.linspace(-0.75, 1., 100)
n = np.array([0,1,2,3,4,5])

fig, axes = plt.subplots(1, 4, figsize=(12,3))

axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))
axes[0].set_title("scatter")

axes[1].step(n, n**2, lw=2)
axes[1].set_title("step")

axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)
axes[2].set_title("bar")

axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);
axes[3].set_title("fill_between");

## Pandas

Library used to manipulate dataframes. A dataframe is a structured data format used to represent tabular data (or anyother data that can be represented in tabular form).

As always, Google or check the [docs](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) in case of queries.

Check out these excellent [exercises](https://github.com/guipsamora/pandas_exercises).

In [0]:
import pandas as pd

URL = 'https://raw.githubusercontent.com/HariharasudhanAS/ML-Lab/master/Lab_1/heart.csv'

df_raw = pd.read_csv(URL)

#display top 'x=5' rows

df_raw.head(5)

In [0]:
df_raw.tail(4)

In [0]:
print(df_raw.shape)

In [0]:
# dtype is the datatype

print(df_raw.dtypes)

In [0]:
print(df_raw.columns)

In [0]:
print(df_raw.describe())

In [0]:
# use display() (a jupyter specific function) if you want to pretty print the output

display(df_raw.describe())

In [0]:
# grouping like in SQL

grouped = df_raw.groupby('age')
grouped_sum = grouped.sum()

In [0]:
grouped_sum.head()

In [0]:
# not recommended
grouped_sum.thalach.mean()

In [0]:
# recommended
grouped_sum['thalach'].mean()

In [0]:
# Functions can be attached to output of another function

df_raw.groupby('age').sum()['thalach'].mean()

In [0]:
# .agg() - Aggregate using one or more operations over the specified axis.

df_raw.groupby('age')['thalach'].agg(['mean', 'min', 'max'], axis=0).head(4)

In [0]:
# Access elements in dataframe by index 

df_raw.iloc[:,0:10].head()

[Difference btw iloc, loc and ix](https://stackoverflow.com/questions/31593201/how-are-iloc-ix-and-loc-different)

In [0]:
# Apply a function to columns or rows

df_raw.apply(np.sum, axis=0).head()

In [0]:
# Using lambda functions

display(df_raw.apply(lambda x: x*2, axis=0).head())

display(df_raw.head())

In [0]:
# Understanding how apply() works
# It passes each column 

def own_function(x):
  print(x)
  return pd.Series([2]*4)

df_raw[:2].apply(own_function)

In [0]:
# Slicing is allowed 

df_raw[0:2]

In [0]:
# Plotting 

age_wise = df_raw.groupby('age').mean()

# Inplace changes the dataframe

age_wise.sort_values(by = 'chol', ascending = True, inplace=True)

# create the plot
age_wise['chol'][:10].plot(kind='bar')

# Set the title and labels
plt.xlabel('age')
plt.ylabel('cholestrol')

# show the plot
plt.show()

## Congrats!

On completing this tutorial, please visit the Kaggle in-class competition to compete