This chapter is a very brief introduction to python and Jupyter notebooks. We only discuss the content relevant for applying python to analyze data.

## Installation

If you are new to python, we recommend downloading the [Anaconda installer](https://docs.anaconda.com/anaconda/install/) and following the instructions for installation. Once installed, we'll use the Jupyter Notebook interface to write code.

## Jupyter notebook

### Introduction

Jupyter notebook is an interactive platform, where you can write code and text, and make visualizations. You can access Jupyter notebook from the Anaconda Navigator, or directly open the Jupyter Notebook application itself. It should automatically open up in your default browser. The figure below shows a Jupyter Notebook opened with Google Chrome. This page is called the *landing page* of the notebook.

In [16]:
#| echo: false

# import image module
from IPython.display import Image

# get the image
Image(url="./Datasets/jupyter.jpg", width=700, height=400)

To create a new notebook, click on the `New` button and select the `Python 3` option. You should see a blank notebook as in the figure below.

In [22]:
#| echo: false

# import image module
from IPython.display import Image

# get the image
Image(url="./Datasets/jupyter_newbook.jpg", width=600, height=150)

### Writing and executing code

**Code cell:**
By default, a cell is of type *Code*, i.e., for typing code, as seen as the default choice in the dropdown menu below the *Widgets* tab. Try typing a line of python code (say, `2+3`) in an empty code cell and execute it by pressing *Shift+Enter*. This should execute the code, and create an new code cell. Pressing *Ctlr+Enter for Windows (or Cmd+Enter for Mac)* will execute the code without creating a new cell. 

**Commenting code in a code cell:** Comments should be made while writing the code to explain the purpose of the code or a brief explanation of the tasks being performed by the code. A comment can be added in a code cell by preceding it with a # sign. For example, see the comment in the code below.

Writing comments will help other users understand your code. It is also useful for the coder to keep track of the tasks being performed by their code. 

In [51]:
#This code adds 3 and 5
3+5

8

**Markdown cell:** Although a comment can be written in a code cell, a code cell cannot be used for writing headings/sub-headings, and is not appropriate for writing lengthy chunks of text. In such cases, change the cell type to *Markdown* from the dropdown menu below the *Widgets* tab. Use any markdown cheat sheet found online, for example, [this one](https://www.markdownguide.org/cheat-sheet/) to format text in the markdown cells.

Give a name to te notebook by clicking on the text, which says 'Untitled'. 

### Saving and loading notebooks

Save the notebook by clicking on `File`, and selecting `Save as`, or clicking on the `Save and Checkpoint` icon (below the `File` tab). Your notebook will be saved as a file with an exptension *ipynb*. This file will contain all the code as well as the outputs, and can be loaded and edited by a Jupyter user. To load an existing Jupyter notebook, navigate to the folder of the notebook on the *landing page*, and then click on the file to open it.

## Python language basics

### Object Oriented Programming

Python is an object-oriented programming language. In layman terms, it means that every number, string, data structure, function, class, module, etc., exists in the python interpreter as a python object. An object may have attributes and methods associated with it. For example, let us define a variable that stores an integer:

In [52]:
var = 2

The variable `var` is an object that has attributed and methods associated with it. For example a couple of its attributes are `real` and `imag`, which store the real and imaginary parts respectively, of the object `var`:

In [57]:
print("Real part of 'var': ",var.real)
print("Real part of 'var': ",var.imag)

Real part of 'var':  2
Real part of 'var':  0


**Attribute:** An attribute is a value associated with an object, defined within the class of the object. 

**Method:** A method is a function associated with an object, defined within the class of the object, and has access to the attributes associated with the object.

For looking at attributes and methods associated with an object, say `obj`, press tab key after typing `obj.`.

Consider the example below of a class *example_class*:

In [1]:
class example_class:
    class_name = 'My Class'
    def my_method(self):
        print('Hello World!')

e = example_class()

In the above class, `class_name` is an attribute, while `my_method` is a method.

### Assigning variable name to object

When an object is assigned to a variable name, the variable name serves as a reference to the object. For example, consider the following assignment:

In [57]:
x = [5,3]

The variable name `x` is a reference to the memory location where the object `[5, 3]` is stored. Now, suppose assign `x` to a new variable `y`:

In [58]:
y = x

In the above statement the variable name `y` now refers to the same object `[5,3]`. The object `[5,3]` does **not** get copied to a new memory location referred by `y`. To prove this, let us add an element to `y`:

In [59]:
y.append(4)
print(y)

[5, 3, 4]


In [60]:
print(x)

[5, 3, 4]


When we changed `y`, note that `x` also changed to the same object, showing that `x` and `y` refer to the same object, instead of referring to different copies of the same object.

### Importing libraries

There are several [built-in functions](https://docs.python.org/3/library/functions.html) in python like `print()`, `abs()`, `max()`, `sum()` etc., which do not require importing any library. However, these functions will typically be insufficient for a analyzing data. Some of the popular libraries and their primary purposes are as follows:

1. NumPy: Performing numerical operations and efficiently storing numerical data.
2. Pandas: Reading, cleaning and manipulating data.
3. Matplotlib, Seaborn: Visualizing data.
4. SciPy: Performing scientific computing such as solving differential equations, optimization, statistical tests, etc.
5. Scikit-learn: Data pre-processing and machine learning, with a focus on prediction.
6. Statsmodels: Developing statistical models with a focus on inference

A library can be imported using the `import` keyword. For example, a NumPy library can be imported as:

In [62]:
import numpy as np

Using the `as` keyboard, the NumPy library has been given the name `np`. All the functions and attributes of the library can be called using the *'np.'* prefix. For example, let us generate a sequence of whole numbers upto `10` using the NumPy function [arange()](https://numpy.org/doc/stable/reference/generated/numpy.arange.html):

In [64]:
np.arange(8)

array([0, 1, 2, 3, 4, 5, 6, 7])

### Built-in objects

There are several [built-in objects, modules and functions in python](https://docs.python.org/3/library/stdtypes.html). Below are a few examples:

**Scalar objects:** Python has some built-in datatypes for handling scalar objects such as number, string, boolean values, and date/time. The built-in function `type()` function can be used to determine the datatype of an object:

In [68]:
var = 2.2
type(var)

float

**Date time:** Python as a built-in [datetime](https://docs.python.org/3/library/datetime.html) module for handling date/time objects:

In [70]:
import datetime as dt

In [72]:
#Defining a date-time object 
dt_object = dt.datetime(2022, 9, 20, 11,30,0)

Information about date and time can be accessed with the relevant attribute of the `datetime` object.

**range():** The `range()` function returns a sequence of evenly-spaced integer values. It is commonly used in `for` loops to define the sequence of elements over which the iterations are performed.

Below is an example where the `range()` function is used to create a sequence of whole numbers upto 10:

In [108]:
print(list(range(1,10)))

[1, 2, 3, 4, 5, 6, 7, 8, 9]


In [74]:
dt_object.day

20

In [79]:
dt_object.year

2022

The `strftime` method of the `datetime` module formats a `datetime` object as a string. There are several types of formats for representing date as a string:

In [80]:
dt_object.strftime('%m/%d/%Y')

'09/20/2022'

In [100]:
dt_object.strftime('%m/%d/%y %H:%M')

'09/20/22 11:30'

In [99]:
dt_object.strftime('%h-%d-%Y')

'Sep-20-2022'

### Control flow

As in other languages, python has [built-in keywords](https://docs.python.org/3/tutorial/controlflow.html) that provide conditional flow of control in the code. 

**If-elif-else:** The `if-elif-else` statement can check several conditions, and execute the code corresponding to the condition that is true. Note that there can be as many `elif` statements as required.

In [102]:
#Example of if-elif-else
x = 5
if x>0:
    print("x is positive")
elif x==0:
    print("x is zero")
else:
    print("X is negative")
    print("This was the last condition checked")

x is positive


**for loop:** A `for` loop iterates over the elements of an object, and executes the statements within the loop in each iteration. For example, below is a `for` loop that prints odd natural numbers upto 10:

In [103]:
for i in range(10):
    if i%2!=0:
        print(i)

1
3
5
7
9


**while loop:** A `while` loop iterates over a set of statements *while* a condition is satisfied. For example, below is a `while` loop that prints odd numbers upto 10:

In [106]:
i=0
while i<10:
    if i%2!=0:
        print(i)
    i=i+1

1
3
5
7
9
