# SMAI Tutorial-1: Introduction to Python (and some other relevant stuff)

## Basic Setup to get started

### Version Control using Git

- Git is often confused with Github, they are not the same

**Git**:
- Git is a version control system used in software development
- Tracks the changes that you make to files in your project
- You can revert back to older version if you ever need to
- It makes collaboration easier, allowing changes by multiple people to be merged into one source.

<!-- ![Git](https://git-scm.com/book/en/v2/images/local.png) -->
<img src="https://git-scm.com/book/en/v2/images/local.png" alt="local" width="350" height="300"/>

**Github**:
- Github is a hosting service that integrates version control using git
- And thus allows for multiple people across the world to contribute towards a project

<!-- ![Git](https://git-scm.com/book/en/v2/images/centralized.png) -->
<img src="https://git-scm.com/book/en/v2/images/centralized.png" alt="central" width="400" height="300"/>

> This is more of a **self-study** topic, since most people get confused when introduced to git and this tutorial is not about that. But you are highly recommended to get used to git since it will help you in the future.

Tutorials:
- Git for Dummies: https://youtu.be/mJ-qvsxPHpY
- Basics of using git: https://rogerdudler.github.io/git-guide/

Make sure you know these commands :
- `git pull`
- `git clone <repository url>`
- `git add <filename>`
- `git commit -m <commit message>`
- `git push`

Other useful commands to store credential information:

- `git config --global user.name <github username>`
- `git config --global user.email <github associated email>`
- GitHub introduced Personal Access Token (PAT) as an alternative to
using passwords for authentication to GitHub when using the GitHub
API or the command line. Check this documentation for usage.
- To avoid entering token everytime, cache your git credentials or use github cli command `gh login auth`
- Download github cli : https://cli.github.com/
- Git credential storage: https://git-scm.com/book/en/v2/Git-Tools-Credential-Storage

> Make sure to learn about `branches`,  `merge/pull requests`, `diff`

- It will save you a lot of headache when collaborating on a project and avoid conflicts
- It's not as confusing as it might initially seem
- Looking up documentation is never a bad idea: https://git-scm.com/doc

### Python Environment

- A environment manager is a tool that helps to keep dependencies required by different projects separate by creating isolated python virtual environments for them.
- If you are working locally this would be very useful, but you are recommended to use Google Colab for starting out and use this later on while working on projects or bigger assignments.
- Popular Python Environment Managers: `virtualenv`, `conda`


<img src="https://www.dataquest.io/wp-content/uploads/2022/01/python-virtual-envs1.webp" alt="central" width="550" height="350"/>

#### Virtualenv Setup
<img src="https://avatars.githubusercontent.com/u/647025" alt="central" width="150" height="150"/>

**virtualenv**: https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment

- Install package: `python3 -m pip install --user virtualenv`
- Create Environment: `python3 -m venv /path/env_name`
- Activate Environment: `source /path/env_name/bin/activate`
- Installing Package after activating an enironment uses the same command as above.
- Deactivate Environment: `deactivate`

#### Conda Setup

<img src="https://upload.wikimedia.org/wikipedia/commons/e/ea/Conda_logo.svg" alt="logo" width="150" height="150"/>

**conda**: https://conda.io/projects/conda/en/latest/user-guide/install/index.html

(It's recommended to use `miniconda` instead of `anaconda`)

- Install Miniconda: https://docs.conda.io/en/latest/miniconda.html
- Create Environment: `conda create -n env_name python`
- Activate Environment: `conda activate env_name`
- Install Packages; `conda install -n env_name package_name` or use pip on activating environment `python3 -m pip install package_name`
- Deactivate Environment: `conda deactivate`

### Jupyter Notebook

<img src="https://jupyter.org/assets/homepage/main-logo.svg" alt="logo" width="150" height="150"/>

- The Jupyter Notebook is an open source web application that you can use to create and share documents that contain live code, equations, visualizations, and text.
- Install: https://jupyter.org/install
- You could also use notebooks on [Google Colab](https://colab.research.google.com/) (elaborated below) or [Kaggle](https://www.kaggle.com/).

#### Using Markdown

<img src="https://upload.wikimedia.org/wikipedia/commons/4/48/Markdown-mark.svg" alt="logo" width="150" height="150"/>


- Markdown is a language that was created with the goal of enabling people "to write using an easy-to-read and easy-to-write plain text format, optionally convert it to structurally valid XHTML (or HTML)."

Doc: https://www.markdownguide.org/basic-syntax/

**This line written in bold.**

*~~This line in italics is striked through~~*


[Inline-style link](https://www.google.com)


Display image: ![Test Image](https://picsum.photos/200 "Random Image")


Show your code:
```python
# This is markdown syntaxing, we can't execute this inside a markdown cell
print("Hello World")
```


Horizontal Rule:

---


Tables:

Tables | In | Markdown
--- | --- | ---
*Can* | `renders` | **nicely**
Zebras | stripes | are neat

#### Using Inline Math in Markdown

Doc: https://latex-tutorial.com/tutorials/amsmath/

***Latex Equations*** : $ e^{i\pi} + 1 = 0$

***Expressions***:

$$ e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i $$


#### Using Code Block

In [None]:
# There would be a output cell after every code block
print('Hello World 🤡')

Hello World 🤡


### Google Colab

<img src="https://upload.wikimedia.org/wikipedia/commons/d/d0/Google_Colaboratory_SVG_Logo.svg" alt="logo" width="150" height="150"/>

- Colab is a free Jupyter notebook environment that runs entirely on the cloud
- it does not require a setup and the notebooks that you create can be simultaneously edited by your team members - just the way you edit documents in Google Docs.
- It also supports many popular machine learning frameworks like Pytorch and Tensorflow

#### Mounting Google Drive and using it as an extended filesystem

In [None]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#### Access files from the mounted drive

In [None]:
# Access a folder in your drive
# You can copy the images folder to your drive, so that you can access it
IMAGES_FOLDER_PATH_1 = "/content/drive/My Drive/SMAI_Tutorial_1"

# Or you can directly add the file to your Colab notebook
# But the data will disappear when opening a new session
# IMAGES_FOLDER_PATH_2 = "."
IMAGES_FOLDER_PATH_2 = "/usr/local/share/jupyter/nbextensions"

#### Access files from Current Colab Storage Session

In [None]:
# `-nc` : skip downloads that would download to existing files.
# `-N` : will download and overwrite the file only if the server has a newer version
# `-N` doesn't work properly if server isn't configured for it

## Download data
# !wget -N https://chip.unc.edu/wp-content/uploads/2020/12/you-mean-to-tellme-datascience-is-more-than-a-buzzword-49117718-e1607112463478.png -O buzzword.png
!wget -nc https://chip.unc.edu/wp-content/uploads/2020/12/you-mean-to-tellme-datascience-is-more-than-a-buzzword-49117718-e1607112463478.png -O /usr/local/share/jupyter/nbextensions/buzzword.png

File ‘/usr/local/share/jupyter/nbextensions/buzzword.png’ already there; not retrieving.


In [None]:
!pwd

/content


In [None]:
!ls -la

total 396
drwxr-xr-x 1 root root   4096 Jan 10 05:54 .
drwxr-xr-x 1 root root   4096 Jan 10 05:52 ..
-rw-r--r-- 1 root root 378953 Dec  4  2020 buzzword.png
drwxr-xr-x 4 root root   4096 Jan  5 14:33 .config
drwx------ 6 root root   4096 Jan 10 05:54 drive
drwxr-xr-x 2 root root   4096 Jan 10 05:54 Outputs
drwxr-xr-x 1 root root   4096 Jan  5 14:34 sample_data


In [None]:
## You can also use other language code blocks with using corresponding 'Language Tag'

%%html
<!-- <img src="/usr/local/share/jupyter/nbextensions/buzzword.png" alt="meme" width="200" height="200"/> -->
<img src="/nbextensions/buzzword.png" alt="meme" width="200" height="200"/>

## Intro to Python

<img src="https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg" alt="python" width="150" height="150"/>

- For those who are not so familiar with python, check this out: https://www.edureka.co/blog/data-structures-in-python/
- Python has data structures like Lists, Dictionary, Tuples and Sets. As and when is necessary, you can look up the in resources attached.

**Resources**:
- https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/pages/lecture-slides-code/
- https://www.edx.org/professional-certificate/introduction-to-python-programming
- https://cs229.stanford.edu/section/cs229_python_tutorial/cs229_python_friday.pdf

### Basic Data structures in Python

- This would only cover 2 bultin data structure `list` and `dictionary` just so that we can understand some of the things going forward in this tutorial
- But you are highly recommended to check out other bultiin data structures by using some of the links given above

#### List

- Store data in a contiguous sequential manner
- Data of different data types can exist in a python list

In [None]:
# Creation

l = [3, 2, 1, 4]

print(l)
type(l[0])

[3, 2, 1, 4]


int

In [None]:
# Inbuilt sort funciton (quick sort probably)

l.sort()
l

[1, 2, 3, 4]

In [None]:
# Creating another list l2, using interesting syntax of loop in python

l2 = [(x+10) for x in l]


# Concatenation of lists can be simply done by '+' operator

l1 = l + l2
l1

[1, 2, 3, 4, 11, 12, 13, 14]

In [None]:
# Using Different datatypes is valid

l = [1, 1.0, '1', True]

[type(x) for x in l]

[int, float, str, bool]

In [None]:
# Range function

l = list(range(10))

# Normal for loop syntax in python

for x in l:
    print(x, end=' ')

0 1 2 3 4 5 6 7 8 9 

#### Dictionary

- Stores sequence of (Key -> Values) pair mappings
- Keys are unique in a dictionary

In [None]:
# Creation

my_dict = {2: 'Java', 1: 'Python'} #dictionary with elements
print(my_dict)

{2: 'Java', 1: 'Python'}


In [None]:
print(my_dict.keys()) #get keys
print(my_dict.values()) #get values
print(my_dict.items()) #get key-value pairs

dict_keys([2, 1])
dict_values(['Java', 'Python'])
dict_items([(2, 'Java'), (1, 'Python')])


In [None]:
for key, value in my_dict.items():
    print(f'({key}, {value})', end= ' ')

(2, Java) (1, Python) 

In [None]:
# Adding / Changing (Key, Value) Pair

my_dict[3] = 'Rust'
print(my_dict)

my_dict[2] = 'Haskell'
print(my_dict)

{2: 'Java', 1: 'Python', 3: 'Rust'}
{2: 'Haskell', 1: 'Python', 3: 'Rust'}


### Function

- A function is a block of code which only runs when it is called.
- You can pass data, known as parameters, into a function.
- A function can return data as a result.

In [None]:
def fun(x = 0):
    if x <= 0:
        return x

    return x + fun(x-1)

In [None]:
fun(x = 10)

55

### Class

- Python is an object oriented programming language.

- Almost everything in Python is an object, with its properties and methods.

- A Class is like an object constructor, or a "blueprint" for creating objects.

In [None]:
# Defining Class

class Person:
    def __init__(self, _name, _age):
        self.name = _name
        self.age = _age

    def sayHi(self):
        print(f'Hello, my name is {self.name} and I am {self.age} years old!')

In [None]:
# Creating Object instance of class

p1 = Person('Bob', 25)
p1.sayHi() # Prints: Hello, my name is Bob and I am 25 years old!

Hello, my name is Bob and I am 25 years old!


### File Handling

- It is strongly advised to use a context manager.
- As an advantage, it is made sure the file is always closed, no matter what.
- We use `os` module of python to interact with the OS, which includes creating directories, listing files in a directory, etc.

In [None]:
import os

In [None]:
dir_name = 'Outputs'
dir_path = os.path.join(dir_name)

if not os.path.exists(dir_path):
    os.makedirs(dir_path)

file_name = 'tut_1.txt'
file_path = os.path.join(dir_name, file_name)

#### Writing to a file

In [None]:
temp = list(range(10))

# In 'w' mode, file pointer is at the start, that means it will overwrite the existing data
# It will create a new file if it doesn't exist

with open(file_path, "w") as text_file:
    for x in temp:
        print(f"Roll No.: {x}", file=text_file)

#### Appending a file

In [None]:
# File pointer will be at the end of the file

with open(file_path, "a") as text_file:
    print(f"End of class.", file=text_file)

#### Reading from a file

can use:
- `f.readlines()` : reads and returns a list of lines without any change
- `f.splitlines()` : same as above but removes `\n` from the end of each line
- `line.rstrip()` : removes all whitespace characters from the end of each line

In [None]:
with open(file_path, "r") as text_file:
    for line in text_file:
        print(line.rstrip())

Roll No.: 0
Roll No.: 1
Roll No.: 2
Roll No.: 3
Roll No.: 4
Roll No.: 5
Roll No.: 6
Roll No.: 7
Roll No.: 8
Roll No.: 9
End of class.


## Numpy

<img src="https://numpy.org/images/logo.svg" alt="python" width="150" height="150"/>

- Install: https://numpy.org/install/
- Doc: https://numpy.org/doc/stable/

Uses:
- **Numpy**: Package for vector and matrix manipulation Broadcasting and vectorization saves time and amount of code
- **Broadcasting**: Numpy compares dimensions of operands, then infers missing/mismatched dimensions so the operation is still valid. Be careful with DIMENSIONS when performing numpy array operations.
- **Vectorization**:  Implementing array operations using highly optimized modules instead of using loops in order to save time
- All these features result in ***shorter* code** and ***faster* execution time** as compared to using bultin datastructures.

In [None]:
# Installing Library via Notebook

!python3 -m pip install numpy
import numpy as np

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### > Numpy Tutorial Notebook

**Link**: https://www.kaggle.com/flugeltomar/smai-tutorial-1-numpy

**Link**: https://colab.research.google.com/drive/16K8Yq2RaqroTYFDzSuRCO4tQry3CmfPx?usp=sharing

## Pandas

<img src="https://upload.wikimedia.org/wikipedia/commons/e/ed/Pandas_logo.svg" alt="python" width="150" height="150"/>

- Install: https://pandas.pydata.org/getting_started.html
- Doc: https://pandas.pydata.org/docs/

Uses:
- Pandas is a Python library for data analysis, but we will mainly use the library for processing `csv` / `comma separated values` files and `json` files and convert them into useable numpy arrays.
- Pandas has two new types of *objects for storing data* that make analytical tasks easier and eliminate the need to switch tools: **Series**, which have a list-like structure, and **DataFrames**, which have a tabular structure.

In [None]:
# Installing Library via Notebook

!python3 -m pip install pandas
import pandas as pd

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### > Pandas Tutorial Notebook


**Link**: https://www.kaggle.com/code/flugeltomar/smai-tutorial-1




**Link**: https://colab.research.google.com/drive/1tM4hJe3LZlVoo63la1sm0ng0Wr8Z5ps1?usp=sharing

## Matplotlib

<img src="https://upload.wikimedia.org/wikipedia/en/5/56/Matplotlib_logo.svg" alt="python" width="150" height="150"/>

- Install: https://matplotlib.org/stable/users/getting_started/
- Doc: https://matplotlib.org/stable/index.html

Uses:
- Most popular and easy to use data visualization library
- It is designed to work with Numpy
- Large number of options and configurations to display various types of plots

In [None]:
# Installing Library via Notebook

!python3 -m pip install matplotlib
import matplotlib.pyplot as plt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### > Matplotlib Tutorial Notebooks

- **Link_1**: https://colab.research.google.com/drive/16MUiGXlAjmk_fH3pVFTfgae-uV38RDJQ?usp=sharing
- **Link_2**: https://colab.research.google.com/drive/16eZrlGjZdTf50QdKsQKmxN3fMN7WOwxC?usp=sharing

## Scikit-learn

<img src="https://upload.wikimedia.org/wikipedia/commons/0/05/Scikit_learn_logo_small.svg" alt="python" width="150" height="150"/>

- Install: https://scikit-learn.org/stable/install.html
- Doc: https://scikit-learn.org/stable/user_guide.html

Uses:
- It is a simple and robust library for classical machine learning in Python.
- Efficient implementation of machine learning and statistical models such as classification, regression,clustering and dimensionality reduction.
- Build upon Numpy, Scipy and Matplotlib, thus providing high-performance computation.

In [None]:
# Installing Library via Notebook

!python3 -m pip install scikit-learn
import sklearn as sk

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### > Scikit-learn Tutorial Notebook

**Link**: https://colab.research.google.com/drive/1DjXPHJV-ji5rQ_hJGBJaQR2MOC_qFsEz?usp=sharing

## SciPy

<img src="https://scipy.org/images/logo.svg" alt="python" width="150" height="150"/>

- Install: https://scikit-learn.org/stable/install.html

- Doc: https://docs.scipy.org/doc/scipy/

Uses:
- SciPy provides algorithms for optimization, integration, interpolation, eigenvalue problems, algebraic equations, differential equations, statistics and many other classes of problems.
- Extends NumPy providing additional tools for array computing and provides specialized data structures, such as sparse matrices and k-dimensional trees.

In [None]:
# Installing Library via Notebook

!python3 -m pip install scipy
import scipy as sp

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### > SciPy Tutorial Notebook

- Everything in this course can be done using the libraries mentioned before SciPy (except Deep Learning Tasks which require ML Frameworks)
- And thus we believe that its not required to introduce this library as it would be a waste of time for everyone.
- Popular ML Frameworks required for some course work would be introduced in future tutorials.