# A Brief Introduction to Python (for ST 563)

This notebook provides a **basic introduction to Python** for statistical learning, parallel to the structure of the original R handout from Dr. Arnab Maity. We use core Python along with **NumPy** (arrays, vector/matrix math) and **pandas** (data frames).


## Contents
1. [Python, Jupyter, and the Scientific Stack](#python-jupyter)
2. [Basic Data Types in Python](#basic-types)
3. [Variable Names and Assignment](#assignment)
4. [Basic Operations](#basic-ops)
5. [Relational Operations](#rel-ops)
6. [Lists, Tuples, and Dictionaries](#lists-tuples-dicts)
7. [NumPy Arrays ("Vectors")](#numpy-vectors)
8. [Vector & Matrix Operations with NumPy](#vector-matrix)
9. [pandas DataFrames](#dataframes)
10. [Control Flow](#control-flow)
11. [Functions](#functions)


## 1. Python, Jupyter, and Packages


- **Python** is an open-source programming language widely used in data science.
- **Jupyter Notebook** lets you mix code, results, and narrative text (like this file).
- We'll primarily use:
  - `numpy` for efficient numerical arrays and linear algebra;
  - `pandas` for tabular data (data frames);
  - `math` and `statistics` from the standard library for basic math and stats helpers.

> Tip: In most environments, install packages via `pip install numpy pandas` in a terminal.


In [None]:
import numpy as np
import pandas as pd
import math, statistics
np.__version__, pd.__version__


### Basic Use of Python

- Jupyter notebooks contain **code cells** (for execution) and **markdown cells** (for formatted text).
- Run code with `Shift+Enter`.
- Python assignment `=` binds a name to an object in memory (different from R where assignment copies).
- If you assign two names to the same list, both point to the same object unless you `.copy()` it.
- Example:
```python
x = [1, 2, 3]
y = x
x[0] = 100
print(x, y)  # both change!
```


### Asking for Help <a id='help'></a>


- Use built-ins like `help(obj)` and `dir(obj)`. In notebooks, you can also try `obj?` (in many IPython/Jupyter setups).


In [None]:
help(len)      # show docstring for built-in len
dir(math)      # attributes in the math module


## 2. Basic Data Types in Python <a id='basic-types'></a>


### Numeric Types and Booleans

- Python automatically distinguishes `int` and `float`:
  - `10` is an `int`, `10.0` is a `float`.
- Type casting: `int(10.9)` returns 10, `float(3)` returns 3.0.
- Augmented assignment: `x += 1` is shorthand for `x = x + 1`.
- Booleans are subtype of integers: `True == 1`, `False == 0`. Useful for summing conditions.
```python
sum([True, False, True])  # returns 2
```
- Floating-point precision may cause results like `0.1+0.2=0.30000000000000004`.


Common built-in scalar types:
- `int` (e.g., `1`, `-3`), `float` (e.g., `2.0`, `-3.9`), `bool` (`True`, `False`), `str` (`"Arnab"`), and `complex` (e.g., `1+2j`).  
(There is no separate "raw" type as in R; Python has `bytes` for raw byte sequences.)


In [None]:
number = 2              # int
flt = 2.5               # float
name = "Arnab"          # str
truth = True            # bool
z = 1 + 2j              # complex

print(type(number), type(flt), type(name), type(truth), type(z))


## 3. Variable Names and Assignment <a id='assignment'></a>


Use `=` to assign values to names. Names should start with a letter or underscore and may contain letters, digits, underscores.


In [None]:
x = 2
y = 3
print("x =", x, " y =", y)


## 4. Basic Operations <a id='basic-ops'></a>


Arithmetic with numbers:
- `+`, `-`, `*`, `/` (true division), `//` (integer division), `%` (modulus), `**` (exponent).  
Math functions: `math.exp`, `math.log`, `math.sin`, etc.


In [None]:
x, y = 2, 3
print(x + y, x**2, math.cos(math.pi * x / 2))
print("mod:", 7 % 3, "int-div:", 7 // 3)


## 5. Relational Operations <a id='rel-ops'></a>


- `<`, `<=`, `>`, `>=`, `==`, `!=` return Booleans (`True`/`False`).  
Also logical operators: `and`, `or`, `not`.


In [None]:
x, y = 2, 3
print(x == y, x != y, x <= y)
print("strings equal?", "Arnab" == "Maity")


### Lists and Strings

- Strings and lists are both sequence types: support indexing, slicing, iteration.
- Create lists by literals `[ ]`, with `list()`, by appending, or with comprehensions:
```python
[x**2 for x in range(5) if x % 2 == 0]
```
- Negative indexing: `lst[-1]` returns the last element.
- Strings behave like lists of characters: `"hello"[1] -> 'e'`.


## 6. Lists, Tuples, and Dictionaries <a id='lists-tuples-dicts'></a>


- **List:** ordered, mutable sequence: `[1, 2, 3]`  
- **Tuple:** ordered, immutable sequence: `(1, 2, 3)`  
- **Dict:** key–value mapping: `{'name': 'Arnab', 'is_student': False}`  
Index from 0 (not 1 as in R). Slicing uses `:` with half-open intervals.


### Dictionaries

- Dictionary = key–value pairs. Keys must be immutable (str, int, tuple). Values can be any object.
- Create with `{}` or `dict()`. Also from zipped lists:
```python
keys = ["a","b","c"]; vals = [1,2,3]
dict(zip(keys, vals))
```
- Methods: `.get()`, `.keys()`, `.values()`, `.items()`, `.update()`, `.pop()`.
- Merge dictionaries with `{**dict1, **dict2}`.
- Dictionary comprehensions: `{k: k**2 for k in range(5)}`.


In [None]:
lst = [1, 2, 3]
tpl = (1, 2, 3)
d = {'number': 23, 'name': 'Arnab', 'is_student': False}

print(lst[0], tpl[1], d['name'])
lst[1] = 31              # mutate list
d['new_data'] = -23      # add new key
lst, d


## 7. NumPy Arrays ("Vectors") <a id='numpy-vectors'></a>


### NumPy Arrays

- NumPy arrays are faster than lists, require homogeneous data.
- Attributes: `.shape` (dimensions), `.dtype` (data type), `.ndim` (number of dimensions).
- Creation:
  - `np.zeros((m,n))`, `np.ones((m,n))`, `np.full((m,n), value)`, `np.eye(n)` (identity), `np.random.random((m,n))`.
- Reshape with `.reshape()`. Note: returns a **view** (not a copy) unless `.copy()` is used.
- Vectorized operations are elementwise by default.
- Linear algebra: `@` or `np.matmul` for matrix multiply, `np.linalg.inv`, `np.linalg.det`, `np.linalg.eig`.


NumPy arrays are homogeneous, efficient n-dimensional arrays—ideal for vectorized math.  
They behave like R atomic vectors for many operations.


In [None]:
a = np.array([5.1, 4.9, 4.7, 4.6, 5.0])
b = np.array([3.5, 3.0, 3.2, 3.1, 3.6])
a + b, a - b, 2 * a


In [None]:
# elementwise operations
vec_one = np.array([1, 2, 3])
np.log(vec_one)  # natural log elementwise


In [None]:
# indexing (0-based) and assignment
print(vec_one[0])
vec_one[1] = 31
vec_one


> Note: NumPy arrays do not have per-element names like R named vectors.  
For name→value mappings use `dict` or `pandas.Series`.


In [None]:
import pandas as pd
vec_named = pd.Series([91, 85, 99], index=['math', 'english', 'history'])
vec_named['math'], vec_named.iloc[0]


## 8. Vector & Matrix Operations with NumPy <a id='vector-matrix'></a>


- Column vectors are 2D arrays with shape `(p, 1)`; row vectors have shape `(1, p)`.
- **Inner (dot) product**: `a.T @ b` or `np.dot(a, b)` when shapes are compatible.
- **Matrix multiplication**: `A @ B` (uses linear algebra rules, *not* elementwise).
- **Transpose**: `A.T`.
- **Elementwise** multiply: `A * B` (same shapes) or broadcasting.


In [None]:
a = np.array([[1], [0], [2], [5]])  # column vector (4x1)
b = np.array([[2], [3], [1], [6]])  # column vector (4x1)
inner = (a.T @ b).item()            # scalar
inner


In [None]:
# elementwise product (Hadamard)
(a.flatten() * b.flatten())


In [None]:
# matrices
A = np.c_[ [0.71, 0.61, 0.72, 0.83, 0.92],
           [0.63, 0.69, 0.77, 0.80, 1.00] ]
B = np.array([[1,2,3,4,5],
              [6,7,8,9,10]])

A_plus_B = A + B.T      # shapes (5,2) and (5,2)
A_minus_B = A - B.T
C = A @ B               # (5,2) @ (2,5) -> (5,5)

A.shape, B.shape, C.shape, C[1,2]


In [None]:
# Check a specific entry like in the handout (2nd row, 3rd col of C)
# C[1,2] should match 0.61*5 + 0.69*6
calc = 0.61*5 + 0.69*6
C[1,2], calc


In [None]:
# Other linear algebra: inverse, determinant, eigen, SVD
M = np.array([[1., 2., 3.],
              [1., 5., 8.]])
MT = M.T
# For square matrices:
S = np.array([[2., 1.],
              [1., 3.]])
Sinv = np.linalg.inv(S)
detS = np.linalg.det(S)
eigvals, eigvecs = np.linalg.eig(S)
U, s, VT = np.linalg.svd(S)
detS, eigvals, s


## 9. pandas DataFrames <a id='dataframes'></a>


A DataFrame stores columns that can have different types (numeric, string, bool), analogous to R data frames.


In [None]:
df = pd.DataFrame({
    'name': ['Arnab', 'Ana'],
    'grade': [80, 93],
    'is_graduate': [False, True]
})
df


In [None]:
df.shape, df.columns.tolist(), df.index.tolist()


In [None]:
# selection by label / position
df.loc[:, ['name', 'grade']]    # columns by name


In [None]:
# row/column by integer position
df.iloc[[0,1], [0,1]]


## 10. Control Flow <a id='control-flow'></a>


### If statements and Loops

- `if` / `elif` / `else` check boolean conditions.
- `for` loops iterate over any iterable (lists, strings, ranges).
```python
for i in range(5):
    print(i)
```
- Use `enumerate(iterable)` to loop with index and value.
- Use `while condition:` for repeated execution until condition is false.
- `break` exits loop; `continue` skips current iteration.


Use `if` / `elif` / `else`. Conditions must be scalar booleans.


In [None]:
numeric_grade = 85
letter_grade = None

if numeric_grade > 70:
    letter_grade = "S"
else:
    letter_grade = "U"

letter_grade


## 11. Functions <a id='functions'></a>


### User-Defined Functions

- Functions are defined with `def`. Use a docstring (``) to describe purpose and args.
- Default arguments: `def f(x, power=2): return x**power`. Avoid mutable defaults!
- Functions create local scope unless declared `global`.
- Arguments can be passed by name: `f(x=3, power=5)`.
- Example of trimmed mean function with optional parameter `p` to remove outliers.

### Setting a Seed
- For reproducibility in random numbers: `np.random.seed(123)` or new RNG with `np.random.default_rng(123)`.


Define functions with `def`. Return values with `return`. Operations broadcast over NumPy arrays.


In [None]:
def my_fun(x):
    x = np.asarray(x)
    return np.sin(1 / (x**2))

y = 2
my_fun(y), my_fun([1,2,3])


In [None]:
# Practice: given matrix X and vector y, compute (X^T X)^{-1} X^T y
def ols_closed_form(X, y):
    X = np.asarray(X, dtype=float)
    y = np.asarray(y, dtype=float).reshape(-1, 1)
    beta = np.linalg.inv(X.T @ X) @ X.T @ y
    return beta

X = np.c_[np.ones(5), np.arange(5)]
y = np.array([1, 2, 1.5, 3, 2.7])
ols_closed_form(X, y)
