<p style='text-align: right;'> Birkan Emrem </p>
<p style='text-align: right;'> 16.10.2025 </p>
<p style='text-align: right;'> AI Training Series: Python Refresher: Session IV </p>

## Introduction to Scientific Python Ecosystem
#### Why Scientific Python?

In [None]:
# core scientific packages
import numpy as np                    # Numerical computing
import pandas as pd                   # Data analysis
import matplotlib.pyplot as plt       # Plotting
import scipy.stats as stats           # Stats, math
import sympy as sp                    # Symbolic math
import sklearn                        # Machine-Learning algorithms
import torch                          # Deep learning with PyTorch

### Key Points
- Built on efficient, low-level libraries
- Used in data science, engineering and ML

#### Key libraries overview

In [None]:
# NumPy array creation
arr = np.array([[1, 2], [3, 4]])
arr.shape

In [None]:
# Pandas DataFrame
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df

In [None]:
# simple plot
plt.plot([1, 2, 3], [4, 5, 6])

### Key Points
- `NumPy` - array math & broadcasting
- `pandas` - tables & time series
- `Matplotlib` & `Seaborn` - visualization
- `SciPy` - Stats&Math
- `scikit-learn` & `PyTorch` - ML&DL

<hr style="border:1.3px solid gray">

## NumPy Basics: Arrays & Operations
#### Creating Arrays & Basic Properties

In [None]:
# code block

In [None]:
print(a.shape)
print(a.dtype)
print(a.ndim)

In [None]:
# helpers
z = np.zeros((2, 3))
o = np.ones((2, 3))
r = np.random.rand(2, 3)

### Key Points
- arrays are typed, fixed-size and multi-dimensional
- `shape`, `dtype`, `ndim` give structure info

#### Array Math & Broadcasting

In [None]:
x = np.array([1, 2, 3])
y = np.array([10, 20, 30])

In [None]:
# Element-wise math
print(x+y)
print(x*2)

In [None]:
# Broadcasting
m = np.array([[1], [2], [3]])
print(m+x)

In [None]:
# Universal functions
print(np.mean(x))
print(np.std(x))

### Key Points
- Operations are element-wise by default
- Broadcasting streches shapes to match
- Avoid loops by using vectorized math

<hr style="border:1.3px solid gray">

## NumPy Indexing, Reshaping & Aggreation
#### Creating Arrays & Basic Properties

In [None]:
a = np.array([[5, 6, 7], [13, 14, 15]])

In [None]:
# code block

In [None]:
# Boolean filtering
mask = a > 10
print(a[mask])

In [None]:
# modify with mask
a[a < 10] = 0
a

### Key Points
- Slices are views
- Use boolean masks for filtering

#### Reshaping & Aggregation

In [None]:
b = np.arange(6)
c = b.reshape(2, 3)
c

In [None]:
# code block

In [None]:
# Flatten
c.ravel()

In [None]:
# code block

### Key Points
- Use `reshape`, `ravel`, and `transpose`
- Aggregate over axes

<hr style="border:1.3px solid gray">

## Pandas: Series & DataFrame
#### Series: 1D Labeled Data

In [None]:
# code block

In [None]:
# Access by label or position
print(s["b"])
print(s.iloc[0])

In [None]:
# code block

### Key Points
- Series = NumPy array + index
- Great for time series, labeled data
- Supports vectorized operations

#### DataFrame: 2D Tabular Data

In [None]:
# create DataFrame
df = pd.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [27, 28],
    "score": [86, 93]
})

df

In [None]:
# code block

In [None]:
# code block

### Key Points
- Like a spreadsheet in Python

<hr style="border:1.3px solid gray">

## Pandas: Filtering, Grouping, and Pivoting
#### Filtering

In [None]:
import pandas as pd

df = pd.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [27, 32],
    "score": [86, 93]
})

In [None]:
df

In [None]:
# code block

In [None]:
# Multiple conditions
df[(df["age"] <= 29) & (df["score"] < 90)]

### Key Points
- Use conditions to filter rows
- Combine filters with `&` and `|`

#### GroupBy & Pivot Tables

In [None]:
data = pd.DataFrame({
    "dept": ["HR", "HR", "IT", "IT"],
    "salary": [5e3, 5.6e3, 7.3e3, 7.7e3]
})
data

In [None]:
# Group & aggregate
data.groupby("dept").agg((["mean", "max"]))

In [None]:
# Pivot table
pd.pivot_table(data, values="salary", index="dept", aggfunc="mean")

<hr style="border:1.3px solid gray">

## Matplotlib: Line, Bar and Scatter Plots
#### Line & Bar Plots

In [None]:
# code block

In [None]:
# Bar Plot
plt.bar(["A", "B", "C"], [5, 7, 3])

# Titles and labels
plt.title("Sample Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

### Scatter Plots & Quick Customization

In [None]:
import numpy as np

# Random scatter data
x = np.random.rand(50)
y = np.random.rand(50)
szs = np.random.randint(20, 200, 50)

plt.scatter(x,y, s=szs, c="red", alpha=0.5)
plt.title("Scatter Plot")
plt.grid(True)
plt.xlabel("Feature X")
plt.ylabel("Feature Y")
plt.show()

### Key Points:
- Control color, size and markers
- Use alpha, grid and style for readability

<hr style="border:1.3px solid gray">

## Matplotlib: Subplots, Styling, Annotations
#### Creating Subplots

In [None]:
import matplotlib.pyplot as plt

x = [1, 2, 3]
y1 = [1, 4, 9]
y2 = [1, 2, 3]

In [None]:
# code block

### Key Points:
- Use `plt.subplots()` for multiple plots

#### Styling & Annotations

In [None]:
plt.figure(figsize=(6,4))
plt.plot(x,y1, linestyle="-",linewidth=0.6)

# Annotate point
plt.annotate("Peak", xy=(3,9),
   xytext = (2.5, 10),
   arrowprops = dict(facecolor="black"))

plt.title("Styled Plot with Annotation")
plt.xlabel("X")
plt.ylabel("Y")
plt.grid(True)
plt.show()

### Key Points:
- Change colors, markers, linestyles
- Add text annotations
- Use figsize for presentation control

<hr style="border:1.3px solid gray">

## Seaborn: Statistical Visualization
#### Why Use Seaborn?

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Sample Data
tips = sns.load_dataset("tips")
tips

In [None]:
# code block

#### Pair, Box & Heatmaps

In [None]:
# Pairplot for numeric comparisons
sns.pairplot(tips, hue="sex")
plt.show()

In [None]:
# codeblock

In [None]:
# Heatmap of correlation
corr = tips.corr(numeric_only=True)
sns.heatmap(corr, annot=True, cmap="magma")
plt.title("Correlation Matrix")
plt.show()

### Key Points:
- `pairplot()` shows variable relationships
- `boxplot()` shows distributions & outliers
- `heatmap()` visualizes correlation matrices

<hr style="border:1.3px solid gray">

## SciPy: Stats & Probability
#### Distributions and Sampling

In [None]:
from scipy import stats
import numpy as np

In [None]:
# code block

In [None]:
# Random samples
smp = stats.norm.rvs(loc=0,scale=1,size=5)
print("Samples:", smp)

In [None]:
# Mean and variance
print(stats.norm.stats(moments="mv"))

### Key Points:
- `scipy.stats` includes common distributions

#### Statistical Tests

In [None]:
# t-test: compare two groups
group1 = [14, 15, 16, 15]
group2 = [13, 14, 14, 13]

t_stat,p_val = stats.ttest_ind(group1, group2)
print(f"t={t_stat:.2f}, p={p_val:.3f}")

In [None]:
# chi-square test
observed = [10, 20, 30]
expected = [15, 15, 30]
chi2,p = stats.chisquare(f_obs=observed, 
           f_exp=expected)
print(f"chi^2={chi2:.2f}, p={p:.3f}")

### Key Points:
- Use t-tests for means
- Use chi-square for independence

<hr style="border:1.3px solid gray">

## SciPy: Optimization & Interpolation
#### Function Minimization

In [None]:
from scipy.optimize import minimize
import numpy as np

In [None]:
# Function to minimize
def f(x):
    return (x-3)**2 + 10

In [None]:
# code block

### Key Points:
- Use `scipy.optimize.minimize()` for scalar functions
- Works for custom defined functions

#### 1D Interpolation

In [None]:
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Known data
x = np.array([0, 1, 2, 3])
y = np.array([0, 2, 1, 3])

In [None]:
# Interpolate
f = interp1d(x, y, kind="cubic")
x_new = np.linspace(0, 3, 100)
y_new = f(x_new)

In [None]:
plt.plot(x, y, "o", label="data")
plt.plot(x_new, y_new, label="cubic")
plt.legend()
plt.show()

<hr style="border:1.3px solid gray">

## SymPy: Symbolic Math
#### Expressions

In [None]:
import sympy as sp

In [None]:
# Define symbols
x, y = sp.symbols("x y")

# Build expression
expr = (x + y)**2

In [None]:
# Expand and simplify

### Key Points:
- Use symbols for algebraic expressions
- Perform expansion, factoring and simplification

#### Calculus & Equation Solving

In [None]:
# Derivate and integral
f = x**3 + 2*x
df = sp.diff(f, x)
F = sp.integrate(f, x)

In [None]:
# Solve equations
sol = sp.solve(x**2 - 4, x)

# Limits
lim = sp.limit(sp.sin(x)/x, x, 0)

In [None]:
print("Derivative:", df)
print("Integral:", F)
print("Roots:", sol)
print("Limit:", lim)

<hr style="border:1.3px solid gray">

## `scikit-learn`: ML Workflow & Classification
#### Build your first ML model

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import  DecisionTreeClassifier

In [None]:
# Load data
X, y = load_iris(return_X_y=True)

In [None]:
# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
# Set model and train
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

In [None]:
# Accuracy
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)

#### Pipelines and Preprocessing

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [None]:
# pipeline
pipe = Pipeline([
 ("scaler", StandardScaler()),
 ("clf", LogisticRegression())
])

In [None]:
pipe.fit(X_train, y_train)
accuracy = pipe.score(X_test, y_test)
print("Pipeline score:", accuracy)

### Key Points:
- Split > Train > Predict > Evaluate
- Use structured data (NumPy, pandas)

<hr style="border:1.3px solid gray">

## PyTorch and Tensors
#### Working with Tensors

In [None]:
import torch

In [None]:
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
c = a + b

In [None]:
print("Sum:", c)
print("Shape:", c.shape)
print("Data type:", c.dtype)
print("Device:", c.device)

### Key Points:
- Tensors = NumPy-like arrays
- Basis of all PyTorch operations
- Good starting point for deep learning

#### Tensors on GPU

In [None]:
x = torch.randn(2, 3)
print("CPU Tensor:", x)

In [None]:
if torch.cuda.is_available():
    x_gpu = x.to("cuda")
    print("Moved to GPU")
    print("New device:", x_gpu.device)
else:
    print("CUDA not available")

### Key Points:
- `.to(“cuda“)` moves tensors to GPU
- Same syntax for CPU vs. GPU tensors
- Critical for high-performance training

<hr style="border:1.3px solid gray">