<a href="https://colab.research.google.com/github/aryamanpathak2022/Statistics-DSAI-2026/blob/main/numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Python for Data Science
**Duration:** 1.5 Hours

**Emails:**
- Aryaman.Pathak@iiitb.ac.in
- Shreyas.Biradar@iiitb.ac.in

**Whatsapp Group:**

<img width="500" src="https://i.postimg.cc/jScf2Yv6/Whats-App-Image-2026-01-14-at-15-48-43.jpg" />

## Agenda
1.  **NumPy:** Efficient numerical arrays and mathematical operations.
2.  **Pandas:** Data manipulation, cleaning, and analysis.
3.  **Matplotlib:** Basic data visualization.

---
### Prerequisite Check
Open Google Colab - https://colab.research.google.com

Ensure you have the libraries installed:
`!pip install numpy pandas matplotlib`

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Check versions
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

NumPy version: 2.0.2
Pandas version: 2.2.2


## 1. NumPy: The Foundation
NumPy is faster and more memory-efficient than standard Python lists. It allows us to perform mathematical operations on entire arrays at once (vectorization).

In [3]:
#  Creating Arrays
# A standard Python list
py_list = [1, 2, 3, 4, 5]

# Converting to a NumPy array
arr = np.array(py_list)

print("Python List:", py_list)
print("NumPy Array:", arr)
print("Shape of array:", arr.shape) # (5,) means a 1D array with 5 elements

# Creating a 2D array (Matrix)
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Matrix:\n", matrix)
print("Shape:", matrix.shape) # (2, 3) -> 2 rows, 3 columns

Python List: [1, 2, 3, 4, 5]
NumPy Array: [1 2 3 4 5]
Shape of array: (5,)

2D Matrix:
 [[1 2 3]
 [4 5 6]]
Shape: (2, 3)


In [4]:
# Vectorized Operations (The Power of NumPy)
# Try doing this with a normal list, you'd need a for-loop!
print("Original:", arr)
print("Add 10:", arr + 10)
print("Squared:", arr ** 2)

# 3. Basic Statistics
data = np.random.randint(10, 100, size=20) # Generate 20 random numbers between 10 and 100
print("\nRandom Data:", data)

print(f"Mean: {np.mean(data)}")       # Average
print(f"Median: {np.median(data)}")   # Middle value
print(f"Std Dev: {np.std(data):.2f}") # Standard Deviation
print(f"Max Value: {np.max(data)}")
print(f"Argmax (Index of max): {np.argmax(data)}")

Original: [1 2 3 4 5]
Add 10: [11 12 13 14 15]
Squared: [ 1  4  9 16 25]

Random Data: [17 54 13 62 62 55 16 49 37 71 32 88 20 62 42 92 16 42 97 31]
Mean: 47.9
Median: 45.5
Std Dev: 25.39
Max Value: 97
Argmax (Index of max): 18


Indexing & Slicing

In [5]:
arr = np.array([10, 20, 30, 40, 50])

print(arr[0])      # First element
print(arr[-1])     # Last element
print(arr[1:4])    # Slicing





10
50
[20 30 40]


In [6]:
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

print(matrix[0, 1])   # Row 0, Column 1
print(matrix[:, 1])   # All rows, column 1

2
[2 5]


# Shape vs Size vs ndim

	•	shape → structure
	•	size → total elements
	•	ndim → number of dimensions

In [7]:
arr = np.arange(12).reshape(3, 4)

print("Shape:", arr.shape)
print("Size:", arr.size)
print("Dimensions:", arr.ndim)

Shape: (3, 4)
Size: 12
Dimensions: 2


# Data Type

In [8]:
arr = np.array([1, 2, 3])
print(arr.dtype)

arr_float = np.array([1, 2, 3], dtype=float)
print(arr_float.dtype)

int64
float64


# Extra Functions

In [9]:


# Creates a 2×3 array filled with zeros
zeros_arr = np.zeros((2, 3))
print("Zeros Array:\n", zeros_arr)

# Creates a 3×3 array filled with ones
ones_arr = np.ones((3, 3))
print("\nOnes Array:\n", ones_arr)

# Creates a 3×3 identity matrix
identity_arr = np.eye(3)
print("\nIdentity Matrix:\n", identity_arr)

# Generates 5 evenly spaced numbers between 0 and 1 (inclusive)
linspace_arr = np.linspace(0, 1, 5)
print("\nLinspace Array:\n", linspace_arr)

Zeros Array:
 [[0. 0. 0.]
 [0. 0. 0.]]

Ones Array:
 [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

Identity Matrix:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Linspace Array:
 [0.   0.25 0.5  0.75 1.  ]


###  Mini-Exercise 1
1. Create a NumPy array of numbers from 0 to 9 using `np.arange(10)`.
2. Reshape it into a $2 \times 5$ matrix using `.reshape()`.
3. Calculate the sum of all elements.

In [10]:
# Solution
ex_arr = np.arange(10)
reshaped_arr = ex_arr.reshape(2, 5)
total_sum = np.sum(reshaped_arr)

print("Reshaped:\n", reshaped_arr)
print("Sum:", total_sum)

Reshaped:
 [[0 1 2 3 4]
 [5 6 7 8 9]]
Sum: 45


###  NumPy Resources
* **[Official NumPy Quickstart](https://numpy.org/doc/stable/user/quickstart.html):** The best place to look up documentation.
* **[Visual Intro to NumPy](https://jalammar.github.io/visual-numpy/):** Excellent visual guide to understanding arrays and dimensions.
* **[NumPy Cheat Sheet](https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf):** A printable PDF reference for syntax.

###  How to Explore NumPy (Self-Help)
Don't memorize every function! Use these tools to find what you need:


1.  **`np.info(function)`**: Detailed documentation for a specific function.
2.  **`?` and `??`**: The Jupyter shortcuts (quickest way)
 3.  **`dir(np)`**: Lists **everything** inside NumPy (variables, functions, classes).

In [15]:
np.info(np.array)

array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
      like=None)

Create an array.

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an object whose
    ``__array__`` method returns an array, or any (nested) sequence.
    If object is a scalar, a 0-dimensional array containing object is
    returned.
dtype : data-type, optional
    The desired data-type for the array. If not given, NumPy will try to use
    a default ``dtype`` that can represent the values (by applying promotion
    rules when necessary.)
copy : bool, optional
    If ``True`` (default), then the array data is copied. If ``None``,
    a copy will only be made if ``__array__`` returns a copy, if obj is
    a nested sequence, or if a copy is needed to satisfy any of the other
    requirements (``dtype``, ``order``, etc.). Note that any copy of
    the data is shallow, i.e., for arrays with object dtype, the new
    array will point to the same object

In [16]:
dir(np)

['False_',
 'ScalarType',
 'True_',
 '_CopyMode',
 '_NoValue',
 '__NUMPY_SETUP__',
 '__all__',
 '__array_api_version__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__dir__',
 '__doc__',
 '__expired_attributes__',
 '__file__',
 '__former_attrs__',
 '__future_scalars__',
 '__getattr__',
 '__loader__',
 '__name__',
 '__numpy_submodules__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_core',
 '_distributor_init',
 '_expired_attrs_2_0',
 '_get_promotion_state',
 '_globals',
 '_int_extended_msg',
 '_mat',
 '_msg',
 '_pyinstaller_hooks_dir',
 '_pytesttester',
 '_set_promotion_state',
 '_specific_msg',
 '_type_info',
 '_typing',
 '_utils',
 'abs',
 'absolute',
 'acos',
 'acosh',
 'add',
 'all',
 'allclose',
 'amax',
 'amin',
 'angle',
 'any',
 'append',
 'apply_along_axis',
 'apply_over_axes',
 'arange',
 'arccos',
 'arccosh',
 'arcsin',
 'arcsinh',
 'arctan',
 'arctan2',
 'arctanh',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'argwhere',
 'around',
 'array',
 'ar