# Numpy

__Numpy__ is a _Python package_ that stands for ___Numerical Python___. It is a _library_ for the _Python_ programming language, adding support for large, _multi-dimensional arrays_ and _matrices_, along with a _large collection_ of high-level mathematical functions to operate on these _arrays_.

__Numpy__ provides a powerful _N-dimensional_ array object, useful for performing mathematical and logical operations on arrays. It also has _functions_ for working in domain of _linear algebra_, _Fourier transform_, and _matrices_.

By importing `numpy as np`, we can access all the functions and methods provided by the _numpy_ package using the `np` alias.

In [2]:
import numpy as np

# create an array
array1 = np.array([1, 2, 3, 4, 5])
print("Part1:", array1, type(array1))

# create an array with range
array = np.arange(10, 51, 2)
print("Part 2", array)

# create an array with linspace
array = np.linspace(0,1,11)
print("Part 3:", array)

# create a matrix of zeros
zeros = np.zeros((3,4))
print("Part 4:\n", zeros)
ones = np.ones((1,10))
print("Part 4.1:\n", ones)

# get array data type
print("Step 5: ")
print(array1.dtype)
print(array.dtype)

Part1: [1 2 3 4 5] <class 'numpy.ndarray'>
Part 2 [10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50]
Part 3: [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
Part 4:
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
Part 4.1:
 [[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
Step 5: 
int64
float64


In [3]:
# get shape of array
print("Step 1:", array.shape)

# get shape of matrix
print("Step 2:", zeros.shape)

# get number of dimensions of array
print("Step 3:", array1.ndim)

# get number of dimensions of matrix
print("Step 4:", ones.ndim)

Step 1: (11,)
Step 2: (3, 4)
Step 3: 1
Step 4: 2


In [4]:
# get number of elements in array
length = array.size
print("Step 1:", length)

# get element by index
element_array = array[4]
element_matrix = zeros[1, 2]
print("Step 2:", element_array, element_matrix)

Step 1: 11
Step 2: 0.4 0.0


In [5]:
# descriptive statistics
 
# get sum of array
sum = array.sum()
print("Sum:", sum)

# get mean of array
mean = array.mean()
print("Mean:", mean)

# get standard deviation of array
std = array.std()
print("Standard Deviation:", std)

# get variance of array
var = array.var()
print("Variance:", var)

# get min of array
min = array.min()
print("Minimum:", min)

# get max of array
max = array.max()
print("Maximum:", max)

Sum: 5.500000000000001
Mean: 0.5000000000000001
Standard Deviation: 0.31622776601683794
Variance: 0.1
Minimum: 0.0
Maximum: 1.0


In [6]:
# slice a matrix
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Matrix:\n", matrix)
slice = matrix[:2, 1:]
print("Slice:\n", slice)

# is the slice by reference or by value?
slice[0, 0] = 100
print("Matrix 2:\n", matrix)

Matrix:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Slice:
 [[2 3]
 [5 6]]
Matrix 2:
 [[  1 100   3]
 [  4   5   6]
 [  7   8   9]]


## Linear Algebra with Numpy

_Numpy_ is a powerful _Python_ package that provides support for _linear algebra_ operations. It allows us to perform various mathematical operations on _arrays_ and _matrices_ efficiently.

With __Numpy__, we can easily solve _linear algebra_ problems such as finding solutions to systems of linear equations, calculating matrix determinants, eigenvalues, eigenvectors, and much more.

# 

In [7]:
# solve a system of linear equations
equations = np.array([[2, 3], [3, 7]])
answers = np.array([5, 12])
solution = np.linalg.solve(equations, answers)
print(solution)

[-0.2  1.8]


In [8]:
# get the inverse of a matrix
inverse = np.linalg.inv(equations)
print(inverse)

[[ 1.4 -0.6]
 [-0.6  0.4]]


In [9]:
# get the determinant of a matrix
determinant = np.linalg.det(equations)
print(determinant)

4.999999999999998


In [10]:
# get the dot product of two arrays
array_1 = np.array([1, 2, 3])
array_2 = np.array([4, 5, 6])
print("Dot:", np.dot(array_1, array_2))

Dot: 32


In [11]:
# get the cross product of two arrays
# cross = [2*6-3*5 3*4-1*6 1*5-2*4]
print("Cross:",np.cross(array_1, array_2))

Cross: [-3  6 -3]


In [12]:
# get the norm of an array
# norm(array_1) = sqrt(1² + 2² + 3²)
print(np.linalg.norm(array_1))

3.7416573867739413


In [13]:
# get the eigenvalues and eigenvectors of a matrix
eigenvalues, eigenvectors = np.linalg.eig(equations)
print("Eigenvalues:", eigenvalues)
print("Eigenvector:\n", eigenvectors)

Eigenvalues: [0.59487516 8.40512484]
Eigenvector:
 [[-0.90558942 -0.4241554 ]
 [ 0.4241554  -0.90558942]]


## Vectorization

__Vectorization__ is a technique in computer programming that allows us to perform operations on _entire arrays or matrices_ instead of looping through each element individually. This approach leverages the power of optimized, low-level operations provided by libraries like __NumPy__.

By using _vectorized operations_, we can significantly improve the _performance_ of our code, as it takes advantage of parallel processing capabilities of modern _CPUs_. Instead of writing explicit loops, we can express our computations as _mathematical expressions_ on arrays, making our code more concise and readable.

For example, instead of iterating over each element of an array to calculate the square root, we can simply apply the `np.sqrt()` function to the _entire array_. This not only simplifies the code but also improves its efficiency.

In addition to arithmetic operations, __vectorization__ also enables us to perform logical operations, mathematical functions, and other operations on arrays and matrices. This makes it a powerful tool for _scientific computing_, _data analysis_, and _machine learning_ tasks.

Overall, __vectorization__ is a fundamental concept in array programming that allows us to write efficient and concise code by operating on entire arrays or matrices at once. It is a key technique to leverage the full potential of libraries like __NumPy__ and optimize our computations.

In [14]:
import memory_profiler
import time

# decorator to time a function
def time_function(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"Time taken: {end - start}")
        return result
    return wrapper

# decorator to get memory usage of a function
def memory_function(func):
    def wrapper(*args, **kwargs):
        result = memory_profiler.memory_usage((func, args, kwargs))
        print(f"Memory used: {result[0]}")
        return result
    return wrapper

In [15]:
# calculare the square root of a matrix with vertorization
@time_function
@memory_function
def sqrt_vectorized(matrix):
    return np.sqrt(matrix)

# calculate the square root of a matrix with a loop
@time_function
@memory_function
def sqrt_loop(matrix):
    result = np.zeros_like(matrix)
    for i in range(matrix.shape[0]):
        for j in range(matrix.shape[1]):
            result[i, j] = np.sqrt(matrix[i, j])
    return result

# ------- Test --------
size = 5000
matrix = np.random.rand(size, size)
print("Shape:", matrix.shape)
print("Vectorized")
t = sqrt_vectorized(matrix.copy())
print("\nLoop")
t = sqrt_loop(matrix)

Shape: (5000, 5000)
Vectorized
Memory used: 659.91796875
Time taken: 0.13384652137756348
Loop
Memory used: 278.4765625
Time taken: 29.520958423614502


In [16]:
# calculate the sum of two arrays witn vectorization
@time_function
@memory_function
def sum_vectorized(array_1: np.array, array_2: np.array):
    return array_1 + array_2

# calculate the sum of two arrays with a loop
@time_function
@memory_function
def sum_loops(array_1, array_2):
    result = np.zeros_like(array_1)
    for i in range(array_1.size):
        result[i] = array_1[i] + array_2[i]
    return result

#---------- Test -----------
size = 100000000
array_1 = np.random.rand(size)
array_2 = np.random.rand(size)
print("Shape:", array_1.shape)
print("Vectorized")
t = sum_vectorized(array_1, array_2)
print("\nLoops")
t = sum_loops(array_1, array_2)

Shape: (100000000,)
Vectorized
Memory used: 1804.4921875
Time taken: 0.49916863441467285
Loops
Memory used: 1804.57421875
Time taken: 31.070863246917725


In [17]:
# calculate broadcasted multiplication of and array and a scalar with vectorization
@time_function
@memory_function
def broadcasted_vectorized(array: np.array, scalar: int):
    return array * scalar

# calculate broadcasted multiplication of and array and a scalar with a loop
@time_function
@memory_function
def broadcasted_loop(array: np.array, scalar: int):
    result = np.zeros_like(array)
    for i in range(array.size):
        result[i] = array[i] * scalar
    return result

# ------------ Test ------------------
size = 100000000
array = np.random.rand(size)
scalar = np.random.randint(10)
print("Shape:", array.shape, "  -- Scalar:", scalar)
print("Vectorized:")
t = broadcasted_vectorized(array, scalar)
print("\nLoops:")
t = broadcasted_loop(array, scalar)

Shape: (100000000,)   -- Scalar: 1
Vectorized:
Memory used: 2567.65625
Time taken: 0.6349039077758789
Loops:
Memory used: 2567.67578125
Time taken: 22.94003701210022


In [18]:
# filter an array with a loop
@time_function
@memory_function
def is_even_vectorized(array):
    return array[array % 2 == 0]

# filter an array with a lambda function
@time_function
@memory_function
def is_even_lambda(array):
    return list( filter(lambda x: x%2==0, array) )


# loops
@time_function
@memory_function
def is_even_loop(array):
    result = []
    for x in array:
        if x % 2 == 0:
            result.append(x)
    result

# ------------- Test -----------
size = 100000000
array = np.random.rand(size)
print("Vectorized:")
t = is_even_vectorized(array)
print("\nLambda:")
t = is_even_lambda(array)
print("\nLoop:")
t = is_even_loop(array)

Vectorized:
Memory used: 2567.82421875
Time taken: 0.8963615894317627
Lambda:
Memory used: 2567.82421875
Time taken: 19.725606441497803
Loop:
Memory used: 2567.82421875
Time taken: 15.469332218170166


# Strings

A __string__ is a _sequence of characters_ enclosed in single quotes ( `''`) or double quotes (`""`).

In __Python__, strings are _immutable_, which means they cannot be changed once created.

__Strings__ can be _concatenated_ using the `+` operator, and _repeated_ using the `*` operator.

Various _string_ methods are available to perform operations like finding the _length_, _converting case_, _splitting_, and _joining_ strings.

In [34]:
# compare strings using conditionals
s_1 = "hello"
s_2 = "ud"

if s_1 == s_2:
    print("Equal")

if s_1 != s_2:
    print("Different")

if s_1 < s_2:
    print(s_1, "less than", s_2)

Different
hello less than ud


In [38]:
# check if a substring is in a string
sub = "ell"
if sub in s_1:
    print("Substring found")
else:
    print("Substring not found")

Substring found


In [39]:
# traverse a string with a loop
for char in s_1:
    print(char)

h
e
l
l
o


In [42]:
# get length of a string
print("Length s_1:", len(s_1))
print("Length s_2:", len(s_2))

# concatenate strings
print("Concatenate 1:", s_1 + " " + s_2)
print("Concatenate 2 (Yoda):", s_2 + ' ' +s_1)

# repeat a string
print("Repeat:", s_1 * 6)

Length s_1: 5
Length s_2: 2
Concatenate 1: hello ud
Concatenate 2 (Yoda): ud hello
Repeat: hellohellohellohellohellohello


In [46]:
name = "Lupita"

# format using format method
print("Que le pasa a {}? {}".format(name, s_1))

# format using f-strings
print(f'Que le pasa a {name}? {s_1} {s_2}')

Que le pasa a Lupita? hello
Que le pasa a Lupita? hello ud


In [51]:
# split a string
names_s = "Ana-Bob-Claire-Dennis"
names = names_s.split('-')
print(names, type(names))

# join a list of strings
names_joined = ", ".join(names)
print(names_joined, type(names_joined))

['Ana', 'Bob', 'Claire', 'Dennis'] <class 'list'>
Ana, Bob, Claire, Dennis <class 'str'>


In [69]:
# replace a substring
print("Replace:", names_joined.replace("na", "NA"))

# find the index of a substring
s_3 = s_1 * 3
print(s_3)
print("Index:", s_3.find("el"))

# count occurrences of a substring
print("Count a:", names_joined.count('a'))
print("Count nn:", names_joined.count('nn'))

# Cases
print("Lower: ", names_joined.lower())
print("Upper:", names_joined.upper())
print("Title:", names_joined.title())
print("Inverse:", names_joined.swapcase())

Replace: ANA, Bob, Claire, Dennis
hellohellohello
Index: 1
Count a: 2
Count nn: 1
Lower:  ana, bob, claire, dennis
Upper: ANA, BOB, CLAIRE, DENNIS
Title: Ana, Bob, Claire, Dennis
Inverse: aNA, bOB, cLAIRE, dENNIS


# DateTimes

__Datetimes__ in _Python_ are objects that represent _dates_ and _times_. They are used to perform various operations related to dates and times, such as _calculating time differences_, _formatting dates_, and _parsing strings_ into datetime objects.


In [80]:
from datetime import datetime

# Create a datetime object representing the current date and time
current = datetime.now()
print("Now:", current)

# Create a date object representing a specific date
date_1 = datetime(year=2015, month=10, day=25, hour=17)
print("Custom date:", date_1)

# Create a time object representing a specific time
print("Time:", datetime.time(current))

# Get the current datetime with a specific timezone
from pytz import timezone
colombia_tz = timezone("America/Bogota")
current_colombia = datetime.now(colombia_tz)
print("Now in Colombia:", current_colombia)


Now: 2024-06-26 10:27:22.291523
Custom date: 2015-10-25 17:00:00
Time: 10:27:22.291523
Now in Colombia: 2024-06-26 10:27:22.321485-05:00


In [86]:
# Use strftime to format a date and time
print("Format y-m(letter)-d h-m-s:", current_colombia.strftime("%Y-%b-%d %H:%M:%S"))
print("Format d(letter), m(letter), d(num), y:", current_colombia.strftime("%A, %b, %d, %Y"))

# Use strptime to parse a date and time string
date_string = "2023-02-21"
date_2 = datetime.strptime(date_string, "%Y-%m-%d")
print("Date 2:", type(date_2), date_2)

Format y-m(letter)-d h-m-s: 2024-Jun-26 10:27:22
Format d(letter), m(letter), d(num), y: Wednesday, Jun, 26, 2024
Date 2: <class 'datetime.datetime'> 2023-02-21 00:00:00


In [89]:
# Timedelta object representing the difference between two dates and times
from datetime import timedelta
print("Tomorrow:", current_colombia + timedelta(days=1))
print("Yesterday:", current_colombia - timedelta(days=1))
print("Deadline:", current_colombia + timedelta(minutes=30))
print("Close seconds:", current_colombia + timedelta(seconds=8))

Tomorrow: 2024-06-27 10:27:22.321485-05:00
Yesterday: 2024-06-25 10:27:22.321485-05:00
Deadline: 2024-06-26 10:57:22.321485-05:00
Close seconds: 2024-06-26 10:27:30.321485-05:00


# Regular Expressions

__Regular expressions__, also known as __regex__, are powerful tools for _pattern matching_ and _text manipulation_ in Python. They allow you to search, match, and manipulate strings based on specific __patterns__.

In _Python_, regular expressions are supported through the `re` module. This module provides functions and methods for working with regular expressions.

To use __regular expressions__ in _Python_, you need to import the `re` module. Once imported, you can use various functions and methods provided by the module to perform operations such as _pattern matching_, _searching_, _replacing_, and _splitting strings_.

__Regular expressions__ in _Python_ are defined using a combination of special characters and metacharacters that represent _patterns_. 

In [99]:
import re

# use regex to find a substring
string = "hello world, ud summer vacations sooner"
substring = "world,"
if re.search(substring, string):
    print("substring found")
else:
    print("substring not found")

# use regex to find all words
words = re.findall(r'\w+', string)
print("Words:", words)

substring found
Words: ['hello', 'world', 'ud', 'summer', 'vacations', 'sooner']
Words s: ['summer', 'sooner']
Numbers:, ['123', '456']


In [100]:
# use regex to find all words whom start with specific letter
words_s = re.findall(r'\b[s]\w+', string)
print("Words s:", words_s)

# find numbers using regex
string_num = "abc 123 def 456"
numbers = re.findall(r'\d+', string_num)
print("Numbers:,", numbers)

Words s: ['summer', 'sooner']
Numbers:, ['123', '456']


In [105]:
# use regex to replace a substring
print("Replace:", re.sub(r'ud', "Universidad Distrital", string))
print("Replace numbers:", re.sub(r'\d+', "XX", string_num))

# use regex to split a string
print("Words:", re.split(r'\d+', string_num))

Replace: hello world, Universidad Distrital summer vacations sooner
Replace numbers: abc XX def XX
Words: ['abc ', ' def ', '']


In [130]:
# use regex to match a string
print(string)
match_ = re.match('[hello]', string)
print(match_)
if re.match(r'ell', string):
    print("Match found")

# use regex to find all posibilities of a pattern
string = "dflksjfgjsdaflk;gjsd;lkfgjsh;dlfgjl;sdkjfg"

pattern = '[a-d]'
print("Matches:", re.findall(pattern, string))

pattern = '[d][j-l]'
print("Matches:", re.findall(pattern, string))

dflksjfgjsdaflk;gjsd;lkfgjsh;dlfgjl;sdkjfg
None
Matches: ['d', 'd', 'a', 'd', 'd', 'd']
Matches: ['dl', 'dk']


In [142]:
# validate an email address
email = "test.a2_class_223.ssd@udistrital.edu.co"
pattern = r'^[a-z][a-z0-9._]*@udistrital.edu.co$'
if re.match(pattern, email):
    print("Valid email")
else:
    print("Invalid email")

Valid email
