# Numpy

__Numpy__ is a _Python package_ that stands for ___Numerical Python___. It is a _library_ for the _Python_ programming language, adding support for large, _multi-dimensional arrays_ and _matrices_, along with a _large collection_ of high-level mathematical functions to operate on these _arrays_.

__Numpy__ provides a powerful _N-dimensional_ array object, useful for performing mathematical and logical operations on arrays. It also has _functions_ for working in domain of _linear algebra_, _Fourier transform_, and _matrices_.

By importing `numpy as np`, we can access all the functions and methods provided by the _numpy_ package using the `np` alias.

In [2]:
import numpy as np

# create an array
array = np.array([1, 2, 3, 4, 5])


# create an array with range
array = np.arange(10, 51, 2)

# create an array with linspace
array = np.linspace(0, 1, 20)

# create a matrix of zeros
zeros = np.zeros((3, 3))

# data type
dtype = array.dtype

In [None]:
# get shape of array
shape = array.shape

# get shape of matrix
shape = zeros.shape

# get number of dimensions of array
dimensions = array.ndim

# get number of dimensions of matrix
dimensions = zeros.ndim

In [26]:
# get number of elements in array
length = array.size

# get element by index
element_array = array[3]
element_matrix = zeros[1, 1]

In [None]:
# descriptive statistics

# get sum of array
sum = array.sum()

# get mean of array
mean = array.mean()

# get standard deviation of array
std = array.std()

# get variance of array
var = array.var()

# get min of array
min = array.min()

# get max of array
max = array.max()

In [5]:
# slice a matrix
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
slice = matrix[:2, 1:]
print(slice)

# is the slice by reference or by value?
slice[0, 0] = 100
print(matrix)

[[2 3]
 [5 6]]
[[  1 100   3]
 [  4   5   6]
 [  7   8   9]]


## Linear Algebra with Numpy

_Numpy_ is a powerful _Python_ package that provides support for _linear algebra_ operations. It allows us to perform various mathematical operations on _arrays_ and _matrices_ efficiently.

With __Numpy__, we can easily solve _linear algebra_ problems such as finding solutions to systems of linear equations, calculating matrix determinants, eigenvalues, eigenvectors, and much more.

# 

In [27]:
# solve a system of linear equations
equations = np.array([[2, 1], [3, 5]])
answers = np.array([1, 2])
solution = np.linalg.solve(equations, answers)

# to solve a system of linear equations, the number of equations must equal the number of unknowns

In [28]:
# get the inverse of a matrix
matrix = np.array([[1, 2], [3, 4]])
inverse = np.linalg.inv(matrix)
print(inverse)

# to get the inverse of a matrix, 
# the matrix must be square (number of rows = number of columns)
# the matrix must be non-singular (determinant is not zero)
# Inversa = [[d/ad-bc, -b/ad-bc], [-c/ad-bc, a/ad-bc]]

[[-2.   1. ]
 [ 1.5 -0.5]]


In [29]:
# get the determinant of a matrix
determinat = np.linalg.det(matrix)
print(determinat)

# the determinant of a matrix is a scalar value  calculated with next formula
# det(A) = ad - bc

-2.0000000000000004


In [30]:
# get the dot product of two arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
dot = np.dot(array1, array2)
print(dot)
# dot = 1*4 + 2*5 + 3*6

32


In [31]:
# get the cross product of two arrays
cross = np.cross(array1, array2)
print(cross)
# cross = [2*6-3*5, 3*4-1*6, 1*5-2*4]

[-3  6 -3]


In [32]:
# get the norm of an array
norm = np.linalg.norm(array1)
print(norm)
# norm = sqrt(1^2 + 2^2 + 3^2)

3.7416573867739413


In [33]:
# get the eigenvalues and eigenvectors of a matrix
eigenvalues, eigenvectors = np.linalg.eig(matrix)
print(eigenvalues)
print(eigenvectors)
# the eigenvalues of a matrix are the values that satisfy the equation det(A - λI) = 0
# the eigenvectors of a matrix are the vectors that satisfy the equation A*v = λ*v

# eigenvectors are used in principal component analysis (PCA) to reduce the dimensionality of data  

[-0.37228132  5.37228132]
[[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]


## Vectorization

__Vectorization__ is a technique in computer programming that allows us to perform operations on _entire arrays or matrices_ instead of looping through each element individually. This approach leverages the power of optimized, low-level operations provided by libraries like __NumPy__.

By using _vectorized operations_, we can significantly improve the _performance_ of our code, as it takes advantage of parallel processing capabilities of modern _CPUs_. Instead of writing explicit loops, we can express our computations as _mathematical expressions_ on arrays, making our code more concise and readable.

For example, instead of iterating over each element of an array to calculate the square root, we can simply apply the `np.sqrt()` function to the _entire array_. This not only simplifies the code but also improves its efficiency.

In addition to arithmetic operations, __vectorization__ also enables us to perform logical operations, mathematical functions, and other operations on arrays and matrices. This makes it a powerful tool for _scientific computing_, _data analysis_, and _machine learning_ tasks.

Overall, __vectorization__ is a fundamental concept in array programming that allows us to write efficient and concise code by operating on entire arrays or matrices at once. It is a key technique to leverage the full potential of libraries like __NumPy__ and optimize our computations.

In [34]:
import memory_profiler
import time

# decorator to time a function
def time_function(func):
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        end = time.time()
        print(f"Time taken: {end - start}")
        return result
    return wrapper

# decorator to get memory usage of a function
def memory_function(func):
    def wrapper(*args, **kwargs):
        result = memory_profiler.memory_usage((func, args, kwargs))
        print(f"Memory used: {result[0]}")
        return result
    return wrapper

In [None]:
# calculare the square root of a matrix with vertorization
@time_function
@memory_function
def sqrt_vectorized(matrix):
    return np.sqrt(matrix)

# calculate the square root of a matrix with a loop
@time_function
@memory_function
def sqrt_loop(matrix):
    result = np.zeros_like(matrix)
    for i in range(matrix.shape[0]):
        for j in range(matrix.shape[1]):
            result[i, j] = np.sqrt(matrix[i, j])
    return result

size = 1000
matrix = np.random.rand(size, size)
print("Shape:", matrix.shape)
print("Vectorized")
t = sqrt_vectorized(matrix)
print("Loop")
t = sqrt_loop(matrix)

In [None]:
# calculate the sum of two arrays witn vectorization
@time_function
@memory_function
def sum_vectorized(array1, array2):
    return array1 + array2

# calculate the sum of two arrays with a loop
@time_function
@memory_function
def sum_loop(array1, array2):
    result = np.zeros_like(array1)
    for i in range(array1.shape[0]):
        result[i] = array1[i] + array2[i]
    return result

size = 10000000
array1 = np.random.rand(size)
array2 = np.random.rand(size)
print("Shape:", array1.shape)
print("Vectorized")
t = sum_vectorized(array1, array2)
print("Loop")
t = sum_loop(array1, array2)

In [None]:
# calculate broadcasted multiplication of and array and a scalar with vectorization
@time_function
@memory_function
def broadcasted_vectorized(array, scalar):
    return array * scalar

# calculate broadcasted multiplication of and array and a scalar with a loop
@time_function
@memory_function
def broadcasted_loop(array, scalar):
    result = np.zeros_like(array)
    for i in range(array.shape[0]):
        result[i] = array[i] * scalar
    return result

size = 10000000
array = np.random.rand(size)
scalar = np.random.rand(1)
print("Shape:", array.shape)
print("Vectorized")
t = broadcasted_vectorized(array, scalar)
print("Loop")
t = broadcasted_loop(array, scalar)

In [None]:
# filter an array with vectorization
@time_function
@memory_function
def is_even(x):
    return x % 2 == 0

# filter an array with a loop
@time_function
@memory_function
def is_even_loop(array):
    result = []
    for x in array:
        if x % 2 == 0:
            result.append(x)
    return result

# filter an array with a lambda function
@time_function
@memory_function
def is_even_lambda(array):
    return list(filter(lambda x: x % 2 == 0, array))

size = 10000000
array = np.random.randint(0, 100, size)
print("Shape:", array.shape)
print("Vectorized")
t = is_even(array)
print("Loop")
t = is_even_loop(array)
print("Lambda")
t = is_even_lambda(array)

# Strings

A __string__ is a _sequence of characters_ enclosed in single quotes ( `''`) or double quotes (`""`).

In __Python__, strings are _immutable_, which means they cannot be changed once created.

__Strings__ can be _concatenated_ using the `+` operator, and _repeated_ using the `*` operator.

Various _string_ methods are available to perform operations like finding the _length_, _converting case_, _splitting_, and _joining_ strings.

In [None]:
# compare strings using conditionals
string1 = "hello"
string2 = "world"

if string1 == string2:
    print("Equal")
if string1 != string2:
    print("Not equal")
if string1 < string2:
    print("Less than")

In [None]:
# check if a substring is in a string
substring = "ell"
if substring in string1:
    print("Substring found")

In [None]:
# traverse a string with a loop
for char in string1:
    print(char)

In [None]:
# get length of a string


# concatenate strings


# repeat a string


In [None]:
# format using format method
print("Hello, {}!".format("world"))

# format using f-strings
print(f"Hello, {string2}!")

In [None]:
# split a string


# join a list of strings


In [None]:
# replace a substring


# find the index of a substring
index = string1.find(substring)

# count occurrences of a substring
count = string1.count(substring)

# DateTimes

__Datetimes__ in _Python_ are objects that represent _dates_ and _times_. They are used to perform various operations related to dates and times, such as _calculating time differences_, _formatting dates_, and _parsing strings_ into datetime objects.


In [None]:
from datetime import datetime

# Create a datetime object representing the current date and time
current = datetime.now()
print(current)

# Create a date object representing a specific date
date_var = datetime(2020, 12, 31)

# Create a time object representing a specific time
time_var = datetime.time(datetime.now())

# Get the current datetime with a specific timezone
from pytz import timezone
import pytz
colombia = timezone('America/Bogota')
current_colombia = datetime.now(colombia)
print(current_colombia)

In [None]:
# Use strftime to format a date and time
formated = current.strftime("%Y-%m-%d %H:%M:%S")
print(formated)
formated = current.strftime("%A, %B %d, %Y")
print(formated)

# Use strptime to parse a date and time string
date_string = "2024-06-22"
date = datetime.strptime(date_string, "%Y-%m-%d")
print(date)

In [None]:
from datetime import timedelta

# Timedelta object representing the difference between two dates and times
tomorrow = current + timedelta(days=1)
print(tomorrow)

# Regular Expressions

__Regular expressions__, also known as __regex__, are powerful tools for _pattern matching_ and _text manipulation_ in Python. They allow you to search, match, and manipulate strings based on specific __patterns__.

In _Python_, regular expressions are supported through the `re` module. This module provides functions and methods for working with regular expressions.

To use __regular expressions__ in _Python_, you need to import the `re` module. Once imported, you can use various functions and methods provided by the module to perform operations such as _pattern matching_, _searching_, _replacing_, and _splitting strings_.

__Regular expressions__ in _Python_ are defined using a combination of special characters and metacharacters that represent _patterns_. 

In [6]:
import re

# use regex to find a substring
string = "hello, world"
substring = "world"
if re.search(substring, string):
    print("Substring found")

# use regex to find all substrings
substrings = re.findall(r"\w+", string)
print(substrings)

# use regex to find all words starting with a specific letter
words = re.findall(r"\b[h]\w+", string)
print(words)

# use regex to find numbers in a string
string = "hello, 123"
numbers = re.findall(r"\d+", string)
print(numbers)

Substring found
['hello', 'world']
['hello']
['123']


In [62]:
# use regex to replace a substring
new_string = re.sub(r"hello", "hi", string)
print(new_string)

# use regex to split a string
words = re.split(r"\W+", string)
print(words)

hi, 123
['hello', '123']


In [None]:
# use regex to match a string
match = re.match(r"hello", string)
if match:
    print("Match found")

# use regex to find all posibilities of a pattern
string = "avbdgfkjghjgkjiulkjlkjfgcgfcsrtdtfhjhkkhgfdfd"
pattern = "[a-d]"
matches = re.findall(pattern, string)
print(matches)

pattern = "[j][fg]"
matches = re.findall(pattern, string)
print(matches)

In [18]:
# validate an email address
email = "test@udistrital.edu.co"
pattern = r"^[a-z][a-zA-Z0-9_.-]*@udistrital.edu.co$"
if re.match(pattern, email):
    print("Valid email")

Valid email
