## PyTorch

In [6]:
from torch import Tensor
import torch

### DataTypes

In [7]:
# -- checking if a variable is a sequence

import collections
import numpy as np

isinstance([1], collections.abc.Sequence) # a list is a sequence
isinstance((1,), collections.abc.Sequence) # a tuple is a sequence
isinstance('a', collections.abc.Sequence) # a string is a sequence
isinstance(1, collections.abc.Sequence) # an int is not a sequence
isinstance(1.0, collections.abc.Sequence) # a float is not a sequence
isinstance(np.array([1,2,3]), collections.abc.Sequence) # a numpy array is NOT a sequence

# -- checking if a variable is iterable

isinstance([1], collections.abc.Iterable) # a list is iterable
isinstance((1,), collections.abc.Iterable) # a tuple is iterable
isinstance('a', collections.abc.Iterable) # a string is iterable
isinstance(1, collections.abc.Iterable) # an int is not iterable
isinstance(1.0, collections.abc.Iterable) # a float is not iterable
isinstance(np.array([1,2,3]), collections.abc.Iterable) # a numpy array is iterable

True

So when creating a tensor, need to 1st check the data type. We can give a list (which is recorgnaized as a seq), numpy array or a numeric (not recognized). Otherwise, to imitate pytorch, we can give an object of type `torch.Size` (in which case it would be names `lib.Size`, we have to create it ourselves) which returns an empty tensor of these dimensions

When instanciating an object of type Tensor, we can give the following arguments as **data**:  

- list of numbers of list of lists (the items at the end must be numbers, it could be list of list of list of list of numbers, etc)  
- numpy array (not explicitly required to provide this functionality by the assignment, but might be useful when importing, tranforming and processing data)  
- numeric (only really required to handle int32, int64, float32, float64), these numerci types should be defined in the library to imitate pytorch ui  

Need to validate for these types when creating a tensor (could be in `__setattr__` method of Tensor class OR could be handlede within the tensor() function since we want to make the user create the tensor through it, as it is the case in pytorch)

In [15]:
t1 = torch.tensor([1,2,3])
t1.dtype #by default it is int64 

t2 = torch.tensor([1.0,2])
t2.dtype #by default it is float32

t1.requires_grad #by default it is False

t1.is_leaf #by default it is True

# t3=torch.tensor() #this will raise an error, need to have arg data

torch.float32

The default behaviour of the `tensor()` function:  

* at least oen require argument, the data  
* if elements in the lowest level are not numbers, raise an error  
* if elements in the lowest level are numbers and one of them is a float, the tensor should be of type float32 otherwise int64 (if no dtype is provided)  

_the Tensor class is deprecated, pytorch recommends using the torch.tensor() function_

In [39]:
type(torch.float32) #torch.dtype
# type(torch.dtype) #type

# issubclass(torch.float32, torch.dtype) #this will raise an error, torch.float32 is not a class
isinstance(torch.float32, torch.dtype)

torch.dtype.__dict__
# torch.float32.__dict__ #AttributeError: 'torch.dtype' object has no attribute '__dict__'

# torch.float32(1) #not callable

mappingproxy({'__module__': 'torch',
              '__repr__': <slot wrapper '__repr__' of 'torch.dtype' objects>,
              '__reduce__': <method '__reduce__' of 'torch.dtype' objects>,
              'to_real': <method 'to_real' of 'torch.dtype' objects>,
              'to_complex': <method 'to_complex' of 'torch.dtype' objects>,
              'is_floating_point': <attribute 'is_floating_point' of 'torch.dtype' objects>,
              'is_complex': <attribute 'is_complex' of 'torch.dtype' objects>,
              'is_signed': <attribute 'is_signed' of 'torch.dtype' objects>,
              'itemsize': <attribute 'itemsize' of 'torch.dtype' objects>,
              '__doc__': None})

In [None]:
t1=torch.tensor([1,2,3])
t1[0] #tensor(1), returns a tensor object with 1st item in the tensor (if it is a scalar, it will return a 0D tensor)


torch.Size([])

The tensor is an iterable of tensors, where the lowest level that isn't iterable is teh 0 dimensional tensor (scalar).

### Playground

In [None]:
(Tensor([1,2])==torch.tensor([1.0,2.0])) #tensor([True, True]), this output is of type Tensor TAKE INTO CONSIDERATION WHEN WRITING THE __eq__ method
isinstance(torch.tensor([1,2]), torch.Tensor) # True

# -- torch.tensor(-) is a function that instanciates an object of type torch.Tensor and returns it

torch.Tensor

#### Scalars

Scalar has ndim=0, shape (Size) empty, so it's a 0D tensor  
A scalar is defined when we give a single number (so numeric type) to the tensor constructor.  

In [93]:
scalar = torch.tensor(7)
scalar = torch.tensor(7.)
scalar=torch.tensor(int(7.0))
scalar=torch.tensor(float(7))
scalar=torch.tensor(np.int64(7))
# scalar=torch.tensor('a') #TypeError: new(): invalid data type 'str'
scalar

isnumeric = lambda x: isinstance(x, (int, float, np.int64, np.float64,np.int32, np.float32))
isnumeric(scalar.item()) 
# type(scalar.item()) #int

True

In [88]:
print(f'nb of dim:{scalar.ndim}; dim:{scalar.shape}; size:{scalar.size()}')

# help(Tensor.size) 
# -- the difference between size() and shape is that size(), we can give it a dimension and it will return the size of that dimension

nb of dim:0; dim:torch.Size([]); size:torch.Size([])


#### Vectors

In [91]:
vector=torch.tensor([1,2,3])
vector=torch.tensor([1.,2.,3.])
vector=torch.tensor([1,2,3], dtype=torch.float32)
vector=torch.tensor(np.array([1,2,3])) #works fine :')

print(f'nb of dim:{vector.ndim}; dim:{vector.shape}; size:{vector.size()}')

nb of dim:1; dim:torch.Size([3]); size:torch.Size([3])


In [36]:
# type(vector.shape)
# help(torch.Size())

vec2=torch.tensor([1,2,3], requires_grad=True,dtype=torch.float32) #requires_grad=True only if float
print(vec2)

tensor([1., 2., 3.], requires_grad=True)


In [37]:
vec2.requires_grad #true
vector.requires_grad #false

vector.dtype #int64 by default

vec2==vector 
#returns tensor([True, True, True]) so __eq__ only looks up values (no dtype, requires_grad, etc)

tensor([True, True, True])

In [41]:
vec2==scalar
scalar==vec2
# both return tensor([False, False, False])

tensor([False, False, False])

#### Matrices   

In [97]:
matrix=torch.tensor([[1,2,3],[4,5,6]])
matrix=torch.tensor([[1.,2.,3.],[4.,5.,6.]])
# matrix=torch.tensor([[1,2,3],[4,5]]) #ValueError: expected sequence of length 3 at dim 1 (got 2)
# -- this means that items has to be of the same length at each level of the list

print(f'nb of dim:{matrix.ndim}; dim:{matrix.shape}; size:{matrix.size()}')

nb of dim:2; dim:torch.Size([2, 3]); size:torch.Size([2, 3])


#### N-D Tensors

In [99]:
t3d=torch.tensor([
        [
            [1,2],
            [3,4]
        ],
        [
            [5,6],
            [7,8]
        ]
    ])
# <=>
t3d=torch.tensor(np.array([
        [
            [1,2],
            [3,4]
        ],
        [
            [5,6],
            [7,8]
        ]
    ]))

print(f'nb of dim:{t3d.ndim}; dim:{t3d.shape}; size:{t3d.size()}')

nb of dim:3; dim:torch.Size([2, 2, 2]); size:torch.Size([2, 2, 2])


When we give it a list of lists, notice that the ndim=number of opened brackets, and the shape is the number of elements in each bracket.  
Othweise if np array, it's easier to process dimensions as we can use the `shape` attribute of the numpy array.

# Scratch

* [x] dtype 
* [ ] validation (raise errors if conditions are not met):  
    * the input data type (has to be list, numpy array at top level; numeric type exceptionally when defining a 0D tensor)   
    * no empty lists  
    * the lowest level of the data (has to be numeric), convert by default to float64 using python's `float()` casting function  
    * the dimensions are uniform at each level  



## dtype

In [68]:
from enum import Enum

class dtype(Enum):
    int64 = "int64"
    float64 = "float64"

    def __repr__(self):
        return self.value

    def __call__(self, x):
        '''make if callable, uses:
        ```
        >>> dtype.int64(1.7)
        1
        >>> dtype.float64(1)
        1.0
        '''
        if self == dtype.int64:
            return int(x)
        elif self == dtype.float64:
            return float(x)
        else:
            print(f"Unknown dtype: {self}")

# -- aliasing
int64 = dtype.int64
float64 = dtype.float64


In [48]:
d=dtype.int64
type(d) #<enum 'dtype'>
isinstance(d, dtype) #True
isinstance(d, Enum) #True
isinstance(d, type) #False
type(d) #<enum 'dtype'>
type(dtype) #enum.EnumType

<enum 'dtype'>

## validation

In [56]:
def is_emptylist(l):
    '''
    returns True if l has at the lowest level at least one empty list
    
    e.g.

    ```
    >>> is_emptylist([])
    True
    >>> is_emptylist([1])
    False
    >>> is_emptylist([[]])
    True
    >>> is_emptylist([[],[]])  
    True
    >>> is_emptylist([[],[1]])
    True
    ```
    '''
    return all(isinstance(x, list) for x in l)


def is_numeric(x):
    '''
    takes x and returns True if x is a numeric type 
    
    e.g.

    ```
    >>> is_numeric(1)
    True
    >>> is_numeric(1.0)
    True
    >>> is_numeric('a')
    False
    >>> is_numeric([1])
    False
    ```
    '''
    acceptable_numeric_types = (int, float, np.int64, np.float64, np.int32, np.float32)
    for i in acceptable_numeric_types:
        if isinstance(x, i):
            return True
    return False

def is_inner_numeric(l:list):
    '''
    recursive function that takes a list l and returns True if all the inner elements of l are numeric (depends on is_numeric())

    e.g.

    ```
    >>> is_inner_numeric([1,2,3])
    True
    >>> is_inner_numeric([1,2,'a'])
    False
    >>> is_inner_numeric([1,[2,3]])
    True
    >>> is_inner_numeric([1,[2,'a']])
    False
    ```
    '''
    if isinstance(l,list):
        return all(is_inner_numeric(x) for x in l)
    else:
        return is_numeric(l)
    
def check_dlist(l):
    '''makes sure the input is either a numeric or a non empty dlist of numerics, depends on is_inner_numeric() and is_numeric() and is_emptylist()

    THROW ValueError if the input is not a numeric or a dlist of numerics
    
    '''
    if is_emptylist(l):
        raise ValueError('empty list provided')
    if not is_inner_numeric(l):
        raise ValueError('all elements in the list must be numeric')
    return True

In [29]:
# ##### TESTING #####
# #wont use them will use cast_dtype instead :p


# def make_float64(l):
#     '''
#     recursive function that takes a list l and returns a list with all the numeric elements converted to float64

#     note: this step is after checking if the elements are numeric
    
#     e.g.

#     ```
#     >>> make_float64([1,2,3])
#     [1.0, 2.0, 3.0]
#     >>> make_float64([1,2])  
#     [1.0, 2.0]
#     >>> make_float64([1,2.0])
#     [1.0, 2.0]
#     >>> make_float64([1,[2,3]])
#     [1.0, [2.0, 3.0]]
#     ```
#     '''
#     if isinstance(l,list):
#         return [make_float64(x) for x in l]
#     else:
#         return float(l)

# def make_int64(l):
#     '''
#     recursive function that takes a list l and returns a list with all the numeric elements converted to int64

#     note: this step is after checking if the elements are numeric
    
#     e.g.

#     ```
#     >>> make_int64([1.0,2.0,3.0])
#     [1, 2, 3]
#     >>> make_int64([1.0,2.0])  
#     [1, 2]
#     >>> make_int64([1.0,2])
#     [1, 2]
#     >>> make_int64([1,[2.0,3.0]])
#     [1, [2, 3]]
#     ```
#     '''
#     if isinstance(l,list):
#         return [make_int64(x) for x in l]
#     else:
#         return int(l)

In [31]:
# function to go recursively to teh lowest level of the list: we'll use a function deorator


# def memoize_dimensions(func):
#     """decorator to memoize the dimensions of a nested list"""
#     cache = {}

#     def inner(nested_list):
#
#         id_list = id(nested_list)
#         if id_list in cache:
#             return cache[id_list]
#         result = func(nested_list)
#         cache[id_list] = result
#         return result
#
#     return inner

# @memoize_dimensions
def infer_dimensions(nested_list):
    '''
    recursively infer the dimensions of a nested list and validate uniformity  

    THROW ERROR IF NOT UNIFORM

    ```
    >>> infer_dimensions([1,2,3]) #vector
    [3]
    >>> infer_dimensions([1]) #vector
    [1]
    >>> infer_dimensions(1)  #scalar
    []  
    >>> infer_dimensions([[1,2],[3,4]]) #matrix
    [2, 2]  
    >>> infer_dimensions([[[1,2],[3,4]],[[5,6],[7,8]]]) #3D tensor
    [2, 2, 2]  
    >>> infer_dimensions([[[1,2],[3,4]],[[5,6],[7]]]) 
    ValueError: Dimension mismatch detected: [[2, 2], [2, 1]]
    '''
    if isinstance(nested_list, list):
        if len(nested_list) == 0: #if empty inner list = dimension 0
            'base case for scalars, reurn dim 0'
            return [0]  
        sub_shapes = [infer_dimensions(sublist) for sublist in nested_list]
        if len(set(map(tuple, sub_shapes))) > 1:  
            '''
            # this condition takes all shapes of lists at the same level which are lists inside sub_shapes
            # makes them tuples and remove duplicates (set())
            # length should be 1 if the lists are uniform in shape
            '''
            raise ValueError(f"Dimension mismatch detected: {sub_shapes}")
    
        return [len(nested_list)] + sub_shapes[0]  #combine this level with sub-dimensions so this way we have [2,2] for [[1,2],[3,4]] (sub_shapes[0] is the only item in the list)
    
    #if not a list (a scalar), no dimensions, need to return a list of length 0 in order to check for at the next base case
    return [] 



In [107]:

try:
    nested_list = [[1, 2], [3, 4]]
    dimensions = infer_dimensions(nested_list)
    print("Dimensions:", dimensions)

    irregular_list = [[1, 2], [3, 4, 5]]
    infer_dimensions(irregular_list)
except ValueError as e:
    print("Error:", e)

Dimensions: [2, 2]
Error: Dimension mismatch detected: [[2], [3]]


In [32]:
# examples to test

l0=1
l1=[1,2,3,4]
l2=[[1,2],[3,4]]
l3=[[[1,2],[3,4]],[[5,6],[7,8]]]
l4=[[
        [1,2],[3,4],[5,6],[7,8],[9,10],[11,12]
    ],
    [
        [5,6],[7,8],[9,10],[11,12],[13,14],[15,16]
    ], 
    [
        [9,10],[11,12],[13,14],[15,16],[17,18],[19,20]
    ]]

l5=[
    [
        [
            [1,2],[3,4]
        ],
        [
            [5,6],[7,8]
        ]
    ]
]

#empty
bad_l0=[]
#same level but one list and one int
bad_l1=[1,[1]] 
#non uniform length at same level
bad_l2=[[1,2],[3]]
bad_l3=[[[1,2],[3,4]],[[5,6],[7]]]
bad_l4=[[[1,2],[3,4]],[[5,6],[7,8],[9,10]]]
#not enclosed by list at top level
bad_l5=(1,2)
#lowest level not numeric
bad_l6=[[[1,2],[3,4]],[[5,6],[7,'a']]]

# is_numeric(bad_l2)

In [43]:
try:
    d=infer_dimensions(bad_l4)
    print(d)
except ValueError as e:
    print("Error:", e)
# successful test

# -- things that are not validated by infer_dimensions
#   - the top level is a list  (handled in 1st step by checking the input type)
#   - the lowest level is numeric   
#   - the list is empty
# need to add these checks prior to calling infer_dimensions

Error: Dimension mismatch detected: [[2, 2], [3, 2]]


In [72]:
def cast_dtype(nested_list, dt: dtype=dtype.float64):
    '''
    recursively cast the elements of a nested list to a given dtype (default is float64)

    e.g.
    
    ```
    >>> cast_dtype([1,2,3], dtype.float64)
    [1.0, 2.0, 3.0]
    >>> cast_dtype([1,2,3], dtype.int64)
    [1, 2, 3]
    ```
    '''
    if isinstance(nested_list, list):
        return [cast_dtype(sublist, dt) for sublist in nested_list]
    return dt(nested_list)


#successful test

In [73]:
def check_dtype(dt):
    '''
    check if dtype is within the dtype enumerate

    ```
    >>> check_dtype('int64')
    >>> check_dtype('int32')
    ValueError: Invalid dtype given: int32; Valid dtypes are from ['int64', 'float64'] #so far, under development
    ```
    '''
    if dt not in dtype.__members__.keys():
        raise ValueError(f"Invalid dtype given: {dtype}; Valid dtypes are from {list(dtype.__members__.keys())}")
    
    return dtype.__members__[dt]    


In [74]:
cast_dtype(l4, check_dtype('float64'))

dt=check_dtype('int64')
cast_dtype(l4, dt)

[[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]],
 [[5, 6], [7, 8], [9, 10], [11, 12], [13, 14], [15, 16]],
 [[9, 10], [11, 12], [13, 14], [15, 16], [17, 18], [19, 20]]]

In [None]:
def validate_tensor_input(input_data):
    '''
    The input should be either a numeric or a nested list of numerics (allow for numpy in later versions)

    When validating the things to check for (in order) are:  

    1. if the input is a  non-empty list (could be nested) (or numeric): raise ValueError if not -> check_dlist(input_data)    
    2. dimensions of the list (uniformity): raise valueError if not uniform -> infer_dimensions(input_data)  

    in 1 we are checking for (when non numeric):  
        a. top level is a list   
        b. non-empty list (nor containing empty lists)  
        c. lowest level is numeric    
    '''