In [30]:
import torch
import numpy as np

#### Creating Tensors.

Lets create a numpy array, and turn into tensors using the four different options:

In [31]:
data = np.array([1,2,3])
print(data,type(data))

[1 2 3] <class 'numpy.ndarray'>


In [32]:
t1 = torch.Tensor(data) # Class constructor.
t2 = torch.tensor(data) # Factory function, allows more dynamic object creation.
t3 = torch.as_tensor(data) # Factory function.
t4 = torch.from_numpy(data) # Factory function.
print(f't1: {t1}')
print(f't2: {t2}')
print(f't3: {t3}')
print(f't4: {t4}')

t1: tensor([1., 2., 3.])
t2: tensor([1, 2, 3], dtype=torch.int32)
t3: tensor([1, 2, 3], dtype=torch.int32)
t4: tensor([1, 2, 3], dtype=torch.int32)


Check coresponding datatype:

In [26]:
print(f't1: {t1.dtype}')
print(f't2: {t2.dtype}')
print(f't3: {t3.dtype}')
print(f't4: {t4.dtype}')


t1: torch.float32
t2: torch.int32
t3: torch.int32
t4: torch.int32


The **torch.Tensor()** returns float32 because the constructor uses the global default data type.

In [11]:
torch.get_default_dtype()

torch.float32

The other factory functions *infer* the data type.
They choose the datatype based on incoming data. (Type inference)

*Integer coming in - integer coming out.*

In [13]:
torch.tensor(np.array([1,2,3]))

tensor([1, 2, 3], dtype=torch.int32)

*Floating-point coming in - floating point coming out*

In [15]:
torch.tensor(np.array([1.,2.,3.]))

tensor([1., 2., 3.], dtype=torch.float64)

We can also set the dtype explicitly:  
(*Notice that, although we pass ints, the dtype turns to float because we explicitly passed it.*)

In [16]:
torch.tensor(np.array([1,2,3]),dtype=torch.float64)

tensor([1., 2., 3.], dtype=torch.float64)

> All of the above has been what we can *visually* detect, in terms of differences, just by inspecting output. There are more differences however, behind the scenes.

#### Memory: Sharing vs Copying

Lets start with the initial np.array:

In [17]:
data = np.array([1,2,3])
data

array([1, 2, 3])

In [18]:
t1 = torch.Tensor(data) 
t2 = torch.tensor(data) 
t3 = torch.as_tensor(data)
t4 = torch.from_numpy(data)

Now, we leave the tensors alone, but we modify the data we used (*np.array([1,2,3])*) when creating the numpy array that we in turn used, when we created our tensors.

In [20]:
data[0] = 0
data[1] = 0
data[2] = 0

We modified the contents of the initial list, time to see what happens to the tensors.

In [22]:
print(f't1: {t1}')
print(f't2: {t2}')
print(f't3: {t3}')
print(f't4: {t4}')

t1: tensor([1., 2., 3.])
t2: tensor([1, 2, 3], dtype=torch.int32)
t3: tensor([0, 0, 0], dtype=torch.int32)
t4: tensor([0, 0, 0], dtype=torch.int32)


> The **torch.Tensor(**) and **torch.tensor()** *haven't changed*. This means that they *still contain the data from the original [1,2,3] numpy array*. Changing the array did not affect the t1, t2 tensor data.  
> 
> The **torch.as_tensor()** and **torch.from_numpy()**, *contain the same data that is now in the array, after the change*
>
> The reason: t1 and t2 create an additional copy of the input data in memory, while the t3 and t4 tensors share data in memory with the numpy array.
<div style="text-align: center;">

![Share - Copy](Img\Share_copy_data.JPG)

</div>

This sharing just means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the **torch.Tensor** and the **numpy.ndarray**.

> Moving between numpy arrays and pytorch tensors can be very fast because the data is shared and not copied behind the scenes when creating new pytorch tensors.*When we say the data is shared we mean that the data exists in a **single place**.* As a result any changes that occur in the underlying data will be reflected in both objects; the array and the tensor.

#### ***Best options for creating tensors in Pytorch:***

 Given all of these details, these two are the best options:

    torch.tensor()
    torch.as_tensor()

The torch.tensor() call is the sort of go-to call, while torch.as_tensor() should be employed when tuning our code for performance. 
 Some things to keep in mind about memory sharing (it works where it can):

    1.  Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.
    2.  The memory sharing of as_tensor() doesn't work with built-in Python data structures like lists.
    3.  The as_tensor() call requires developer knowledge of the sharing feature. This is necessary so we don't inadvertently make an unwanted change in the underlying data without realizing the change impacts multiple objects.
    4.  The as_tensor() performance improvement will be greater if there are a lot of back and forth operations between numpy.ndarray objects and tensor objects. However, if there is just a single load operation, there shouldn't be much impact from a performance perspective.