## Code Snippets for Natural Language Programming 

### 0. Contents

- 1 One-hot representation with scikit-lear

### 1. Creating one-hot representation with scikit-learn

In [21]:
from sklearn.feature_extraction.text import CountVectorizer

corpus = [
    "This is the first document.",
    "This document is the second document.",
    "And this is the third one.",
    "Is this the first document?"
]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
vectorizer.get_feature_names_out()


array(['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third',
       'this'], dtype=object)

In [23]:
print(X.toarray())

[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]


### 1. Creating Tensors

In [6]:
def describe(x):
    """A function used to give information """
    print("Type: {}".format(x.type()))
    print("Shape/size: {}".format(x.shape))
    print("Values: \n {}".format(x))

In [10]:
import torch
describe(torch.Tensor(2, 3))

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
 tensor([[6.2449e-39, 4.9592e-39, 4.9592e-39],
        [6.0612e-39, 4.2246e-39, 1.0286e-38]])


In [11]:
import torch 
describe(torch.rand(2,3)) # create a tensor from uniform random distribution
describe(torch.randn(2,3)) # create a tensor from random normal distribution

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
 tensor([[0.6591, 0.1082, 0.0849],
        [0.7537, 0.0412, 0.3315]])
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values: 
 tensor([[-1.2704,  0.8241,  0.4256],
        [ 0.3054, -0.3068,  0.2641]])


We can also create tensors all filled with the same scalar. For creating tensor of zeros or ones, we have built-in functions, and for filling it with specific values, we can use the fill_() method. 

Any PyTorch method with an underscore refers to an in-place operation; that is, it modifies the content in place without creating a new object.

In [19]:
# Creating a filled tensor 

import torch 
# describe(torch.zeros(2,3))
x = torch.ones(2,3)
# describe(x)
x.fill_(5)
# describe(x)

tensor([[5., 5., 5.],
        [5., 5., 5.]])

In [17]:
# Creating and initializing a tensor from lists

x = torch.tensor([[1,2,3],
                  [4,5,6]])

In [22]:
# We can also create from Numpy but pay attention that in that case the type will
# be torch.DoubleTensor instead of the torch.FloatTensor

import numpy as np
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))

Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values: 
 tensor([[0.9334, 0.9989, 0.0594],
        [0.4688, 0.8958, 0.8736]], dtype=torch.float64)


You can convert a tensor to a different type(float, long, double, etc. ) 
by specifying it at initialization or later using one of the typecasting methods. 
There are two ways to specify the initialization type: either by directly calling 
the constructor of a specific tensor type, such as FloatTensor or LongTensor, or 
using a special method, torch.tensor().

In [26]:
x = torch.FloatTensor([[1, 2, 3], [4, 5, 6]])
x = x.long()
x = torch.tensor([[1, 2, 3],[4, 5, 6]], dtype = torch.int64)
x = x.float()

We use the shape property and size() method of a tensor object to access the 
measurements of its dimensions. 
The two ways of accessing these measurements are mostly synonymous. 
Inspecting the shape of the tensor is an indispansable tool in debugging PyTorch code.