In [1]:
import torch
from torch import nn 

In [2]:
vocab_size = 10

In [103]:
class RNN(nn.Module):

    def __init__(self,dim,output_size):
        super().__init__()
        self.dim = dim
        self.Wx = nn.Linear(dim,dim)
        self.Wh = nn.Linear(dim,dim)
        self.sigmoid = nn.Sigmoid()
        self.Wy = nn.Linear(dim,output_size)
        
    def hiddend_state(self):
        return torch.zeros(1, self.dim)
        
    def forward(self,x,h):
        h = self.sigmoid(self.Wx(x) + self.Wh(h))
        logits = self.Wy(h)
        return h,logits

In [104]:
class Seq2Seq(nn.Module):

    def __init__(self,dim,output_size):
        super().__init__()
        self.rnn = RNN(dim,output_size)   
        self.embedding = nn.Embedding(output_size,dim)
    
    def forward(self,x):
        h = self.rnn.hiddend_state()
        h_seq = []
        logits_seq = []
        for xi in x:
            xi = self.embedding(xi)
            h,logits = self.rnn(xi,h)
            h_seq.append(h)
            logits_seq.append(logits)
        h_seq = torch.cat(h_seq)
        logits_seq = torch.cat(logits_seq)
        return h_seq,logits_seq

In [105]:

text = """First, should you want to get a PhD? I was in a fortunate position of knowing since young age that I really wanted a PhD. Unfortunately it wasn’t for any very well-thought-through considerations: First, I really liked school and learning things and I wanted to learn as much as possible, and second, I really wanted to be like Gordon Freeman from the game Half-Life (who has a PhD from MIT in theoretical physics). I loved that game. But what if you’re more sensible in making your life’s decisions? Should you want to do a PhD? There’s a very nice Quora thread and in the summary of considerations that follows I’ll borrow/restate several from Justin/Ben/others there. I’ll assume that the second option you are considering is joining a medium-large company (which is likely most common). Ask yourself if you find the following properties appealing:

Freedom. A PhD will offer you a lot of freedom in the topics you wish to pursue and learn about. You’re in charge. Of course, you’ll have an adviser who will impose some constraints but in general you’ll have much more freedom than you might find elsewhere.

Ownership. The research you produce will be yours as an individual. Your accomplishments will have your name attached to them. In contrast, it is much more common to “blend in” inside a larger company. A common feeling here is becoming a “cog in a wheel”.

Exclusivity. There are very few people who make it to the top PhD programs. You’d be joining a group of a few hundred distinguished individuals in contrast to a few tens of thousands (?) that will join some company.

Status. Regardless of whether it should be or not, working towards and eventually getting a PhD degree is culturally revered and recognized as an impressive achievement. You also get to be a Doctor; that’s awesome.

Personal freedom. As a PhD student you’re your own boss. Want to sleep in today? Sure. Want to skip a day and go on a vacation? Sure. All that matters is your final output and no one will force you to clock in from 9am to 5pm. Of course, some advisers might be more or less flexible about it and some companies might be as well, but it’s a true first order statement.

Maximizing future choice. Joining a PhD program doesn’t close any doors or eliminate future employment/lifestyle options. You can go one way (PhD -> anywhere else) but not the other (anywhere else -> PhD -> academia/research; it is statistically less likely). Additionally (although this might be quite specific to applied ML), you’re strictly more hirable as a PhD graduate or even as a PhD dropout and many companies might be willing to put you in a more interesting position or with a higher starting salary. More generally, maximizing choice for the future you is a good heuristic to follow.

Maximizing variance. You’re young and there’s really no need to rush. Once you graduate from a PhD you can spend the next ~50 years of your life in some company. Opt for more variance in your experiences.

Personal growth. PhD is an intense experience of rapid growth (you learn a lot) and personal self-discovery (you’ll become a master of managing your own psychology). PhD programs (especially if you can make it into a good one) also offer a high density of exceptionally bright people who will become your best friends forever.

Expertise. PhD is probably your only opportunity in life to really drill deep into a topic and become a recognized leading expert in the world at something. You’re exploring the edge of our knowledge as a species, without the burden of lesser distractions or constraints. There’s something beautiful about that and if you disagree, it could be a sign that PhD is not for you.

The disclaimer. I wanted to also add a few words on some of the potential downsides and failure modes. The PhD is a very specific kind of experience that deserves a large disclaimer. You will inevitably find yourself working very hard (especially before paper deadlines). You need to be okay with the suffering and have enough mental stamina and determination to deal with the pressure. At some points you will lose track of what day of the week it is and go on a diet of leftover food from the microkitchens. You’ll sit exhausted and alone in the lab on a beautiful, sunny Saturday scrolling through Facebook pictures of your friends having fun on exotic trips, paid for by their 5-10x larger salaries. You will have to throw away 3 months of your work while somehow keeping your mental health intact. You’ll struggle with the realization that months of your work were spent on a paper with a few citations while your friends do exciting startups with TechCrunch articles or push products to millions of people. You’ll experience identity crises during which you’ll question your life decisions and wonder what you’re doing with some of the best years of your life. As a result, you should be quite certain that you can thrive in an unstructured environment in the pursuit research and discovery for science. If you’re unsure you should lean slightly negative by default. Ideally you should consider getting a taste of research as an undergraduate on a summer research program before before you decide to commit. In fact, one of the primary reasons that research experience is so desirable during the PhD hiring process is not the research itself, but the fact that the student is more likely to know what they’re getting themselves into.

I should clarify explicitly that this post is not about convincing anyone to do a PhD, I’ve merely tried to enumerate some of the common considerations above. The majority of this post focuses on some tips/tricks for navigating the experience once if you decide to go for it (which we’ll see shortly, below).

Lastly, as a random thought I heard it said that you should only do a PhD if you want to go into academia. In light of all of the above I’d argue that a PhD has strong intrinsic value - it’s an end by itself, not just a means to some end (e.g. academic job).

Getting into a PhD program: references, references, references. Great, you’ve decided to go for it. Now how do you get into a good PhD program? The first order approximation is quite simple - by far most important component are strong reference letters. The ideal scenario is that a well-known professor writes you a letter along the lines of: “Blah is in top 5 of students I’ve ever worked with. She takes initiative, comes up with her own ideas, and gets them to work.” The worst letter is along the lines of: “Blah took my class. She did well.” A research publication under your belt from a summer research program is a very strong bonus, but not absolutely required provided you have strong letters. In particular note: grades are quite irrelevant but you generally don’t want them to be too low. This was not obvious to me as an undergrad and I spent a lot of energy on getting good grades. This time should have instead been directed towards research (or at the very least personal projects), as much and as early as possible, and if possible under supervision of multiple people (you’ll need 3+ letters!). As a last point, what won’t help you too much is pestering your potential advisers out of the blue. They are often incredibly busy people and if you try to approach them too aggressively in an effort to impress them somehow in conferences or over email this may agitate them.

Picking the school. Once you get into some PhD programs, how do you pick the school? It’s easy, join Stanford! Just kidding. More seriously, your dream school should 1) be a top school (not because it looks good on your resume/CV but because of feedback loops; top schools attract other top people, many of whom you will get to know and work with) 2) have a few potential advisers you would want to work with. I really do mean the “few” part - this is very important and provides a safety cushion for you if things don’t work out with your top choice for any one of hundreds of reasons - things in many cases outside of your control, e.g. your dream professor leaves, moves, or spontaneously disappears, and 3) be in a good environment physically. I don’t think new admits appreciate this enough: you will spend 5+ years of your really good years living near the school campus. Trust me, this is a long time and your life will consist of much more than just research."""

In [106]:
chars = list(set(text))

In [107]:
char2id = {c:i for i,c in enumerate(chars)}
id2char = {i:c for i,c in enumerate(chars)}

In [108]:
tokens = [char2id[c] for c in text]

In [109]:
import random

In [110]:
def get_random_x_y(tokens,seq_len):
    i = random.randint(0,len(tokens)-seq_len-2)
    seq = tokens[i:i+seq_len+1]
    x = seq[:-1]
    y = seq[1:]
    return x,y  

In [111]:
x,y

(tensor([42, 23,  0, 42, 59, 55, 16, 18, 42, 21, 55, 16, 53, 70,  0]),
 tensor([23,  0, 42, 59, 55, 16, 18, 42, 21, 55, 16, 53, 70,  0, 12]))

In [112]:
x = torch.zeros((1,1,3))

In [113]:
y = torch.LongTensor([5])

In [114]:
y.shape

torch.Size([1])

In [115]:
vocab_size = len(id2char)

In [116]:
vocab_size

73

In [117]:
criterion = nn.CrossEntropyLoss()
model = Seq2Seq(64,vocab_size)
optimizer = Adam(model.parameters(),lr=3e-4)

In [132]:
x,y = get_random_x_y(tokens,15)
x = torch.tensor(x)
y = torch.tensor(y)

In [119]:
x,y

(tensor([13, 18,  9, 42, 21, 55, 16, 18, 59, 42, 25, 53, 42, 58, 23]),
 tensor([18,  9, 42, 21, 55, 16, 18, 59, 42, 25, 53, 42, 58, 23, 15]))

In [145]:
for epoch in range(5000):
    seq_len = 64
    count = 0
    loss = 0
    for _ in range(len(text)//seq_len):
        x,y = get_random_x_y(tokens,seq_len)
        x = torch.tensor(x)
        y = torch.tensor(y)
        h,logits = model(x)
        loss += criterion(logits,y)
        count += 1
    loss = loss / count
    print(loss)
    optimizer.zero_grad()
    loss.backward()
#     print(loss)
    nn.utils.clip_grad_norm_(model.parameters(), 1)
    optimizer.step()

tensor(2.0763, grad_fn=<DivBackward0>)
tensor(2.0773, grad_fn=<DivBackward0>)
tensor(2.0701, grad_fn=<DivBackward0>)
tensor(2.0827, grad_fn=<DivBackward0>)
tensor(2.0901, grad_fn=<DivBackward0>)
tensor(2.0812, grad_fn=<DivBackward0>)
tensor(2.0719, grad_fn=<DivBackward0>)
tensor(2.0828, grad_fn=<DivBackward0>)
tensor(2.0753, grad_fn=<DivBackward0>)
tensor(2.0783, grad_fn=<DivBackward0>)
tensor(2.0897, grad_fn=<DivBackward0>)
tensor(2.0801, grad_fn=<DivBackward0>)
tensor(2.0717, grad_fn=<DivBackward0>)
tensor(2.0616, grad_fn=<DivBackward0>)
tensor(2.0470, grad_fn=<DivBackward0>)
tensor(2.0663, grad_fn=<DivBackward0>)
tensor(2.0518, grad_fn=<DivBackward0>)
tensor(2.0627, grad_fn=<DivBackward0>)
tensor(2.0564, grad_fn=<DivBackward0>)
tensor(2.0542, grad_fn=<DivBackward0>)
tensor(2.0791, grad_fn=<DivBackward0>)
tensor(2.0812, grad_fn=<DivBackward0>)
tensor(2.0645, grad_fn=<DivBackward0>)
tensor(2.0915, grad_fn=<DivBackward0>)
tensor(2.0955, grad_fn=<DivBackward0>)
tensor(2.0571, grad_fn=<D

tensor(2.0332, grad_fn=<DivBackward0>)
tensor(2.0477, grad_fn=<DivBackward0>)
tensor(2.0011, grad_fn=<DivBackward0>)
tensor(2.0240, grad_fn=<DivBackward0>)
tensor(2.0286, grad_fn=<DivBackward0>)
tensor(2.0290, grad_fn=<DivBackward0>)
tensor(2.0607, grad_fn=<DivBackward0>)
tensor(2.0205, grad_fn=<DivBackward0>)
tensor(2.0292, grad_fn=<DivBackward0>)
tensor(2.0700, grad_fn=<DivBackward0>)
tensor(2.0489, grad_fn=<DivBackward0>)
tensor(2.0304, grad_fn=<DivBackward0>)
tensor(2.0477, grad_fn=<DivBackward0>)
tensor(2.0241, grad_fn=<DivBackward0>)
tensor(2.0223, grad_fn=<DivBackward0>)
tensor(2.0420, grad_fn=<DivBackward0>)
tensor(2.0364, grad_fn=<DivBackward0>)
tensor(2.0405, grad_fn=<DivBackward0>)
tensor(2.0508, grad_fn=<DivBackward0>)
tensor(2.0391, grad_fn=<DivBackward0>)
tensor(2.0619, grad_fn=<DivBackward0>)
tensor(2.0318, grad_fn=<DivBackward0>)
tensor(2.0142, grad_fn=<DivBackward0>)
tensor(2.0498, grad_fn=<DivBackward0>)
tensor(2.0471, grad_fn=<DivBackward0>)
tensor(2.0353, grad_fn=<D

tensor(2.0079, grad_fn=<DivBackward0>)
tensor(1.9914, grad_fn=<DivBackward0>)
tensor(2.0256, grad_fn=<DivBackward0>)
tensor(1.9777, grad_fn=<DivBackward0>)
tensor(2.0098, grad_fn=<DivBackward0>)
tensor(2.0180, grad_fn=<DivBackward0>)
tensor(1.9920, grad_fn=<DivBackward0>)
tensor(2.0073, grad_fn=<DivBackward0>)
tensor(2.0018, grad_fn=<DivBackward0>)
tensor(1.9806, grad_fn=<DivBackward0>)
tensor(1.9977, grad_fn=<DivBackward0>)
tensor(2.0199, grad_fn=<DivBackward0>)
tensor(2.0021, grad_fn=<DivBackward0>)
tensor(1.9845, grad_fn=<DivBackward0>)
tensor(1.9796, grad_fn=<DivBackward0>)
tensor(2.0065, grad_fn=<DivBackward0>)
tensor(1.9934, grad_fn=<DivBackward0>)
tensor(2.0153, grad_fn=<DivBackward0>)
tensor(1.9880, grad_fn=<DivBackward0>)
tensor(1.9935, grad_fn=<DivBackward0>)
tensor(1.9859, grad_fn=<DivBackward0>)
tensor(2.0095, grad_fn=<DivBackward0>)
tensor(2.0078, grad_fn=<DivBackward0>)
tensor(1.9934, grad_fn=<DivBackward0>)
tensor(1.9867, grad_fn=<DivBackward0>)
tensor(2.0008, grad_fn=<D

tensor(1.9674, grad_fn=<DivBackward0>)
tensor(1.9764, grad_fn=<DivBackward0>)
tensor(1.9583, grad_fn=<DivBackward0>)
tensor(1.9539, grad_fn=<DivBackward0>)
tensor(1.9570, grad_fn=<DivBackward0>)
tensor(1.9652, grad_fn=<DivBackward0>)
tensor(1.9581, grad_fn=<DivBackward0>)
tensor(1.9541, grad_fn=<DivBackward0>)
tensor(1.9682, grad_fn=<DivBackward0>)
tensor(1.9417, grad_fn=<DivBackward0>)
tensor(1.9538, grad_fn=<DivBackward0>)
tensor(1.9377, grad_fn=<DivBackward0>)
tensor(1.9720, grad_fn=<DivBackward0>)
tensor(1.9738, grad_fn=<DivBackward0>)
tensor(1.9383, grad_fn=<DivBackward0>)
tensor(1.9456, grad_fn=<DivBackward0>)
tensor(1.9455, grad_fn=<DivBackward0>)
tensor(1.9533, grad_fn=<DivBackward0>)
tensor(1.9920, grad_fn=<DivBackward0>)
tensor(1.9683, grad_fn=<DivBackward0>)
tensor(1.9465, grad_fn=<DivBackward0>)
tensor(1.9589, grad_fn=<DivBackward0>)
tensor(1.9812, grad_fn=<DivBackward0>)
tensor(1.9739, grad_fn=<DivBackward0>)
tensor(1.9688, grad_fn=<DivBackward0>)
tensor(1.9396, grad_fn=<D

tensor(1.9167, grad_fn=<DivBackward0>)
tensor(1.9365, grad_fn=<DivBackward0>)
tensor(1.9203, grad_fn=<DivBackward0>)
tensor(1.9238, grad_fn=<DivBackward0>)
tensor(1.9517, grad_fn=<DivBackward0>)
tensor(1.9321, grad_fn=<DivBackward0>)
tensor(1.9257, grad_fn=<DivBackward0>)
tensor(1.9356, grad_fn=<DivBackward0>)
tensor(1.9562, grad_fn=<DivBackward0>)
tensor(1.9083, grad_fn=<DivBackward0>)
tensor(1.9503, grad_fn=<DivBackward0>)
tensor(1.9103, grad_fn=<DivBackward0>)
tensor(1.9137, grad_fn=<DivBackward0>)
tensor(1.9156, grad_fn=<DivBackward0>)
tensor(1.9303, grad_fn=<DivBackward0>)
tensor(1.9397, grad_fn=<DivBackward0>)
tensor(1.9081, grad_fn=<DivBackward0>)
tensor(1.9424, grad_fn=<DivBackward0>)
tensor(1.9395, grad_fn=<DivBackward0>)
tensor(1.9092, grad_fn=<DivBackward0>)
tensor(1.9210, grad_fn=<DivBackward0>)
tensor(1.9406, grad_fn=<DivBackward0>)
tensor(1.9184, grad_fn=<DivBackward0>)
tensor(1.9405, grad_fn=<DivBackward0>)
tensor(1.9414, grad_fn=<DivBackward0>)
tensor(1.9332, grad_fn=<D

tensor(1.9013, grad_fn=<DivBackward0>)
tensor(1.9058, grad_fn=<DivBackward0>)
tensor(1.8921, grad_fn=<DivBackward0>)
tensor(1.8991, grad_fn=<DivBackward0>)
tensor(1.8993, grad_fn=<DivBackward0>)
tensor(1.9161, grad_fn=<DivBackward0>)
tensor(1.8957, grad_fn=<DivBackward0>)
tensor(1.8857, grad_fn=<DivBackward0>)
tensor(1.9113, grad_fn=<DivBackward0>)
tensor(1.9067, grad_fn=<DivBackward0>)
tensor(1.8942, grad_fn=<DivBackward0>)
tensor(1.9180, grad_fn=<DivBackward0>)
tensor(1.9067, grad_fn=<DivBackward0>)
tensor(1.8991, grad_fn=<DivBackward0>)
tensor(1.8951, grad_fn=<DivBackward0>)
tensor(1.8865, grad_fn=<DivBackward0>)
tensor(1.9007, grad_fn=<DivBackward0>)
tensor(1.8976, grad_fn=<DivBackward0>)
tensor(1.8992, grad_fn=<DivBackward0>)
tensor(1.8838, grad_fn=<DivBackward0>)
tensor(1.8915, grad_fn=<DivBackward0>)
tensor(1.8970, grad_fn=<DivBackward0>)
tensor(1.8703, grad_fn=<DivBackward0>)
tensor(1.9032, grad_fn=<DivBackward0>)
tensor(1.8988, grad_fn=<DivBackward0>)
tensor(1.9014, grad_fn=<D

tensor(1.8441, grad_fn=<DivBackward0>)
tensor(1.8552, grad_fn=<DivBackward0>)
tensor(1.8713, grad_fn=<DivBackward0>)
tensor(1.8630, grad_fn=<DivBackward0>)
tensor(1.8621, grad_fn=<DivBackward0>)
tensor(1.8664, grad_fn=<DivBackward0>)
tensor(1.8584, grad_fn=<DivBackward0>)
tensor(1.8934, grad_fn=<DivBackward0>)
tensor(1.8730, grad_fn=<DivBackward0>)
tensor(1.8673, grad_fn=<DivBackward0>)
tensor(1.8691, grad_fn=<DivBackward0>)
tensor(1.8766, grad_fn=<DivBackward0>)
tensor(1.8638, grad_fn=<DivBackward0>)
tensor(1.8491, grad_fn=<DivBackward0>)
tensor(1.8742, grad_fn=<DivBackward0>)
tensor(1.8956, grad_fn=<DivBackward0>)
tensor(1.8587, grad_fn=<DivBackward0>)
tensor(1.8373, grad_fn=<DivBackward0>)
tensor(1.8600, grad_fn=<DivBackward0>)
tensor(1.8640, grad_fn=<DivBackward0>)
tensor(1.8719, grad_fn=<DivBackward0>)
tensor(1.8457, grad_fn=<DivBackward0>)
tensor(1.8498, grad_fn=<DivBackward0>)
tensor(1.8523, grad_fn=<DivBackward0>)
tensor(1.8383, grad_fn=<DivBackward0>)
tensor(1.8449, grad_fn=<D

tensor(1.8340, grad_fn=<DivBackward0>)
tensor(1.8566, grad_fn=<DivBackward0>)
tensor(1.8357, grad_fn=<DivBackward0>)
tensor(1.8542, grad_fn=<DivBackward0>)
tensor(1.8544, grad_fn=<DivBackward0>)
tensor(1.8352, grad_fn=<DivBackward0>)
tensor(1.8395, grad_fn=<DivBackward0>)
tensor(1.8409, grad_fn=<DivBackward0>)
tensor(1.8422, grad_fn=<DivBackward0>)
tensor(1.8294, grad_fn=<DivBackward0>)
tensor(1.8397, grad_fn=<DivBackward0>)
tensor(1.8235, grad_fn=<DivBackward0>)
tensor(1.8319, grad_fn=<DivBackward0>)
tensor(1.8644, grad_fn=<DivBackward0>)
tensor(1.8465, grad_fn=<DivBackward0>)
tensor(1.8679, grad_fn=<DivBackward0>)
tensor(1.8395, grad_fn=<DivBackward0>)
tensor(1.8343, grad_fn=<DivBackward0>)
tensor(1.8268, grad_fn=<DivBackward0>)
tensor(1.8135, grad_fn=<DivBackward0>)
tensor(1.8387, grad_fn=<DivBackward0>)
tensor(1.8474, grad_fn=<DivBackward0>)
tensor(1.8308, grad_fn=<DivBackward0>)
tensor(1.8347, grad_fn=<DivBackward0>)
tensor(1.8232, grad_fn=<DivBackward0>)
tensor(1.8250, grad_fn=<D

tensor(1.8245, grad_fn=<DivBackward0>)
tensor(1.7938, grad_fn=<DivBackward0>)
tensor(1.8099, grad_fn=<DivBackward0>)
tensor(1.8041, grad_fn=<DivBackward0>)
tensor(1.8170, grad_fn=<DivBackward0>)
tensor(1.8319, grad_fn=<DivBackward0>)
tensor(1.8172, grad_fn=<DivBackward0>)
tensor(1.8234, grad_fn=<DivBackward0>)
tensor(1.7941, grad_fn=<DivBackward0>)
tensor(1.7911, grad_fn=<DivBackward0>)
tensor(1.8236, grad_fn=<DivBackward0>)
tensor(1.7810, grad_fn=<DivBackward0>)
tensor(1.8016, grad_fn=<DivBackward0>)
tensor(1.8109, grad_fn=<DivBackward0>)
tensor(1.8010, grad_fn=<DivBackward0>)
tensor(1.8586, grad_fn=<DivBackward0>)
tensor(1.7746, grad_fn=<DivBackward0>)
tensor(1.7942, grad_fn=<DivBackward0>)
tensor(1.8228, grad_fn=<DivBackward0>)
tensor(1.8127, grad_fn=<DivBackward0>)
tensor(1.8015, grad_fn=<DivBackward0>)
tensor(1.8230, grad_fn=<DivBackward0>)
tensor(1.8085, grad_fn=<DivBackward0>)
tensor(1.7957, grad_fn=<DivBackward0>)
tensor(1.8278, grad_fn=<DivBackward0>)
tensor(1.7859, grad_fn=<D

tensor(1.7936, grad_fn=<DivBackward0>)
tensor(1.7886, grad_fn=<DivBackward0>)
tensor(1.7849, grad_fn=<DivBackward0>)
tensor(1.8279, grad_fn=<DivBackward0>)
tensor(1.7893, grad_fn=<DivBackward0>)
tensor(1.7402, grad_fn=<DivBackward0>)
tensor(1.7642, grad_fn=<DivBackward0>)
tensor(1.8055, grad_fn=<DivBackward0>)
tensor(1.7719, grad_fn=<DivBackward0>)
tensor(1.8028, grad_fn=<DivBackward0>)
tensor(1.7799, grad_fn=<DivBackward0>)
tensor(1.7820, grad_fn=<DivBackward0>)
tensor(1.8175, grad_fn=<DivBackward0>)
tensor(1.7972, grad_fn=<DivBackward0>)
tensor(1.8161, grad_fn=<DivBackward0>)
tensor(1.7783, grad_fn=<DivBackward0>)
tensor(1.8114, grad_fn=<DivBackward0>)
tensor(1.7863, grad_fn=<DivBackward0>)
tensor(1.7779, grad_fn=<DivBackward0>)
tensor(1.7567, grad_fn=<DivBackward0>)
tensor(1.7951, grad_fn=<DivBackward0>)
tensor(1.7891, grad_fn=<DivBackward0>)
tensor(1.7480, grad_fn=<DivBackward0>)
tensor(1.7651, grad_fn=<DivBackward0>)
tensor(1.7841, grad_fn=<DivBackward0>)
tensor(1.7914, grad_fn=<D

tensor(1.7716, grad_fn=<DivBackward0>)
tensor(1.7969, grad_fn=<DivBackward0>)
tensor(1.7724, grad_fn=<DivBackward0>)
tensor(1.7504, grad_fn=<DivBackward0>)
tensor(1.7537, grad_fn=<DivBackward0>)
tensor(1.7717, grad_fn=<DivBackward0>)
tensor(1.7524, grad_fn=<DivBackward0>)
tensor(1.7620, grad_fn=<DivBackward0>)
tensor(1.7718, grad_fn=<DivBackward0>)
tensor(1.7643, grad_fn=<DivBackward0>)
tensor(1.7760, grad_fn=<DivBackward0>)
tensor(1.7606, grad_fn=<DivBackward0>)
tensor(1.7527, grad_fn=<DivBackward0>)
tensor(1.7497, grad_fn=<DivBackward0>)
tensor(1.7346, grad_fn=<DivBackward0>)
tensor(1.7346, grad_fn=<DivBackward0>)
tensor(1.7681, grad_fn=<DivBackward0>)
tensor(1.7436, grad_fn=<DivBackward0>)
tensor(1.7701, grad_fn=<DivBackward0>)
tensor(1.7557, grad_fn=<DivBackward0>)
tensor(1.7784, grad_fn=<DivBackward0>)
tensor(1.7691, grad_fn=<DivBackward0>)
tensor(1.7538, grad_fn=<DivBackward0>)
tensor(1.7724, grad_fn=<DivBackward0>)
tensor(1.7573, grad_fn=<DivBackward0>)
tensor(1.7561, grad_fn=<D

tensor(1.7205, grad_fn=<DivBackward0>)
tensor(1.7380, grad_fn=<DivBackward0>)
tensor(1.7140, grad_fn=<DivBackward0>)
tensor(1.7418, grad_fn=<DivBackward0>)
tensor(1.7494, grad_fn=<DivBackward0>)
tensor(1.7430, grad_fn=<DivBackward0>)
tensor(1.7510, grad_fn=<DivBackward0>)
tensor(1.7401, grad_fn=<DivBackward0>)
tensor(1.7324, grad_fn=<DivBackward0>)
tensor(1.7555, grad_fn=<DivBackward0>)
tensor(1.7126, grad_fn=<DivBackward0>)
tensor(1.7092, grad_fn=<DivBackward0>)
tensor(1.7391, grad_fn=<DivBackward0>)
tensor(1.7189, grad_fn=<DivBackward0>)
tensor(1.7103, grad_fn=<DivBackward0>)
tensor(1.7287, grad_fn=<DivBackward0>)
tensor(1.7496, grad_fn=<DivBackward0>)
tensor(1.7145, grad_fn=<DivBackward0>)
tensor(1.7322, grad_fn=<DivBackward0>)
tensor(1.7246, grad_fn=<DivBackward0>)
tensor(1.7104, grad_fn=<DivBackward0>)
tensor(1.7119, grad_fn=<DivBackward0>)
tensor(1.7078, grad_fn=<DivBackward0>)
tensor(1.7182, grad_fn=<DivBackward0>)
tensor(1.7221, grad_fn=<DivBackward0>)
tensor(1.7254, grad_fn=<D

tensor(1.7214, grad_fn=<DivBackward0>)
tensor(1.6954, grad_fn=<DivBackward0>)
tensor(1.7120, grad_fn=<DivBackward0>)
tensor(1.7078, grad_fn=<DivBackward0>)
tensor(1.6846, grad_fn=<DivBackward0>)
tensor(1.7087, grad_fn=<DivBackward0>)
tensor(1.7042, grad_fn=<DivBackward0>)
tensor(1.7240, grad_fn=<DivBackward0>)
tensor(1.7316, grad_fn=<DivBackward0>)
tensor(1.7028, grad_fn=<DivBackward0>)
tensor(1.7001, grad_fn=<DivBackward0>)
tensor(1.6794, grad_fn=<DivBackward0>)
tensor(1.6974, grad_fn=<DivBackward0>)
tensor(1.6976, grad_fn=<DivBackward0>)
tensor(1.7120, grad_fn=<DivBackward0>)
tensor(1.7191, grad_fn=<DivBackward0>)
tensor(1.6946, grad_fn=<DivBackward0>)
tensor(1.6975, grad_fn=<DivBackward0>)
tensor(1.6906, grad_fn=<DivBackward0>)
tensor(1.7280, grad_fn=<DivBackward0>)
tensor(1.7100, grad_fn=<DivBackward0>)
tensor(1.7072, grad_fn=<DivBackward0>)
tensor(1.6820, grad_fn=<DivBackward0>)
tensor(1.6924, grad_fn=<DivBackward0>)
tensor(1.7036, grad_fn=<DivBackward0>)
tensor(1.7207, grad_fn=<D

tensor(1.6632, grad_fn=<DivBackward0>)
tensor(1.6969, grad_fn=<DivBackward0>)
tensor(1.6850, grad_fn=<DivBackward0>)
tensor(1.6968, grad_fn=<DivBackward0>)
tensor(1.6965, grad_fn=<DivBackward0>)
tensor(1.6662, grad_fn=<DivBackward0>)
tensor(1.6773, grad_fn=<DivBackward0>)
tensor(1.6927, grad_fn=<DivBackward0>)
tensor(1.6871, grad_fn=<DivBackward0>)
tensor(1.6630, grad_fn=<DivBackward0>)
tensor(1.6795, grad_fn=<DivBackward0>)
tensor(1.6559, grad_fn=<DivBackward0>)
tensor(1.6810, grad_fn=<DivBackward0>)
tensor(1.7002, grad_fn=<DivBackward0>)
tensor(1.6600, grad_fn=<DivBackward0>)
tensor(1.7018, grad_fn=<DivBackward0>)
tensor(1.6850, grad_fn=<DivBackward0>)
tensor(1.6776, grad_fn=<DivBackward0>)
tensor(1.7002, grad_fn=<DivBackward0>)
tensor(1.6991, grad_fn=<DivBackward0>)
tensor(1.6903, grad_fn=<DivBackward0>)
tensor(1.7154, grad_fn=<DivBackward0>)
tensor(1.7059, grad_fn=<DivBackward0>)
tensor(1.6849, grad_fn=<DivBackward0>)
tensor(1.6993, grad_fn=<DivBackward0>)
tensor(1.6614, grad_fn=<D

tensor(1.6871, grad_fn=<DivBackward0>)
tensor(1.6423, grad_fn=<DivBackward0>)
tensor(1.6510, grad_fn=<DivBackward0>)
tensor(1.6260, grad_fn=<DivBackward0>)
tensor(1.6669, grad_fn=<DivBackward0>)
tensor(1.6651, grad_fn=<DivBackward0>)
tensor(1.6321, grad_fn=<DivBackward0>)
tensor(1.6523, grad_fn=<DivBackward0>)
tensor(1.6706, grad_fn=<DivBackward0>)
tensor(1.6862, grad_fn=<DivBackward0>)
tensor(1.6707, grad_fn=<DivBackward0>)
tensor(1.6631, grad_fn=<DivBackward0>)
tensor(1.6794, grad_fn=<DivBackward0>)
tensor(1.6662, grad_fn=<DivBackward0>)
tensor(1.6623, grad_fn=<DivBackward0>)
tensor(1.6639, grad_fn=<DivBackward0>)
tensor(1.6315, grad_fn=<DivBackward0>)
tensor(1.6769, grad_fn=<DivBackward0>)
tensor(1.6495, grad_fn=<DivBackward0>)
tensor(1.6989, grad_fn=<DivBackward0>)
tensor(1.6675, grad_fn=<DivBackward0>)
tensor(1.6701, grad_fn=<DivBackward0>)
tensor(1.6259, grad_fn=<DivBackward0>)
tensor(1.6369, grad_fn=<DivBackward0>)
tensor(1.6786, grad_fn=<DivBackward0>)
tensor(1.6844, grad_fn=<D

tensor(1.6370, grad_fn=<DivBackward0>)
tensor(1.6481, grad_fn=<DivBackward0>)
tensor(1.6269, grad_fn=<DivBackward0>)
tensor(1.6274, grad_fn=<DivBackward0>)
tensor(1.6409, grad_fn=<DivBackward0>)
tensor(1.6575, grad_fn=<DivBackward0>)
tensor(1.6611, grad_fn=<DivBackward0>)
tensor(1.6435, grad_fn=<DivBackward0>)
tensor(1.6398, grad_fn=<DivBackward0>)
tensor(1.6494, grad_fn=<DivBackward0>)
tensor(1.6641, grad_fn=<DivBackward0>)
tensor(1.6655, grad_fn=<DivBackward0>)
tensor(1.6486, grad_fn=<DivBackward0>)
tensor(1.5907, grad_fn=<DivBackward0>)
tensor(1.6112, grad_fn=<DivBackward0>)
tensor(1.6457, grad_fn=<DivBackward0>)
tensor(1.6583, grad_fn=<DivBackward0>)
tensor(1.6252, grad_fn=<DivBackward0>)
tensor(1.6166, grad_fn=<DivBackward0>)
tensor(1.6521, grad_fn=<DivBackward0>)
tensor(1.6649, grad_fn=<DivBackward0>)
tensor(1.6687, grad_fn=<DivBackward0>)
tensor(1.6303, grad_fn=<DivBackward0>)
tensor(1.6327, grad_fn=<DivBackward0>)
tensor(1.6089, grad_fn=<DivBackward0>)
tensor(1.6495, grad_fn=<D

tensor(1.6456, grad_fn=<DivBackward0>)
tensor(1.6224, grad_fn=<DivBackward0>)
tensor(1.6107, grad_fn=<DivBackward0>)
tensor(1.6422, grad_fn=<DivBackward0>)
tensor(1.6401, grad_fn=<DivBackward0>)
tensor(1.6040, grad_fn=<DivBackward0>)
tensor(1.5982, grad_fn=<DivBackward0>)
tensor(1.6128, grad_fn=<DivBackward0>)
tensor(1.6033, grad_fn=<DivBackward0>)
tensor(1.6250, grad_fn=<DivBackward0>)
tensor(1.6010, grad_fn=<DivBackward0>)
tensor(1.6244, grad_fn=<DivBackward0>)
tensor(1.6329, grad_fn=<DivBackward0>)
tensor(1.6140, grad_fn=<DivBackward0>)
tensor(1.5975, grad_fn=<DivBackward0>)
tensor(1.5948, grad_fn=<DivBackward0>)
tensor(1.6108, grad_fn=<DivBackward0>)
tensor(1.6265, grad_fn=<DivBackward0>)
tensor(1.6004, grad_fn=<DivBackward0>)
tensor(1.6164, grad_fn=<DivBackward0>)
tensor(1.5867, grad_fn=<DivBackward0>)
tensor(1.6284, grad_fn=<DivBackward0>)
tensor(1.6261, grad_fn=<DivBackward0>)
tensor(1.6415, grad_fn=<DivBackward0>)
tensor(1.6161, grad_fn=<DivBackward0>)
tensor(1.6075, grad_fn=<D

tensor(1.6011, grad_fn=<DivBackward0>)
tensor(1.6030, grad_fn=<DivBackward0>)
tensor(1.6103, grad_fn=<DivBackward0>)
tensor(1.6033, grad_fn=<DivBackward0>)
tensor(1.5865, grad_fn=<DivBackward0>)
tensor(1.5925, grad_fn=<DivBackward0>)
tensor(1.6119, grad_fn=<DivBackward0>)
tensor(1.5911, grad_fn=<DivBackward0>)
tensor(1.5896, grad_fn=<DivBackward0>)
tensor(1.6220, grad_fn=<DivBackward0>)
tensor(1.6116, grad_fn=<DivBackward0>)
tensor(1.5759, grad_fn=<DivBackward0>)
tensor(1.5815, grad_fn=<DivBackward0>)
tensor(1.5974, grad_fn=<DivBackward0>)
tensor(1.5592, grad_fn=<DivBackward0>)
tensor(1.5872, grad_fn=<DivBackward0>)
tensor(1.5960, grad_fn=<DivBackward0>)
tensor(1.6016, grad_fn=<DivBackward0>)
tensor(1.6303, grad_fn=<DivBackward0>)
tensor(1.6223, grad_fn=<DivBackward0>)
tensor(1.5878, grad_fn=<DivBackward0>)
tensor(1.5661, grad_fn=<DivBackward0>)
tensor(1.5977, grad_fn=<DivBackward0>)
tensor(1.6040, grad_fn=<DivBackward0>)
tensor(1.6107, grad_fn=<DivBackward0>)
tensor(1.6025, grad_fn=<D

tensor(1.5980, grad_fn=<DivBackward0>)
tensor(1.5816, grad_fn=<DivBackward0>)
tensor(1.6051, grad_fn=<DivBackward0>)
tensor(1.5869, grad_fn=<DivBackward0>)
tensor(1.5816, grad_fn=<DivBackward0>)
tensor(1.5674, grad_fn=<DivBackward0>)
tensor(1.6003, grad_fn=<DivBackward0>)
tensor(1.5842, grad_fn=<DivBackward0>)
tensor(1.5827, grad_fn=<DivBackward0>)
tensor(1.5802, grad_fn=<DivBackward0>)
tensor(1.5997, grad_fn=<DivBackward0>)
tensor(1.5782, grad_fn=<DivBackward0>)
tensor(1.5938, grad_fn=<DivBackward0>)
tensor(1.5701, grad_fn=<DivBackward0>)
tensor(1.5493, grad_fn=<DivBackward0>)
tensor(1.6181, grad_fn=<DivBackward0>)
tensor(1.5683, grad_fn=<DivBackward0>)
tensor(1.5910, grad_fn=<DivBackward0>)
tensor(1.5753, grad_fn=<DivBackward0>)
tensor(1.5793, grad_fn=<DivBackward0>)
tensor(1.5830, grad_fn=<DivBackward0>)
tensor(1.5758, grad_fn=<DivBackward0>)
tensor(1.6061, grad_fn=<DivBackward0>)
tensor(1.5838, grad_fn=<DivBackward0>)
tensor(1.5897, grad_fn=<DivBackward0>)
tensor(1.5933, grad_fn=<D

tensor(1.5428, grad_fn=<DivBackward0>)
tensor(1.5673, grad_fn=<DivBackward0>)
tensor(1.5740, grad_fn=<DivBackward0>)
tensor(1.5611, grad_fn=<DivBackward0>)
tensor(1.5480, grad_fn=<DivBackward0>)
tensor(1.5526, grad_fn=<DivBackward0>)
tensor(1.5625, grad_fn=<DivBackward0>)
tensor(1.5737, grad_fn=<DivBackward0>)
tensor(1.5869, grad_fn=<DivBackward0>)
tensor(1.5776, grad_fn=<DivBackward0>)
tensor(1.5716, grad_fn=<DivBackward0>)
tensor(1.5616, grad_fn=<DivBackward0>)
tensor(1.5767, grad_fn=<DivBackward0>)
tensor(1.5505, grad_fn=<DivBackward0>)
tensor(1.5796, grad_fn=<DivBackward0>)
tensor(1.5670, grad_fn=<DivBackward0>)
tensor(1.5533, grad_fn=<DivBackward0>)
tensor(1.5642, grad_fn=<DivBackward0>)
tensor(1.5583, grad_fn=<DivBackward0>)
tensor(1.5727, grad_fn=<DivBackward0>)
tensor(1.5863, grad_fn=<DivBackward0>)
tensor(1.5122, grad_fn=<DivBackward0>)
tensor(1.5840, grad_fn=<DivBackward0>)
tensor(1.5547, grad_fn=<DivBackward0>)
tensor(1.5724, grad_fn=<DivBackward0>)
tensor(1.5588, grad_fn=<D

tensor(1.5577, grad_fn=<DivBackward0>)
tensor(1.5458, grad_fn=<DivBackward0>)
tensor(1.5306, grad_fn=<DivBackward0>)
tensor(1.5511, grad_fn=<DivBackward0>)
tensor(1.5463, grad_fn=<DivBackward0>)
tensor(1.5303, grad_fn=<DivBackward0>)
tensor(1.5560, grad_fn=<DivBackward0>)
tensor(1.5597, grad_fn=<DivBackward0>)
tensor(1.5800, grad_fn=<DivBackward0>)
tensor(1.5354, grad_fn=<DivBackward0>)
tensor(1.5532, grad_fn=<DivBackward0>)
tensor(1.5420, grad_fn=<DivBackward0>)
tensor(1.5485, grad_fn=<DivBackward0>)
tensor(1.5505, grad_fn=<DivBackward0>)
tensor(1.5437, grad_fn=<DivBackward0>)
tensor(1.5327, grad_fn=<DivBackward0>)
tensor(1.5369, grad_fn=<DivBackward0>)
tensor(1.5317, grad_fn=<DivBackward0>)
tensor(1.5176, grad_fn=<DivBackward0>)
tensor(1.5536, grad_fn=<DivBackward0>)
tensor(1.5683, grad_fn=<DivBackward0>)
tensor(1.5273, grad_fn=<DivBackward0>)
tensor(1.5344, grad_fn=<DivBackward0>)
tensor(1.5386, grad_fn=<DivBackward0>)
tensor(1.5396, grad_fn=<DivBackward0>)
tensor(1.5287, grad_fn=<D

tensor(1.5601, grad_fn=<DivBackward0>)
tensor(1.5217, grad_fn=<DivBackward0>)
tensor(1.5182, grad_fn=<DivBackward0>)
tensor(1.5328, grad_fn=<DivBackward0>)
tensor(1.5114, grad_fn=<DivBackward0>)
tensor(1.5049, grad_fn=<DivBackward0>)
tensor(1.5181, grad_fn=<DivBackward0>)
tensor(1.5113, grad_fn=<DivBackward0>)
tensor(1.5132, grad_fn=<DivBackward0>)
tensor(1.5374, grad_fn=<DivBackward0>)
tensor(1.5305, grad_fn=<DivBackward0>)
tensor(1.5171, grad_fn=<DivBackward0>)
tensor(1.5105, grad_fn=<DivBackward0>)
tensor(1.5057, grad_fn=<DivBackward0>)
tensor(1.5297, grad_fn=<DivBackward0>)
tensor(1.5284, grad_fn=<DivBackward0>)
tensor(1.5428, grad_fn=<DivBackward0>)
tensor(1.4974, grad_fn=<DivBackward0>)
tensor(1.5320, grad_fn=<DivBackward0>)
tensor(1.5275, grad_fn=<DivBackward0>)
tensor(1.4980, grad_fn=<DivBackward0>)
tensor(1.5416, grad_fn=<DivBackward0>)
tensor(1.5268, grad_fn=<DivBackward0>)
tensor(1.4971, grad_fn=<DivBackward0>)
tensor(1.5077, grad_fn=<DivBackward0>)
tensor(1.5562, grad_fn=<D

KeyboardInterrupt: 

In [146]:
torch.save(model,'1.pt')

In [147]:
h,logits = model(x)

In [148]:
''.join(id2char[xi.item()] for xi in x)

'ou should consider getting a taste of research as an undergradua'

In [149]:
''.join(id2char[xi.item()] for xi in logits.argmax(dim=-1))

'nrwhould bomsider iot ing t Phnt rsf teaearch i  a dander ram rt'