## 数据集类
`torch.util.data.Dataset`，能够快速加载数据<br>
我们需要在自定义的数据集类中自定义Dataset类，同时实现两个方法：<br>
1.`__len__`方法，获取元素个数<br>
2.`__getitem__`方法，通过传入索引的方式获取数据。

### 数据加载案例
每行完整记录一条短信内容，每行开头通过ham和spam标记为垃圾短信

In [1]:
from torch.utils.data import Dataset,DataLoader
import pandas as pd

data_path = r"data\SMSSpamCollection.txt"

class MyDataSet(Dataset):
    def __init__(self):
        lines = open(data_path,"r",encoding='utf-8')
        # 对数据进行处理，前四个为label，后面为短信内容
        lines = [[i[:4].strip(),i[4:].strip()] for i in lines]
        # 转化为dataFrame
        self.df = pd.DataFrame(lines,columns=["label","sms"])
        
    def __getitem__(self,index):
        single_item = self.df.iloc[index,:]
        return single_item.values[0],single_item.values[1]
    
    def __len__(self):
        return self.df.shape[0]

In [2]:
myDataset = MyDataSet()
myDataset[100]

('ham', "Please don't text me anymore. I have nothing else to say.")

## 数据加载器类
`torch.utils.data.DataLoader`

In [3]:
data_loader = DataLoader(dataset=myDataset,batch_size=7,shuffle=True)

#遍历，获取其中的每个batch的结果
for index,(label, context) in enumerate(data_loader):
    print(index,label,context)
    print("*"*100)

0 ('ham', 'ham', 'ham', 'ham', 'ham', 'spam', 'ham') ('WHORE YOU ARE UNBELIEVABLE.', "I lost 4 pounds since my doc visit last week woot woot! Now I'm gonna celebrate by stuffing my face!", 'I pocked you up there before', 'Captain vijaykanth is doing comedy in captain tv..he is drunken :)', "I'm stuck in da middle of da row on da right hand side of da lt...", 'YOU VE WON! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TC s, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins', 'Wat r u doing?')
****************************************************************************************************
1 ('ham', 'spam', 'ham', 'ham', 'spam', 'ham', 'ham') ('Yeah work is fine, started last week, all the same stuff as before, dull but easy and guys are fun!', '+449071512431 URGENT! This is the 2nd attempt to contact U!U have WON £1250 CALL 09071512433 b4 050703 T&CsBCM4235WC1N3XX. callcost 150ppm mobilesvary. max£7. 50', 'Babe, I need your advice', 'Th

168 ('ham', 'ham', 'ham', 'spam', 'ham', 'ham', 'ham') ('Your pussy is perfect!', 'Just seeing your missed call my dear brother. Do have a gr8 day.', 'Hi this is yijue, can i meet u at 11 tmr?', 'Reply with your name and address and YOU WILL RECEIVE BY POST a weeks completely free accommodation at various global locations www.phb1.com ph:08700435505150p', 'I have gone into get info bt dont know what to do', 'Just got outta class gonna go gym.', "Sorry, I'll call later")
****************************************************************************************************
169 ('ham', 'ham', 'ham', 'ham', 'spam', 'spam', 'ham') ('U should have made an appointment', 'And stop wondering "wow is she ever going to stop tm\'ing me ?!" because I will tm you whenever I want because you are MINE ... *laughs*', "I'll text carlos and let you know, hang on", 'Yup... From what i remb... I think should be can book...', 'CLAIRE here am havin borin time & am now alone U wanna cum over 2nite? Chat now 090

318 ('ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham') ('Watch lor. I saw a few swatch one i thk quite ok. Ard 116 but i need 2nd opinion leh...', "As I entered my cabin my PA said, '' Happy B'day Boss !!''. I felt special. She askd me 4 lunch. After lunch she invited me to her apartment. We went there.", 'Not heard from U4 a while. Call 4 rude chat private line 01223585334 to cum. Wan 2C pics of me gettin shagged then text PIX to 8552. 2End send STOP 8552 SAM xxx', 'How many licks does it take to get to the center of a tootsie pop?', 'He needs to stop going to bed and make with the fucking dealing', "Excellent, I'll see what riley's plans are", 'Yes see ya not on the dot')
****************************************************************************************************
319 ('spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham') ("Wanna get laid 2nite? Want real Dogging locations sent direct to ur mobile? Join the UK's largest Dogging Network. Txt PARK to 69696 now! Nyt. ec2a. 3lp £1.50/

439 ('ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham') ('No problem. Talk to you later', "alright. Thanks for the advice. Enjoy your night out. I'ma try to get some sleep...", 'Now press conference da:)', 'Really dun bluff me leh... U sleep early too. Nite...', "These won't do. Have to move on to morphine", "I'm in class. Did you get my text.", 'Received, understood n acted upon!')
****************************************************************************************************
440 ('ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham') ("Yup. Izzit still raining heavily cos i'm in e mrt i can't c outside.", 'What time should I tell my friend to be around?', "Sorry, I'll call later in meeting", 'How come guoyang go n tell her? Then u told her?', "I'll text you when I drop x off", "Hurry up, I've been weed-deficient for like three days", 'Yup... How ü noe leh...')
****************************************************************************************************
441 ('ham', 'spam', 'ham', 

609 ('spam', 'ham', 'ham', 'spam', 'ham', 'ham', 'ham') ('SMS SERVICES For your inclusive text credits pls gotto www.comuk.net login 3qxj9 unsubscribe with STOP no extra charge help 08702840625 comuk.220cm2 9AE', 'No it will reach by 9 only. She telling she will be there. I dont know', 'Update your face book status frequently :)', 'Bought one ringtone and now getting texts costing 3 pound offering more tones etc', "I wonder if you'll get this text?", 'The greatest test of courage on earth is to bear defeat without losing heart....gn tc', 'Guy, no flash me now. If you go call me, call me. How madam. Take care oh.')
****************************************************************************************************
610 ('spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham') ('Our dating service has been asked 2 contact U by someone shy! CALL 09058091870 NOW all will be revealed. POBox84, M26 3UZ 150p', "There'll be a minor shindig at my place later tonight, you interested?", 'Gettin rdy to sh

761 ('ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham') ("Actually i'm waiting for 2 weeks when they start putting ad.", "The last thing i ever wanted to do was hurt you. And i didn't think it would have. You'd laugh, be embarassed, delete the tag and keep going. But as far as i knew, it wasn't even up. The fact that you even felt like i would do it to hurt you shows you really don't know me at all. It was messy wednesday, but it wasn't bad. The problem i have with it is you HAVE the time to clean it, but you choose not to. You skype, you take pictures, you sleep, you want to go out. I don't mind a few things here and there, but when you don't make the bed, when you throw laundry on top of it, when i can't have a friend in the house because i'm embarassed that there's underwear and bras strewn on the bed, pillows on the floor, that's something else. You used to be good about at least making the bed.", "Oh right, ok. I'll make sure that i do loads of work during the day!  got a really nas

## PyTorch自带数据集
1.`torchvision`提供了对图片数据处理的相关api和数据<br>
<li>数据位置：torchvision.datasets<br>
2.`torchtext`提供了对文本数据处理相关api和数据<br>
<li>数据位置：torchtext.datasets

In [4]:
import torchvision
dataset = torchvision.datasets.MNIST('./data/',train=True,download=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST\raw\train-images-idx3-ubyte.gz


  0%|          | 0/9912422 [00:00<?, ?it/s]

Extracting ./data/MNIST\raw\train-images-idx3-ubyte.gz to ./data/MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST\raw\train-labels-idx1-ubyte.gz


  0%|          | 0/28881 [00:00<?, ?it/s]

Extracting ./data/MNIST\raw\train-labels-idx1-ubyte.gz to ./data/MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST\raw\t10k-images-idx3-ubyte.gz


  0%|          | 0/1648877 [00:00<?, ?it/s]

Extracting ./data/MNIST\raw\t10k-images-idx3-ubyte.gz to ./data/MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST\raw\t10k-labels-idx1-ubyte.gz


  0%|          | 0/4542 [00:00<?, ?it/s]

Extracting ./data/MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data/MNIST\raw



In [5]:
dataset[0][0].show()