# TweetStream and Dataloader

## Stream of Tweets


Our tutorial uses a dataset of unlabeled tweets to simulate a text stream of tweets. Twitter provides an excellent source of 
text streams, given its widespread use and real-time updates from its users. We draw a set of ten million tweets in English from the Edinburgh corpus. This dataset is a collection of tweets from different languages for academic purposes and was downloaded from November 2009 to February 2010 using the Twitter API. The dataset consists of 1,000,000 tweets in a text file, with each tweet occupying one line and separated by a line break.

The file can be downloaded from [here](https://drive.google.com/file/d/1Fay5WRNKjIpa0wCtzGJBaR_W5_lMhaDJ/view?usp=sharing).

## TweetStream Class

To efficiently load and read larger text files that may not fit into memory, we used the [```IterableDataset```](https://pytorch.org/docs/stable/data.html) class of the Pytorch API, which is an extension of the IterableDataset class. We then utilized the [data loader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) provided by Pytorch to load the iterable dataset. This allowed us to efficiently access the data without having to store large amounts of it in memory.

### import libraries

In [1]:

from rivertext.utils import TweetStream
from torch.utils.data import DataLoader

### Load the Text Stream

In [2]:
ts = TweetStream("tweets.txt")
dataloader = DataLoader(ts, batch_size=1)

In [3]:
for tweet in dataloader:
    print(tweet)

['Gluetext is a fresh site aggregating all the news about whatever you want to one place']
['I WANT CHOCOLATE p']
['The WHITE zone is for loading and unloading ONLY If you need to load or unload go to the WHITE zone you ll LOVE it']
['10 Steps to make the best Italian coffee in Paris']
['why what happened boo hoo']
['I m really looking forward to a catch up coffee with and her cool little man']
['Morning world Guess it s time to get work started Gaan ons']
['The Stranger in My House']
['Photo Uberbyte I wanna grow up to look like one of these fuckers']
['yo I got some white rhino in my desk right now that shit is dank']
['Wishing I could sleep Big truck day in the morning']
['Space Invaders Autopsy T Shirt Space Invaders Gizmodo']
['I posted 16 photos on Facebook in the album AFTERNOON TEA']
['AT 6 54pm 35 6°C Humidity 16 Wind S 13km h Pressure 1015 9hPa SO FAR Min 22 9°C Max 38 9°C Rain nil F CAST']
['www parttimepoker com wsop 3m more for Moon']
['Artists Without Mortarboards Should 