**This notebook is inspired by Ben Trevett's solution to storing the IMDB 50000-review train and test datasets in JSON format for easier processing.**

https://github.com/bentrevett/pytorch-sentiment-analysis/issues/6

In [2]:
# Import libraries
import os
import torch
from torchtext import data
from torchtext import datasets
import json

In [3]:
%sh ls -a /dbfs/FileStore

In [4]:
# Download spacy en package
!python -m spacy download en

In [5]:
# Declare text and label fields, download dataset if required and then tokenize the training and test sets with spacy
TEXT = data.Field(tokenize='spacy', lower=True)
LABEL = data.LabelField()
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL, train='train', test='test')

In [6]:
# Save training example as JSON
train_examples = [vars(t) for t in train_data]
with open('/dbfs/FileStore/train.json', 'w+') as f:
  for example in train_examples:
    json.dump(example, f)
    f.write('\n')

In [7]:
# Save test example as JSON
test_examples = [vars(t) for t in test_data]
with open('/dbfs/FileStore/test.json', 'w+') as f:
  for example in test_examples:
    json.dump(example, f)
    f.write('\n')

# Download instructions

If you followed the example above, your files will be stored in **/dbfs/FileStore**. You can check the existence of your files by running the following command in a cell:

**%sh ls -a /dbfs/FileStore**

To download, in your browser's URL bar, you'll need to type the following and hit Enter **(but keep reading first!)**:

https://your_region.azuredatabricks.net/files/train.json

Here, your_region is the Azure region that is showing up in your databricks web address. For example, the first part of my URL is https://canadacentral.azuredatabricks.net

Basically, copy paste the https://....net/ part from your databricks webpage URL.

If you see that there is a **?o=###########** part after .net, then your final URL should look like this:

**https://canadacentral.azuredatabricks.net/files/test.json?o=1234567890**

Once you hit Enter, your browser will ask you to Open/Save/Download the file. That's it!