# 0B. Obtaining the Datasets via Jupyter

Kaggle API provides a method to download the datasets via the command-line interface.
- Steps to do so will be covered in the Assignment Report

This notebook will provide an alternate method of downloading the datasets via Jupyter Notebooks.

# Table of Contents
1. [Installing kaggle](#installing_kaggle)
2. [Preparing API Credentials](#preparing_api_credentials)
3. [Downloading the Datasets](#downloading_datasets)

### Installing kaggle <a name="installing_kaggle"></a>

First, ensure you have kaggle installed. You can install it from a command prompt, or by running the following cell;

Installing from Command Prompt (recommended):
- conda install -c conda-forge kaggle

OR
- pip install kaggle

Installing from within Jupyter:

In [1]:
'''
Only run this cell if you want to install kaggle
Choose one method, uncomment the line and run the cell; Running just one of them will suffice
'''
import sys

# Install via pip
# !{sys.executable} -m pip install kaggle

# Install via conda
# On command prompt use:
# conda install -c conda-forge kaggle
# On notebook use:
# !conda install --prefix {sys.prefix} -c conda-forge kaggle --yes

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: C:\Users\AJL\Anaconda3

  added / updated specs:
    - kaggle


The following NEW packages will be INSTALLED:

  kaggle             conda-forge/win-64::kaggle-1.5.6-py37_1
  python-slugify     conda-forge/noarch::python-slugify-1.2.6-py_0
  unidecode          conda-forge/noarch::unidecode-1.1.1-py_0


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done


### Preparing API Credentials <a name="preparing_api_credentials"></a>

From the kaggle API official GitHub Link: https://github.com/Kaggle/kaggle-api

- To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. 

- Then go to the 'Account' tab of your user profile and select 'Create API Token'.
    - This will trigger the download of kaggle.json, a file containing your API credentials.


- Place this .json file in the location ~/.kaggle/kaggle.json (on Windows in the location C:\Users\{your-username}\.kaggle\kaggle.json). Create .kaggle folder if it doesn't exist

### Important:

Before continuing, ensure you have prepared your API credentials.

Simply ensure kaggle.json is correctly placed in the ~/.kaggle folder. This should have been already done in the above cell.

### Downloading the Datasets <a name="downloading_datasets"></a>

In [2]:
import kaggle as kg
import pandas as pd

In [3]:
kg.api.authenticate()
kg.api.dataset_download_files(dataset="datasnaek/youtube", path='./kaggle', unzip=True)

Now that the datasets have been downloaded, we can take a peek at them.

In [4]:
us_videos = pd.read_csv('./kaggle/USvideos.csv', error_bad_lines=False)
us_videos.head()

b'Skipping line 2401: expected 11 fields, saw 21\nSkipping line 2800: expected 11 fields, saw 21\nSkipping line 5297: expected 11 fields, saw 12\nSkipping line 5299: expected 11 fields, saw 12\nSkipping line 5300: expected 11 fields, saw 12\nSkipping line 5301: expected 11 fields, saw 12\n'


Unnamed: 0,video_id,title,channel_title,category_id,tags,views,likes,dislikes,comment_total,thumbnail_link,date
0,XpVt6Z1Gjjo,1 YEAR OF VLOGGING -- HOW LOGAN PAUL CHANGED Y...,Logan Paul Vlogs,24,logan paul vlog|logan paul|logan|paul|olympics...,4394029,320053,5931,46245,https://i.ytimg.com/vi/XpVt6Z1Gjjo/default.jpg,13.09
1,K4wEI5zhHB0,iPhone X — Introducing iPhone X — Apple,Apple,28,Apple|iPhone 10|iPhone Ten|iPhone|Portrait Lig...,7860119,185853,26679,0,https://i.ytimg.com/vi/K4wEI5zhHB0/default.jpg,13.09
2,cLdxuaxaQwc,My Response,PewDiePie,22,[none],5845909,576597,39774,170708,https://i.ytimg.com/vi/cLdxuaxaQwc/default.jpg,13.09
3,WYYvHb03Eog,Apple iPhone X first look,The Verge,28,apple iphone x hands on|Apple iPhone X|iPhone ...,2642103,24975,4542,12829,https://i.ytimg.com/vi/WYYvHb03Eog/default.jpg,13.09
4,sjlHnJvXdQs,iPhone X (parody),jacksfilms,23,jacksfilms|parody|parodies|iphone|iphone x|iph...,1168130,96666,568,6666,https://i.ytimg.com/vi/sjlHnJvXdQs/default.jpg,13.09


In [5]:
us_comments = pd.read_csv('./kaggle/UScomments.csv', error_bad_lines=False, low_memory=False)
us_comments.head()

b'Skipping line 41589: expected 4 fields, saw 11\nSkipping line 51628: expected 4 fields, saw 7\nSkipping line 114465: expected 4 fields, saw 5\nSkipping line 142496: expected 4 fields, saw 8\nSkipping line 189732: expected 4 fields, saw 6\nSkipping line 245218: expected 4 fields, saw 7\nSkipping line 388430: expected 4 fields, saw 5\n'


Unnamed: 0,video_id,comment_text,likes,replies
0,XpVt6Z1Gjjo,Logan Paul it's yo big day ‼️‼️‼️,4,0
1,XpVt6Z1Gjjo,I've been following you from the start of your...,3,0
2,XpVt6Z1Gjjo,Say hi to Kong and maverick for me,3,0
3,XpVt6Z1Gjjo,MY FAN . attendance,3,0
4,XpVt6Z1Gjjo,trending 😉,3,0
