# Sentiment Analysis Dataset Preparation

This notebook demonstrates how to download and prepare a sentiment analysis dataset from Kaggle using their API. The process includes:

- Loading Kaggle credentials
- Authenticating with the Kaggle API
- Downloading a sentiment analysis dataset
- Extracting the dataset for further analysis

**Requirements**: Kaggle account and API credentials

In [1]:
import numpy as np

In [2]:
import json
with open("../artifacts/kaggle.json", "r") as file:
    data = json.load(file)


### Display Credentials (for verification only)

In [3]:
data

{'username': 'nimeshakarshana', 'key': '0e50d78835a1e1063466a7b451c8952b'}

In [4]:
!pip install kaggle



In [5]:
import os
os.environ['KAGGLE_USERNAME'] = data['username']
os.environ['KAGGLE_KEY'] = data['key']

### Authenticate with Kaggle

This step initializes the Kaggle API client and authenticates using the credentials we loaded:

1. The `KaggleApi` class provides programmatic access to Kaggle's features
2. We create an instance of the API client
3. The `authenticate()` method uses our username and API key (stored in environment variables)
4. Authentication must succeed before we can download datasets

If this step fails, check that your kaggle.json file contains valid credentials and that the environment variables were properly set.

In [6]:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()

### Download the Sentiment Analysis Datase

In [7]:
api.dataset_download_files('dineshpiyasamara/sentiment-analysis-dataset', path='../artifacts/')

Dataset URL: https://www.kaggle.com/datasets/dineshpiyasamara/sentiment-analysis-dataset


### Extract the Downloaded ZIP File

In [8]:
import zipfile
with zipfile.ZipFile("../artifacts/sentiment-analysis-dataset.zip", 'r') as zip_ref:
    zip_ref.extractall("../artifacts/")