# Getting the raw dataset

We'll be using `twython` and `cryptocompare` as APIs to get the respective
Twitter and cryptocurrency data that will make up the project's dataset.

## Importing packages.

First, we'll import the necessary python packages.


In [30]:
from datetime import datetime

# Access environment variables.
from os import environ

# Resolving paths in a platform agnostic way.
from os.path import dirname, join, realpath

# Cryptocompare API.
from cryptocompare import get_historical_price_minute, get_price

# Loading environment variables from a `.env` file.
from dotenv import load_dotenv

# Manipulating the raw data to save it in a ``.csv`` files.
from pandas import DataFrame

# Twython API.
from twython import Twython

In [31]:
def is_interactive():
    import __main__ as main

    return not hasattr(main, "__file__")


if is_interactive():
    SCRIPT_DIR = dirname(realpath("__file__"))
else:
    SCRIPT_DIR = dirname(realpath(__file__))

DATA_DIR = join(dirname(SCRIPT_DIR), "data")

## Loading secrets from environment variables.

Next, we need to set credentials to access the CryptoCompare and Twython
APIs. To avoid hard coding the secrets, we'll load them from the environment,
i.e., a `.env` file.

In [32]:
# Loads environment variables from a `.env` file.
load_dotenv()

# Now the environment variables from the file are available, as if they were
# specified typically from the commandline.
TWITTER_APP_KEY = environ["TWITTER_APP_KEY"]
TWITTER_APP_SECRET = environ["TWITTER_APP_SECRET"]

## Testing the APIs.

Let's test the CryptoCompare API. `CRYPTOCOMPARE_API_KEY` should be specified
in the `.env` file so that the python package can detect it automatically as
an environment variable.

In [33]:
get_price("BTC", currency="USD")

{'BTC': {'USD': 38691.61}}

Now let's test accessing Twitter's API through Twython.

In [34]:
twitter = Twython(TWITTER_APP_KEY, TWITTER_APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
twitter = Twython(
    TWITTER_APP_KEY, TWITTER_APP_SECRET, access_token=ACCESS_TOKEN
)

search_results = twitter.search(count=1, q="cryptocurrency")
print(search_results)

{'statuses': [{'created_at': 'Tue Mar 08 16:20:47 +0000 2022', 'id': 1501231464821637126, 'id_str': '1501231464821637126', 'text': 'RT @Rabbit1366: @Kenzo_Ventures @CalvariaP2E The most beautiful things in the world cannot be touched,cannot be seen with the eyes,they hav…', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'Rabbit1366', 'name': 'Rabbit', 'id': 1432642655779962881, 'id_str': '1432642655779962881', 'indices': [3, 14]}, {'screen_name': 'Kenzo_Ventures', 'name': 'Kenzo Ventures', 'id': 1421141837310357505, 'id_str': '1421141837310357505', 'indices': [16, 31]}, {'screen_name': 'CalvariaP2E', 'name': 'Calvaria: P2E Game / WL IS OPEN', 'id': 1441073624526426120, 'id_str': '1441073624526426120', 'indices': [32, 44]}], 'urls': []}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply

## Getting cryptocurrency data

In [36]:
cryptocurrencies = ["BTC", "ETH", "DOGE", "SOL", "AVAX"]
for cryptocurrency in cryptocurrencies:
    price_dataset = []
    earliest_timestamp = datetime.now()
    days_count = 7
    for day in range(0, days_count):
        price_dataset += get_historical_price_minute(
            cryptocurrency, "USD", limit=1440, toTs=earliest_timestamp
        )

        earliest_timestamp = price_dataset[-1]["time"]

    # Saving the raw price data to a csv file.
    price_data_frame = DataFrame(price_dataset)
    price_data_frame.to_csv(
        join(
            DATA_DIR,
            "raw",
            "crypto",
            f"{cryptocurrency.lower()}_{days_count}_days_by_minute.csv",
        )
    )