This notebook focuses on collecting and preparing data from the Raverly API.
he main goals are to:
- fetch publicly available knitting pattern data using the Ravelry API,
- build an initial raw dataset containing pattern metadata,
- clean and structure the data for further analysis,
- prepare the dataset for exploratory data analysis (EDA) and later modelling.

This is the first step of the project. I will concentrate on data collection and basic data preparation for further analysis.

1. Import libraries for data handling, API calls, and quick visual checks

In [4]:
# BASIC DATA HANDLING
import pandas as pd
import numpy as np
import os
import json
import time

# API & REQUESTS
import requests
from requests.auth import HTTPBasicAuth

# VISUALISATION (later use)
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Settings for nicer plots
sns.set(style="whitegrid")



2. Define and create folders for raw and processed data.

In [5]:
RAW_DIR = "../data/raw/v1"
PROCESSED_DIR = "../data/processed/v1"

os.makedirs(RAW_DIR, exist_ok=True)
os.makedirs(PROCESSED_DIR, exist_ok=True)

RAW_JSON_PATH = os.path.join(RAW_DIR, "patterns_raw.json")
RAW_CSV_PATH = os.path.join(RAW_DIR, "patterns_raw.csv")

PROCESSED_CSV_PATH = os.path.join(PROCESSED_DIR, "patterns_clean.csv")

RAW_DIR, PROCESSED_DIR


('../data/raw/v1', '../data/processed/v1')

3. API Credentials. They are loaded from enviroment variables. They are stored locally as environment variables to prevent accidental exposure in the codebase.


In [6]:
from dotenv import load_dotenv

load_dotenv()  # loads variables from .env

RAVELRY_USER = os.getenv("RAVELRY_ACCESS_KEY")
RAVELRY_PASS = os.getenv("RAVELRY_PERSONAL_KEY")

if not RAVELRY_USER or not RAVELRY_PASS:
    raise ValueError(
        "API credentials not found. "
        "Make sure RAVELRY_ACCESS_KEY and RAVELRY_PERSONAL_KEY are set in .env file."
    )

print("Credentials loaded ✅")



Credentials loaded ✅
