eep — Data Migration README

Purpose

Quickly migrate CSV data from the project data/ folder into a remote API-backed database using Python (pandas + requests).
Keep sensitive values out of source control via a .env file.

Quick setup

Create a Python virtual environment and install dependencies:

Install virtualenv if not already installed:
```
pip install --user virtualenv
```
Create the virtual environment using virtualenv:
```
python3 -m virtualenv .venv
```

Activate the virtual environment:

macOS / Linux:
```
source .venv/bin/activate
```
Windows (PowerShell):
```
.venv\Scripts\Activate.ps1
```
Windows (cmd):
```
.venv\Scripts\activate.bat
```

Install dependencies:
```
pip install -r requirements.txt
```
requirements.txt should include (examples):
```
pandas
requests
python-dotenv
```
Recommendation: pin versions in requirements.txt for reproducibility (e.g., pandas==1.5.3).

Create the data directory
- mkdir data
- Place your CSV files in data/. Example names: users.csv, transactions.csv, products.csv.
Note: If those CSVs are large, do not commit them directly. Use one of:
- Add data/ to .gitignore
- Use Git LFS for large files (git lfs track "data/*")

Create the .env file

Create a file named .env at the repo root. Example contents (#file:.env):

API_BASE_URL=https://api.example.com
API_TOKEN=your_bearer_token_here
OTHER_OPTIONAL_VAR=value

How to run the loader

The recommended approach is a small script (example: scripts/load_data.py) that:
- Loads environment variables with python-dotenv
- Iterates CSVs in data/
- Uses pandas.read_csv to load chunks or full frames (use chunksize for large files)
- Transforms/validates rows as needed
- Sends rows or batches to the API using requests with Authorization: Bearer <API_TOKEN>
- Implements retries, backoff, and logging

Minimal loader example (conceptual)

# scripts/load_data.py
# Loads .env, iterates CSVs, and posts batches to API.
# Use proper error handling, logging, and chunking for large files.

Logging and errors

Use the logging module to record progress and errors.
Catch and handle: network errors (requests.exceptions), JSON/parse errors, and data validation issues.
Add exponential backoff and retry limits for transient API failures.

Conversation log requirement

For every interaction create a conversation summary file:
- .github/conversations/conversation_<unique_id>.md
- Include: purpose, files added/changed, environment variables required, any manual steps.

Recommendations & next steps

Add requirements.txt with pinned versions for reproducibility.
Implement chunked processing (pandas.read_csv with chunksize) for very large CSVs.
Consider validating schemas (pydantic or custom validators) before sending data.
Add unit/integration tests or a dry-run mode that validates payloads without posting.
Store really-large raw data outside Git (S3, external storage) and reference via metadata if needed.

Summary

Create data/ and put CSVs there (or keep externally and avoid committing).
Create .env with API_BASE_URL and API_TOKEN.
Implement a modular script using pandas + requests with logging, retries, and error handling.
Log this conversation under .github/conversations/ as required.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
00 - Column List.ipynb		00 - Column List.ipynb
01 - EEP Migrate Artist.ipynb		01 - EEP Migrate Artist.ipynb
02 - EEP Migrate Campaign.ipynb		02 - EEP Migrate Campaign.ipynb
03 - EEP Migrate Colours.ipynb		03 - EEP Migrate Colours.ipynb
04 - EEP Migrate EAN.ipynb		04 - EEP Migrate EAN.ipynb
05 - EEP Migrate Product.ipynb		05 - EEP Migrate Product.ipynb
OLD_SCHEMA_DOCUMENTATION.md		OLD_SCHEMA_DOCUMENTATION.md
README.md		README.md
SCHEMA_DOCUMENTATION.md		SCHEMA_DOCUMENTATION.md
data_columns_reference.txt		data_columns_reference.txt
product_migration_checkpoint.json		product_migration_checkpoint.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

eep — Data Migration README

About

Uh oh!

Releases

Packages

Languages

akackon/migration-script-eep

Folders and files

Latest commit

History

Repository files navigation

eep — Data Migration README

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages