GitHub - alicja-januszkiewicz/bidnamic-data-challenge: Bidnamic's Data Engineering Coding Challenge

Requirements:

psycopg2

How to run:

Set up a database to hold the data. See prepare_db.py for an example query to create the required tables on a newly set up database.
Fill out conn_options in main.py as required.
Run main.py as a python script to ingest and aggregate the data.

Performance

Data ingestion takes around 2.5s as measured with timeit() on a Ryzen 1700X. aggregate_roas() takes around 0.8 seconds per alias field. The profile.py module facilitates measuring these values.

Ingest upsert strategy

As it stands the main.py script will simply ignore rows violating any unique or primary key constraints. There is an alternative approach implemented in alternative_ingest.py which instead will update the rows with new values. To use it, uncomment line 3 (# from alternative_ingest import ingest) in main.py.

Notes on extending the functionality

`status` columns

The status columns seem to only ever contain two values, ENABLED and REMOVED. One could implement these sql columns to be of boolean type to save memory. However, this would require editing the ingest() function to map these values from strings to booleans. The speed-memory tradeoff might not be worth it.

`aggregate_roas()`

The aggregate_roas() function can be easily extended to allow aggregating by more fields of the alias column; One only needs to append the field name to valid_aggregate_columns and the name with its position to alias_field_to_position_mapper. For example adding in 'structure_value' and {'structure_value': 4} respectively would allow the following usage: aggregate_roas(cursor, ['country', 'structure_value']).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ER Diagram.png		ER Diagram.png
adgroups.csv		adgroups.csv
alternative_ingest.py		alternative_ingest.py
campaigns.csv		campaigns.csv
challenge.md		challenge.md
logo.png		logo.png
main.py		main.py
prepare_db.py		prepare_db.py
profile.py		profile.py
readme.md		readme.md
search_terms.csv		search_terms.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ER Diagram.png

ER Diagram.png

adgroups.csv

adgroups.csv

alternative_ingest.py

alternative_ingest.py

campaigns.csv

campaigns.csv

challenge.md

challenge.md

logo.png

logo.png

main.py

main.py

prepare_db.py

prepare_db.py

profile.py

profile.py

readme.md

readme.md

search_terms.csv

search_terms.csv

Repository files navigation

Requirements:

How to run:

Performance

Ingest upsert strategy

Notes on extending the functionality

`status` columns

`aggregate_roas()`

ER Diagram

About

Releases

Packages

Languages

alicja-januszkiewicz/bidnamic-data-challenge

Folders and files

Latest commit

History

Repository files navigation

Requirements:

How to run:

Performance

Ingest upsert strategy

Notes on extending the functionality

status columns

aggregate_roas()

ER Diagram

About

Resources

Stars

Watchers

Forks

Languages

`status` columns

`aggregate_roas()`