Replace id column data type #59

cduhn17 · 2021-09-14T13:41:35Z

This update will change the id columns data type to uuid of all tables and enables uuid in Postgres.

🗣 Description

The "id" column data type ''text" is replaced with uuid at all table schema.

💭 Motivation and context

The motivation is to future proof the table schema in the event the database is merged into a data warehouse where there may be many of the same schema getting merged together. When operations like this happen the id columns of each database to be merged , would have duplicate id integers. The id column is primary key and as a result duplicated "id" would be filtered out of the data to be merged and the data would be lost. This PR addresses issue #49

🧪 Testing

✅ Checklist

This PR has an informative and human-readable title.
Changes are limited to a single goal - eschew scope creep!
All future TODOs are captured in issues, which are referenced
in code comments.
All relevant type-of-change labels have been added.
I have read the CONTRIBUTING document.
These code changes follow cisagov code standards.
All relevant repo and/or project documentation has been updated
to reflect the changes in this PR.
Tests have been added and/or modified to cover the changes in this PR.
All new and existing tests pass.

This update will change the id columns data type to uuid of all tables and enables uuid in Postgres.

coveralls · 2021-09-14T13:46:44Z

Pull Request Test Coverage Report for Build 1237867968

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 76.987%

Totals
Change from base Build 1233850694:	0.0%
Covered Lines:	174
Relevant Lines:	217

💛 - Coveralls

Get all new changes from develop repo..

schmelz21 · 2021-09-14T17:20:22Z

@cduhn17 I like this idea, just a couple of quick thoughts to run by you:

Do we want to consider having UUID associated to a single/simple column "name" to build continuity. For example use "_id", "id", "key", "pk" or "uuid" or alternatively "THE_TABLE_NAME_uid"? The thought here as we work with various sources, we may find conflict in using something like "organization_id" as it could also be a key to one of our data sources. This "organization_id" would then be "Foreign". Whatever selected would then be "off-limits".
For this PR do we want to consider including 'timestamp' for a column for record creation giving us a source to retrieve data.
Last one for consideration, do we want to leverage PostGres "Schemas" for each of our sources to better handle long-term objectives. Also becasue there will be times where sources will come and go.... For example a data sources may construct "exposed_credentials" natively. This could then become sixgill.exposed_credentials and hibp.exposed_credentials respectively. Secondarily a scheme for P&E Assets.

cduhn17 · 2021-09-14T19:41:43Z

@schmelz-ctr ,

I agree with creating a single identified "id" column. I like the simplicity and ease to write "_uid" that way when reading/writing and execution sql statements i.e. joins. As well as the reasons that you mentioned.
@schmelz-ctr @aloftus23 @DJensen94
In regards to the separate schemas per source. I think separating the schemas by source goes outside of the scope of this PR but its a real consideration and we should make that another issue. A thought on that is to keep the quantity of tables to a minimum and add a column called "source" this would add continuity to the data and reduce the complexity of sql statements(As the data sources grow, execution time may be impacted. In the case of joins).

schmelz21 · 2021-09-14T19:51:38Z

In regards to the separate schemas per source. I think separating the schemas by source goes outside of the scope of this PR but its a real consideration and we should make that another issue. A thought on that is to keep the quantity of tables to a minimum and add a column called "source" this would add continuity to the data and reduce the complexity of sql statements(As the data sources grow, execution time may be impacted. In the case of joins).

Makes sense.. for now agree simpler the better.

DJensen94

Looks good

Sort the Package Requirements in setup.py

Replace id column data type

Sort the Package Requirements in setup.py

Replace id column data type

74dbbf1

This update will change the id columns data type to uuid of all tables and enables uuid in Postgres.

Merge branch 'develop' into CD-working-dbschema

7cab7e1

Get all new changes from develop repo..

schmelz21 assigned cduhn17 Sep 14, 2021

schmelz21 added the improvement This issue or pull request will add or improve functionality, maintainability, or ease of use label Sep 14, 2021

schmelz21 added this to PR Reviews in Sprint 4 (9/15 – 10/1) Sep 15, 2021

Update UUID column names

247e597

cduhn17 requested review from schmelz21, DJensen94 and aloftus23 September 15, 2021 15:08

schmelz21 approved these changes Sep 15, 2021

View reviewed changes

cduhn17 marked this pull request as ready for review September 15, 2021 15:27

cduhn17 requested review from dav3r, elr64, felddy, jsf9k, mcdonnnj and stewartl97 as code owners September 15, 2021 15:27

DJensen94 approved these changes Sep 15, 2021

View reviewed changes

aloftus23 approved these changes Sep 15, 2021

View reviewed changes

schmelz21 merged commit e20fd50 into develop Sep 15, 2021

schmelz21 deleted the CD-working-dbschema branch September 15, 2021 19:29

schmelz21 linked an issue Sep 15, 2021 that may be closed by this pull request

Primary Key integer or UUID #49

Closed

1 task

schmelz21 mentioned this pull request Sep 15, 2021

Primary Key integer or UUID #49

Closed

1 task

schmelz21 moved this from Pull Request to Done in Sprint 4 (9/15 – 10/1) Sep 15, 2021

aloftus23 pushed a commit that referenced this pull request Apr 13, 2022

Merge pull request #59 from cisagov/sort_requirements

e21c5e2

Sort the Package Requirements in setup.py

aloftus23 pushed a commit that referenced this pull request Apr 13, 2022

Merge pull request #59 from cisagov/CD-working-dbschema

b980017

Replace id column data type

aloftus23 pushed a commit that referenced this pull request Apr 15, 2022

Merge pull request #59 from cisagov/sort_requirements

a8bdbe5

Sort the Package Requirements in setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace id column data type #59

Replace id column data type #59

cduhn17 commented Sep 14, 2021 •

edited by schmelz21

coveralls commented Sep 14, 2021 •

edited

schmelz21 commented Sep 14, 2021

cduhn17 commented Sep 14, 2021 •

edited

schmelz21 commented Sep 14, 2021 •

edited

DJensen94 left a comment

Replace id column data type #59

Replace id column data type #59

Conversation

cduhn17 commented Sep 14, 2021 • edited by schmelz21

🗣 Description

💭 Motivation and context

🧪 Testing

✅ Checklist

coveralls commented Sep 14, 2021 • edited

Pull Request Test Coverage Report for Build 1237867968

💛 - Coveralls

schmelz21 commented Sep 14, 2021

cduhn17 commented Sep 14, 2021 • edited

schmelz21 commented Sep 14, 2021 • edited

DJensen94 left a comment

Choose a reason for hiding this comment

cduhn17 commented Sep 14, 2021 •

edited by schmelz21

coveralls commented Sep 14, 2021 •

edited

cduhn17 commented Sep 14, 2021 •

edited

schmelz21 commented Sep 14, 2021 •

edited