Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.

Added resource_name column to all the scanners, added schema update handles in scanner dao #1864

Merged
merged 28 commits into from
Sep 18, 2018

Conversation

joecheuk
Copy link
Contributor

@joecheuk joecheuk commented Aug 7, 2018

Resolves #1828 , #109

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, it's awesome to have a way to make the schema changes. Just a comment on where the update is run.

if callable(schema_update) and subclass.__tablename__ in tables:
LOGGER.info('Updating table %s', subclass.__tablename__)
# schema_update will require the Table object.
schema_update(tables.get(subclass.__tablename__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, it would be awesome if the running of the schema update can be moved outside of the forseti application, so that the db update operation doesn't have to run everytime the forseti application starts up, and users can have the chance to backup their database before their db schema is actually updated. This will also require a documentation change to our upgrade instruction as well.

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.... where is the change in the startup script, that will call the db_migrator.py?

table (Table): The table object of this class.
"""
col = Column('resource_name', String(256), default='')
create_column(table, col)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is called again when the resource_name column already exists? Should there be error handling or logging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added error handling in db_migrator.py.

@@ -0,0 +1,62 @@
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this db_migrator.py for a proper noun?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to db_migrator.py

@codecov
Copy link

codecov bot commented Aug 28, 2018

Codecov Report

Merging #1864 into dev will decrease coverage by 0.01%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##              dev    #1864      +/-   ##
==========================================
- Coverage   89.25%   89.24%   -0.02%     
==========================================
  Files         161      161              
  Lines       12004    12008       +4     
==========================================
+ Hits        10714    10716       +2     
- Misses       1290     1292       +2
Impacted Files Coverage Δ
...oud/forseti/scanner/audit/bigquery_rules_engine.py 95.83% <ø> (ø) ⬆️
...forseti/scanner/audit/enabled_apis_rules_engine.py 88.49% <ø> (ø) ⬆️
...r/audit/instance_network_interface_rules_engine.py 92.64% <ø> (ø) ⬆️
...d/forseti/scanner/audit/ke_version_rules_engine.py 88.88% <ø> (ø) ⬆️
...loud/forseti/scanner/audit/buckets_rules_engine.py 81.92% <ø> (ø) ⬆️
...seti/scanner/audit/forwarding_rule_rules_engine.py 92% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/cloudsql_rules_engine.py 79.26% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/log_sink_rules_engine.py 93.7% <ø> (ø) ⬆️
...ner/scanners/instance_network_interface_scanner.py 0% <ø> (ø) ⬆️
...ud/forseti/scanner/audit/blacklist_rules_engine.py 93.18% <ø> (ø) ⬆️
... and 6 more

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall, some small comments.


echo "Attempting to upgrade database."
python $USER_HOME/forseti-security/install/gcp/upgrade_tools/db_migrator.py
sleep 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sleep 5.... can it take more than 5 seconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migrator script should block until the upgrade is finished, removed the sleep statement.

@@ -193,6 +193,11 @@ def GenerateConfig(context):
echo "Starting services."
systemctl start cloudsqlproxy
sleep 5

echo "Attempting to upgrade database."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be clear that this updating the schema, and add if necessary at the end.

echo "Attempting to update database schema. if necessary."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

LOGGER.info('Failed to update db schema, table=%s',
subclass.__tablename__)
except Exception as e:
LOGGER.error(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using LOGGER.exception() here?

LOGGER.exception('Unexpected error happened when attempting to update database schema.')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

try:
schema_update(tables.get(subclass.__tablename__))
except OperationalError:
LOGGER.info('Failed to update db schema, table=%s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be LOGGER.error(), rather than info()

If this is OperationalError, can we be more specific in the log message what might have happened?

Copy link
Contributor Author

@joecheuk joecheuk Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected when the table already has that column/column already deleted, I think info makes more sense so user won't need to worry about seeing the message at error level. Added log message in the update_schema method so we know what is failing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no log message in the update_schema()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an info log in the update_schema method - LOGGER.info('Attempting to create column: %s', col_name)

LOGGER.info('Failed to update db schema, table=%s',
subclass.__tablename__)
except Exception as e:
LOGGER.error(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add the table name here? Or is that already in the stacktrace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -183,6 +188,16 @@ def __repr__(self):
return string.format(
self.violation_type, self.resource_type, self.rule_name)

@staticmethod
def schema_update(table):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method name should start with verb first. So, update_schema(), which is also consistent with migrate_schema(). Or is there a reason to call this schema_update()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to update_schema

pool_recycle=3600)

# Upgrade Scanner tables.
migrate_schema(sql_engine, scanner_dao.BASE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this? Which will make it easier to add more tables in the future?

declaritive_bases = [scanner_dao.BASE, inventory_dao.BASE]

for base in declaritive_bases:
    migrate_schema(sql_engine, base)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@joecheuk
Copy link
Contributor Author

Hi @ahoying , can you take a look at this PR again? This would be the final version if no new bugs were found.

Copy link
Collaborator

@ahoying ahoying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the content of every resource_name field in the scanners to ensure that was the best value for the field, but the code LGTM.

@@ -193,6 +193,10 @@ def GenerateConfig(context):
echo "Starting services."
systemctl start cloudsqlproxy
sleep 5

echo "Attempting to update database schema. if necessary."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit, use a comma after schema, not period: "schema, if necessary"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, that should definitely be a comma, updated!

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of quick comments.

Args:
table (Table): The table object of this class.
"""
col_name = 'resource_name'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, in the future, as we have more columns to add, how would they be handled? This time, we want to add resource_name, next time we want to add xyz. It's hard to see how xyz would fit here.

Copy link
Contributor Author

@joecheuk joecheuk Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will add the column as an attribute to the class and add the column creation logic in this method,

i.e.
class Violation(BASE):
xyz = Column(...)

  def update_schema(table):
         col = Column('resource_name', String(256), default='')
         col.create(table, populate_default=True)

         col = Column('xyz', String(256), default='')
         col.create(table, populate_default=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's what I thought. I think you should make col_name into a list of tuples [(name, type, default), ....]

And then just loop?

Or else just hardcode the name into the Column(), instead of having a variable col_name to track it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to hard code the column name

try:
schema_update(tables.get(subclass.__tablename__))
except OperationalError:
LOGGER.info('Failed to update db schema, table=%s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no log message in the update_schema()

"""
col = Column('resource_name', String(256), default='')
LOGGER.info('Attempting to create column: %s', col.name)
col.create(table, populate_default=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the pattern is to add the column as you mentioned, then should we surround try/except around each col.create()? Otherwise, when adding xyz in the future, would the execution reach xyz if resource_name already exists?

         col = Column('resource_name', String(256), default='')
         col.create(table, populate_default=True)

         col = Column('xyz', String(256), default='')
         col.create(table, populate_default=True)

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For support purposes, will we ever need to know which columns are updated in which forseti versions?

Returns:
dict: A mapping of Action: Column.
"""
columnMapping = {'CREATE': Column('resource_name',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we want to add more columns in the future? Shouldn't we use a list?:

columnMapping = {'CREATE', [column1, column2, ..]}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Returns:
dict: A mapping of Action: Column.
"""
columnMapping = {'CREATE': Column('resource_name',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why camel casing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

CREATE = 'CREATE'


def create_col(table, col):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid abbreviation here?

def create_column(table, column):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


Args:
table (Table): The table object.
col (Column): The column object."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific on the Column and Table object's type here? Other than just 'Column` and Tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

col.create(table, populate_default=True)


def drop_col(table, col):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def drop_column(table, column):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

# The format of tables is: {table_name: Table object}.
tables = base.metadata.tables

schema_update_method_name = 'update_schema'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be consistent with the update_schema() method, so update_schema_method_name would be better.

update_schema_method_name = 'update_schema'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

schema_update_method_name = 'update_schema'

for subclass in base_subclasses:
schema_update = getattr(subclass, schema_update_method_name, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better as:

update_schema = getattr(subclass, update_schema_method_name, None)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

# schema_update will require the Table object.
try:
table = tables.get(subclass.__tablename__)
columnMapping = schema_update()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why camel casing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -183,6 +184,18 @@ def __repr__(self):
return string.format(
self.violation_type, self.resource_type, self.rule_name)

@staticmethod
def update_schema():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not do any update. It would be better to be called as get_update_actions() or get_migrate_actions().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

for columnAction, column in columnMapping.iteritems():
columnAction = columnAction.upper()
if columnAction in ColumnActionMap:
ColumnActionMap.get(columnAction)(table, column)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a try/except around this line, so that if a column exists, then it can move to the next column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more nits here.

Returns:
dict: A mapping of Action: Column.
"""
column_mapping = {'CREATE': [Column('resource_name',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the list out, so that it's easier to read?

columns_to_create = [
    Column('resource_name',  String(256), default=''),
    Column('abc',  String(256), default=''),
    Column('xyz',  String(256), default=''),
]

column_mapping = {
    'CREATE': columns_to_create,
    DELETE: columns_to_delete,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Returns:
dict: A mapping of Action: Column.
"""
column_mapping = {'CREATE': [Column('resource_name',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of column_mapping, it would document better if the naming is schema_update_mapping or schema_action_mapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

schema_update_actions_method = 'get_schema_update_actions'

for subclass in base_subclasses:
update_actions = getattr(subclass, schema_update_actions_method, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line here needs to be consistent with the naming comment above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@blueandgold blueandgold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just one more nit.

LOGGER.info('Updating table %s', subclass.__tablename__)
# schema_update will require the Table object.
table = tables.get(subclass.__tablename__)
schema_update_mapping = get_schema_update_actions()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more nit: this line is a bit inconsistent. We should make it read more smoothly as:

schema_update_actions = get_schema_update_actions()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1864 into dev will decrease coverage by 0.02%.
The diff coverage is 40%.

@@            Coverage Diff             @@
##              dev    #1864      +/-   ##
==========================================
- Coverage   89.27%   89.25%   -0.03%     
==========================================
  Files         162      162              
  Lines       12177    12182       +5     
==========================================
+ Hits        10871    10873       +2     
- Misses       1306     1309       +3
Impacted Files Coverage Δ
...oud/forseti/scanner/audit/bigquery_rules_engine.py 92.85% <ø> (ø) ⬆️
...forseti/scanner/audit/enabled_apis_rules_engine.py 88.49% <ø> (ø) ⬆️
...r/audit/instance_network_interface_rules_engine.py 92.64% <ø> (ø) ⬆️
...d/forseti/scanner/audit/ke_version_rules_engine.py 88.88% <ø> (ø) ⬆️
...loud/forseti/scanner/audit/buckets_rules_engine.py 81.92% <ø> (ø) ⬆️
...seti/scanner/audit/forwarding_rule_rules_engine.py 92% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/cloudsql_rules_engine.py 79.26% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/log_sink_rules_engine.py 93.7% <ø> (ø) ⬆️
...ner/scanners/instance_network_interface_scanner.py 0% <ø> (ø) ⬆️
...ud/forseti/scanner/audit/blacklist_rules_engine.py 93.18% <ø> (ø) ⬆️
... and 6 more

@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1864 into dev will decrease coverage by 0.02%.
The diff coverage is 40%.

@@            Coverage Diff             @@
##              dev    #1864      +/-   ##
==========================================
- Coverage   89.33%   89.31%   -0.03%     
==========================================
  Files         162      162              
  Lines       12212    12217       +5     
==========================================
+ Hits        10910    10912       +2     
- Misses       1302     1305       +3
Impacted Files Coverage Δ
...oud/forseti/scanner/audit/bigquery_rules_engine.py 92.85% <ø> (ø) ⬆️
...forseti/scanner/audit/enabled_apis_rules_engine.py 88.49% <ø> (ø) ⬆️
...r/audit/instance_network_interface_rules_engine.py 92.64% <ø> (ø) ⬆️
...d/forseti/scanner/audit/ke_version_rules_engine.py 88.88% <ø> (ø) ⬆️
...loud/forseti/scanner/audit/buckets_rules_engine.py 81.92% <ø> (ø) ⬆️
...seti/scanner/audit/forwarding_rule_rules_engine.py 92% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/cloudsql_rules_engine.py 79.26% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/log_sink_rules_engine.py 93.7% <ø> (ø) ⬆️
...ner/scanners/instance_network_interface_scanner.py 0% <ø> (ø) ⬆️
...ud/forseti/scanner/audit/blacklist_rules_engine.py 93.18% <ø> (ø) ⬆️
... and 6 more

1 similar comment
@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #1864 into dev will decrease coverage by 0.02%.
The diff coverage is 40%.

@@            Coverage Diff             @@
##              dev    #1864      +/-   ##
==========================================
- Coverage   89.33%   89.31%   -0.03%     
==========================================
  Files         162      162              
  Lines       12212    12217       +5     
==========================================
+ Hits        10910    10912       +2     
- Misses       1302     1305       +3
Impacted Files Coverage Δ
...oud/forseti/scanner/audit/bigquery_rules_engine.py 92.85% <ø> (ø) ⬆️
...forseti/scanner/audit/enabled_apis_rules_engine.py 88.49% <ø> (ø) ⬆️
...r/audit/instance_network_interface_rules_engine.py 92.64% <ø> (ø) ⬆️
...d/forseti/scanner/audit/ke_version_rules_engine.py 88.88% <ø> (ø) ⬆️
...loud/forseti/scanner/audit/buckets_rules_engine.py 81.92% <ø> (ø) ⬆️
...seti/scanner/audit/forwarding_rule_rules_engine.py 92% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/cloudsql_rules_engine.py 79.26% <ø> (ø) ⬆️
...oud/forseti/scanner/audit/log_sink_rules_engine.py 93.7% <ø> (ø) ⬆️
...ner/scanners/instance_network_interface_scanner.py 0% <ø> (ø) ⬆️
...ud/forseti/scanner/audit/blacklist_rules_engine.py 93.18% <ø> (ø) ⬆️
... and 6 more

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants