Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Generating primary key for raw_data #145

Closed
pgdr opened this issue Feb 19, 2017 · 13 comments
Closed

WIP Generating primary key for raw_data #145

pgdr opened this issue Feb 19, 2017 · 13 comments

Comments

@pgdr
Copy link
Member

pgdr commented Feb 19, 2017

WIP for the first step, adding an integerfield primary key to sensor_sensor:

Two steps:

  1. add auto-increment field (but must manually add unique numbers)
  2. move primary key from sensor_id to s_id (the integer field)

Model file:

class Sensor( Model ):
    IDPattern = "[-_:a-zA-Z0-9]+"

    s_id = IntegerField(  )
    sensor_id = CharField("Sensor ID" , max_length = 60 , primary_key = True ,
                          validators = [RegexValidator(regex = "^%s$" % IDPattern)])
    sensor_type = ForeignKey( SensorType )
    parent_device = ForeignKey( Device )
    data_type = ForeignKey( DataType , default = "TEST" )
    description = CharField("Description" , max_length = 256 )
    on_line = BooleanField( default = True )
    last_value = FloatField( null = True , blank = True)
    last_timestamp = DateTimeField( null = True , blank = True) 
...

Migration file:

# -*- coding: utf-8 -*-
# Generated by Django 1.10.3 on 2017-02-19 21:54
from __future__ import unicode_literals

from django.db import migrations, models


def forwards(apps, schema_editor):
    if not schema_editor.connection.alias == 'default':
        return
    sensor = apps.get_model('sensor', 'Sensor')
    s_id = 0
    for row in sensor.objects.all():
        s_id += 1
        row.s_id = s_id
        row.save()


class Migration(migrations.Migration):

    dependencies = [
        ('sensor', '0048_auto_20170219_1705'),
    ]

    operations = [
        migrations.AddField(
            model_name='sensor',
            name='s_id',
            field=models.IntegerField(default=1),
            preserve_default=False,
        ),
        migrations.RunPython(forwards)
    ]
@pgdr
Copy link
Member Author

pgdr commented Feb 19, 2017

Hm, do you know where this is defined?

CREATE INDEX sensor_sensor_7a592b3b
  ON public.sensor_sensor
  USING btree
  (sensor_type_id);

In any case, we might have to delete any INDEX before performing step 2: moving primary key from sensor_id to s_id (integer key).

@joakim-hove
Copy link
Contributor

joakim-hove commented Feb 19, 2017 via email

@pgdr
Copy link
Member Author

pgdr commented Feb 19, 2017

This is only step 1, yes. I'm still struggling with INDEX issues. Perhaps the migration needs to drop all the indices:

Indexes:
    "filter_filterdata_pkey" PRIMARY KEY, btree (id)
    "filter_filterdata_sensor_id_6482fe566b49ce37_uniq" UNIQUE CONSTRAINT, btree (sensor_id, filter_code_id)
    "filter_filterdata_ts_id_7167e05c2931e0a5_uniq" UNIQUE CONSTRAINT, btree (ts_id)
    "filter_filterdata_4bd0d623" btree (filter_code_id)
    "filter_filterdata_7ad5859e" btree (ts_id)
    "filter_filterdata_d96d866a" btree (sensor_id)
    "filter_filterdata_filter_code_id_cdca74721628e57_like" btree (filter_code_id varchar_pattern_ops)
    "filter_filterdata_sensor_id_31261240c8b18c90_like" btree (sensor_id varchar_pattern_ops)
Foreign-key constraints:
    "filt_ts_id_7167e05c2931e0a5_fk_time_series_regulartimeseries_id" FOREIGN KEY (ts_id) REFERENCES time_series_regulartimeseries(id) DEFERRABLE INITIALLY DEFERRED
    "filter_filte_filter_code_id_cdca74721628e57_fk_filter_filter_id" FOREIGN KEY (filter_code_id) REFERENCES filter_filter(id) DEFERRABLE INITIALLY DEFERRED
    "filter_filterdat_sensor_id_31261240c8b18c90_fk_sensor_sensor_id" FOREIGN KEY (sensor_id) REFERENCES sensor_sensor(sensor_id) DEFERRABLE INITIALLY DEFERRED

@joakim-hove
Copy link
Contributor

[...] I'm still struggling with INDEX issues.

Hehe - this stuff is beyond my SQL pay-grade 😄

@pgdr
Copy link
Member Author

pgdr commented Mar 3, 2017

I've done it now locally.

I had to delete the filter*-tables, and I also deleted time_series* and plot-* tables.

@pgdr
Copy link
Member Author

pgdr commented Mar 4, 2017

def forwards(apps, schema_editor):
    import sys
    if not schema_editor.connection.alias == 'default':
        return
    sensor = apps.get_model('sensor', 'Sensor')
    rawdata = apps.get_model('sensor', 'RawData')

    sensors = {}
    for row in sensor.objects.all():
        sensor_name = row.sensor_id
        sensor_id = row.s_id
        sensors[sensor_name] = row

    for row in rawdata.objects.all():
        sensor_name = row.sensor_id
        if sensor_name not in sensors:
            sys.stderr.write('\nNo sensor for key "%s".\n' % sensor_name)
            row.s_id = None
        else:
            row.s_id = sensors[sensor_name]
        row.save()

With RawData like:

class RawData(Model):
    sensor_id = CharField(max_length=128)
    s_id = ForeignKey( Sensor, null = True )
    timestamp_recieved = DateTimeField(  ) 
    timestamp_data = DateTimeField( )
    value = FloatField( default = -1 )
    processed = BooleanField( default = False )

@pgdr
Copy link
Member Author

pgdr commented Mar 4, 2017

We need to create dummy sensors for

     84 No sensor for key "OAMTempDHT22OJBrochsGate".
    288 No sensor for key "ARUdustTest".
   2055 No sensor for key "ARUtest".
   7973 No sensor for key "OAMLysOJBrochsGate".
   7974 No sensor for key "OAMFuktOJBrochsGate".
   7974 No sensor for key "OAMStoevOJBrochsGate".
   7974 No sensor for key "OAMTempOJBrochsGate".
 413929 No sensor for key "NJBGassGroenllien".

But other than that, we're ready to go!

@joakim-hove Is it okay if we delete the filter-tables in the heroku db?

@joakim-hove
Copy link
Contributor

Delete delete!

When it comes to dummy sensors we might go the delete route instead?

@pgdr
Copy link
Member Author

pgdr commented Mar 4, 2017

Definitely. What's the gas sensor? Almost 500k rows?

@pgdr
Copy link
Member Author

pgdr commented Mar 4, 2017

The we can go back to, for each sensor,

def get_data_list(sensor, start, end):
    data_all = RawData.objects.filter(s_id=sensor.s_id,
                                      timestamp_data__range=
                                      (start,end)).values("value",
                                                          "timestamp_data"
                                      ).order_by('timestamp_data')
    datalist = downsample(list(data_all), minutes=30, cutoff=100)
    datalist = map( make_timestamp , datalist )
    return datalist

And we're more or less done with the migration. :)

@pgdr
Copy link
Member Author

pgdr commented Mar 5, 2017

Have performed the following steps upstream:

  1. make field sensor.s_id.
  2. make field sensor.s_id be the primary key for the Sensor model.
  3. created foreign key rawdata.s_id that formally links each rawdata entry to a sensor. (PR imminent)

The latter step is currently under testing at my machine. If successes galore, I will heroku push and migrate. This step, however, takes appx 10 minutes (strange, but perhaps because it write 500k rows and tries to make it reversible?).

@pgdr
Copy link
Member Author

pgdr commented Mar 5, 2017

In fact, after indexing on rawdata.timestamp (which we should consider),

                            min   p10   p25   p50   p75   p90   max
before:  0.488 +/- 0.037, (0.430 0.450 0.460 0.480 0.500 0.540 0.660)
after:   0.466 +/- 0.039, (0.410 0.430 0.440 0.450 0.480 0.530 0.640)
morten:  0.369 +/- 0.029, (0.330 0.340 0.350 0.360 0.380 0.410 0.550)
current: 0.339 +/- 0.038, (0.300 0.310 0.320 0.330 0.340 0.380 0.810)
int_id:  0.265 +/- 0.020, (0.240 0.250 0.250 0.260 0.270 0.290 0.370)
indexed: 0.188 +/- 0.024, (0.160 0.170 0.170 0.180 0.190 0.210 0.300)

When we started way back in the beginning (before quick), we were at several seconds for SQL. Now we're down to less than 0.2s!

@pgdr
Copy link
Member Author

pgdr commented Mar 6, 2017

#173 #174 #175 fixed this

@pgdr pgdr closed this as completed Mar 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants