Ease development with Docker #57

aaronjwood · 2023-01-20T22:56:40Z

One command and you're good to go :)

JPHutchins · 2023-01-21T22:03:01Z

@aaronjwood Very exciting! I am testing this out now!

FYI we'll have to maintain the original process guide in the README (move to bottom instead of replacing) since it's informative for how production is running (Linux systemd).

Hopefully I work up the courage to switch production to the docker container.

aaronjwood · 2023-01-22T01:37:59Z

Sounds good, I'll adjust the readme when I get some time in a few days.

When I got everything up locally and fixed some crashing around the test data parsing I found that the UI didn't show the test data that was loaded into the DB anywhere, and the UI was stuck on December 1969. Are you aware of this being an existing issue? I'm guessing it's specific to the local dev env since things are working for me on your live deployment with my PGE data but I didn't dig in very much to see exactly why it wasn't working. The test data is from 2019 it seems, but the front end doesn't allow to go anywhere besides 1969.

aaronjwood · 2023-01-22T01:44:42Z

docker/Dockerfile-app

+WORKDIR /frontend
+RUN npm ci && npm run build
+
+FROM python:3.8-slim


@JPHutchins what do you think about moving to PyPy for the JIT sweetness?

open_energy_view/celery_tasks.py

JPHutchins · 2023-01-28T21:09:49Z

@aaronjwood Unfortunately I am in a "how is this even working" sorta situation with the MQ and Celery tasks on the live server...

The docker container works for me up to the point of queuing the async jobs - LMK if this flow is working for you in the docker container: https://github.com/JPHutchins/open-energy-view#example-account-setup

Here's a description of what is supposed to be happening.

User registers for the first time and we'd like to get their historical data. PGE SMD team advised that I request 1 month of data at a time with some delay in between 🙄.

This starts a job "fetch historical data" that queues up to ~48 jobs that will make the requests for 1 month of data at a time to the PGE SMD API

real task:

open-energy-view/open_energy_view/celery_tasks.py

Lines 106 to 137 in e4410b4

    
           @celery.task(bind=True, name="fetch_task") 
        
           def fetch_task(self, published_period_start, interval_block_url, headers, cert): 
        
               four_weeks = 3600 * 24 * 28 
        
               end = int(time.time()) 
        
               published_period_start = int(published_period_start) 
        
               print(published_period_start, interval_block_url, headers, cert) 
        
               while end > published_period_start: 
        
                   start = end - four_weeks + 3600 
        
                   params = { 
        
                       "published-min": start, 
        
                       "published-max": end, 
        
                   } 
        
                   response_text = request_url( 
        
                       "GET", 
        
                       interval_block_url, 
        
                       params=params, 
        
                       headers=headers, 
        
                       cert=cert, 
        
                       format="text", 
        
                   ) 
        
                   save_espi_xml(response_text) 
        
                   db_insert_task = insert_espi_xml_into_db.delay(response_text) 
        
                   end = start - 3600 
        
                   sleep(2) 
        
               retries = 0 
        
               while not db_insert_task.ready(): 
        
                   if retries > 60: 
        
                       print("Insert into DB failed!") 
        
                       break 
        
                   retries += 1 
        
                   sleep(1) 
        
               return "done"

mock task (the one that should work in docker container):

open-energy-view/open_energy_view/celery_tasks.py

Lines 140 to 172 in e4410b4

    
           @celery.task(bind=True, name="fake_fetch") 
        
           def fake_fetch(self): 
        
               test_xml = [ 
        
                   "/home/jp/open-energy-view/test/data/espi/espi_2_years.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-16.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-17.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-18.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-19.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-20.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-21.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-22.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-23.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-24.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-25.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-26.xml", 
        
                   "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-27.xml", 
        
               ] 
        
               test_xml.reverse() 
        
               for xml_path in test_xml: 
        
                   time.sleep(2.5) 
        
                   with open(xml_path) as xml_reader: 
        
                       xml = xml_reader.read() 
        
                   db_insert_task = insert_espi_xml_into_db.delay(xml) 
        
               retries = 0 
        
               while not db_insert_task.ready(): 
        
                   if retries > 60: 
        
                       break 
        
                   retries += 1 
        
                   sleep(1) 
        
               return "done"

AFAICT the fetch task is running correctly in the highly-threaded celery-io pool.

Before the fetch task completes it queues up the insert_espi_xml_into_db task in the single-threaded celery-cpu pool (FIFO to write to the DB):

open-energy-view/open_energy_view/celery_tasks.py

Lines 21 to 82 in e4410b4

    
           @celery.task(bind=True, name="insert_espi_xml_into_db") 
        
           def insert_espi_xml_into_db(self, xml, given_source_id=None, save=False): 
        
               """Parse and insert the XML into the db.""" 
        
               print("CALLED") 
        
               if not has_app_context(): 
        
                   app = create_app(f"open_energy_view.{os.environ.get('FLASK_CONFIG')}") 
        
                   app.app_context().push() 
        
               print(has_app_context()) 
        
               if save: 
        
                   try: 
        
                       save_espi_xml(xml) 
        
                   except Exception as e: 
        
                       print(e) 
        
                       save_espi_xml(xml.decode("utf-8")) 
        
                   finally: 
        
                       pass 
        
               data_update = [] 
        
               source_id_memo = {} 
        
               for start, duration, watt_hours, usage_point in parse_espi_data(xml): 
        
                   if usage_point not in source_id_memo: 
        
                       if given_source_id: 
        
                           source_id_memo[usage_point] = [given_source_id] 
        
                       else: 
        
                           sources = db.session.query(models.Source).filter_by( 
        
                               usage_point=usage_point 
        
                           ) 
        
                           if sources.count() == 0: 
        
                               print( 
        
                                   f"could not find usage point {usage_point} in db, probably gas" 
        
                               ) 
        
                               source_id_memo[usage_point] = [] 
        
                           elif sources.count() > 1: 
        
                               print(f"WARNING: {usage_point} is associated with multiple sources") 
        
                           source_id_memo[usage_point] = [source.id for source in sources] 
        
                   for source_id in source_id_memo[usage_point]: 
        
                       data_update.append( 
        
                           { 
        
                               "start": start, 
        
                               "duration": duration, 
        
                               "watt_hours": watt_hours, 
        
                               "source_id": source_id, 
        
                           } 
        
                       ) 
        
               try: 
        
                   db.session.bulk_insert_mappings(models.Espi, data_update) 
        
                   db.session.commit() 
        
               except SQLiteException.IntegrityError: 
        
                   db.session.rollback() 
        
                   sql_statement = """ 
        
                       INSERT OR REPLACE INTO espi (start, duration, watt_hours, source_id) 
        
                       VALUES (:start, :duration, :watt_hours, :source_id) 
        
                   """ 
        
                   db.engine.execute(sql_statement, data_update) 
        
               finally: 
        
                   timestamp = int(time.time() * 1000) 
        
                   for source_ids in source_id_memo.values(): 
        
                       for source_id in source_ids: 
        
                           source_row = db.session.query(models.Source).filter_by(id=source_id) 
        
                           source_row.update({"last_update": timestamp}) 
        
                   db.session.commit()

I'm sure that this is going into the MQ, but what's not happening for me is the celery-cpu queue getting processed. In fact, you can see the "CALLED" print at the top - I think I left that in from when I was trying to get setup on AWS and ran into the same "how was this ever working" situation - anyway, if you see "CALLED" in the stdout that would be a good sign! 😭

As I mentioned, in production these are all running from systemd. I've inspected my config and it does not seem to differ from what you have setup in the docker container.

LMK what you might find when you run that flow.

It's critical for development to be able to mock the PGE request/response in the development environment so that we have an efficient way to test data parsing, fetching etc, thank you for your help!

EDIT: just confirmed that the "fake fetch" is working in production.

create a new account with fake email "ejfsklfjkeljf@ghlgjrdkgjlr.vom", pw admin
select fake utility, name whatever
you'll see it load in the first month and add a spinner in the upper right corner. Network will show 202s coming in as it checks on the celery tasks and eventually it will prompt you to reload.
this is exactly what should be working in the development environment.

EDIT2: if it's not clear, the architecturally f*(cky thing here is that the insert_to_db task needs the "flask application context" in order to setup the SQL ORM (sql alchemy).

JPHutchins · 2023-01-28T21:23:48Z

JFC there is some embarrassing code in here

finally: 
    pass

aaronjwood force-pushed the docker branch from a375923 to 1bd9cf6 Compare January 20, 2023 23:00

JPHutchins self-requested a review January 21, 2023 22:02

aaronjwood mentioned this pull request Jan 22, 2023

Make development more platform agnostic with docker #42

Open

aaronjwood commented Jan 22, 2023

View reviewed changes

open_energy_view/celery_tasks.py Outdated Show resolved Hide resolved

Ease development with Docker

050d808

aaronjwood force-pushed the docker branch from 1bd9cf6 to 050d808 Compare January 28, 2023 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ease development with Docker #57

Ease development with Docker #57

aaronjwood commented Jan 20, 2023

JPHutchins commented Jan 21, 2023

aaronjwood commented Jan 22, 2023 •

edited

Loading

aaronjwood Jan 22, 2023

JPHutchins commented Jan 28, 2023 •

edited

Loading

JPHutchins commented Jan 28, 2023 •

edited

Loading

Ease development with Docker #57

Are you sure you want to change the base?

Ease development with Docker #57

Conversation

aaronjwood commented Jan 20, 2023

JPHutchins commented Jan 21, 2023

aaronjwood commented Jan 22, 2023 • edited Loading

aaronjwood Jan 22, 2023

Choose a reason for hiding this comment

JPHutchins commented Jan 28, 2023 • edited Loading

JPHutchins commented Jan 28, 2023 • edited Loading

aaronjwood commented Jan 22, 2023 •

edited

Loading

JPHutchins commented Jan 28, 2023 •

edited

Loading

JPHutchins commented Jan 28, 2023 •

edited

Loading