Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ease development with Docker #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aaronjwood
Copy link

One command and you're good to go :)

@JPHutchins
Copy link
Owner

@aaronjwood Very exciting! I am testing this out now!

FYI we'll have to maintain the original process guide in the README (move to bottom instead of replacing) since it's informative for how production is running (Linux systemd).

Hopefully I work up the courage to switch production to the docker container.

@aaronjwood
Copy link
Author

aaronjwood commented Jan 22, 2023

Sounds good, I'll adjust the readme when I get some time in a few days.

When I got everything up locally and fixed some crashing around the test data parsing I found that the UI didn't show the test data that was loaded into the DB anywhere, and the UI was stuck on December 1969. Are you aware of this being an existing issue? I'm guessing it's specific to the local dev env since things are working for me on your live deployment with my PGE data but I didn't dig in very much to see exactly why it wasn't working. The test data is from 2019 it seems, but the front end doesn't allow to go anywhere besides 1969.

WORKDIR /frontend
RUN npm ci && npm run build

FROM python:3.8-slim
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPHutchins what do you think about moving to PyPy for the JIT sweetness?

@JPHutchins
Copy link
Owner

JPHutchins commented Jan 28, 2023

@aaronjwood Unfortunately I am in a "how is this even working" sorta situation with the MQ and Celery tasks on the live server...

The docker container works for me up to the point of queuing the async jobs - LMK if this flow is working for you in the docker container: https://github.com/JPHutchins/open-energy-view#example-account-setup

Here's a description of what is supposed to be happening.

  • User registers for the first time and we'd like to get their historical data. PGE SMD team advised that I request 1 month of data at a time with some delay in between 🙄.
  • This starts a job "fetch historical data" that queues up to ~48 jobs that will make the requests for 1 month of data at a time to the PGE SMD API
    • real task:
      @celery.task(bind=True, name="fetch_task")
      def fetch_task(self, published_period_start, interval_block_url, headers, cert):
      four_weeks = 3600 * 24 * 28
      end = int(time.time())
      published_period_start = int(published_period_start)
      print(published_period_start, interval_block_url, headers, cert)
      while end > published_period_start:
      start = end - four_weeks + 3600
      params = {
      "published-min": start,
      "published-max": end,
      }
      response_text = request_url(
      "GET",
      interval_block_url,
      params=params,
      headers=headers,
      cert=cert,
      format="text",
      )
      save_espi_xml(response_text)
      db_insert_task = insert_espi_xml_into_db.delay(response_text)
      end = start - 3600
      sleep(2)
      retries = 0
      while not db_insert_task.ready():
      if retries > 60:
      print("Insert into DB failed!")
      break
      retries += 1
      sleep(1)
      return "done"
    • mock task (the one that should work in docker container):
      @celery.task(bind=True, name="fake_fetch")
      def fake_fetch(self):
      test_xml = [
      "/home/jp/open-energy-view/test/data/espi/espi_2_years.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-16.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-17.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-18.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-19.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-20.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-21.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-22.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-23.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-24.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-25.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-26.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-27.xml",
      ]
      test_xml.reverse()
      for xml_path in test_xml:
      time.sleep(2.5)
      with open(xml_path) as xml_reader:
      xml = xml_reader.read()
      db_insert_task = insert_espi_xml_into_db.delay(xml)
      retries = 0
      while not db_insert_task.ready():
      if retries > 60:
      break
      retries += 1
      sleep(1)
      return "done"
  • AFAICT the fetch task is running correctly in the highly-threaded celery-io pool.
  • Before the fetch task completes it queues up the insert_espi_xml_into_db task in the single-threaded celery-cpu pool (FIFO to write to the DB):
    @celery.task(bind=True, name="insert_espi_xml_into_db")
    def insert_espi_xml_into_db(self, xml, given_source_id=None, save=False):
    """Parse and insert the XML into the db."""
    print("CALLED")
    if not has_app_context():
    app = create_app(f"open_energy_view.{os.environ.get('FLASK_CONFIG')}")
    app.app_context().push()
    print(has_app_context())
    if save:
    try:
    save_espi_xml(xml)
    except Exception as e:
    print(e)
    save_espi_xml(xml.decode("utf-8"))
    finally:
    pass
    data_update = []
    source_id_memo = {}
    for start, duration, watt_hours, usage_point in parse_espi_data(xml):
    if usage_point not in source_id_memo:
    if given_source_id:
    source_id_memo[usage_point] = [given_source_id]
    else:
    sources = db.session.query(models.Source).filter_by(
    usage_point=usage_point
    )
    if sources.count() == 0:
    print(
    f"could not find usage point {usage_point} in db, probably gas"
    )
    source_id_memo[usage_point] = []
    elif sources.count() > 1:
    print(f"WARNING: {usage_point} is associated with multiple sources")
    source_id_memo[usage_point] = [source.id for source in sources]
    for source_id in source_id_memo[usage_point]:
    data_update.append(
    {
    "start": start,
    "duration": duration,
    "watt_hours": watt_hours,
    "source_id": source_id,
    }
    )
    try:
    db.session.bulk_insert_mappings(models.Espi, data_update)
    db.session.commit()
    except SQLiteException.IntegrityError:
    db.session.rollback()
    sql_statement = """
    INSERT OR REPLACE INTO espi (start, duration, watt_hours, source_id)
    VALUES (:start, :duration, :watt_hours, :source_id)
    """
    db.engine.execute(sql_statement, data_update)
    finally:
    timestamp = int(time.time() * 1000)
    for source_ids in source_id_memo.values():
    for source_id in source_ids:
    source_row = db.session.query(models.Source).filter_by(id=source_id)
    source_row.update({"last_update": timestamp})
    db.session.commit()
  • I'm sure that this is going into the MQ, but what's not happening for me is the celery-cpu queue getting processed. In fact, you can see the "CALLED" print at the top - I think I left that in from when I was trying to get setup on AWS and ran into the same "how was this ever working" situation - anyway, if you see "CALLED" in the stdout that would be a good sign! 😭

As I mentioned, in production these are all running from systemd. I've inspected my config and it does not seem to differ from what you have setup in the docker container.

LMK what you might find when you run that flow.

It's critical for development to be able to mock the PGE request/response in the development environment so that we have an efficient way to test data parsing, fetching etc, thank you for your help!

EDIT: just confirmed that the "fake fetch" is working in production.

  • create a new account with fake email "ejfsklfjkeljf@ghlgjrdkgjlr.vom", pw admin
  • select fake utility, name whatever
  • you'll see it load in the first month and add a spinner in the upper right corner. Network will show 202s coming in as it checks on the celery tasks and eventually it will prompt you to reload.
  • this is exactly what should be working in the development environment.

EDIT2: if it's not clear, the architecturally f*(cky thing here is that the insert_to_db task needs the "flask application context" in order to setup the SQL ORM (sql alchemy).

@JPHutchins
Copy link
Owner

JPHutchins commented Jan 28, 2023

JFC there is some embarrassing code in here

finally: 
    pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants