Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR] Pacific Power Webscraper proof of concept #21

Merged
merged 49 commits into from
Feb 25, 2024
Merged

[PR] Pacific Power Webscraper proof of concept #21

merged 49 commits into from
Feb 25, 2024

Conversation

solderq35
Copy link
Member

@solderq35 solderq35 commented Sep 23, 2023

EDIT - This first comment will be where I list general info about this PR

General Requirements

Running the code

  • npm i from within PacificPower directory, then see below
  • node readPP.js
  • See this file for more detailed instructions for unit testing
  • Save this file as .env in PacificPower directory

How it works

  • See comments in code
  • Set headless to false as noted here to see it working in real time
  • In general:
    • If there is an error for this section, retry for a total of 5 times (across the whole scraper)
      • Log in
      • Go to "usage details page", switch from graph to table view
    • If there is an error for this section, retry for a total of 5 times per meter. If there are 5 retries for any given meter, the loop ends, if there is a success, the counter resets
      • Open meter menu, iterate through meters one by one until you reach a meter without data
      • If there is data, upload a table in this format (meter selector number as defined by mat-option value in PacificPower site source code, Usage in Kwh, PacificPower meter ID)
    • Eventually, a meter will return errors 5 times in a row, which hopefully means that all valid meters have already been read (PacificPower seems to list all valid meters first, then all invalid meters)

Next steps

@solderq35 solderq35 marked this pull request as draft September 23, 2023 19:10
@solderq35
Copy link
Member Author

solderq35 commented Sep 25, 2023

got through login of iframe (https://stackoverflow.com/questions/46529201/puppeteer-how-to-fill-form-that-is-inside-an-iframe)

got to usage details page

image

next: need to look into puppeteer with dropdown menus.

Reference this for "hovering mouse" on graph - #5

@solderq35
Copy link
Member Author

need to add function for automating clicking on another date for possible missed uploads or upload errors

@solderq35
Copy link
Member Author

solderq35 commented Oct 23, 2023

New approach

  1. detect lack of data on meter page (top row of table)
  2. click previous meter
  3. don't read the energy data as we already know previous meter data exists
  4. click next, back to original meter
  5. if data is now present, proceed as normal. If not, go back to Step 2, try this up to 5 times. Valid data resets the counter
  6. if the meters still show invalid after 5 attempts of Steps 2-5, it is safe to assume we have reached all valid meters (seems all valid meters are grouped ahead of all invalid meters on pacificpower), end the webscraper

image

The bug is that sometimes the frontend (pacificpower seems to be built with AngularJS, so state issue?) shows "no data" even when data is present, returning to previous meter and coming back to original meter forces a "refresh"

@solderq35 solderq35 changed the title Pacific Power Webscraper proof of concept [PR] Pacific Power Webscraper proof of concept Oct 23, 2023
@solderq35 solderq35 mentioned this pull request Nov 4, 2023
@solderq35
Copy link
Member Author

solderq35 commented Jan 16, 2024

Summary of TODO

Energy Dashboard Backend

MySQL

  • Add row to meters table
  • Add row to buildings
    • Check openstreetmap (right click > query features > way ID > copy paste to buildings table's map_id column)
  • Add row to meter_group
    • Check building_id_2 column as foreign key to meter_groups
  • Add row to meter_group_relation to link meter and meter_group together

Update Webscraper (wait until above 2 sections checked)

  • Run against local backend, change DASHBOARD_API in env to localhost / 3000 port
  • Add Unix Timestamps
  • Add uploading (post)

Energy Dashboard Backend

Future Optimizations

@solderq35 solderq35 marked this pull request as ready for review January 18, 2024 19:35
@solderq35 solderq35 marked this pull request as draft January 18, 2024 19:36
@s-egge
Copy link
Member

s-egge commented Feb 1, 2024

Pacific Power Meters Upload

I added a meterslist.json for meters that we are currently displaying on the frontend, but all meters with daily data are being uploaded to the database. If there is not a match in meterslist.json for the Pacific Power ID, the meter_id in the database will be null. If the PPM gets added to the front end in the future, we can easily update all corresponding entries with the newly added meter_id and we will be able to display all the data that's been collected right away (will need to turn off safe mode):

UPDATE pacific_power_data SET meter_id = {meter_id} where pacificpower_meter_id = {pacific_power_meter_id}

See the commit for the Energy Dashboard here:
OSU-Sustainability-Office/energy-dashboard@67f6c26

The solar meters and pacific power meters are differentiated on the backend with a type that is sent in the upload request. I tested with both Pacific Power Meters and Solar Meters and both work. To test, run the backend in the "add-pacific-power-meters" commit on the Energy Dashboard and set the DASHBOARD_API in the PacificPower env file to http://127.0.0.1:3000

@solderq35
Copy link
Member Author

Sounds good. I'm still not 100% sure if it's better to include the meter table's ID value in the pacific_power_data table, or if a join on the download would be fine performance wise, so let's come back to that after download is implemented.

PacificPower/readPP.js Outdated Show resolved Hide resolved
s-egge and others added 4 commits February 11, 2024 14:15
… arg for uploading

- Changed date logic to grab the date for the power usage, so that the date will always reflect the data
- Removed references to db_meter_id, as the EDB backend will grab data via pacific_power_meter_id instead of db_meter_id
- Added a command line argument (--no-upload) for easier testing
@solderq35
Copy link
Member Author

solderq35 commented Feb 21, 2024

Added a function to check if latest date shown in pacificpower matches yesterday's date. After this webscraper is pushed to AWS Cloudwatch, we can set up cloudwatch to check for that log message.

Otherwise, PR look pretty good to me.

@solderq35 solderq35 marked this pull request as ready for review February 21, 2024 19:38
@solderq35
Copy link
Member Author

solderq35 commented Feb 22, 2024

I think this PR just generally good to go, with caveats that I still got 2 meters (74264319 and 77293450) missing data from yesterday (only showing up to 2 days ago), as late as 2 PM.

I think the error will be easier to address after we have a bigger amount of data in the database, so I think we can start on the uploads on AWS and come back to fix the date mismatch / "unavailable" data issues later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants