Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to get historical data for a particular location #3

Closed
Milind220 opened this issue Feb 21, 2022 · 21 comments · Fixed by #51
Closed

Add feature to get historical data for a particular location #3

Milind220 opened this issue Feb 21, 2022 · 21 comments · Fixed by #51
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Milind220
Copy link
Collaborator

Currently only live data can be fetched given a location. Most users would find more utility in large datasets of historical data.

I don't think that the WAQI api is capable of providing historical data, but the WAQI website does have a resource for downloading CSV's and Excel sheets of historical data - Perhaps it could be possible to download and read the CSV from there, programmatically.

@Milind220 Milind220 added enhancement New feature or request help wanted Extra attention is needed labels Feb 21, 2022
@Milind220
Copy link
Collaborator Author

I think there may be a way to do this by using Requests to fill up the form on this page, and then somehow programmatically clicking the download button to get the CSV locally onto the user's computer, into the same directory as their project.
From there it could be read and imported into a dataframe to get all the data they need, or left as a CSV for them to do that themselves

@Samxx97
Copy link
Contributor

Samxx97 commented Feb 27, 2022

Hello this seems interesting can I work on it?

@Milind220
Copy link
Collaborator Author

@Sam-damn yeah sure! That'd be awesome💯

This is a pretty major feature to add, so I think a new feature branch is probably a good idea for it.

@Milind220
Copy link
Collaborator Author

@Sam-damn I've created a new branch - hist-data for this feature. Make your commits there, and when the feature's ready we'll merge it into dev.
If you're curious about the branching model we follow for Ozone, you can check out the discussion about it.

Good luck!

@Milind220 Milind220 assigned Milind220 and Samxx97 and unassigned Milind220 Feb 28, 2022
@Samxx97
Copy link
Contributor

Samxx97 commented Feb 28, 2022

alrighty I’ll checkout the branching model so I can familiarize myself with the process and begin! Thanks for the info

@Samxx97
Copy link
Contributor

Samxx97 commented Feb 28, 2022

so i have been researching how this can be accomplished using the requests library , in order to submit a form and get the history data file u would have to emulate what requests your own browser is sending upon pressing the submit button , so i inspected the network tab in the developer tools in my browser and saw the POST request my browser sends upon clicking submit button and i tried to emulate it exactly using requests but problem is all it sends me back as a response is an IP and status , where as using a browser it generates a downloadable file as save as dialog, do u have any idea if i should be performing GET request on this IP that is returned back to me (i'am not sure if that is even possible since GET requests are usually performed on a URL)? overall i feel like this task could be accomplished using headless browser tool such as selenium but selenium requires other dependencies that cannot be listed as python packages, what are your thoughts on this?

@Milind220
Copy link
Collaborator Author

@Sam-damn I've used selenium before, and I'm not opposed to it being used for this feature.

What do you mean by 'other dependencies that cannot be listed as python packages' ?

Meanwhile, I'll do some research into whether Requests can actually be used for this at all.

@Samxx97
Copy link
Contributor

Samxx97 commented Mar 1, 2022

@Milind220 one of selenium dependencies is a web driver interface which is usually a binary which needs to be installed manually and so it cannot be listed as a pip package, nevertheless I think there’s a python package that helps with this. If u find anything about wether requests can be used for this do let me know 😁.

@Milind220
Copy link
Collaborator Author

@Sam-damn After doing some research I'm confident that requests could be used for this. Here are some links to YouTube videos that do similar things (you can refer to them if you need to)

To fill up the form to access the downloads, this video of logging into websites using Requests would help. It's a similar task that we need: https://www.youtube.com/watch?v=bM50i7sKwwM

To download the files: https://www.youtube.com/watch?v=UMuO2_BVFwY

Lemme know what you find when you try it out!

Also did you have any luck with that other python package?

@Samxx97
Copy link
Contributor

Samxx97 commented Mar 4, 2022

I have made a lot of progress using selenium, however much like in requests , I got stuck at the same stage where a save as dialog appears “you have chosen to save this file” this dialog is an operating system window and since it’s not an element within the browser it hence it cannot be accessed using selenium , I have to tried to bypass this by changing the settings of the web driver profile to suppress this dialog box and to allow for an automatic save to a custom location but this doesn’t work for some reason.

Ps : this dialog box seems to be a common problem as I observed from many stack overflow questions

@Samxx97
Copy link
Contributor

Samxx97 commented Mar 4, 2022

As for using requests library , this same issue becomes even harder to solve because in order to download a file using requests u would need a URL to perform a GET Request on ,which we don’t have , and unfortunately the links u sent me do not Tackle this issue , nevertheless I will keep trying using selenium And keep you updated.

And If downloading the actual csv file doesn’t work , as a last resort we can simply web-scrape the data from The Table element (which appears after filling the search bar and before submitting the form) and then simply construct a csv file from that data and then pass it to pandas or do whatever we want with it , however I’m not sure whether that data table is complete or not.
I would love your input on this 😁.

@Milind220
Copy link
Collaborator Author

@Sam-damn Ahhhh I know what you mean by the dialog box:
Screenshot 2022-03-04 at 12 22 00 PM

@Milind220
Copy link
Collaborator Author

@Sam-damn Now that you mention it, I think webscraping the table element is genius! It appears to be the easiest solution to this problem. I checked it out for a few locations, and the table is 100% complete for all the parameters.

Great thinking man!

Let's try this:

  • Requests to fill up the form, if possible. That way we don't have to worry about adding Selenium WebDriver as a dependency.
  • Web-scrape the table. I've got some experience with this.
  • create a pandas dataframe with scraped data. Then we can use the _format_data method to get it into whatever format the user desires.

@Milind220
Copy link
Collaborator Author

Milind220 commented Mar 4, 2022

@Sam-damn Actually, if you manage to webscrape the table with Selenium, that's fine too. I suppose we can ask users to download the WebDriver on their own, or perhaps setup a shell script to download it separately (idk if that's possible, but just an idea)

EDIT: I found this package which could help us out with the WebDriver part. It downloads the WebDriver on the spot, which would allow us to add selenium as a regular dependency.

@Samxx97
Copy link
Contributor

Samxx97 commented Mar 5, 2022

Alright then i will focus on the scraping then and i will keep you updated, also what a coincidence i actually came across that package two days ago and been using it , its quite handy!

@Milind220
Copy link
Collaborator Author

@Sam-damn hahaha that's great. Let me know how it goes!

@Milind220
Copy link
Collaborator Author

@Sam-damn Any progress with that?

This is a pretty exciting feature for us to add - it would create a lot of opportunity for expansion and usage of the package. Historical data is very important for researchers, and this would make it really simple for them to get data. I'm hoping to get some professors from my university to use it if we can get this to work!

@Samxx97
Copy link
Contributor

Samxx97 commented Mar 9, 2022

@Milind220 It’s almost finished! I got it working nicely now , and I tested it a lot , hopefully it will work for everyone , currently Iam just organizing the file and making it more readable and adding the docs (the methods docs and class )and stuff and the packages etc...
and yes indeed it was quite challenging to get it to work and quite fun.

Also I apologize for the delay , we have a pretty bad electricity situation here, so I have been working on it whenever I could 👍🏻👍🏻

@Milind220 Milind220 linked a pull request Mar 10, 2022 that will close this issue
@Milind220
Copy link
Collaborator Author

@Sam-damn No problem at all! Your work has been top notch :)

@Milind220
Copy link
Collaborator Author

@Sam-damn Hey, check your email!

@Samxx97
Copy link
Contributor

Samxx97 commented Mar 13, 2022

@Milind220 i sent a reply 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants