Fetching data with PHP #288
Comments
That wiki article is very out of date, we need to find some time to update it. All of the items on the todo list are now sorted except for multi-connected developer consoles, and we are using the new dev console in our current release. See #285 for a related question You would need to use the |
Thank you for your quick answer. I have started to study the code from PasswordAuthenticator, DevConsoleV2 and BaseAuthenticator, but I don't understand what GALX represents, as I see it several times in the code but can't figure out what kind of data it is. |
It is just a temporary cookie you get from Google. You need to pass it along when authenticating. |
Okay, thanks. I was pretty sure it would contain some useful auth information. (I'm curious btw, what do those letters stand for?) I'm trying to follow the guideline of your PasswordAuthenticator. For now, I have made the same GET request (I'm using Curl for the PHP aspect) but the only Cookie I can seem to get from the response is named GAPS... not GALX. Maybe there is something I didn't understand. Also, I saw you often use the cookieStore java class. There is no such class in PHP, but with the extension I'm using for http requests, I might be able to put the cookies in a file after each request, and then retrieve them when needed. I would rather not have to parse this, to save some time, but well, if it happens to be necessary I'll do it. At this point, I'm not sure. So, Is it really important to have them stored in such a way (similar to CookieStore in Java) or can I simply make use of variables? |
Not sure what GALX stands for... Check https://github.com/AndlyticsProject/andlytics/blob/master/src/com/github/andlyticsproject/console/v2/HttpClientFactory.java which sets a number of properties used for the GET request. I don't think you need to use a CookieStore (which is probably just backed by a hashmap), its just a convinence class so variables would be fine (and is what is used in the old version). |
I've been trying to get this GALX value by parsing the response of my GET request, but I can't find how to retrieve it. Indeed, I can see it under the cookie: header, in Chrome's debug console in the network panel. I noticed there are two requests upon loading this page : https://accounts.google.com/ServiceLogin?service=androiddeveloper One of them has the GALX value I'm looking for, but under the cookie: header. When I check the headers fetched after executing the GET request in my page, it looks like I only have the Response Header part available... For example I can easily get the set-cookie: header which starts like "GAPS=" but not this GALX value... I don't have much experience with HTTP requests, so I'm sorry if my question is in fact, dumb or trivial. |
Cookie names, etc. are not documented so you have to guess. If you want to do this you have to get familiar with how HTTP, cookies, etc. work. Generally, you get cookies from the server in the Besides setting the proper headers, you also you need to make sure your HTTP client follows redirect automatically, or handle them yourself properly. Curl should be able to handle this, as well as maintain the session cookie store for you. If not, try to find an HTTP library that does, otherwise you have to implement the whole thing yourself. import urllib
import urllib2
import getpass
import re
email = raw_input("Enter your Google username: ")
password = getpass.getpass("Enter your password: ")
cookie_processor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(cookie_processor)
urllib2.install_opener(opener)
# Define URLs
loing_page_url = 'https://accounts.google.com/ServiceLogin?service=androiddeveloper'
authenticate_url = 'https://accounts.google.com/ServiceLoginAuth?service=androiddeveloper'
dev_console_url = 'https://play.google.com/apps/publish/v2/'
# Load sign in page
login_page_contents = opener.open(loing_page_url).read()
# Find GALX value
galx_match_obj = re.search(r'name="GALX"\s*value="([^"]+)"', login_page_contents, re.IGNORECASE)
galx_value = galx_match_obj.group(1) if galx_match_obj.group(1) is not None else ''
print "GLAX: " + galx_value
# Set up login credentials
login_params = urllib.urlencode( {
'Email' : email,
'Passwd' : password,
'continue' : dev_console_url,
'GALX': galx_value
})
# Login
auth_contents = opener.open(authenticate_url, login_params).read()
print auth_contents
f = open("dev-console.html", "w")
f.write(auth_contents)
f.close()
for c in cookie_processor.cookiejar:
print c
print "\n"
#dev_console_contents = opener.open(dev_console_url).read()
#print dev_console_contents |
Thanks a lot! I've been able to get the GALX value (by finally, using a cookiejar feature with Curl, and parsing the resulting file, couldn't find another way...) I've seen no expiration date for this cookie, so I'm wondering, does it only refresh its value when the session ends? (seems like it, from what I've read about the 0 expiration date) Now, I'll try to do the POST with my auth parameters + GALX, and the Python code will certainly be much clearer for me to understand (I've never used Java, in fact). |
I've done the POST request, and I do get the whole source code of the developer console's homepage as a result of my POST, the same that I get when login in my browser usually, so here, the job seems done! :) EDIT: Off-topic bit removed, so it won't get confusing for anyone else who reads this page |
This is getting quite a bit off topic. We can help with understanding the protocol, but not really with your program, especially without seeing it. Also not exactly clear what you are trying to do? BTW, all cookies used are session cookies, so there is no expiration date. They will be gone once you close/destroy your HTTP client instance. |
I understand. Actually, I should have been more precise while asking. (comment edited) In fact, it looks like I'm authenticated correctly now (with the right cookies), so the next step is to send basic requests, such as getting the apps list. I've been trying to do so, by observing related Andlytics files (mainly DevConsoleV2, DevConsoleV2Protocol) and decided to start by fetchAppInfos() (DevConsoleV2). But there is a part which I don't understand. I need a xsrf token to put into a template at some point, in createFetchAppInfosRequest. Indeed, the function returns : String.format(FETCH_APPS_TEMPLATE, sessionCredentials.getXsrfToken()); I can't see where the xsrftoken property of these sessionCredentials has been set before, so that it can be returned and used to format the string with this template. Sorry for the previous off-topic bit, and thanks again for helping me. |
The token and developer ID are extracted from the first response and used for all subsequent ones. That's towards the end of |
I've now managed to generate fetchAppsInfosRequest with the xsrfToken, in a JSON string format, and the url for this request (fetchAppsUrl). I also have my curL Cookiejar file, which still contains all the cookies from the requests I've made so far (including the AD cookie, which I saw mentionned in your code even if I have no idea of its actual purpose). Yet when I send this POST, Google understands my request but refuses to give me the data. I get this response : {"error":{"data":[null,-1] ,"code":-1}} I'm asking you in case you already got this error before, and in case it has a precise cause. I'm quite stuck with this. |
Not sure about the exact cause, but you are probably missing some parameter. Did you append the developer ID to the URL? Compare with identical request in Chrome/FF, etc. Rinse and repeat :) |
I did it! I've managed to get the list of the apps data! Thank you for your advice! Next step, 'll try to do a parser quite like yours, and extract some things out of this JSON :) I think that the hardest part is done, still. :D |
Congratulations :) JSON parsing is indeed somewhat easier, but also tricky because you are almost never sure which parts are always there and which can be omitted (null, etc.). |
Okay, so I managed to get some data about the app (ratings, active installs, etc). Yet, I am gonna need to store previous data from the past statistics. I saw that you have made a function for this (DevConsoleV2#fetchStatistics, thank you for this I think it's gonna help me a lot) but you are not using it at the moment. I would like to put historical statistics into a database, so I'll need to fetch some. I want to give it a try but I'm missing some information about one parameter: statsType. |
You can get full historical stats by types eg active installs, daily installs and by breakdown eg android version, app version. However, you can also get simple active installs and total downloads in the main app info request, hence why we don't need to use the full one yet. See https://github.com/AndlyticsProject/andlytics/blob/master/src/com/github/andlyticsproject/console/v2/DevConsoleV2Protocol.java#L40:L51 for the constants Also note that at the moment the full statistics parsing just jumps to the last entry, so you will need to itterate over all entires. |
Okay, so I gave it a try. It's true that these JSON are quite a hell to parse, NULL nodes everywhere, data scattered between nested arrays... Plus, this data keeps changing (not every app has this or that android version) but I managed to create objects to keep it. For now, I've done it by Android Version, and by Device but the method should stay the same for the other ones. For example, if I want to get updates by Android version, I return an associative array with one entry per timestamp (converted to yyyy-mm-dd format) and with one property per entry for each Android Version. When you get the property's value, you have the number of downloads/updates/whatever. Now I'll retrieve the data in the same way for the other types (Country etc.) and then see what I'm gonna do to put this into a database. I thought about one table with general app infos, and several others for version, device, etc (with a key which would be the packageName of the app). |
Hi, it's me again. In the meantime, I fetched every historical kind of data in associative arrays, the functions are working fine. I encounter this weird behaviour in any app in the developer console for almost every sorting parameter. I should add that it is visible in the GUI, so it's not a fetching problem. I can't find the logic of it. Any clues? |
If it is the same in the console, there is not much you can do about it. And your should really treat those numbers more as reference, than as absolutes, they are known to fluctuate or be outright wrong sometimes (eventually get fixed/normalized). As for using a DB, look at the *Table.java files in andlytics for some hints. Depending on your purposes though, this may not be the best schema design for you. |
I am closing this issue to keep the issue log as short as possible. Discussion on this topic/issue can still go on :) |
So. Hello again, I'm still working on my project, and I have the weirdest error right now. Since this morning (GMT +1) I can't seem to make successful requests to Google anymore. What the hell is happening? Is Google trolling me or something...? I'm quite worried now. |
Usual but painful API changes, see #314 |
Sorry I'm posting this so late, but I've made a tutorial to fetch data with PHP, if anyone is interested in using/studying it, here's the link: http://neko-spirit.fr/public/tutorial/tuto.php I also wanted to thank you again for your help guys, I really appreciated it. |
You should set your client to follow redirects automatically, it seems this is not the default in newer versions. |
hello cfecherolle thanks allot |
@cfecherolle Hello cfecherolle can you please share a different tutorial link the one you posted is not working ? |
Hi, the server on which I was hosting it has now been down for a few 2013/10/9 weasr notifications@github.com
|
@cfecherolle thanks allot for your respond , I would appreciate if you could send me the file to my email address , since I'm currently working on a similar project and want to have some progress on, many thanks in advance. my email is : axel.rewdas@gmail.com |
Hello again @weasr ! I've put it back online on my own student web space (I hadn't thought of this possibility before!) It might help others, so instead of giving it to you by email, I'd rather post it here :) |
thanks allot cfecherolle, , much appreciated that you took the time to put the tutorial online again, and agree it might help others as well :) |
Hi!
I've been really impressed by your app so far, and what it's able to do, and I thought that maybe you could give me some piece of advice.
If you can't, that's okay :)
I've been looking for a solution all over the internet, and you are quite my last hope to achieve this project.
I'm currently developping, for private use, a website which would allow my friends (or other people) to watch the statistics of the mobile applications which I developped, statistics which would be fetched from Itunes Connect, and Google Developper Console.
So far, I've managed to do interesting things for the Itunes part of the work (fetching data, getting it into a database... basic stuff you would say) because there are scripts and API existing, that I can tweak to fit my needs. But I can't figure how to gain access to the Developer Console's data.
Right now, I'm looking for guidelines so I could figure out how to get data from the Google Developer console, which seems quite not friendly for this special use.
I already tried to send POST requests (based on what I sniffed looking at the console) after authenticating in PHP via Oauth 2 process. I couldn't test my authentication because I didn't know what requests I should use. And since there isn't any API to allow me to communicate with Google for this developer console, I'm stuck.
I've read your information/TODO page about it, but it seems like the login part is still on the high level to do list... :(
I don't know if you've made recent progress about all this stuff, but I'd be glad to have any further information, since you look quite documented on the subject.
Keep up the good work! :D
The text was updated successfully, but these errors were encountered: