Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue at every first run #26

Closed
remagio opened this issue Nov 4, 2014 · 6 comments
Closed

Issue at every first run #26

remagio opened this issue Nov 4, 2014 · 6 comments

Comments

@remagio
Copy link

remagio commented Nov 4, 2014

I got this for a while on some Debian boxes, now the same on my clean OSX box. At first run, it takes time but it will stop like if requests package isn't installed. In this case when still remains 119 API attempt. All next run work fine, but first fail limits tweets results.
It happens with and without --scrape. The example is with lots of tweets, if results are limited seems working fine.

$ twarc.py --scrape "#moncler #report"

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 283, in <module>
    archive(args.query, tweets)
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 197, in archive
    for status in statuses:
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 123, in search
    for status in scrape_tweets(q, max_id=max_id):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 210, in scrape_tweets
    for tweet_id in scrape_tweet_ids(query, max_id, sleep=1):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 233, in scrape_tweet_ids
    r = requests.get(url, params=q)
NameError: global name 'requests' is not defined

The example is getting tweets, so you could relaunch it. It will work. Drop all *.json and starting again it fails in the same way.

@edsu
Copy link
Member

edsu commented Nov 4, 2014

Are you sure you are running the latest code? requests_oauthlib is included in both the requirements.txt and the setup.py

@edsu edsu closed this as completed Nov 4, 2014
@remagio
Copy link
Author

remagio commented Nov 5, 2014

Sorry to say @edsu but now I got on OSX same results of the old Debian box:
I installed Twarc from your Git on Monday. So, I repeated anyway again now:

  • git pull (no results because no updates on repo);
  • python setup.py install" (looks the same of Monday, no error or weird output).

Executed same query of yesterday, results:

  1. getting older tweet without --scrape than with --scrape
  2. at every run it restart from the beginning and not starting from last IDS saved
  3. saving name files with '%23' instead '#' despite
  4. no more the error of previous comment and a clean log.

Here is results of utils/summarize.py with --scrape :

%23moncler%20%23report-20141105104103.json
  start: 529573570809057281 [Tue Nov 04 09:59:12 +0000 2014]
  end:   529930957868908544 [Wed Nov 05 09:39:20 +0000 2014]
  total: 6653

%23moncler%20%23report-20141105105537.json
  start: 529573570809057281 [Tue Nov 04 09:59:12 +0000 2014]
  end:   529934176045502464 [Wed Nov 05 09:52:07 +0000 2014]
  total: 6655

and without --scrape:

%23moncler%20%23report-20141105104930.json
  start: 529013237497737216 [Sun Nov 02 20:52:39 +0000 2014]
  end:   529932988579328000 [Wed Nov 05 09:47:24 +0000 2014]
  total: 6260

%23moncler%20%23report-20141105113241.json
  start: 529013237497737216 [Sun Nov 02 20:52:39 +0000 2014]
  end:   529943588109815808 [Wed Nov 05 10:29:31 +0000 2014]
  total: 6280

@edsu
Copy link
Member

edsu commented Nov 5, 2014

I don't understand this ticket. I thought you opened it because you were getting an error about the missing requests module?

@remagio
Copy link
Author

remagio commented Nov 5, 2014

I opened because requirements were apparently all installed properly since the beginning. And tried to reinstall Twarc anyway getting what I posted with the issue.
Now I checked again requirement.txt to understand. Then tested requirements installation manually. I found that executing directly "pip install" it really installed the pytest package, like if "python setup.py install" didn't installed it since the beginning.

pip install pytest
Downloading/unpacking pytest
  Downloading pytest-2.6.4.tar.gz (512kB): 512kB downloaded
  Running setup.py (path:/private/var/folders/q4/ry4k5ymx2dvdhm8lqwd915nr0000gn/T/pip_build_remagio/pytest/setup.py) egg_info for package pytest

Downloading/unpacking py>=1.4.25 (from pytest)
  Downloading py-1.4.26.tar.gz (190kB): 190kB downloaded
  Running setup.py (path:/private/var/folders/q4/ry4k5ymx2dvdhm8lqwd915nr0000gn/T/pip_build_remagio/py/setup.py) egg_info for package py

Installing collected packages: pytest, py
  Running setup.py install for pytest

    Installing py.test-2.7 script to /Library/Frameworks/Python.framework/Versions/2.7/bin
    Installing py.test script to /Library/Frameworks/Python.framework/Versions/2.7/bin
  Running setup.py install for py
pip install python-dateutil
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): six in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages (from python-dateutil)

pip install requests_oauthlib
Requirement already satisfied (use --upgrade to upgrade): requests-oauthlib in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Cleaning up…

I did an --upgrade anyway. Other requirements were ok.
Than tried again to check if it's solved the issue reported. It didn't. But it only change is results with --scrape, starting since 03 instead 04:

%23moncler%20%23report-20141105115907.json
  start: 529389017947582464 [Mon Nov 03 21:45:52 +0000 2014]
  end:   529949921546612736 [Wed Nov 05 10:54:41 +0000 2014]
  total: 6687

%23moncler%20%23report-20141105120627.json
  start: 529389017947582464 [Mon Nov 03 21:45:52 +0000 2014]
  end:   529949921546612736 [Wed Nov 05 10:54:41 +0000 2014]
  total: 6687

Main issue is it re-download all same tweets since the beginning instead since last IDS saved.
It looks related with filename '%23something' -> query "#something", like described.
Testing too again with "Keybase" instead "#Keybase" it seems working fine.

@edsu
Copy link
Member

edsu commented Nov 5, 2014

I'm afraid I still don't understand your problem. Would removing the --scrape functionality help you?

@remagio
Copy link
Author

remagio commented Nov 5, 2014

It doesn't help, It doesn't matter if using or not --scrape. Simply solving this issue I got back a previous opened issue for which I stopped to use a previous Debian box.

  1. The initial issue was missing "requests", on a new OSX box. The error is present only at first run of the query, not on next executions of same query (all next execution son't get same error like at first execution).
  2. Following anyway your suggestion to check requirements: I launched again the installation and checked requirements (python setup.py installation). Apparently all satisfied. But testing again all requirements directly (using pip install namepackage), and one by one, it looks like the standard setup missed only the "pytest" package. Not "requests". Checking back the console there was no errors or any abnormal output during all kind of setups.

So, I solved initial issue about "requests" but appeared a new issue:
the Twarc started to handle JSON name file using "23%" instead of "#" like a few minutes early before step 2. I think this is the cause of this new issue: at every execution of the same query, in the same path, Twarc don't read properly previous JSON file for checking last IDS. And it return again a JSON file like if not enable to read the previous JSON filename and if it's always a first run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants