Issue at every first run #26

remagio · 2014-11-04T16:10:35Z

I got this for a while on some Debian boxes, now the same on my clean OSX box. At first run, it takes time but it will stop like if requests package isn't installed. In this case when still remains 119 API attempt. All next run work fine, but first fail limits tweets results.
It happens with and without --scrape. The example is with lots of tweets, if results are limited seems working fine.

$ twarc.py --scrape "#moncler #report"

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 283, in <module>
    archive(args.query, tweets)
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 197, in archive
    for status in statuses:
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 123, in search
    for status in scrape_tweets(q, max_id=max_id):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 210, in scrape_tweets
    for tweet_id in scrape_tweet_ids(query, max_id, sleep=1):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/twarc.py", line 233, in scrape_tweet_ids
    r = requests.get(url, params=q)
NameError: global name 'requests' is not defined

The example is getting tweets, so you could relaunch it. It will work. Drop all *.json and starting again it fails in the same way.

The text was updated successfully, but these errors were encountered:

edsu · 2014-11-04T21:39:51Z

Are you sure you are running the latest code? requests_oauthlib is included in both the requirements.txt and the setup.py

remagio · 2014-11-05T10:38:22Z

Sorry to say @edsu but now I got on OSX same results of the old Debian box:
I installed Twarc from your Git on Monday. So, I repeated anyway again now:

git pull (no results because no updates on repo);
python setup.py install" (looks the same of Monday, no error or weird output).

Executed same query of yesterday, results:

getting older tweet without --scrape than with --scrape
at every run it restart from the beginning and not starting from last IDS saved
saving name files with '%23' instead '#' despite
no more the error of previous comment and a clean log.

Here is results of utils/summarize.py with --scrape :

%23moncler%20%23report-20141105104103.json
  start: 529573570809057281 [Tue Nov 04 09:59:12 +0000 2014]
  end:   529930957868908544 [Wed Nov 05 09:39:20 +0000 2014]
  total: 6653

%23moncler%20%23report-20141105105537.json
  start: 529573570809057281 [Tue Nov 04 09:59:12 +0000 2014]
  end:   529934176045502464 [Wed Nov 05 09:52:07 +0000 2014]
  total: 6655

and without --scrape:

%23moncler%20%23report-20141105104930.json
  start: 529013237497737216 [Sun Nov 02 20:52:39 +0000 2014]
  end:   529932988579328000 [Wed Nov 05 09:47:24 +0000 2014]
  total: 6260

%23moncler%20%23report-20141105113241.json
  start: 529013237497737216 [Sun Nov 02 20:52:39 +0000 2014]
  end:   529943588109815808 [Wed Nov 05 10:29:31 +0000 2014]
  total: 6280

edsu · 2014-11-05T10:53:43Z

I don't understand this ticket. I thought you opened it because you were getting an error about the missing requests module?

remagio · 2014-11-05T11:38:48Z

I opened because requirements were apparently all installed properly since the beginning. And tried to reinstall Twarc anyway getting what I posted with the issue.
Now I checked again requirement.txt to understand. Then tested requirements installation manually. I found that executing directly "pip install" it really installed the pytest package, like if "python setup.py install" didn't installed it since the beginning.

pip install pytest
Downloading/unpacking pytest
  Downloading pytest-2.6.4.tar.gz (512kB): 512kB downloaded
  Running setup.py (path:/private/var/folders/q4/ry4k5ymx2dvdhm8lqwd915nr0000gn/T/pip_build_remagio/pytest/setup.py) egg_info for package pytest

Downloading/unpacking py>=1.4.25 (from pytest)
  Downloading py-1.4.26.tar.gz (190kB): 190kB downloaded
  Running setup.py (path:/private/var/folders/q4/ry4k5ymx2dvdhm8lqwd915nr0000gn/T/pip_build_remagio/py/setup.py) egg_info for package py

Installing collected packages: pytest, py
  Running setup.py install for pytest

    Installing py.test-2.7 script to /Library/Frameworks/Python.framework/Versions/2.7/bin
    Installing py.test script to /Library/Frameworks/Python.framework/Versions/2.7/bin
  Running setup.py install for py

pip install python-dateutil
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): six in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages (from python-dateutil)

pip install requests_oauthlib
Requirement already satisfied (use --upgrade to upgrade): requests-oauthlib in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Cleaning up…

I did an --upgrade anyway. Other requirements were ok.
Than tried again to check if it's solved the issue reported. It didn't. But it only change is results with --scrape, starting since 03 instead 04:

%23moncler%20%23report-20141105115907.json
  start: 529389017947582464 [Mon Nov 03 21:45:52 +0000 2014]
  end:   529949921546612736 [Wed Nov 05 10:54:41 +0000 2014]
  total: 6687

%23moncler%20%23report-20141105120627.json
  start: 529389017947582464 [Mon Nov 03 21:45:52 +0000 2014]
  end:   529949921546612736 [Wed Nov 05 10:54:41 +0000 2014]
  total: 6687

Main issue is it re-download all same tweets since the beginning instead since last IDS saved.
It looks related with filename '%23something' -> query "#something", like described.
Testing too again with "Keybase" instead "#Keybase" it seems working fine.

edsu · 2014-11-05T13:02:37Z

I'm afraid I still don't understand your problem. Would removing the --scrape functionality help you?

remagio · 2014-11-05T14:07:37Z

It doesn't help, It doesn't matter if using or not --scrape. Simply solving this issue I got back a previous opened issue for which I stopped to use a previous Debian box.

The initial issue was missing "requests", on a new OSX box. The error is present only at first run of the query, not on next executions of same query (all next execution son't get same error like at first execution).
Following anyway your suggestion to check requirements: I launched again the installation and checked requirements (python setup.py installation). Apparently all satisfied. But testing again all requirements directly (using pip install namepackage), and one by one, it looks like the standard setup missed only the "pytest" package. Not "requests". Checking back the console there was no errors or any abnormal output during all kind of setups.

So, I solved initial issue about "requests" but appeared a new issue:
the Twarc started to handle JSON name file using "23%" instead of "#" like a few minutes early before step 2. I think this is the cause of this new issue: at every execution of the same query, in the same path, Twarc don't read properly previous JSON file for checking last IDS. And it return again a JSON file like if not enable to read the previous JSON filename and if it's always a first run.

edsu closed this as completed Nov 4, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue at every first run #26

Issue at every first run #26

remagio commented Nov 4, 2014

edsu commented Nov 4, 2014

remagio commented Nov 5, 2014

edsu commented Nov 5, 2014

remagio commented Nov 5, 2014

edsu commented Nov 5, 2014

remagio commented Nov 5, 2014

Issue at every first run #26

Issue at every first run #26

Comments

remagio commented Nov 4, 2014

edsu commented Nov 4, 2014

remagio commented Nov 5, 2014

edsu commented Nov 5, 2014

remagio commented Nov 5, 2014

edsu commented Nov 5, 2014

remagio commented Nov 5, 2014