Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (data): script to request tweets from twitter API #75

Closed
guillaume-salle opened this issue Mar 22, 2022 · 3 comments · Fixed by #83
Closed

feat (data): script to request tweets from twitter API #75

guillaume-salle opened this issue Mar 22, 2022 · 3 comments · Fixed by #83
Assignees
Labels
feature New feature fixme This issue will be soon fixed

Comments

@guillaume-salle
Copy link
Contributor

guillaume-salle commented Mar 22, 2022

Objective: Build the database and having data on as much days as possible.

📖 Describe what you want

Update the script about dataset to request specific tweets from the API of twitter based on its date or ID.

The script MUST save ALL the tweets received into csv files in the data/raw/twitter directory, with the date and ID of first and last tweet specified (in the name ?). Possible format:

  • data/raw/twitter/[candidat_name]_[startdate]_[enddate].csv
  • data/raw/twitter/[candidat_name]_[first_id_tweet]_[last_id_tweet].csv
  • data/raw/twitter/candidat_name/[startdate]_[enddate]_[first_id_tweet]_[last_id_tweet].csv
  • data/raw/twitter/week_#x/[candidat_name]_[startdate]_[enddate]_[first_id_tweet]_[last_id_tweet].csv

A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a tmp directory where the little portions are stored and after that the script

This script should be designed to be launched periodically, every week (or every day?) and collect specified amount of tweets about each candidate. These amounts of tweets per day and per candidate are yet to be determined.

✔️ Definition of done

  • a functioning script is written.
  • a format for the filename is chosen,
  • the script create a tmp directory where it saves small chunks of the total results.
  • the script concatenate all the chunks into a final csv file.

This script should be FIRST and ONLY tested with small amounts of tweet requested to the API in order to save the amount of tweet we can request: for instance 1k tweets for 2 or 3 candidates. The person testing the script should be careful to check the above points.

This script will be used for larger amounts after the pull request is validated.

@madvid
Copy link
Contributor

madvid commented Mar 22, 2022

Acutally, you do not need to write a need script but only add a feature to the existing one.

@madvid
Copy link
Contributor

madvid commented Mar 24, 2022

la requete:

poetry run python -m src data --download twitter --mention Melenchon --start_time '2022-03-18 8:00' --end_time '2022-03-18 22:00'

@madvid
Copy link
Contributor

madvid commented Mar 25, 2022

At this time, the part concerning:

A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a tmp directory where the little portions are stored and after that the script.

is not implmentend yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature fixme This issue will be soon fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants