You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Objective: Build the database and having data on as much days as possible.
📖 Describe what you want
Update the script about dataset to request specific tweets from the API of twitter based on its date or ID.
The script MUST save ALL the tweets received into csv files in the data/raw/twitter directory, with the date and ID of first and last tweet specified (in the name ?). Possible format:
A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a tmp directory where the little portions are stored and after that the script
This script should be designed to be launched periodically, every week (or every day?) and collect specified amount of tweets about each candidate. These amounts of tweets per day and per candidate are yet to be determined.
✔️ Definition of done
a functioning script is written.
a format for the filename is chosen,
the script create a tmp directory where it saves small chunks of the total results.
the script concatenate all the chunks into a final csv file.
This script should be FIRST and ONLY tested with small amounts of tweet requested to the API in order to save the amount of tweet we can request: for instance 1k tweets for 2 or 3 candidates. The person testing the script should be careful to check the above points.
This script will be used for larger amounts after the pull request is validated.
The text was updated successfully, but these errors were encountered:
A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a tmp directory where the little portions are stored and after that the script.
Objective: Build the database and having data on as much days as possible.
📖 Describe what you want
Update the script about dataset to request specific tweets from the API of twitter based on its date or ID.
The script MUST save ALL the tweets received into
csv
files in thedata/raw/twitter
directory, with the date and ID of first and last tweet specified (in the name ?). Possible format:data/raw/twitter/[candidat_name]_[startdate]_[enddate].csv
data/raw/twitter/[candidat_name]_[first_id_tweet]_[last_id_tweet].csv
data/raw/twitter/candidat_name/[startdate]_[enddate]_[first_id_tweet]_[last_id_tweet].csv
data/raw/twitter/week_#x/[candidat_name]_[startdate]_[enddate]_[first_id_tweet]_[last_id_tweet].csv
A particular point must be considered: the script should collect small chunks of results in order to save all the results little by little to avoid issue related to cache memory, disk memory or whatever: Create a
tmp
directory where the little portions are stored and after that the scriptThis script should be designed to be launched periodically, every week (or every day?) and collect specified amount of tweets about each candidate. These amounts of tweets per day and per candidate are yet to be determined.
✔️ Definition of done
tmp
directory where it saves small chunks of the total results.csv
file.This script should be FIRST and ONLY tested with small amounts of tweet requested to the API in order to save the amount of tweet we can request: for instance 1k tweets for 2 or 3 candidates. The person testing the script should be careful to check the above points.
This script will be used for larger amounts after the pull request is validated.
The text was updated successfully, but these errors were encountered: