SNAPP Soil Organic Carbon -- Tweet Parsing and Analysis

This repository contains several scripts developed to process Twitter data to investigate how soil organic content and health are related.

Two different data sources are used:

For the archive data:

Main: raw_data_processing.R:
- read raw twitter datasets from different sources (Json or csv format)
- clean and standardize to enable a merge
- simple analysis of what the data looks like
- to correct parsing errors found in the csv files derived from the API (cell overlap) use fixed_tweet.sh !!! This script needs to be edited from the command line and NOT from R, as it is dealing with hidden characters !!!

For the data collected via Twitter API:

Main: automate.R
- saves two files:
  (1) raw data from Twitter API in .csv format saved in directory /home/shares/soilcarbon/Twitter/API_csv/
  (2) writes over previous /home/shares/soilcarbon/Twitter/Merged_v3/ (master files) cleaning and standardizing to enable merge, as well as removing duplicates
- runs twice a week collecting the last 6-9 days of twitter data based on query words from tag_list.csv

Inititial data exploration:
- Data_viz_script.R: Data visualization and exploration
- Sentiment_test.R: Used to explore text mining options with Archived/json data. Reproducible for the larger merged dataset.
More specific exploration and visualizations can be found in the following folders (see their respective README's for more detailed information about specific analyses):
- various way of visualizing the content of tweets by different categoriestweet_content
- attempts to identify what type of content appeals to different user groups influencers
- each of these ^ rely on the functions within text_analysis_functions.R

translation folder contains scripts for translating hindi using google translate via webinterface

pre_processing folder contains scripts for specific tasks (usually run once).

Name		Name	Last commit message	Last commit date
Latest commit History 437 Commits
figures		figures
influencers		influencers
influential_tweets		influential_tweets
influential_users		influential_users
mapping		mapping
natural_language_processing		natural_language_processing
paper_figures		paper_figures
pre_processing		pre_processing
tables		tables
time_series		time_series
translation		translation
.gitignore		.gitignore
Data_viz_script.R		Data_viz_script.R
README.md		README.md
automate.R		automate.R
cron_job.md		cron_job.md
data_version_check.Rmd		data_version_check.Rmd
locations.Rmd		locations.Rmd
sentiment_test.R		sentiment_test.R
soc-twitter.Rproj		soc-twitter.Rproj
tag_list.csv		tag_list.csv
text_analysis_functions.R		text_analysis_functions.R

Provide feedback