Skip to content

easternhercules/sentiment-analysis-of-human-arthropod-interactions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis of Human-Arthropod Interactions

Purpose

These are the .rmds containing the code used in my Master's thesis, Sentiment Analysis of Human-Arthropod Interactions (2021). .jp2 format figures are collected in figures.zip.

Requirements

Summary of steps

  • Downloading iNaturalist data
  1. Select your genera (or other taxa) of interest
  2. Export observations using the Export Observations tool on iNaturalist (alternatively, you can export directly from GBIF's "iNaturalist Research-grade Observations" occurrence dataset)
  • Processing iNaturalist data
  1. If multiple, merge resulting data tables
  2. Remove any observations that are not usable (not research-grade, coordinates unlisted due to being marked as private, etc.)
  3. Convert tabular entries to points using latitude/longitude
  4. Ensure CRS match, check for spatial overlap with a basemap, and clip to area of interest
  5. Normalize by census data
  6. Create maps/graphs; if your results are off you can return to the previous steps and re-examine your dataset
  7. Export data from R and import to GeoDa for running Moran's I spatial autocorrelation
  • Downloading Twitter data
  1. Install libraries and update packages
  2. Authenticate Twitter access and check token
  3. Get tweets using your parameters (use as few parameters as possible; it is better to have more data and cut down your dataset than to not have enough)
  4. Save tweets to CSV
  5. Read in CSV (may need to break original CSV into parts and rbind if dataset is large)
  • Processing Twitter data
  1. Check for and remove duplicate tweets (using isUnique on text column; should visually examine duplicates before removing)
  2. Remove any tweets that are not relevant to your topic (using grepl on text column to find key words)
  3. Get latitude/longitude from bounding box coordinates (separate bbox coords column, delete unnecessary new columns, transform new columns to numeric, get centroids of bbox using (minlong+maxlong)/2 and (minlat+maxlat)/2, check for and drop any entries where is.na=”TRUE”)
  4. Get state names by determining which entries do not have them, adding a projection to the coordinates from the previous step, checking for spatial overlap with a basemap, and using over to copy states to a new column
  5. Normalize by census data
  6. Create maps/graphs and perform sentiment analysis; if your results are off you can return to the previous steps and re-examine your dataset
  7. Export data from R and import to GeoDa for running Moran's I spatial autocorrelation
  • Combination
  1. For multi-layered analysis, you can overlay the results from processing the iNaturalist data and the Twitter data. You can also choose to incorporate Google Trends data

In the future these steps might change to require more/less due to addition or deprecation of various features or attributes by Twitter post-v1.1 API.

Questions/comments

@ me on Twitter or email julianholman96 at gmail dot com.