- Store all data files locally and DO NOT upload raw data to GitHub or any public repository
- If version control is needed, use
.gitignoreto exclude data files - Consider using sample data or anonymized subsets for documentation purposes
- Remove entries for participants who did not consent to data usage
- Document the process of consent verification
- Keep a separate record of excluded entries for audit purposes
- Maintain a log of when and why data was excluded
-
Review and standardize educational fields:
- Create a mapping table for similar entries (e.g., "Computer Science" = "CS" = "CS Vanderbilt")
- Document all equivalences in a separate reference file
- Include original and standardized values in the cleaning log
-
Create a data cleaning log with:
- Original value
- Standardized value
- Reason for change
- Date of modification
- Person responsible for the change
-
Calculate basic statistics:
- Age distribution
- Gender representation
- Geographic distribution
- Educational background
- Professional experience
-
Documentation of any assumptions made during analysis
- Include clear labels and legends
- Note excluded categories or filtered data
- Provide context for interpretations
- Data exclusions documented
- Manual entry standardizations logged
- Privacy requirements met
- Analysis code version controlled
- Data cleaning steps documented
- Statistical assumptions stated
- Results independently verified