Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security and Deployment Recommendations: Addressing Personal Identifiable Information (PII) and Enhancing User Guidance in GitHub Repository #251

Open
DoriKyll opened this issue Jan 28, 2024 · 0 comments

Comments

@DoriKyll
Copy link

Summary:

This security and deployment report addresses critical findings in the GitHub repository of the health4all project. The identified issues include the inadvertent inclusion of Personal Identifiable Information (PII) and sensitive data, such as medical patient details, in the SQL database dump. Additionally, recommendations encompass not only the removal of the exposed data but also retroactive corrections in the Git history to mitigate the presence of these leaks.

Furthermore, the report introduces recommendations to streamline the deployment process, discouraging users from forking the project for deployment purposes. Instead, it advocates for alternative deployment methods, such as using Docker or provided deployment scripts, to enhance efficiency and resource optimization. A dedicated section for contributors outlines the importance of open communication, reasons for forking, and references to contribution guidelines.

The overarching goal is to fortify the security posture of the health4all project while providing clear and user-friendly instructions for both deployment and contribution processes.

Details:

Findings:

  1. PII Leak:

    • The SQL database dump file (/db/health4all.sql) contains sensitive data, including PII of potential patients and volunteers in various tables:
      • helpline number and name in helpline table (line 2606)
      • phone numbers and in helpline_call table (lines 2648, 2649)
      • emails, user_id's and personal notes in table helpline_email (lines 2752 to 3151)
      • helpline phone numbers in table helpline_numbers (lines 3177, 3178)
      • volunteer infos in helpline_receiver table (line 3202)
      • personal data in patient table (lines 16282 to 16290)
      • patient id's and symptoms in patient_visit table (16474-16481)
      • patient's prescriptions in prescription table (16549)
  2. Test Data:

    • Test data is present in several tables (area, department, helpline, hospital, patient, etc.) of the SQL database dump:
      • in table area (line 1350)
      • in table department (line 1963)
      • in hospital table (lines 3259,3260)
      • in patient table (lines 16282 to 16290)
  3. Admin User in User Table:

    • An admin user is present in the user table of the SQL database dump.
  4. Insecure Password Storage:

    • User passwords are stored using an insecure protocol (MD5) in the user table.
  5. Log File:

    • The repository includes a log file (error_log) containing potentially sensitive information.

Recommendations:

1. PII Leak:

  • Immediate Action:
    • Remove the SQL database dump file from the repository promptly.
    • Conduct a thorough data review before future uploads to prevent sensitive information inclusion.
    • Retroactive Correction in Git:
      • Implement changes retroactively in the Git history to eliminate PII leaks since the initial commit.
      • Example: Use Git interactive rebase to modify commits or consider using Git filter-branch for more complex history changes.
    • Consider Breaking the health4all.sql File:
      • If possible, consider breaking the health4all.sql file into smaller ones specific to each table or action for better manageability.
      • Example: Separate the SQL dump into individual files like patients.sql, departments.sql, etc.

2. Test Data:

  • Data Clean-Up Process:
    • Implement a process to exclude test data from the SQL dump.
    • Develop a testing script that inserts sanitized dummy data for testing purposes.
    • Example: Use SQL scripts or automation tools to remove or obfuscate test data before generating the dump.

3. Admin User in User Table:

  • Secure User Management:
    • Remove the admin user from the database.
    • Implement a setup script for secure admin user creation with a strong, unique password.
    • Example: Create a script that prompts the user to enter a secure password or generates one randomly for the admin account.

4. Insecure Password Storage:

  • Upgrade Password Storage:
    • Upgrade password storage to use a robust hashing algorithm (e.g., bcrypt or Argon2).
    • Example: Use a library or framework that supports modern password hashing algorithms, ensuring secure storage.

5. Log File:

  • Log File Management:
    • Exclude the log file from version control by adding it to .gitignore.
    • Rename the log file to error.log for better identification.
    • Example: Add log to the .gitignore file and rename the log file using a command like mv error_log error.log. see gitignore pattern guide
    • Retroactive Correction in Git:
      • Implement changes retroactively in the Git history to eliminate the log file since the initial commit.
      • Example: Use Git interactive rebase to modify commits or consider using Git filter-branch for more complex history changes.

6. Apply changes to forks

  • Provide instructions to contributors and users of forks to apply similar retroactive changes.
  • Suggest they follow the steps outlined for the original repository in their respective forked repositories.

7. Streamlined Deployment Instructions:

  • Update Deployment Instructions:
    • Revise the "Steps to be followed" section to strongly discourage users from forking the repository for deployment. Provide clear instructions for deploying the solution, emphasizing reasons to avoid unnecessary forks.
    • Example: Avoid forking the project for deployment purposes. Instead, follow the steps below for a more straightforward setup. Forking is discouraged for deployment due to the following reasons:
      • Efficiency: Forking introduces additional steps that can be avoided for a more efficient deployment process.
      • Maintenance: Deploying without forking ensures users receive updates and improvements without the need for manual synchronization.
      • Resource Optimization: Forking increases repository duplication, utilizing additional storage and bandwidth resources.
    • Highlight Deployment Alternatives:
      • Emphasize the benefits of deploying the solution without forking the project, providing users with compelling reasons to choose alternative deployment methods.
      • Example: Opt for a streamlined deployment process using Docker or provided deployment scripts. Avoid unnecessary forks to:
        • Simplify Deployment: Deploying directly reduces complexity and accelerates the setup process.
        • Stay Up-to-Date: Direct deployment ensures users automatically benefit from updates and enhancements.
        • Reduce Redundancy: Forking introduces redundancy, consuming additional resources and complicating maintenance.

8. Dedicated Contributors Section:

  • Create Contributors Section:
    • Introduce a new section in the README specifically for contributors, providing clear reasons to fork the project and submit changes.
    • Example:
      • For Contributors:
        • Open an issue if you have questions or if there's a specific need in the project.
        • Fork the project only if you plan to actively contribute code changes or enhancements. Forking for deployment is discouraged due to the following considerations:
          • Development Collaboration: Forking is encouraged for contributors actively participating in the development and collaboration process.
          • Isolation of Changes: Forking allows contributors to isolate changes before submitting them for review.
          • Code Contribution: Forking is essential when planning to contribute to the codebase by submitting pull requests.
    • Emphasize Communication:
      • Reiterate the importance of open communication with contributors, providing context for the reasons behind the recommendation.
      • Example: Engage in discussions in the issues section to ensure alignment with project goals before initiating development efforts. Effective communication helps in:
        • Aligning Contributions: Discussing changes beforehand ensures alignment with project objectives.
        • Avoiding Redundancy: Communication helps prevent redundant efforts and enhances collaboration.
    • Reference Contribution Guidelines:
      • Provide links to contribution guidelines, explaining the standards contributors should follow when submitting changes.
      • Example: Reference contribution guidelines to maintain consistency, quality, and adherence to project standards.

Steps to Reproduce:

  1. Go to the public GitHub repo: https://github.com/UCDS/health4all_v3.
  2. Navigate to the db/ folder.
  3. Open the health4all.sql file.
  4. Inspect the specified lines for leaked data in various tables.

Disclaimer:

This document serves as an exploratory overview and is not intended to be construed as a comprehensive cybersecurity report. The examination conducted focused primarily on a rapid exploration of the source code within the repository. While concerning findings were identified within the SQL dump, no observations or assessments were made within the application source code. It is important to note that proper cybersecurity assessments involve comprehensive testing of the application's security, including the use of appropriate tools and methodologies. Additionally, the application was not built and deployed for thorough testing. Therefore, this document should not be considered a formal cybersecurity report. For any further inquiries or clarifications, please contact DoriKyll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant